Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell...
Transcript of Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell...
![Page 2: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/2.jpg)
The problem
Binning: clustering sequences with the same origin together
A corner piece? GREAT! But where is the rest of the puzzle?
Drew Sheneman, New Jersey -- The Newark Star Ledger
![Page 3: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/3.jpg)
Potato processing wastewater treatment plant at Olburgen, The Netherlands
Stable system operated since 2006
Images:Left & Middle Abma et al. Water Science & Technology (2010)
Study site
![Page 4: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/4.jpg)
nitritation/ anammox reactor (600 m3)
5.0 m
0.2 m
1.4 m
2.6 m
3.8 m
total sample
washed granules
1
2
3
4
5
6
7
8
total sample
washed granules
DNA isolation
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Sampling strategy: 8 samples
Sample treatmentSample location DNA isolation
![Page 5: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/5.jpg)
Data handles
Sequence composition
Prior knowledge (Databases)
Sequence abundance
Mate pair & Paired end
![Page 6: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/6.jpg)
Data handles: mate pair and paired end
![Page 7: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/7.jpg)
Data handles: mate pair and paired end
![Page 8: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/8.jpg)
Data handles: databases
![Page 9: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/9.jpg)
Data handles: composition
Limited chemical signature
Biological information- Codon usage (tetramer frequency)
‘Unique’ long k-mers
Contig/read length matters!
![Page 10: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/10.jpg)
DNA isolation and
library preparation
sequencing and assembly
Data handles: abundance
Abundance in the sample correlates with abundance in reads
![Page 11: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/11.jpg)
Many roads try to get to Rome
Reference based and reference independent binning methods
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).
![Page 12: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/12.jpg)
Many roads try to get to Rome
Composition: - GC content- Tetranucleotide frequencies
Abundance - Long k-mer copy number- Contig coverage
Content- Essential single copy genes
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).
![Page 13: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/13.jpg)
Binning approaches
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 14: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/14.jpg)
Binning approaches
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 15: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/15.jpg)
Assembly independent binning
Wang, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28, i356–i362 (2012).
T = long kmer abundance
w = long kmer length
![Page 16: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/16.jpg)
Binning approaches
Assembly independent read binning
Binning on GC content and Sequencing depth
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 17: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/17.jpg)
Separating genomes: binning
Binning based on coverage and GC content
Se
quen
cin
g de
pth
GC content
![Page 18: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/18.jpg)
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 19: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/19.jpg)
Binning: tetranucleotide ESOM
Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).
![Page 20: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/20.jpg)
Emergent Self Organizing Map (ESOM) based on tetranucleotide frequency
Binning: tetranucleotide ESOM
Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).
![Page 21: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/21.jpg)
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 22: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/22.jpg)
Using nucleotide extraction bias to separate organisms
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31, 533–538 (2013).
Binning: differential coverage binning
http://madsalbertsen.github.io/multi-metagenome/
![Page 23: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/23.jpg)
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 24: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/24.jpg)
differential coverage binning: crAss
![Page 25: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/25.jpg)
differential coverage binning: groopM
http://minillinim.github.io/GroopM/
1. Imelfort, M., Parks, D., Woodcroft, B. J. & Dennis, P. GroopM: An automated tool for the recovery of population genomes from related metagenomes. (2014).
![Page 26: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/26.jpg)
differential coverage binning: concoct
1. Alneberg, J. et al. CONCOCT: Clustering cONtigs on COverage and ComposiTion. (2013).
![Page 27: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/27.jpg)
differential coverage binning: ESOM
1. Kantor, R. S. et al. Small Genomes and Sparse Metabolisms of Sediment-Associated Bacteria from Four Candidate Phyla. MBio 4, e00708–13–e00708–13 (2013).
![Page 28: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/28.jpg)
differential coverage binning: ESOM
1. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol 32, 822–828 (2014).
![Page 29: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/29.jpg)
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
![Page 30: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/30.jpg)
Determining what belongs together by crosslinking total cell content
1) Crosslink2) Cut DNA3) Religate randomly4) Sequence paired end labrary of both crosslinked and native sample
Binning: Hi-C metagenomics
Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1
![Page 31: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/31.jpg)
Clustering by organism (and even replicon!)
Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1
Binning: Hi-C metagenomics
![Page 32: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/32.jpg)
Roads less travelled…Whichever method you choose, do a background check…
![Page 33: Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell content 1) Crosslink 2) Cut DNA 3) Religate randomly 4) Sequence paired end labrary of](https://reader033.fdocuments.us/reader033/viewer/2022041800/5e5113e4ba5ead2f004329a5/html5/thumbnails/33.jpg)
When analyzing a complex community,
experimental design largely determines how much you can get out
Binning: concluding remarks