Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Computational Tools for Metagenomics

Surya Saha Twitter: @SahaSurya / LinkedIn: www.linkedin.com/in/suryasaha/

Magdalen Lindeberg Plant Pathology & Plant-Microbe Biology

Microbial Friends & Foes, Sep 25, 2012

Temperton, Current Opinion in Microbiology, 2012

Impact of Technology on Metagenomics

Types of “Meta” genomics

16S rRNA survey of bacterial microbiome

ITS survey of fungal microbiome

Bellemain, BMC Microbiology 2010 Slide: Julien Tremblay, JGI

Types of “Meta” genomics

Whole genome shotgun • Varying complexity of microbial communities • High coverage sequencing • Sophisticated informatics • Host associated metagenomes

– Deep sequencing of host meta-genome – Bioinformatic screening of host sequences

• Environmental metagenomes – Eg. Soil samples – Requires very high depth of coverage – Complicated to assemble

Big picture!!

What users see

Big picture!!

What users see

What users want!!

16S/ITS community surveys

• Multiple target regions in 16S gene and ITS region • Comparison of results requires amplification of same region • Advantages

– Fast survey of large communities – Mature set of tools and statistics for analysis – Good for first round survey

• 454 16S tags or pyrotags (~ 700 bp) have been the preferred method

• Illumina Miseq (2x150bp, 2x250 bp) are the next workhorses

• Depth of sampling – 2-6000 reads/sample for simple communities – 20000 reads /sample for complex soil metagenomes

16S/ITS issues

• Lack of tools for processing ITS/Fungal microbiome data sets – RDP classifier targets only ITS – No ITS reconstruction tools

• Amplification bias effects accuracy and replication • Use of short reads prevents disambiguation of similar

strains • 16S or ITS may not differentiate between similar strains

– Clustering is done at 97% – Regions may be >99% similar

• Sequencing error inflates number of OTUs • Chloroplast 16S sequences can get amplified in plant

metagenomes

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

• Quality plots and read trimming

– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

– FASTX http://hannonlab.cshl.edu/fastx_toolkit/

• Chimera removal

– AmpliconNoise http://code.google.com/p/ampliconnoise/

– UCHIME http://www.drive5.com/uchime/

Impact of Sequence Length

Slide: Feng Chen, JGI

• Merge overlapping paired end reads

– FLASH http://www.genomics.jhu.edu/software/FLASH/index.shtml

– FastqJoin http://code.google.com/p/ea-utils/wiki/FastqJoin

– CD-HIT read-linker http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit-auxtools-manual

• Clustering with high stringency

– UCLUST/USEARCH (16S only) http://www.drive5.com/usearch/

– CD-HIT-OTU (16S only) http://weizhong-lab.ucsd.edu/cd-hit-otu/

– phylOTU (16S only) https://github.com/sharpton/PhylOTU

• Composition based classifiers – RDP database + classifier http://rdp.cme.msu.edu/classifier/classifier.jsp

• Homology based classifiers – ARB + Silva database (16S only) http://www.arb-home.de/

– GreenGenes database (16S only) http://greengenes.lbl.gov/cgi-bin/nph-index.cgi

– UNITE database (ITS only) http://unite.ut.ee/

– FungalITSPipeline (ITS only) http://www.emerencia.org/fungalitspipeline.html

• http://www.qiime.org/

• Comprehensive suite of tools – OTU picking

– Taxonomic classification

– Construction of phylogenetic trees

– Visualization

– Compute diversity statistics

• Available as Amazon EC2 image

Whole Genome Shotgun (WGS) Metagenomics

• Better classification with Increasing number of complete genomes

• Focus on whole genome based phylogeny (whole genome phylotyping)

• Advantages – No amplification bias like in 16S/ITS

• Issues – Poor sampling of fungal diversity – Assembly of metagenomes is complicated due to

uneven coverage – Requires high depth of coverage

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

• Quality plots and read trimming

– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

– FASTX http://hannonlab.cshl.edu/fastx_toolkit/

Assemble reads

• NGS assembly with uneven depth

– IDBA-UD http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/

– MIRA http://www.chevreux.org/projects_mira.html

– Velvet / MetaVelvet http://www.ebi.ac.uk/~zerbino/velvet/

http://metavelvet.dna.bio.keio.ac.jp/

Assemble reads

• Hybrid composition/homology based classifiers – FCP http://kiwi.cs.dal.ca/Software/FCP

– Phymm/PhymmBL http://www.cbcb.umd.edu/software/phymm/

– AMPHORA2 http://wolbachia.biology.virginia.edu/WuLab/Software.html

– NBC http://nbc.ece.drexel.edu/

– MEGAN http://ab.inf.uni-tuebingen.de/software/megan/

Assemble reads

• Web based classifiers

– MG-RAST http://metagenomics.anl.gov/

– CAMERA http://camera.calit2.net/

– IMG/M http://img.jgi.doe.gov/cgi-bin/m/main.cgi

MetaPhAln

• Unique clade-specific markers for sequenced bacteria and archaea • 400 genuses/4000 genomes including HMP genomes • Species level resolution • MetaPhAln 2 in the works

– Eukaryotes including Fungi – Viruses – Higher coverage of archaea

• Krona and GraphAln for visualization of output • Websites

– https://bitbucket.org/nsegata/metaphlan – http://huttenhower.sph.harvard.edu/metaphlan

PhyloSift/pplacer

• Reference database of marker genes • Places reads on tree of life based on homology to

reference protein • Integration with metAMOS for pre-assembling next-

generation datasets • Bacterial and Archaeal classification only • Plant and Fungi marker genes are being added • Websites

– http://phylosift.wordpress.com/ – https://github.com/gjospin/PhyloSift

Real cost of Sequencing!!

Sboner, Genome Biology, 2011

Acknowledgements

Funding

Magdalen Lindeberg Cornell University

Dave Schneider USDA-ARS, Ithaca

Citrus greening / Wolbachia (wACP)

Thank you!

Surya Saha ss2489@cornell.edu

Suggestions

• Plan informatics workflow as early as possible

• Incorporate statistics at different stages in the workflow

Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Education

Transcript of Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Ion 16S Metagenomics Kit and Ion Reporter metagenomics ... · Ion 16S™ Metagenomics Kit and Ion Reporter ™ metagenomics workfl ow solution Culture-free rapid identiﬁ cation

Human Molecular Genetics · Key Words: human microbiome, 16S rRNA, metagenomics, bacteria Human Molecular Genetics. For Peer Review 1 Sequencing the human microbiome in health and

Effects of dietary exposures to pesticide residues on the ... · Faecal samples in 65 twin pairs that are discordant for an organic diet are studied by shotgun metagenomics and metabolomics

Pair B · Web viewAs mentioned during the last lab period, sequenced-based metagenomics analysis encompasses the sequencing of entire genomes, and the sequencing of 16S rDNA sequences

Taxonomy of anaerobic digestion microbiome reveals biases ... · approaches: 16S rRNA amplicon sequencing, shotgun DNA and shotgun RNA. This comparative analysis revealed that, in

Building a foundation for microbial metagenomics analysis Mccluskey_WDCM... · Building a foundation for microbial metagenomics analysis ... –Intestinal/Rumen •Growth of Metagenomics

Composition, taxonomy and functional diversity of the ... Biodiversity, Computational Biology, Genomics, Microbiology, Neurology Keywords Metagenomics, Microbiome, Shotgun sequencing,

Metagenomics Research Review - Smith College · 4 16S Ribosomal RNA The new era of metagenomics was ushered in by studies using 16S rRNA as a phylogenetic marker of microbial taxa.4

What is Metagenomics? - an IntroductionMetagenomics What is Metagenomics? Metagenomics (Environmental Genomics, Ecogenomics or Community Genomics) is the study of genetic material

Introduction...V1 V2 V3 V4 V5 V6 V7 V8 V9 Bacterial 16S rDNA 11/17/17 11 Metagenomics - Sequencing Metagenomic sequencing – Shotgun sequencing • Complete sequencing of entire metagenome

Identification of fungi in shotgun metagenomics datasetssihua.ivyunion.org/QT/Identification of fungi in shotgun metagenomic… · The most common is PCR amplification of internal

METAGENOMICS - genetica.uab.catgenetica.uab.cat/base/documents/Genomics/Metagenomics. Aina Mª Mas2015... · Metagenomics is complemeted with different approaches to have an overall

Multiple comparative metagenomics using multiset k-mer ... · composition. Such compositions may be approximated by sequencing marker genes, such as the rRNA 16S in bacterial communities

METAGENOMICS (16S AMPLICON SEQUENCING) AND DGGE …

Improving saliva shotgun metagenomics by chemical host …

Quantitative Metagenomics

SHOTGUN METAGENOMIC SEQUENCING AND …hmpdacc.org/doc/d1.s2.t2 - Makedonka Mitreva - WGS at...(()Poster #59) 2. Metagenomic analysis of the microbiota 16S community profiling (Poster

Metagenomics and 16S - DTU

PoreLab - Hogeschool Leiden · Shotgun metagenomics enables accurate detection of organisms but ratio’s are an approximation. Transcriptomics is still at it’s infancy but will

713 Lecture 15 Host metagenomics. Progression of techniques Culture based –Use phenotypes and genotypes to ID Non-culture based, focused on 16S rDNA –Clone.