From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics...
-
Upload
brice-hodges -
Category
Documents
-
view
214 -
download
0
Transcript of From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics...
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20131
Anna Shcherbina
Bioinformatics Challenge Day
02/02/2013
From Metagenomic Sample to Useful Visual
This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and
are not necessarily endorsed by the United States Government.
Distribution Statement A: Approved for public release; distribution is unlimited.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20132
The Opportunity
•NGS instruments have recently given us the ability to characterize the microbiomes that we live in and that live in us.
•We can get a step closer to this goal by creating a visualization program that facilitates manual data curation by a human.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20133
Your Mission
Invent novel visualization approaches to represent metagenomic data.
Subgoals:•Pick out anomalies within a given dataset. •Generate time series representation of multiple datasets.•Compress data efficiently to allow visualization of huge datasets.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20134
Metagenomic datasets (FASTQ format) from clinical and environmental samples.
• Metagenome of the human oral cavity under healthy and diseased conditions, with a focus on supragingival dental plaque and cavities. – “oral_healthy” and “oral_diseased” datasets– Roche 454
• Nose/throat swab from Nicaraguan child with acute respiratory illness– “nicaragua” dataset– Illumina
The Data (I)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20135
• Skin surface from the palm of a human hand – “palm” dataset– Roche 454
• Human abscess sample of unknown etiology – “abscess” dataset– Illumina
• Cultivated corn soil metagenome – “soil” dataset– Illumina
The Data (II)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20136
Our Processing Pipeline
Raw FASTA reads
BLAST against virus, bacteria, and archaea databases
(from GenBank)
Data Processing•Parsed CSV summary of BLAST hits
•BLAST hits sorted by species, FASTA format
Other BLAST parsers
Data is available from each stage of the processing pipeline
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20137
Parsed BLAST File Example for a Single Hit
S62.141238_159200 Query Name+ Query Strand1 Query Start232 Query EndNeisseria meningitidis Query OrganismBacteria; Proteobacteria; Betaproteobacteria; Query Taxonomy 232 Identities100 Percent0 Number Gaps0 Number CharactersGU561418 Target Name- Target Strand47 Target Start 278 Target EndNeisseria subflava Target OrganismBacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria.Target Taxonomy CTGGGCCGTGTCTCAGTCCCAGTGTGGC Query SequenceCTGGGCCGTGTCTCAGTCCCAGTGTGGC Target SequenceBLASTN Analysis Programbacteria.gdna Database
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20138
Your Open-Source Toolkit
•MEGAN4
•IMG/IM
•KRONA (included with PhymmBl)
•MG-RAST
•METAREP
•Mothur
•Feel free to use any additional tools you think are useful.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20139
MEGAN4-MEtaGenomoe ANalyzer
•A simple lowest common ancestor algorithm assigns reads to taxa. • Taxonomic level reflects the degree of conservation of a sequence.
•Dissects large datasets without assembly or the targeting of specific phylogenetic markers.
•Graphical and statistical output for comparing different datasets.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201310
MEGAN4-MEtaGenomoe ANalyzer
Oral Diseased Bacteria
Oral Healthy Bacteria
Oral Diseased Virus Oral Healthy Virus
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201311
MEGAN4-MEtaGenomoe ANalyzer
Oral healthy Vs.
Oral diseasedBacteria
Oral healthy Vs.
Oral diseasedVirus
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201312
• Web interface: http://img.jgi.doe.gov/cgi-bin/m/main.cgi
IMG/IM – Integrated Microbial Genomes with Microbial Samples
source: http://img.jgi.doe.gov/m/doc/about_index.html
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201313
IMG/IM Phylogenetic Distribution of Genes Based on Distribution of BLAST Hits
source: http://img.jgi.doe.gov/m/doc/about_index.html
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201314
IMG/M Abundance Profile Overview
source: http://img.jgi.doe.gov/m/doc/about_index.html
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201315
• KRONA allows hierarchal data to be explored with zoomable pie-charts. – Excel template or KRONA tools. – Support for several bioinformatics tools and raw data formats.
KRONA
source: http://sourceforge.net/p/krona/home/krona/
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201316
MG-RAST
Oral Diseased
source: http://blog.metagenomics.anl.gov/
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201317
MG-RAST
Oral Healthy
source: http://blog.metagenomics.anl.gov/
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201318
MG-RAST
Oral Diseased Oral Healthy
source: http://blog.metagenomics.anl.gov/
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201319
• A Web 2.0 application to analyze and compare annotated metagenomic datasets.
• Compare absolute and relative counts of multiple datasets at various functional and taxonomic levels.
• Statistical tests, multidimensional scaling, heatmap and hierarchal clustering plots.
JCVI Metagenomics Reports (METAREP)
source: http://blogs.jcvi.org/tag/metarep/
Heatmap Plot
Hierarchical Clustering Plot
METASTAT Results
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201320
• A single platform for sequence alignment, pairwise distance calculation, distance matrix analysis.
• Venn diagrams, community trees, heat maps, sample-based rarefaction curves.
Mothur: 16S rRNA Sequence Analysis