Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions,...
Transcript of Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions,...
![Page 1: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/1.jpg)
Canadian&Bioinforma,cs&Workshops&
www.bioinforma,cs.ca&
2 Module #: Title of Module
![Page 2: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/2.jpg)
Module&1&Introduc,on&to&Metagenomics!
Robert&Beiko&
Rob Beiko [email protected] @rob_beiko
Module!1!! bioinformatics.ca en.wikipedia.org
![Page 3: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/3.jpg)
Module!1!! bioinformatics.ca
Avery– MacLeod– McCarty experiment
en.wikipedia.org
Module!1!! bioinformatics.ca
Course overview
• Module 1: Introduction – definitions, approaches, considerations
• Module 2: Marker genes – measuring community diversity • Module 3: Metagenome taxonomy – classifying and binning
sequence reads • Module 4: Metagenome function – databases and pathways • Module 5: Metatranscriptomics: data, taxonomy, function • Module 6: Biomarker discovery
![Page 4: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/4.jpg)
Module!1!! bioinformatics.ca
General Learning Objectives
At the end of this workshop, you will be able to: • Define the objectives of different types of metagenomic
projects • Process raw data files using appropriate quality control • Run standard pipelines for marker-gene, metagenome and
metatranscriptome datasets • Analyze results using statistical and network approaches • Recognize the technical limitations of metagenomic studies
Module!1!! bioinformatics.ca
Learning objectives of Module 1
You will be able to: • Apply key terms in metagenomics, for example
microbial communities, OTUs, metadata • Define the objectives of a metagenomic experiment,
with appropriate choice of technology • Interpret the contents of sequence files • Acquire data from online resources and reference
databases
![Page 5: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/5.jpg)
Module!1!! bioinformatics.ca
Defining Metagenomics • Microbiome: Attributed to Joshua Lederberg by Hooper and Gordon
(2001): “the collective genome of our indigenous microbes (microflora), the idea being that a comprehensive genetic view of Homo sapiens as a life-form should include the genes in our microbiome”
• Is also used to mean microbiota, the set of microorganisms found in a particular setting
• Metagenome: Handelsman et al. (1998) “…advances in molecular biology and eukaryotic genomics, which have laid the groundwork for cloning and functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
• Does not encompass marker-gene surveys (e.g., 16S) This report says it does.
Module!1!! bioinformatics.ca
The big picture Explore the relationship between microbes and their habitat
To accomplish this, we use a series of experimental and computational techniques to make inferences about the community: - Marker genes - Metagenomes - Metatranscriptomes - Metaproteomes - Metametabolomes - “Culturomes”
![Page 6: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/6.jpg)
Module!1!! bioinformatics.ca
Why metagenomics?
• The “great plate count anomaly”: <1% of organisms across many habitats are culturable (reviewed in Amann et al., 1995: PMID 7535888) – CONTROVERSIAL; probably not true for habitats such as human body sites
• In any event, it would be nearly impossible to culture ALL constituents of a given microbiome sample (apart from trivially simple ones)
• Metagenomics offers an effective (if imperfect) way to profile the structure and function of microbial communities
Module!1!! bioinformatics.ca
Human gut microbiome: 2-3 million genes
Typically > 160 “species” at any given sampling time
Host: ~25,000 genes
Qin et al., Nature (2010)
The Human Microbiome
![Page 7: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/7.jpg)
Module!1! bioinformatics.ca
A Brief History of Metagenomics and Things Like It
Module!1!! bioinformatics.ca
1970s
1960: pyrimidine tract sequencing via depurination 1955: insulin protein sequence
1965: Atlas of Protein Sequence and Structure (Eck, Dayhoff)
Frederick Sanger, Margaret Dayhoff en.wikipedia.org
![Page 8: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/8.jpg)
Module!1!! bioinformatics.ca
Staden (1979)
“The continuing rapid fall in the cost of computer components is making it possible for most DNA sequencing laboratories to have their own small computer. The fact that DNA sequencing is now a fast procedure, and the availability of computers gives the possibility of more efficient overall strategies for sequence determination.”
Module!1!! bioinformatics.ca
T4 genome map: Wood and Revel, 1976
PhiX174 phage genome: Sanger et al., 1977
![Page 9: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/9.jpg)
Module!1!! bioinformatics.ca
1980s
Norm Pace http://pacelab.colorado.edu
1980: “Dr. Dayhoff established an on-line computer database and a sophisticated retrieval system, accessable by phone to outside users, in September 1980” http://www.dayhoff.cc/MODBiography.html
Module!1!! bioinformatics.ca
Octopus Spring: Stahl et al., 1985
![Page 10: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/10.jpg)
Module!1!! bioinformatics.ca
1990s
Jo Handelsman en.wikipedia.org
Module!1!! bioinformatics.ca
2000s
Oded Béjà rbni.technion.ac.il Jill Banfield ourenvironment.berkeley.edu
![Page 11: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/11.jpg)
Module!1!! bioinformatics.ca
2010s
“The microbiome of”: Roller derby Kissing Mobile phones Beer Irish rugby players
Jessica Green
Rob Knight
Module!1! bioinformatics.ca
(Very) high-level workflows
![Page 12: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/12.jpg)
Module!1!! bioinformatics.ca
The big picture
Microbial sample
Generate “Meta-omic” data
Process data (QC,
etc.) Analysis
Module!1!! bioinformatics.ca
Marker genes
Extract DNA
Amplify with
targeted primers
Filter errors, build
clusters
Diversity analysis
![Page 13: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/13.jpg)
Module!1!! bioinformatics.ca
Metagenomes
Extract DNA
Sequence random
fragments
QC, assemble, annotate
Diversity, function analysis
Module!1!! bioinformatics.ca
Metatranscriptomes
Extract RNA,
subtract rRNA
Sequence cDNA QC
Gene expression,
function
![Page 14: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/14.jpg)
Module!1!! bioinformatics.ca
Scaling up
Metadata
Langille et al., Microbiome (2014)
Module!1! bioinformatics.ca
Examples of “Metagenomics”
![Page 15: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/15.jpg)
Module!1!! bioinformatics.ca Remediation of C. difficile infection: Lawley et al., PLoS Pathogens (2012)
Module!1!! bioinformatics.ca Analysis of membrane proteins in the GOS dataset: Patel et al.,Genome Res (2010)
![Page 16: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/16.jpg)
Module!1!! bioinformatics.ca
Metagenomic / metatranscriptomic AMD analysis - Hua et al., ISME J (2015) Draft genomes at MG-RAST
Module!1!! bioinformatics.ca Metabolites and microbes in bacterial vaginosis: Srinivasan et al.,Genome Res (2010)
![Page 17: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/17.jpg)
Module!1!! bioinformatics.ca
Impact of low-dose penicillin on mouse development – Cox et al., Cell (2014)
Module!1!! bioinformatics.ca
Sequencing technologies
Sanger Ion Torrent Roche 454
Illumina *Seq Pacific Biosciences Nanopore
![Page 18: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/18.jpg)
Module!1! bioinformatics.ca
Resources
Module!1!! bioinformatics.ca
16S
GreenGenes: MacDonald et al. ISME
J (2012)
SILVA: Quast et al. NAR (2013)
rrnDB: Stoddard et al. NAR (2014)
RDP II: Cole et al. NAR (2013)
![Page 19: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/19.jpg)
Module!1!! bioinformatics.ca
Genomes
PATRIC GenBank Genomes
GOLD Ensembl Genomes
Module!1!! bioinformatics.ca
“Metagenomes”
EBI metagenomics MG-RAST
HMP DACC
![Page 20: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/20.jpg)
Module!1!! bioinformatics.ca
Function
KEGG
UniProtKB
CARD
Gene Ontology
Module!1! bioinformatics.ca
Major concerns in metagenomic analysis
![Page 21: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/21.jpg)
Module!1!! bioinformatics.ca
Data Quality
• Sequencing errors – Introduced in workup – Error rates, error type (PacBio: 10% random, Illumina –
0.1% substitution) • Chimeras
– Amplification artifacts, cloning of restriction fragments
Module!1!! bioinformatics.ca
Comparability / Reproducibility
• 16S: different V regions give different results • Different sequencing platforms / sampling
conditions ALSO give different results – Eisen paper about different recoveries under different
conditions • Workflow complexity / plethora of tools
![Page 22: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/22.jpg)
Module!1!! bioinformatics.ca
Morgan Langille Useless, not published
“Middle-aged”
Young
Reference
Old
Module!1!! bioinformatics.ca
Linkage and resolution
• Strain-level diversity in metagenomes will often be missed by amplicon (esp. short-read) and shotgun approaches
• This may be especially important between samples
• Should you assemble metagenomic reads? What are the assumptions?
![Page 23: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/23.jpg)
Module!1!! bioinformatics.ca
16S is not the only option!!
Martiny et al. (2009) Env Micro
Ribosomal intergenic transcribed spacer regions (ITS)
Module!1!! bioinformatics.ca
Taxonomy and OTUs
RDP taxonomic predictions +
taxonomy in general
OTUs – arbitrary, quasi-phylogenetic
Seed sequences
???
De novo
97%
![Page 24: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity](https://reader034.fdocuments.us/reader034/viewer/2022042304/5ecfbea509c5ab23a90fa322/html5/thumbnails/24.jpg)
Module!1! bioinformatics.ca
Functional annotation problems
CAFA (Radivojac et al., Nat Meth 2013)
Misannotations across databases (Schnoes et al., PLoS Comp Biol 2009) Coverage vs accuracy
Module!1! bioinformatics.ca
We&are&on&a&Coffee&Break&&&Networking&Session&