Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S...
-
Upload
oswin-neal -
Category
Documents
-
view
229 -
download
1
Transcript of Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S...
Metagenomics
What is metagenomics
Cloning genes from the environment, screening for function
16S sequencing
Random community genomics
Eukaryotic metagenomics
Screening from the environment
Random fragments of DNA Clone into a vector
Low copy vectors BACs YACs
BACs
Science Creative Quarterly
Screening from the environment
Random fragments of DNA Clone into a vector
Low copy vectors BACs YACs
Screen for a phenotype e.g. Diversa patents > 1,000 amylase
genesWhy did Diversa sequence whale-falls?
Screening from the environment
Expression host? Pathway or single gene? Get what you select
But remember …
A selection is worth a thousand screens
16S sequencing
Catalogs the bacteria that are present
PCR amplify the 16S gene with standard primers
Sequence the primers
Compare to known databases
Prokaryotic ribosome:
Large subunit:50S
5S and 23S rRNA
Small subunit:
30S
16S rRNA
Ribosomes
Ribosomes are made of proteins and RNA
Blue: proteinOrange: rRNA
30S Thermus aquaticus subunit
E. coli16S rRNA secondary structure
Highly conserved Base pairs = stems No pairing = loops
Variable regions inthe 16S rRNA. Vn – 9 regionsforward/rev primers
V1
V2
V3
V4
V5V6
V7
V8
V9
E. coli16S rRNA secondary structure
16S Primers
27F – 1492R full length
967F – 1046R V6 region
1380F – 1510R V9 region
1,465 base pairs
130 base pairs
79 base pairs
Variable regions = Variable results!
V1-V3V1-V3
V3-V5V3-V5V6-V9V6-V9
16S databases Greengenes
http://greengenes.lbl.gov/ Gary Andersen, Lawrence Berkeley National Laboratory
SILVA – ARB http://www.arb-silva.de/ Frank Oliver Glöckner, MPI, Bremen, Germany
VAMPS http://vamps.mbl.edu/ Mitch Sogin, Woods Hole, USA
Ribosomal Database Project (RDP) http://rdp.cme.msu.edu/ James Cole, Michigan State University, USA
16S sequencing
Cheap Easy Portable
PCR bias Variable regions give
variable answers Only tells you which
organisms are present & abundance
Does not explain much of the variance of the data
What does 16S sequencing actually tell you?
What does 16S sequencing tell you?
What does 16S sequencing tell you?
What is metagenomics
Cloning genes from the environment, screening for function
16S sequencing
Random community genomics
Eukaryotic metagenomics
16S sequencing is not good for functions
How much of the data?
Findley et al, Nature 2013doi: 10.1038/nature12171
Topography of [fungi and] bacteria on the skin
Study = 5,000 taxa14 skin sites10 people3 skin types
5,000 variables
They don't explain the meaning of j-q
The remainder of the variance (85.1%) is explained by a few taxa
each
Each dimension only adds marginal information
How much of the data?
Nine biomes paper
Dinsdale et al., Nature 2008doi:10.1038/nature06810
Variance:1,040,665 reads total(from 45 samples)30 subsystems9 biomes
30 variables
Fewer of the variables explainmore of the data
The variables are distinctive for each environment
Shotgun sequencing (HiSeq)
Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/
16S sequencing (MiSeq)
Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/
Shotgun + 16S (HiSeq)
Movies courtesy Will Trimble, Argonne National Labshttp://www.mcs.anl.gov/~trimble/flowcell/
There is no 16S for viruses
Rohwer and Edwards, 2002. The phage proteomic tree.
doi: 10.1128/JB.184.16.4529-4535.2002
200 liters water 5-500 g fresh fecal matter
DNA/RNA LASL
Sequence
Epifluorescent Microscopy
Extract nucleic acids
Concentrate and purify viruses or bacteria
Random community genomics
Extract DNA Soil extraction kit Water extraction kit
Create library LASLs fosmids
Sequence fragments
How do you sequence the environment?
Hydroshear
Blunt-ending
Addition of Linkers
Amplification of Fragments
HydroshearBlunt-ending
Addition of LinkersAmplification of Fragments
This method produces high coverage libraries of over 1 million clones from as little as 1 ng DNA
Soil Extraction Kit
David Mead -
Breitbart (2002) PNAS
Linker-Amplified Shotgun Libraries (LASLs)
• http://phage.sdsu.edu/~rob/cgi-bin/remoteblast.cgi
• Submit BLAST to local and remote databases– Local (as fast as possible)– NCBI (one search every 3 seconds)
• Many concurrent searches– One search versus 1,000 searches
• Parse data into tables for Excel– Access to taxonomy etc
Early Attempts at a Metagenomics Platform
More bacteria than somatic cells by at least an order of magnitude
More phages than bacteria by an order of magnitude
Sample the bacteria in the intestine by sampling their phage
Human-associated viruses
Known40%
Unknown60%
Breitbart (2003) J. Bacteriol.
Phages94%
Eukaryotic Viruses 6%
Most Viral DNA Sequences in Adult Human Feces are Unknown Phages
Abundance of viruses in twins
Reyes et al,Nature 2010
Microbial samples in gutsdon't change very much
Reyes et al,Nature 2010
Abundance of viruses in twins
Phage samplesin guts changea lot
Reyes et al,Nature 2010
Abundance of viruses in twins
Microbial Phage
Reyes et al,Nature 2010
Abundance of viruses in twins
Known92%
Unknown8%
Pepper Mild
Mottle Virus65%
Other Plant
Viruses9%
Other26%
Zhang (2006) PLoS Biology
Most Human RNA Viruses are Known
ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions
TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/virusems/
Viral particles in fecal sample
Pepper Mild Mottle Virus (PMMV)
S1
S2
S3
S4
S5
S6
S7
S8
S9
PMMV
Fecal samples
Extract total RNA
RT-PCR for PMMV
San Diego : 78% people are positiveSingapore : 67% people are positive
10-50 fold increase in feces compared to food106-109 PMMV copies per gram dry weight of feces
PMMV is common in Human Feces
India
n
curr
yPork
noodle
red
chili
Chic
ken
rice Chin
ese
fo
od
Hong K
ong c
hili
sauce
Hong K
ong
gre
en c
hili
Vegeta
rian
chili
Chili powder
Chili sauces
NOT FOUND IN FRESH PEPPERS
Which Foods Contain PMMV?
Rosario et al. AEM (2009)
PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater
0.1
Lib3 Contig[0064]
Lib2 Contig[0070]
AB084456.1
AB062049.1
AB062051.1
AF103778
AY632863.1
AB119482.1
AJ429088.1
AB062054.1
CoatProtein
AB069853.1
AB062052.1
AB000709.2
M87827.1
AJ429087.1
AF525080.1Lib2_2217
Lib3_Contig[0494]Lib3_Contig[1213]
Lib2_Contig[0458]
Lib2_Contig[1099]
Lib3_65
Lib3_Contig[0273]Lib3_Contig[0078]
Lib3_Contig[0863]
AJ308228.1
AB062053.1
AJ429089.1
X72587.1
Lib2_1377
Lib2_2914
Lib1_2299
Lib3_928
Lib2_1656
Lib2_2549
Lib3_462
Lib2_492Lib3_Contig[0655]
Lib2_133
Lib1_Contig[0253]
Lib1_Contig[0123]
Lib1_Contig[0279]
Lib1_Contig[0107]Lib1_Contig[0052]Lib1_Contig[0004] Lib2_Contig[0995]
Lib1_Contig[0009]
Lib1_Contig[0166]
Lib1_Contig[0657]
Lib1_1449
Lib1_2211
Lib1_Contig[0029]
Lib1_1733
Lib1_Contig[0076]
Lib1_1168
Lib1_Contig[0261]
Lib1_2361
Lib2 1468
Lib2 Contig[0031]
Lib2 Contig[1202]
Lib1_Contig[0005]
Lib1_Contig[0558]
AF103776.1
AB062050.1
I
II
III
IV
V
0.1
Lib3 Contig[0064]
Lib2 Contig[0070]
AB084456.1
AB062049.1
AB062051.1
AF103778
AY632863.1
AB119482.1
AJ429088.1
AB062054.1
CoatProtein
AB069853.1
AB062052.1
AB000709.2
M87827.1
AJ429087.1
AF525080.1Lib2_2217
Lib3_Contig[0494]Lib3_Contig[1213]
Lib2_Contig[0458]
Lib2_Contig[1099]
Lib3_65
Lib3_Contig[0273]Lib3_Contig[0078]
Lib3_Contig[0863]
AJ308228.1
AB062053.1
AJ429089.1
X72587.1
Lib2_1377
Lib2_2914
Lib1_2299
Lib3_928
Lib2_1656
Lib2_2549
Lib3_462
Lib2_492Lib3_Contig[0655]
Lib2_133
Lib1_Contig[0253]
Lib1_Contig[0123]
Lib1_Contig[0279]
Lib1_Contig[0107]Lib1_Contig[0052]Lib1_Contig[0004] Lib2_Contig[0995]
Lib1_Contig[0009]
Lib1_Contig[0166]
Lib1_Contig[0657]
Lib1_1449
Lib1_2211
Lib1_Contig[0029]
Lib1_1733
Lib1_Contig[0076]
Lib1_1168
Lib1_Contig[0261]
Lib1_2361
Lib2 1468
Lib2 Contig[0031]
Lib2 Contig[1202]
Lib1_Contig[0005]
Lib1_Contig[0558]
AF103776.1
AB062050.1
I
II
III
IV
V
Library 1
Library 2
Library 3
Same person 6 months
apart
• Diverse populations
• Differences between individuals and over time
Different PMMV families
Infected leaf Control
Fecal sample
Total RNA
PMMV RT-PCR
Viral concentrate
Plant leaf inoculation
• Spread of infection to Hungarian wax pepper evident within 1 week
• Infected leaf was positive by RT-PCR for PMMV
• Animals may serve as vectors for plant viruses
Human-fecal borne PMMV can infect plants
T
he
su
nm
ac
hin
e.n
et
http://www.sweatnspice.com
Koch’s Postulates
Random community genomics
Eukaryotic metagenomics ITS sequences
Internal transcribed spacer regions http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC4113289/
Individual genes Cox1
Exome sequencing Pull out ESTs and sequence
What is there?
How many are there?
What are they doing?
Experimental manipulations
Diagnostics
Why Metagenomics?
Sequencing costs decreasing
http://genome.gov/sequencingcosts
Firstbacterial genome
100bacterial genomes
1,000bacterial genomes
Num
ber
of
know
n s
equence
s
Year
Environmentalsequencing
How much has been sequenced?
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
One genome fromevery species
Most majormicrobial environments
Year
How much will be sequenced?
Most pipelines work the same way!
Metagenomics ProcessingBinning reads
Contamination removal
Contig Clustering
Functional Assignments
Gene
Pred
ictio
n
Mer
ge p
aire
d-en
d re
ads
Preprocessing
Taxonomic assignments
Metagenomics Quality control –
Prinseq Deconseq Annotation
FOCUS Real time
metagenomics mg-rast Super FOCUS
Statistics STAMP
Population genomes crAss metabat ContigClustering
Metagenomics Processing
AbundanceBinCompostBinconcoctcrAsstetra
Contig clustering FragGeneScan
GlimmerMGMetaGeneAnnotatorMetaGeneMarkMetaGunOrpheliaProdigal
Gene PredictionFASTQC
FastX ToolkitfitGCPNGS QC ToolkitNon-pareilPrinseqQC-ChainStreaming Trim
Preprocessing
CARMA myTaxaFOCUS PhylopythiaSKRAKEN phymmblLMAT RAIphyMEGAN TACOAMetaplan Taxy
Taxonomic assignment CLAMS Sequedex
DiScRIBinATE SORT-ITEMSgenometa SPANNERGSMer SPHINXPPLACER TaxSOMRTMg Treephyler
Functional assignment