Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis...
-
Upload
regina-walker -
Category
Documents
-
view
217 -
download
0
Transcript of Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis...
Initial results from theMicrobiome Quality Control Project
pilot phase (MBQC-pilot)
Curtis Huttenhower
09-30-14
Harvard School of Public HealthDepartment of Biostatistics U. Oregon
2
MBQC-pilot?
• Some amazing numbers:19 labs providing data
>50 researchers
>40 workshop attendees
94 specimens studied
2,238samples sequenced
155Msequences generated
16,555 samples analyzed~3x HMP data,
~6x HMP seqs.!
Name Affiliation Handl.
Bioinf.
Allen-Vercoe U. Guelph X Burk Albert Einstein X XBushman U. Pennsylvania X
Caporaso NAU X
Chia Mayo Clinic X
Flores DCP, NCI X Gevers Broad Institute X
Gloor U. Western Ontario X
Goodman Yale X
Huttenhower HSPH X
Knight UCSD X X
Littman New York University X Mills UCD X Petrosino Baylor X XRavel U. Maryland X XSchloss U. Michigan X
Shanks UPA X Turnbaugh UCSD X Yeager/Yu DCEG, NCI X X
3
Some ground rules
• The goal of the MBQC is not to identify which groups produce the best data– Nor which software or protocol is the “best”– The goal, particularly for the pilot, is to identify
which protocol choices influence variation in results
• These evaluations are not quality judgments– No one sample handling protocol is the best– No one bioinformatics protocol is the best
• In the wild west of the pilot phase, every protocol looks good by some metrics, bad by others
• Be open, be honest – if something looks odd, say so
4
Samples and sample set design
Sample type Health status#
Fresh # DNAFresh Sick (ICU) 3 2Fresh Sick (Diabetes/RA) 2 0Fresh Healthy 2 3Fresh Healthy 2 2Fresh Sick (ICU) 3 2Fresh Sick (ICU) 2 2Fresh Sick (ICU) 2 0Fresh Sick (Diabetes/RA) 3 2Fresh Sick (ICU) 2 2Fresh Healthy 2 2Fresh Healthy 2 0
Freeze-dried 6 month post-surgery CRC case 3 2Freeze-dried 6 month post-surgery CRC case 2 3Freeze-dried 4 month post-surgery CRC case 2 0Freeze-dried Pre-surgery control 3 2Freeze-dried 6 month post-surgery CRC case 2 2Freeze-dried Pre-surgery control 2 3Freeze-dried 3 month post-surgery CRC case 2 2
Robogut Healthy 3 2Robogut Healthy 3 2
Oral mock 3 3Gut mock 3 3
Blank 0 2Totals 96 53 43
EmmaSchwager
JoshSampson
• ~40% triplicates,~60% duplicates
• ~50% pre-extracted
• Range of phenotypes
• ~50% healthy
• Triplicates of mocks
5
Sample handling protocol variables
Lab Extraction Homog? Primers Sequencer
ReadLengt
hPhiX Quality
% Reads
Allen-Vercoe Promega Yes EMP MiSeq 210 0.005 0.9372234000
0
BurkQiagen, MO-BIO Yes EMP MiSeq 300 0.3 0.943
20000000
Bushman MO-BIO Yes EMP MiSeq 250 0.15 1140000
Flores Qiagen Yes EMP MiSeq 250 0.05 0.891520000
0
Gevers Chemagen Yes EMP MiSeq 1750.104
5 0.6882399000
0
Goodman Omega BioTek YesSchloss 2013 MiSeq 250 0.15 0.8875
53394580
Knight MO-BIO Yes EMP MiSeq 150 0.1 0.8462168000
0
Littman MO-BIO No EMP MiSeq 150 0.1 0.961 9012607
Mills Zymo Yes EMP MiSeq 250 0.065 0.9141607067
9
Petrosino MO-BIO No EMP MiSeq 250 0.09 0.8841520000
0
Ravel Yes 318F/806R MiSeq 300 0.1 28514115
Schloss MO-BIO YesSchloss 2013 MiSeq 250 0.07 0.811
14570000
Shanks GeneRite Yes EMP
Sinha No EMP MiSeq 150 0.25 0.9391549767
3
Turnbaugh MO-BIO Yes EMPHiSeq 2500 150 0 0.573
59780000
6
Raw sequencing dataset sizes (per sample)
Rav
el
Flo
res
Mill
s
Go
od
man
Sin
ha
Kn
igh
t
Sh
anks
Sch
loss
Glo
or
Pet
rosi
no
Lit
tman
Bu
shm
an
Turn
bau
gh
Gev
ers
Bu
rk
HiSeqTypical target =
~50k/sample
7
EmmaSchwager
No ultimate correlation between
pre-filtering sequence quality and result quality.
HiSeq
8
Sample handling and bioinformatic protocol choices can produce comparable effects
Bus
hman
Goo
dma
nG
ever
sM
ills
Glo
or
Pet
rosi
noR
avel
Yea
ger/
Yu
Sha
nks
Sch
loss
Bur
kF
lore
sK
nig
htLi
ttm
anT
urnb
augh
Only lab to useV3-5 primers
Only lab to use non-MiSeq (HiSeq 2500)
Only lab to use Promega extraction (?)
Lowest ave. seq. per sample (~18,700) Boyu
Ren
log
10
9
Bioinformatics protocol variables
LabQuality
trimmingQC
software
Post-stitch
filtering
Stitchingsoftware
Stitchingoverlap
OTUsoftware
OTUID
OTUclustering
Taxonomicassignment
OTUfiltering
Gloor None None Yes pandaseq 1 UPARSE 7 0.97 Yes Classification No
Burk 20 UPARSE 7 Yes FLASH 20 UPARSE 7 0.97 Yes Clustering No
Chia custom None No pandaseq 20 QIIME 1.8 0.97 No Clustering No
Sinha 3 Trimmomatic Yes QIIME 1.8 6 QIIME 1.8 0.6 Yes Classification Yes
Huttenhower None UPARSE 7 Yes UPARSE 7 16 QIIME 1.8 0.97 Yes Clustering No
Petrosino 20 UPARSE 7 No UPARSE 7 50 UPARSE 7 0.96 Yes Mapping No
Knight (deblur) 20 QIIME 1.9 Yes QIIME 1.9 1 No Mapping No
Knight 4 QIIME 1.9 No QIIME 1.9 6 QIIME 1.9 0.97 Yes Clustering No
Ravel 15 Trimmomatic No FLASH 20 UPARSE 7 0.6 Yes Classification Yes
Caporaso 19 QIIME 1.9 No QIIME 1.9 6 QIIME 1.8 0.97 Yes Clustering Yes
10
Sample handling and bioinformatic protocol choices can produce comparable effects
Hut
ten
how
er
Kni
ght
(de
blur
)
Sin
ha
Cap
oras
o
Glo
or
Pet
rosi
no
Rav
el
Kni
ght
(qi
ime)
Bur
k
Chi
a
pandaseq + UPARSEReads <0.1% removed
pandaseq + QIIME
flash + UPARSE97% GG 13.8 Open
QIIME60% GG 13.5 Open
Snowflakes, but major determinants are
openness + filtering BoyuRen
11
Oral mock
Stool mock
Samples 2 and 8(ICU)
Sample 8 (ICU)Sample 9 (ICU)
Sample handling lab
BoyuRen
dlittman
pturnbaugh
12
Oral mock
Stool mock
Samples 2 and 8(ICU)
Sample 8 (ICU)Sample 9 (ICU)
Bioinformatics lab
BoyuRen
13
Usual suspects drive variation in joint OTU tableH
utt
enh
ow
er
Sin
ha
Cap
ora
so
Pet
rosi
no
Rav
el
Ch
ia
Kn
igh
t
Bu
shm
anG
oo
dm
anG
ever
sL
ittm
anM
ills
Alle
n-V
erco
eP
etro
sin
oR
avel
Sin
ha
Sch
loss
Turn
bau
gh
Bu
rkF
lore
sK
nig
ht
Clostridiales Bacteroidales Lactobacillales Enterobacteriales Fusobacteriales
14
Partitioning variance due to sample type, handling, and bioinformatics
feature abundance = sample variables + handling + bioinformatics
Phylum(Others to come)
SourceStorage
Pre-extraction…
ExtractionAmplificationSequencing
…
StitchingFiltering
ClusteringAssignment
…
EmmaSchwager
15
HiSeq
V3-4 primers slightly enrich
Firmicutes
NCI extraction slightly enriches
Bacteroidetes
Disease samples slightly enrich Proteobacteria
Freeze-drying slightly enriches
Bacteroidetes
SamplesHandling
Bioinfo.Emma
Schwager
16
Negative controls: water blanks
Pet
rosi
no
Litt
man
Glo
or
Flo
res
Bur
k
Kni
ght
Sha
nks
Rav
el
Sch
loss
Tur
nbau
gh
Goo
dma
n
Pet
rosi
no
Litt
man
Glo
or
Flo
res
Bur
k
Kni
ght
Sha
nks
Rav
el
Sch
loss
Tur
nbau
gh
Goo
dma
n
Bioinformatics
Handling
AmnonAmir
Length filtering???
17
Positive controls: mock communities
Bifidobacterium angulatum F16 #22 Collinsella aerofasciens 4_8_47FAA
Propionibacterium acnes 5_U_42AFAA Alistipes shahii ETR2 #14
Bacteroides caccae MR1 #13 Parabateroides merdae UC1 BHI R
Anaerostipes hadrus 5_1_63FAA Clostridium bolteae CC43_001B
Coprobacillus cateniformis 29/1 Enterococcus gallinarum 30_1
Lactobacillus iners 7_1_47FAA Paenibacillus barengoltzii CC33_002B
Pediococcus acidilactici 7_4A Subdoligranulum variabile 6_1_47FAA
Fusobacterium gonidiaformans 3_1_5R Fusobacterium varium 12_1B
Bilophila wadsworthia AC2_8_11 AN D5 FAA1 Escherichia coli 1_1_43
Ralstonia pickettii 5_7_47FAA Pyramidobacter sp. 22-5-S 12 D6 FAA
Bifidobacterium longum 12_1_47BFAA Eggerthella lenta MR1 #12
Rothia mucilaginosa CC87LB Slackia exigua CD1 D6 FAA 13
Capnocytophaga sputigena CC21_001D Prevotella oralis CC98A
Barnesiella sp. 6_1_58FAA CT1 Bacillus licheniformis BT1BCT2
Dialister pneumosintes CD1 D5 FAA 6 Gemella morbillorum CC57F
Granulicatella adiaciens CC94D Mogibacterium timidum CD1 D5 FAA 3
Parvimonas micra CD1 D6 FAA 3 Streptococcus gordonii 2_1_36FAA
Veillonella parvula sp. 3_1_44 Weissella cibaria F16 #1
Fusobacterium periodonticum 1_A_54 (D10) Leptotrichia goodfellowii 4_A_31 (D28)
Campylobacter concisus 10_1_50 Eikenella corrodens CC92I
Klebsiella pneumoniae 1_1_55 Neisseria sicca GT4ACT1
Gut-like mock
Oral-like mock
Roughly even, some
bugs in 0.25x or
0.5x units
18
Positive controls: mock communities
Bu
shm
an
Pet
rosi
no
Go
od
man
Glo
or
Flo
res
Jon
es
Mill
s
Gev
ers
Kn
igh
t
Sh
anks
Rav
el
Sch
loss
RE
FE
RE
NC
E
Bu
shm
an
Pet
rosi
no
Go
od
man
Glo
or
Flo
res
Jon
es
Mill
s
Gev
ers
Kn
igh
t
Sh
anks
Rav
el
Sch
loss
RE
FE
RE
NC
E
Bu
shm
an
Pet
rosi
no
Go
od
man
Glo
or
Flo
res
Jon
es
Mill
s
Gev
ers
Kn
igh
t
Sh
anks
Rav
el
Sch
loss
RE
FE
RE
NC
E
Bioinformatics
Handling
Bu
shm
an
Pet
rosi
no
Go
od
man
Glo
or
Flo
res
Jon
es
Mill
s
Gev
ers
Kn
igh
t
Sh
anks
Rav
el
Sch
loss
RE
FE
RE
NC
E
Handling
AmnonAmir
PrimersOmega ext.?Low seq. #s?
Minimalfiltering
19
Comments, conclusions, and next steps
• There exist multiple sample handling protocols that provide reasonable data
• There exist multiple bioinformatics protocols that can misanalyze reasonable data– Some can even denoise it when it’s suboptimal
• Experimenter beware – everything matters!– Comparable effect sizes of:– Phenotype, sample handling, and bioinformatics
• Working to extract combinations of protocol choices that provide an overall happy medium
GalebAbu-Ali
20
Recommendations and discussion points for MBQC-I
• Need to systematically evaluate the resulting controlled set of sample handling variables– Add sample collection methods and environments– Common extraction, amplification, and seq. protocols
• Ditto bioinformatics – in ways even more complex– QC: trimming, filtering, and chimera checking– Stitching– OTU clustering, classification, or mapping– Taxonomic assignment
• To be continued…
21
Thanks!
AlexKostic
AyshwaryaSubramanian
JosephMoon
GeorgeWeingart
TimTickle
XochiMorgan
DanielaBoernigen
EmmaSchwager
JimKaminski
AfrahShafquat
EricFranzosa
BoyuRen
ReginaJoice
KojiYasuda
TiffanyHsu
KevinOh
RandallSchwager
ChengweiLuo
KeithBayer
MoranYassour
AlexandraSirota
GalebAbu-Ali
AliRahnavard
SoumyaBanerjee
http://huttenhower.sph.harvard.edu
Christian Abnet Rashmi SinhaEmily VogtmannJosh Sampson
Jianxin Shi
Rob KnightAmnon Amir
Owen WhiteVictor Felix
Emma Allen-VercoeRobby BurkRick BushmanGreg CaporasoNick ChiaRoberto FloresDirk GeversGreg Gloor
Microbiome Quality Control ProjectAndy Goodman
Dan LittmanDavid Mills
Joe PetrosinoJacques Ravel
Pat SchlossOrin Shanks
Peter Turnbaugh