Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis...

22
Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 rvard School of Public Health partment of Biostatistics U. Oregon

Transcript of Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis...

Page 1: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

Initial results from theMicrobiome Quality Control Project

pilot phase (MBQC-pilot)

Curtis Huttenhower

09-30-14

Harvard School of Public HealthDepartment of Biostatistics U. Oregon

Page 2: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

2

MBQC-pilot?

• Some amazing numbers:19 labs providing data

>50 researchers

>40 workshop attendees

94 specimens studied

2,238samples sequenced

155Msequences generated

16,555 samples analyzed~3x HMP data,

~6x HMP seqs.!

Name Affiliation Handl.

Bioinf.

Allen-Vercoe U. Guelph X  Burk Albert Einstein X XBushman U. Pennsylvania X  

Caporaso NAU   X

Chia Mayo Clinic   X

Flores DCP, NCI X  Gevers Broad Institute X  

Gloor U. Western Ontario   X

Goodman Yale X  

Huttenhower HSPH   X

Knight UCSD X X

Littman New York University X  Mills UCD X  Petrosino Baylor X XRavel U. Maryland X XSchloss U. Michigan X  

Shanks UPA X  Turnbaugh UCSD X  Yeager/Yu DCEG, NCI X X

Page 3: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

3

Some ground rules

• The goal of the MBQC is not to identify which groups produce the best data– Nor which software or protocol is the “best”– The goal, particularly for the pilot, is to identify

which protocol choices influence variation in results

• These evaluations are not quality judgments– No one sample handling protocol is the best– No one bioinformatics protocol is the best

• In the wild west of the pilot phase, every protocol looks good by some metrics, bad by others

• Be open, be honest – if something looks odd, say so

Page 4: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

4

Samples and sample set design

Sample type Health status#

Fresh # DNAFresh Sick (ICU) 3 2Fresh Sick (Diabetes/RA) 2 0Fresh Healthy 2 3Fresh Healthy 2 2Fresh Sick (ICU) 3 2Fresh Sick (ICU) 2 2Fresh Sick (ICU) 2 0Fresh Sick (Diabetes/RA) 3 2Fresh Sick (ICU) 2 2Fresh Healthy 2 2Fresh Healthy 2 0

Freeze-dried 6 month post-surgery CRC case 3 2Freeze-dried 6 month post-surgery CRC case 2 3Freeze-dried 4 month post-surgery CRC case 2 0Freeze-dried Pre-surgery control 3 2Freeze-dried 6 month post-surgery CRC case 2 2Freeze-dried Pre-surgery control 2 3Freeze-dried 3 month post-surgery CRC case 2 2

Robogut Healthy 3 2Robogut Healthy 3 2

Oral mock 3 3Gut mock 3 3

Blank 0 2Totals 96 53 43

EmmaSchwager

JoshSampson

• ~40% triplicates,~60% duplicates

• ~50% pre-extracted

• Range of phenotypes

• ~50% healthy

• Triplicates of mocks

Page 5: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

5

Sample handling protocol variables

Lab Extraction Homog? Primers Sequencer

ReadLengt

hPhiX Quality

% Reads

Allen-Vercoe Promega Yes EMP MiSeq 210 0.005 0.9372234000

0

BurkQiagen, MO-BIO Yes EMP MiSeq 300 0.3 0.943

20000000

Bushman MO-BIO Yes EMP MiSeq 250 0.15   1140000

Flores Qiagen Yes EMP MiSeq 250 0.05 0.891520000

0

Gevers Chemagen Yes EMP MiSeq 1750.104

5 0.6882399000

0

Goodman Omega BioTek YesSchloss 2013 MiSeq 250 0.15 0.8875

53394580

Knight MO-BIO Yes EMP MiSeq 150 0.1 0.8462168000

0

Littman MO-BIO No EMP MiSeq 150 0.1 0.961 9012607

Mills Zymo Yes EMP MiSeq 250 0.065 0.9141607067

9

Petrosino MO-BIO No EMP MiSeq 250 0.09 0.8841520000

0

Ravel   Yes 318F/806R MiSeq 300 0.1   28514115

Schloss MO-BIO YesSchloss 2013 MiSeq 250 0.07 0.811

14570000

Shanks GeneRite Yes EMP          

Sinha   No EMP MiSeq 150 0.25 0.9391549767

3

Turnbaugh MO-BIO Yes EMPHiSeq 2500 150 0 0.573

59780000

Page 6: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

6

Raw sequencing dataset sizes (per sample)

Rav

el

Flo

res

Mill

s

Go

od

man

Sin

ha

Kn

igh

t

Sh

anks

Sch

loss

Glo

or

Pet

rosi

no

Lit

tman

Bu

shm

an

Turn

bau

gh

Gev

ers

Bu

rk

HiSeqTypical target =

~50k/sample

Page 7: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

7

EmmaSchwager

No ultimate correlation between

pre-filtering sequence quality and result quality.

HiSeq

Page 8: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

8

Sample handling and bioinformatic protocol choices can produce comparable effects

Bus

hman

Goo

dma

nG

ever

sM

ills

Glo

or

Pet

rosi

noR

avel

Yea

ger/

Yu

Sha

nks

Sch

loss

Bur

kF

lore

sK

nig

htLi

ttm

anT

urnb

augh

Only lab to useV3-5 primers

Only lab to use non-MiSeq (HiSeq 2500)

Only lab to use Promega extraction (?)

Lowest ave. seq. per sample (~18,700) Boyu

Ren

log

10

Page 9: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

9

Bioinformatics protocol variables

LabQuality

trimmingQC

software

Post-stitch

filtering

Stitchingsoftware

Stitchingoverlap

OTUsoftware

OTUID

OTUclustering

Taxonomicassignment

OTUfiltering

Gloor None None Yes pandaseq 1 UPARSE 7 0.97 Yes Classification No

Burk 20 UPARSE 7 Yes FLASH 20 UPARSE 7 0.97 Yes Clustering No

Chia custom None No pandaseq 20 QIIME 1.8 0.97 No Clustering No

Sinha 3 Trimmomatic Yes QIIME 1.8 6 QIIME 1.8 0.6 Yes Classification Yes

Huttenhower None UPARSE 7 Yes UPARSE 7 16 QIIME 1.8 0.97 Yes Clustering No

Petrosino 20 UPARSE 7 No UPARSE 7 50 UPARSE 7 0.96 Yes Mapping No

Knight (deblur) 20 QIIME 1.9 Yes QIIME 1.9     1 No Mapping No

Knight 4 QIIME 1.9 No QIIME 1.9 6 QIIME 1.9 0.97 Yes Clustering No

Ravel 15 Trimmomatic No FLASH 20 UPARSE 7 0.6 Yes Classification Yes

Caporaso 19 QIIME 1.9 No QIIME 1.9 6 QIIME 1.8 0.97 Yes Clustering Yes

Page 10: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

10

Sample handling and bioinformatic protocol choices can produce comparable effects

Hut

ten

how

er

Kni

ght

(de

blur

)

Sin

ha

Cap

oras

o

Glo

or

Pet

rosi

no

Rav

el

Kni

ght

(qi

ime)

Bur

k

Chi

a

pandaseq + UPARSEReads <0.1% removed

pandaseq + QIIME

flash + UPARSE97% GG 13.8 Open

QIIME60% GG 13.5 Open

Snowflakes, but major determinants are

openness + filtering BoyuRen

Page 11: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

11

Oral mock

Stool mock

Samples 2 and 8(ICU)

Sample 8 (ICU)Sample 9 (ICU)

Sample handling lab

BoyuRen

dlittman

pturnbaugh

Page 12: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

12

Oral mock

Stool mock

Samples 2 and 8(ICU)

Sample 8 (ICU)Sample 9 (ICU)

Bioinformatics lab

BoyuRen

Page 13: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

13

Usual suspects drive variation in joint OTU tableH

utt

enh

ow

er

Sin

ha

Cap

ora

so

Pet

rosi

no

Rav

el

Ch

ia

Kn

igh

t

Bu

shm

anG

oo

dm

anG

ever

sL

ittm

anM

ills

Alle

n-V

erco

eP

etro

sin

oR

avel

Sin

ha

Sch

loss

Turn

bau

gh

Bu

rkF

lore

sK

nig

ht

Clostridiales Bacteroidales Lactobacillales Enterobacteriales Fusobacteriales

Page 14: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

14

Partitioning variance due to sample type, handling, and bioinformatics

feature abundance = sample variables + handling + bioinformatics

Phylum(Others to come)

SourceStorage

Pre-extraction…

ExtractionAmplificationSequencing

StitchingFiltering

ClusteringAssignment

EmmaSchwager

Page 15: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

15

HiSeq

V3-4 primers slightly enrich

Firmicutes

NCI extraction slightly enriches

Bacteroidetes

Disease samples slightly enrich Proteobacteria

Freeze-drying slightly enriches

Bacteroidetes

SamplesHandling

Bioinfo.Emma

Schwager

Page 16: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

16

Negative controls: water blanks

Pet

rosi

no

Litt

man

Glo

or

Flo

res

Bur

k

Kni

ght

Sha

nks

Rav

el

Sch

loss

Tur

nbau

gh

Goo

dma

n

Pet

rosi

no

Litt

man

Glo

or

Flo

res

Bur

k

Kni

ght

Sha

nks

Rav

el

Sch

loss

Tur

nbau

gh

Goo

dma

n

Bioinformatics

Handling

AmnonAmir

Length filtering???

Page 17: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

17

Positive controls: mock communities

Bifidobacterium angulatum F16 #22 Collinsella aerofasciens 4_8_47FAA

Propionibacterium acnes 5_U_42AFAA Alistipes shahii ETR2 #14

Bacteroides caccae MR1 #13 Parabateroides merdae UC1 BHI R

Anaerostipes hadrus 5_1_63FAA Clostridium bolteae CC43_001B

Coprobacillus cateniformis 29/1 Enterococcus gallinarum 30_1

Lactobacillus iners 7_1_47FAA Paenibacillus barengoltzii CC33_002B

Pediococcus acidilactici 7_4A Subdoligranulum variabile 6_1_47FAA

Fusobacterium gonidiaformans 3_1_5R Fusobacterium varium 12_1B

Bilophila wadsworthia AC2_8_11 AN D5 FAA1 Escherichia coli 1_1_43

Ralstonia pickettii 5_7_47FAA Pyramidobacter sp. 22-5-S 12 D6 FAA

Bifidobacterium longum 12_1_47BFAA Eggerthella lenta MR1 #12

Rothia mucilaginosa CC87LB Slackia exigua CD1 D6 FAA 13

Capnocytophaga sputigena CC21_001D Prevotella oralis CC98A

Barnesiella sp. 6_1_58FAA CT1 Bacillus licheniformis BT1BCT2

Dialister pneumosintes CD1 D5 FAA 6 Gemella morbillorum CC57F

Granulicatella adiaciens CC94D Mogibacterium timidum CD1 D5 FAA 3

Parvimonas micra CD1 D6 FAA 3 Streptococcus gordonii 2_1_36FAA

Veillonella parvula sp. 3_1_44 Weissella cibaria F16 #1

Fusobacterium periodonticum 1_A_54 (D10) Leptotrichia goodfellowii 4_A_31 (D28)

Campylobacter concisus 10_1_50 Eikenella corrodens CC92I

Klebsiella pneumoniae 1_1_55 Neisseria sicca GT4ACT1

Gut-like mock

Oral-like mock

Roughly even, some

bugs in 0.25x or

0.5x units

Page 18: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

18

Positive controls: mock communities

Bu

shm

an

Pet

rosi

no

Go

od

man

Glo

or

Flo

res

Jon

es

Mill

s

Gev

ers

Kn

igh

t

Sh

anks

Rav

el

Sch

loss

RE

FE

RE

NC

E

Bu

shm

an

Pet

rosi

no

Go

od

man

Glo

or

Flo

res

Jon

es

Mill

s

Gev

ers

Kn

igh

t

Sh

anks

Rav

el

Sch

loss

RE

FE

RE

NC

E

Bu

shm

an

Pet

rosi

no

Go

od

man

Glo

or

Flo

res

Jon

es

Mill

s

Gev

ers

Kn

igh

t

Sh

anks

Rav

el

Sch

loss

RE

FE

RE

NC

E

Bioinformatics

Handling

Bu

shm

an

Pet

rosi

no

Go

od

man

Glo

or

Flo

res

Jon

es

Mill

s

Gev

ers

Kn

igh

t

Sh

anks

Rav

el

Sch

loss

RE

FE

RE

NC

E

Handling

AmnonAmir

PrimersOmega ext.?Low seq. #s?

Minimalfiltering

Page 19: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

19

Comments, conclusions, and next steps

• There exist multiple sample handling protocols that provide reasonable data

• There exist multiple bioinformatics protocols that can misanalyze reasonable data– Some can even denoise it when it’s suboptimal

• Experimenter beware – everything matters!– Comparable effect sizes of:– Phenotype, sample handling, and bioinformatics

• Working to extract combinations of protocol choices that provide an overall happy medium

GalebAbu-Ali

Page 20: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

20

Recommendations and discussion points for MBQC-I

• Need to systematically evaluate the resulting controlled set of sample handling variables– Add sample collection methods and environments– Common extraction, amplification, and seq. protocols

• Ditto bioinformatics – in ways even more complex– QC: trimming, filtering, and chimera checking– Stitching– OTU clustering, classification, or mapping– Taxonomic assignment

• To be continued…

Page 21: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.

21

Thanks!

AlexKostic

AyshwaryaSubramanian

JosephMoon

GeorgeWeingart

TimTickle

XochiMorgan

DanielaBoernigen

EmmaSchwager

JimKaminski

AfrahShafquat

EricFranzosa

BoyuRen

ReginaJoice

KojiYasuda

TiffanyHsu

KevinOh

RandallSchwager

ChengweiLuo

KeithBayer

MoranYassour

AlexandraSirota

GalebAbu-Ali

AliRahnavard

SoumyaBanerjee

http://huttenhower.sph.harvard.edu

Christian Abnet Rashmi SinhaEmily VogtmannJosh Sampson

Jianxin Shi

Rob KnightAmnon Amir

Owen WhiteVictor Felix

Emma Allen-VercoeRobby BurkRick BushmanGreg CaporasoNick ChiaRoberto FloresDirk GeversGreg Gloor

Microbiome Quality Control ProjectAndy Goodman

Dan LittmanDavid Mills

Joe PetrosinoJacques Ravel

Pat SchlossOrin Shanks

Peter Turnbaugh

Page 22: Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department.