Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext...

25
Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The Basics Henk C. den Bakker Center for Food Safety University of Georgia

Transcript of Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext...

Page 1: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Whole Genome Sequencing: How Is It Being Used? Is it

Revolutionizing Food Safety?The Basics

Henk C. den Bakker

Center for Food Safety

University of Georgia

Page 2: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Overview

• What is Whole Genome Sequencing?• Sequencing platforms

• What kind of data do we get?

• How can we use the data?

• Bacterial Genomics• Genomic characteristics of some common foodborne pathogens

• How do we use sequence data to infer outbreak clusters

Page 3: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Whole genome sequencing

• All DNA from a single organism is sequenced (chromosome, plasmids)

• All platforms sequence a genome in fragments (reads)• Reads have a certain error rate

• In order to properly infer the true nucleotide in a genome we generally need information of several reads, i.e. we need to sequence a genome more than once ( X fold coverage), typically > 30X for Illumina data.

• Much of current bioinformatics involves how to reconstruct genomes or obtain information of these reads

Page 4: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Short read vs long read platforms

• All platforms sequence a genome in fragments (reads)

• Based on the size of these reads we can subdivide platforms into:• Short read platforms

• Illumina

• Ion Torrent

• Long read platforms• Pacific Biosciences

• Oxford Nanopore

Page 5: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Illumina

• Current maximum read-size:• 2 x 300 bp (MiSeq)

• Output: 15 Gb/25 million reads

• Low per read error rate

• MiSeq is the ‘Work horse’ of Genome Trakr/Pulsenet

• Low footprint

• Can be used for:• Creating high quality draft genomes (draft = not closed)• Amplicon sequencing (e.g., 16S for microbiomes)• Variant detection (e.g., SNPs for genomic epidemiology)

• Relatively cheap, ~ $100 per bacterial genome

• ‘Easy’ library protocol (Nextera kits):• PCR Machine• Qubit for DNA quantification

Page 6: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Pacific Biosciences

• SMRT sequencing:• Single Molecule Real Time• https://youtu.be/WMZmG00uhwU

• Long read platform:• Output per SMRT cell 10 Gb• Read length > 50% in reads of > 30,000 bp• Small percentage of reads > 90,000 bp• Read with high per read error rate, but low consensus error

• More expensive than Illumina, starting at $500 per genome

• Main uses:• Generating closed reference genomes• Sequencing the methylome

• Large laboratory footprint, necessity of specialized equipment for library preps

Page 7: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Oxford Nanopore

• The ‘new’ kid on the block• Sequencers with an extremely small footprint

• minION• smidgION

• Long read technology: read lengths of > 100,000 bp not unusual, > 800,000 bp possible

• Data can be analyzed while sequencing run is in progress• High per read error rate• Technology is still in a state of flux • Applications largely overlap with PacBio, but with much lower cost and

upfront investment

Page 8: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

What can we do with the sequence data?

• Reconstruct the genome of the sequenced organism; genome assembly

• Map reads to the reconstructed genome of a related organism and infer genomic differences (e.g., SNPs); reference based methods

• There are also methods that use neither assembly or reference mapping to infer information (e.g., antibiotic resistance) from sequence data

Page 9: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Listeria monocytogenes

• Genome size between 2.7 Mb (million base-pairs) and 3.2 Mb

• Usually 0 or 1 plasmid in a strain

• Four lineages that are very different genetically

• Within lineages subdivided into clonal complexes

• Between clonal complexes 1000s of single nucleotides differences

• Within clonal complexes very little genomic variation (e.g., within clonal complex 1 < 200 SNPs)

• Low number of Insertion Sequences (Iss), highly conserved genome (no major rearrangements, even between Lm and closely related species)

Page 10: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Salmonella enterica subsp. enterica

• Genome size between 4 and 5.5 Mb

• Large diversity of plasmids:

• Virulence plasmids

• AR plasmids

• Plasmids can be large (> 300 kb)

• Large divergence between most serovars (> 10,000 SNPs)

• Limited genomic rearrangements, limited ISs

Page 11: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Escherichia coli

• Genome size 4.2 to 5.5 Mb

• Large plasmid diversity:• Virulence plasmids

• AMR plasmids

• Plasmids range in size from very small (ca 2 kb) to > 250 kb

• Frequent genome rearrangements, many ISs

https://www.niaid.nih.gov/diseases-conditions/e-coli

Page 12: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

How is WGS used? An example from a pilot study of outbreak cluster detectionSteps for Outbreak detection:

• Sequence isolates of suspected outbreak subtype

• Map sequence reads against publicly available reference• If reference is not publicly available, create reference by de novo assembly

• If reference is too divergent from outbreak, create new reference from more closely related strain as identified a posteriori

• Detect Single nucleotide polymorphisms (SNPs)

• Create phylogenetic tree based on SNPs

Page 13: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

SNP detection

• Every homologous region that the reference and the query strain have in common is interrogated for SNPs

• Regions that have insufficient read coverage or noisy reads are excluded from the analysis

• For relatively high quality data this usually translates to >99% of the genome being interrogated!

• This means 4 to 5 million positions in pathogens like Salmonella, 2.5 to 3 million positions in Listeria

Page 14: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Case: Outbreak detection in Salmonella Enteritidis PFGE/MLVA type JEGX01.0004W

• The Wadsworth Center Bacteriology laboratories types 1300-1800 isolates of Salmonella enterica each year

• Serovar Enteritidis is the most common (400-500 isolates each year)

• 25% of all isolates have the same combined PFGE/MLVA pattern

• Outbreaks with common patterns go undetected unless strong epidemiological links are found

• Strong epidemiological link found for Connecticut nursing home in which seven patients were found to have consumed cannolis from a NY deli

• Genome sequencing at the Ion Torrent PGM of the Wadsworth Center

• Collaboration of NYSDOH, FDA and Cornell University

Page 15: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Isolate selection

• Retrospective component: 29 isolates• 7 Isolates linked to the ‘Long term care facility outbreak ’

• 21 Isolates with a similar PFGE/MLVA type from NY or adjacent states, collected in 2010/2011

• 1 isolate with a similar PFGE type, but different Multi Locus VNTR Analysis (MLVA) type

• Prospective component: 64 isolates

Page 16: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

De Novo Genome characterizationof Outbreak strain• Retrospective study

• Mapping of reads of outbreak strains against publicly available genome showed a difference of ca 580 SNPs compared to the publicly available genome of S. Enteritidis P125109

• Certain regions (prophage related) present in P125109 were missing in the outbreak strain.

• To create a more closely related reference, the genome of an outbreak strain was assembled de novo using MIRA

Page 17: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

De Novo Genome characterizationof Outbreak strain

ELPhiS-like Prophage

Virulence plasmid

Total genome size ca 4,727,619 bp

Page 18: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Detect Single Nucleotide Variants (SNPs)

• Read Mapping based approach:• Filtering necessary to avoid false positives resulting from

low coverage, sequence errors etc.

• SNPs were only used if:• They were found in a region with 8X coverage

• They were supported by 90% of the reads

• Cortex variation assembler:• De Novo variant detection

Page 19: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Variation of coverage within a genome

Page 20: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Phylogenetic reconstruction based on SNPs

• Steps:• Matrix construction for all SNPs that passed the filter

• Maximum likelihood based phylogenetic inference in PhyML

• Jukes-Cantor model of molecular evolution

• 100 ML bootstrap replicates performed to test significance of phylogenetic clustering

Page 21: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The
Page 22: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Temporal/Geographical distribution of ‘LTCF-outbreak’ clade

0

1

2

3

4

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950

Initial outbreak

Nassau county

14 Cases in the Westchester, Putnam NY, Greenwich CT region1 Case in Washington county NY

case

s

Time starting at the first case

Page 23: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

Prospective and Retrospective study combined:• Two outbreaks• Multiple small clusters, some of them

with human clinical cases of three subsequent years

H. C. den Bakker et al. 2014. Emerging Infect Dis 20, 1306–1314 (4).

Page 24: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

WGS in public health now

• WGS is already used on a routine basis for surveillance of L. monocytogenes since 2013, Pulsenet will be switching to WGS for many other foodborne pathogens too

• Routinely used by FDA, USDA

• Centralized and publicly accessible database of WGS data through NCBI (https://www.ncbi.nlm.nih.gov/pathogens/)

• Data deposited in NCBI analyzed for putative outbreak clusters

Page 25: Whole Genome Sequencing: How Is It Being Used? Is it Revolutionizing Food Safety? The ... › ext › resources › FSS... · 2019-05-15 · Is it Revolutionizing Food Safety? The

den_bakker

github.com/hcdenbakker

[email protected]