Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats

Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats

Nicola J. HoldenLeighton Pritchard

EHEC O104:H4 outbreak, Europe 2011

Unprecedented:scale of outbreak

(3950 affected, 53 deaths; multipleimport restrictions)

emerging pathogen(one previous case in S.Korea)

rapid production of sequence datacrowd-sourcing of assembly, and annotation via

GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki

EHEC O104:H4 outbreak, Europe 2011

Unprecedented:scale of outbreak

(3950 affected, 53 deaths; multipleimport restrictions)

emerging pathogen(one previous case in S.Korea)

rapid production of sequence datacrowd-sourcing of assembly and annotation via

collaborative revision control site: GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki

EHEC O104:H4 outbreak – timeline

1st May: onset of outbreak

26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)

30th May: diagnostic laboratory information released (Muenster)

2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska

et al., 2011 LID)

26th July: official end of the outbreak (RKI)

refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster


1st May: onset of outbreak

26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)

30th May: diagnostic laboratory information released (Muenster)

2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska

et al., 2011 LID)

26th July: official end of the outbreak (RKI)

refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster

EHEC O104:H4 outbreak – timeline 27th July: Publication of open-source genomic analysis

A changing paradigm?

Kwan et al. (2011) http://precedings.nature.com/documents/6663/version/1

Meanwhile: diagnostics27th June – 6th July

1. Outbreak isolate-specific, sub-serotype diagnostics

2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences

3. Rapidly generated (perhaps ahead of the biology?)

4. Validated (good estimates of error rates)

5. Easy to use and distribute

6. Cheap(er than sequencing everything)

Meanwhile: diagnostics27th June – 6th July

1. Outbreak isolate-specific, sub-serotype diagnostics

2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences

3. Rapidly generated (perhaps ahead of the biology?)

4. Validated (good estimates of error rates)

5. Easy to use and distribute

6. Cheap(er than sequencing everything)

Alignment-free PCR primer design: no need to identify conserved signature sequences prior to primer design

Alignment-free primer design: strategy

‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing)

‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank)

Design many (>1000) primers to positive genome set:target CDS; optimise for qRT; 20 mers; 100 bp amplicons; TA = 58 oC

Filter primers in silico:

Exclude sets with predicted productive amplification in negative genomes.

Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBank Enterobacteriaceae)

Automation

https://github.com/widdowquinn/find_differential_primers

Alignment-free primer design

Positive

Negative ...

...

...

...III

II

IV

V

I

1. Process configuration files: Locations and classes of input

sequence files.

2. Convert to single (pseudo)chromosomes:

Concatenate draft genome sequence.

3. Genome feature locations:From GBK file or predicted from

Prodigal.

Primer prediction (on positive set)

Positive

Negative

III

II

IV

V

I4. Predict primer locations:> 1000 thermodynamically

plausible primer sets on each (pseudo)chromosome, using

Primer3.

Test cross-amplification in silico

Positive

Negative

III

II

IV

V

I

5. Check cross-amplification:All primer sets tested against

other organisms, using PrimerSearch.

6. BLAST screen:All primers screened for off-

target sequences with BLAST: 7 possible primer sets

Classify primers and validation

III

II

IV

V

I

...

...

...

...

...

III IV V +ve -ve7. Classify primers:

Classified primer sets according to their ability

to amplify specific classes of input

sequence.

8. Validate primers:Primer set validated on positive and negative

targets in vitro.

5 target sequences:prophage gp20 (2)hypothetical CDS (2)impB (1)

ValidationIn silico, diagnostic primers are just another classifier

Validation on unseen data is critical

(avoid overfitting, estimation of performance)

Direct experimental validation of primer candidates (Münster):

‘Positive’ set = 21 clinical outbreak isolates

‘Negative’ set = 32 HUSEC / EPEC isolates

Positive control = LB 226692

Primer design: validated in vitro

positive negative

Alignment-free primer design: summary

Individual primer sets: 100 % sensitivity; 82–94 % specificity; 9% < FDR < 22%

Combining primers: 100 % sensitivity and specificity

A minimal combination of two primer sets discriminated absolutely between outbreak O104:H4 isolates and non-outbreak E. coli isolates, including HUSEC 041

Flexibility in strategy allows for targeted design, e.g. multiplex PCR / different organisms / large gene families etc..

Same approach used for

Resolving Dickeya plant pathogens

Discriminating between RxLR effectors in Phytophthora infestans

Alignment-free primer design: summary

Bypass the need for:

multiple genomic alignments

biological justification for primer choice (maybe even reveal biology…)

Produce diagnostic primers for any subgroup of organisms (possibly…)

Limitations

Scaling issue: PrimerSearch is slow (modular pipeline allows use of alternative programs)

Low specificity of primers -> use qPCR

Very similar organisms may not be distinguished

Time from genomes to primer sets: 90 hours

possibility for improvements as collaborative bioinformatics projects (speed up off-target primer mapping, make into user-friendly tool…)

Acknowledgements

[email protected]@hutton.ac.uk

Thanks to Nadine Brandt,Kath Wright and Sean Chapman

Sprouted seeds as a source of infections

‘Sproutbreak’ - Jimmy Johns restaurant

Colonisation of spinach by VTEC O157:H7 Sakai (vt-)

Referencec : www.slideshare.com

Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats

Documents

Transcript of Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats