Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats
-
Upload
cecilia-lancaster -
Category
Documents
-
view
17 -
download
0
description
Transcript of Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats
Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats
Nicola J. HoldenLeighton Pritchard
EHEC O104:H4 outbreak, Europe 2011
Unprecedented:scale of outbreak
(3950 affected, 53 deaths; multipleimport restrictions)
emerging pathogen(one previous case in S.Korea)
rapid production of sequence datacrowd-sourcing of assembly, and annotation via
GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
EHEC O104:H4 outbreak, Europe 2011
Unprecedented:scale of outbreak
(3950 affected, 53 deaths; multipleimport restrictions)
emerging pathogen(one previous case in S.Korea)
rapid production of sequence datacrowd-sourcing of assembly and annotation via
collaborative revision control site: GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
EHEC O104:H4 outbreak – timeline
1st May: onset of outbreak
26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)
30th May: diagnostic laboratory information released (Muenster)
2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska
et al., 2011 LID)
26th July: official end of the outbreak (RKI)
refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster
EHEC O104:H4 outbreak – timeline
EHEC O104:H4 outbreak – timeline
1st May: onset of outbreak
26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)
30th May: diagnostic laboratory information released (Muenster)
2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska
et al., 2011 LID)
26th July: official end of the outbreak (RKI)
refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster
EHEC O104:H4 outbreak – timeline 27th July: Publication of open-source genomic analysis
A changing paradigm?
Kwan et al. (2011) http://precedings.nature.com/documents/6663/version/1
Meanwhile: diagnostics27th June – 6th July
1. Outbreak isolate-specific, sub-serotype diagnostics
2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences
3. Rapidly generated (perhaps ahead of the biology?)
4. Validated (good estimates of error rates)
5. Easy to use and distribute
6. Cheap(er than sequencing everything)
Meanwhile: diagnostics27th June – 6th July
1. Outbreak isolate-specific, sub-serotype diagnostics
2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences
3. Rapidly generated (perhaps ahead of the biology?)
4. Validated (good estimates of error rates)
5. Easy to use and distribute
6. Cheap(er than sequencing everything)
Alignment-free PCR primer design: no need to identify conserved signature sequences prior to primer design
Alignment-free primer design: strategy
‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing)
‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank)
Design many (>1000) primers to positive genome set:target CDS; optimise for qRT; 20 mers; 100 bp amplicons; TA = 58 oC
Filter primers in silico:
Exclude sets with predicted productive amplification in negative genomes.
Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBank Enterobacteriaceae)
Alignment-free primer design: strategy
‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing)
‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank)
Design many (>1000) primers to positive genome set:target CDS; optimise for qRT; 20 mers; 100 bp amplicons; TA = 58 oC
Filter primers in silico:
Exclude sets with predicted productive amplification in negative genomes.
Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBank Enterobacteriaceae)
Automation
https://github.com/widdowquinn/find_differential_primers
Alignment-free primer design
Positive
Negative ...
...
...
...III
II
IV
V
I
1. Process configuration files: Locations and classes of input
sequence files.
2. Convert to single (pseudo)chromosomes:
Concatenate draft genome sequence.
3. Genome feature locations:From GBK file or predicted from
Prodigal.
Primer prediction (on positive set)
Positive
Negative
III
II
IV
V
I4. Predict primer locations:> 1000 thermodynamically
plausible primer sets on each (pseudo)chromosome, using
Primer3.
Test cross-amplification in silico
Positive
Negative
III
II
IV
V
I
5. Check cross-amplification:All primer sets tested against
other organisms, using PrimerSearch.
6. BLAST screen:All primers screened for off-
target sequences with BLAST: 7 possible primer sets
Classify primers and validation
III
II
IV
V
I
...
...
...
...
...
III IV V +ve -ve7. Classify primers:
Classified primer sets according to their ability
to amplify specific classes of input
sequence.
8. Validate primers:Primer set validated on positive and negative
targets in vitro.
5 target sequences:prophage gp20 (2)hypothetical CDS (2)impB (1)
ValidationIn silico, diagnostic primers are just another classifier
Validation on unseen data is critical
(avoid overfitting, estimation of performance)
Direct experimental validation of primer candidates (Münster):
‘Positive’ set = 21 clinical outbreak isolates
‘Negative’ set = 32 HUSEC / EPEC isolates
Positive control = LB 226692
Primer design: validated in vitro
positive negative
Alignment-free primer design: summary
Individual primer sets: 100 % sensitivity; 82–94 % specificity; 9% < FDR < 22%
Combining primers: 100 % sensitivity and specificity
A minimal combination of two primer sets discriminated absolutely between outbreak O104:H4 isolates and non-outbreak E. coli isolates, including HUSEC 041
Flexibility in strategy allows for targeted design, e.g. multiplex PCR / different organisms / large gene families etc..
Same approach used for
Resolving Dickeya plant pathogens
Discriminating between RxLR effectors in Phytophthora infestans
Alignment-free primer design: summary
Bypass the need for:
multiple genomic alignments
biological justification for primer choice (maybe even reveal biology…)
Produce diagnostic primers for any subgroup of organisms (possibly…)
Limitations
Scaling issue: PrimerSearch is slow (modular pipeline allows use of alternative programs)
Low specificity of primers -> use qPCR
Very similar organisms may not be distinguished
Time from genomes to primer sets: 90 hours
possibility for improvements as collaborative bioinformatics projects (speed up off-target primer mapping, make into user-friendly tool…)
Acknowledgements
[email protected]@hutton.ac.uk
Thanks to Nadine Brandt,Kath Wright and Sean Chapman
Sprouted seeds as a source of infections
‘Sproutbreak’ - Jimmy Johns restaurant
Colonisation of spinach by VTEC O157:H7 Sakai (vt-)
Referencec : www.slideshare.com