1
2
“All of your answers are approximate, you might as well live with it…”
Andrew Rau-Chaplin, 1½ hours ago
Integrated Rapid Infectious Disease Analysiswww.irida.ca
Rob BeikoFaculty of Computer ScienceDalhousie UniversityJune 12
Microbial genomics for rapid investigation of infectious disease
Image © Kenneth Todar
4
2009 and Influenza A
5
6
7
Influenza ARNA genome (14,000 nucleotides)Eight segments(Image: Tao and Zheng, Science 2012)
S. Typhi CT18DNA genome (~5,100,000 nucleotides)One chromosome + two plasmidsScience (2001)
VIRUS BACTERIUM
8
Outbreak investigation
Similarities: place, time, genetics
fda.gov
2014
2010-2013
Inns et al. (2015)
9
Outbreak investigation in Canada
NATIONAL MICROBIOLOGY LABORATORY
PROVINCIAL PUBLIC HEALTH LABORATORIES
CLINICAL ISOLATES
SENTINEL SURVEILLANCE(FoodNet Canada)
CLINICAL, FOOD, ENVIRONMENTAL
CANADIAN FOOD INSPECTION AGENCY
(Regulatory)
FOOD ISOLATES
LISTERIA - E. COLI O157:H7 - SALMONELLA - SHIGELLA
PFGE/MLVA
PUBLIC HEALTH ACTION
10
Pulsed Field Gel ElectrophoresisSerratia - NICU
Hospita
l cas
es
Handwash
es
Environmental
(doors, etc)
Control
(elsewhere in
hospita
l)
Jang et al., J Hosp Infect (2001)
11
15 gigabases per run$1000 - $1500 / run, 1 day
Tinier pieces (150 – 400 bases)
< 1 kilobase per run$2 / run, 1-3 hours (96 in parallel)
Tiny pieces (600 – 1000 bases)
2011: Illumina MiSeq1977: Sanger sequencing ( )
DNA Sequencing
10/10/2013 VanBUG 12
13
MiSeq projects at Dalhousie• Bedford Basin microbial monitoring• Pediatric Crohn’s disease samples• Global microbial air sampling• Mink genomes• Sequencing Lactobacillus genomes from the poop of
old mice• Wastewater diversity and function in the Arctic• Verifying ingredients in dog food ( )• Exercise and the Microbiome
14
Integrated Rapid Infectious Disease Analysiswww.irida.ca
1.56M, 3-year Genome Canada Large-Scale Applied Platform Grant
SFU / BCCDC / PHAC-NML / Dalhousie DNA sequencing and downstream applications
• data management / federation• analysis workflows• ontologies• APIs• 3rd-party applications
Implementation in provincial public health labs Training
15
Five Pillars of IRIDA
16
Ontologies and data standards NCBI, MiXS, vegetables
Metadata Data provenance Data quality Environmental information
17
Data sharing!
• BIG challenges – different jurisdictions, “ownership” of epi data. Privacy!• Health service providers – concerns
about privacy and data breach• Technology outstrips policy• What digital records could we get TODAY?
• Canada lagging in data sharing
18
Calling isolates based on genetic variation
Traditional: Pulsed-field Multi-locus (standards! mlst.net)
Whole genomes: Lots of information! Too much information! Lots of filtering and quality
control required
19
Workflow management
REST-like API (3rd – party applications)
Security: authentication / authorization
Data models & implementation
Local Storage
Remote APIs
IRIDA’s Federated Design
List Samples
20
21
Each pipeline is implemented as a Galaxy workflow
Internal analysis pipelines Assembly and annotation Phylogenetics “Line list” management
3rd-party applications
22
Sampled genomes Quality control Tree generation /visualization
Single-Nucleotide Variant Phylogenetic Pipeline
(SNVPhyl)
23
GenGIS
Data from Haiti cholera outbreak, 2010http://kiwi.cs.dal.ca/GenGIS
24
IslandViewer
http://www.pathogenomics.sfu.ca/islandviewer/browse
25
Interfaces / environment
Personas Researchers Epidemiologists Clinical microbiologists / lab technicians
Workflow design and execution
Full Privileges
Cluster Line List ID
Patient Name
Prov. Health
No.Age Sex Location Sample
IDCollection
DateCulture Result
A 1John Smith 4513253244 26 M Vancouver F14231 14/03/21 Salmonella
sp.
A 2Sally Smith 4519567458 24 F Vancouver F14235 14/03/21 Salmonella
sp.
B 3Tom Jones 4517543216 35 M Vancouver M6542 14/03/24 Salmonella
sp.
B 4Helen Jones 9856321124 35 F Vancouver S1245 14/03/22 Salmonella
sp.
C 5Jennifer Lee 4516853122 29 F Vancouver S5642 14/03/22 Salmonella
sp.
C 6Michael Brown 9456534561 45 M Victoria T68954 14/03/25 Salmonella
sp.
Phylogenetic Tree
Genetic Distance
Limited Privileges
Cluster Line List ID
Patient Name
Prov. Health
No.Age Sex Location Sample
IDCollection
DateCulture Result
A 1John Smith 4513253244 26 M Vancouver F14231 14/03/21 Salmonella
sp.
A 2Sally Smith 4519567458 24 F Vancouver F14235 14/03/21 Salmonella
sp.
B 3Tom Jones 4517543216 35 M Vancouver M6542 14/03/24 Salmonella
sp.
B 4Helen Jones 9856321124 35 F Vancouver S1245 14/03/22 Salmonella
sp.
C 5Jennifer Lee 4516853122 29 F Vancouver S5642 14/03/22 Salmonella
sp.
C 6Michael Brown 9456534561 45 M Victoria T68954 14/03/25 Salmonella
sp.
Phylogenetic Tree
Genetic Distance
28
Large-scale sequencing initiatives
en.wikipedia.org
29
FDA GenomeTrakr
http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
30
Public Health England project (>10,000 Salmonella so far)
• As of 2015, sequencing every sampled Salmonella isolate collected in England• Over 10,000 sequenced to date• 8000 already available for download in the public
databases
31Gary van Domselaar, NML
The Global Microbial Identifier
32
What’s next?
??? per run$900 / run, 6 hours
Huge pieces (max so far – 200-300 kilobases)Can stop / restart using same disposable flowcell
2015: Oxford Nanopore MinION
15 cm (-ish)
thehightechsociety.com
33Quick et al. (2015)
“Using a novel streaming phylogenetic placement method samples can be assigned to a serotype in 40 minutes and determined to be part of the outbreak in less than 2 h.”
34
Ebola monitoring
blogs.biomedcentral.comJoshua Quick, Nick Loman
35
Example workflow
6 hrs
Changeflowcell
Samples evaluated against reference in real time
Positive ID / placement
Load DNA
confi
denc
e
36
Challenges
• Sample extraction: getting DNA from stuff• Clinical-grade evaluation• Training• Equipment reliability• Sequencing errors• Quality of reference data / attribution algorithms
• Database updates in real time• Ethics / privacy (Genomes Sequenced While U Wait)
37
The Point
Comprehensive monitoringAccurate typingRapid identification
Real-time decision making
Acknowledgements PIs
Fiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NMLMorag Graham - NMLRob Beiko – Dalhousie
University of LisbonJoᾶo Carriҫo
National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsenTara LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsChrystal BerryLorelee Tschetter
Laboratory for Foodborne Zoonoses (LFZ)Eduardo TaboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall
Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo
BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella
University of MarylandLynn Schriml
Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert
Dalhousie UniversityAlex Keddy 38
McMaster UniversityAndrew McArthurDaim Sardar
European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid
European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina
39
Seminar from the Will Hsiao,BC Centres for Disease Control
40
Materials to be available onhttp://bioinformatics.ca/
June 24-26, 2015
41
The Bioinformatics Exam of the Future
tagc.com.aucommons.wikimedia.org/wiki/File:DNA_ahelatest_moodustunud_niit_katsuti_korgil..JPGhttp://omicfrontiers.com/2014/06/11/diaryofaminion_part2/
42
2009 was a long time ago
J. Craig Venter Institute
43Photo credit: Emma Allen-VercoeSome slides courtesy of Gary Van Domselaar, NML
FIN
Top Related