IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public...
-
Upload
william-hsiao -
Category
Healthcare
-
view
516 -
download
0
Transcript of IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public...
![Page 1: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/1.jpg)
IRIDA: A federated bioinformatics platform enabling richer genomic epidemiology analysis in public
healthWilliam Hsiao, Ph.D.
[email protected]@wlhsiao
BC Centre for Disease Control Public Health Laboratory and University of British Columbia
March 21 2016, UT San Antonio
![Page 2: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/2.jpg)
![Page 3: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/3.jpg)
Roles of Public Health Agencies• Public Health (PH) agencies around the world track and intervene the
spread of diseases to improve health of the population• PH agencies also come up with policies and strategies to prevent
diseases from occurring• PH laboratories test patient and environmental samples and
determine the cause of diseases• At the BC Public Health Lab, we process on average, 3,000 samples a
day or about 1 million samples a year.
![Page 4: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/4.jpg)
Dual Arms of a Public Health AgencyWhat did you eat? Where did you eat
that? When?
What strain of Salmonella
Enteritidis is it?
Epidemiological Investigation
Laboratory Investigation
Identify common exposure
Identify the culprit pathogen
Confirmed by Epi
Confirmed by Lab
![Page 5: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/5.jpg)
Current State of Clinical Microbiology Laboratory
Didelot et al. 2012. doi:10.1038/nrg3226.
• Culture to isolate organisms using different media
• Different diagnostic tests and typing and subtyping methods
• Different drug sensitivity tests
![Page 6: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/6.jpg)
Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory
• Growth characteristics • Phenotypic panels • Agglutination reactions • Enzyme immuno assays (EIAs) • PCR • DNA arrays (hybridization) • Sanger sequencing of marker genes• DNA restriction • Electrophoresis (PFGE, capillary)
Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows (separate workflows for each pathogen) TAT: 5 min – weeks (months)
Source: Rebecca Lindsey
![Page 7: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/7.jpg)
• Many incompatible systems
• Paper and Fax communication common
• Rich case information conveyed verbally or in free text
• Require data re-entry and re-coding
National Ministry of Health
Provincial / State public health dept.
National laboratory
Local public health dept.
Provincial / State laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone Mailing of Samples/Fax/Eelctronic
Source: M. Taylor, BCCDC
Current State of Public Health Epidemiology
![Page 8: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/8.jpg)
The Era of Molecular Epidemiology
• Molecular test results are often more specific and sensitive than traditional phenotypical or biochemical tests
• These biomarkers can be correlated to epidemiological investigations (People, Place, Time)
• Provides linkage based on common exposure to the same pathogen at the molecular level
BUT….• Most tests detect one or a few of specific biomarkers, representing a
fraction of the pathogens’ genetic information• As pathogens evolve, targeted tests can lose their specificity
![Page 9: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/9.jpg)
Era of Whole Genome Sequencing (WGS) = lots of High Quality Data
• Capture the pathogen’s entire genetic makeup• Unbiased (~97-99+% of the genome captured using common sequencing approaches) • Significantly more data than traditional methods• Allow higher resolution and higher sensitivity analysis to be applied• Allow value-added
evolutionary & Functionalstudy of the pathogens
• Virulence factors• AMR genes
• These genomics data can be usefulfor downstream research use (e.g.comparative genomics)
![Page 10: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/10.jpg)
NGS Reduces Sequencing Cost allowing PHM Sequencing
$10K per human genome or $10 per bacterial genome
$100M per human genome
![Page 11: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/11.jpg)
Whole Genome Sequencing of Foodborne Pathogens
• UK Public Health England committed to sequence all the Salmonella isolates submitted to PH Lab
• US FDA and CDC (supported by National Center for Biotechnology Information) created a distributed network of labs to utilize WGS for pathogen identification
https://publichealthmatters.blog.gov.uk/2014/01/20/innovations-in-genomic-sequencing/http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
![Page 12: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/12.jpg)
PulseNet Canada• Part of PulseNet International
• a global laboratory surveillance network of enteric pathogens • based on Pulse Field Gel Electrophoresis (PFGE) fingerprint technology• Originally developed at CDC Atlanta for E. coli O157:H7 Outbreak investigation
in 1993
• PulseNet Canada formed in 2000 and shares fingerprint data with other PulseNet partners including direct database linkage with the CDC
• PulseNet is transitioning from PFGE to WGS within 3 years• Sequencing facilities are being setup in PH labs across Canada this
year
![Page 13: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/13.jpg)
Whole Genome Sequenced Based Workflow
Didelot et al. 2012. doi:10.1038/nrg3226.
![Page 14: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/14.jpg)
Each year, one in eight Canadians (or 4 million people)
get sick with a domestically acquired food-borne illness.http://www.phac-aspc.gc.ca/efwd-emoha/efbi-emoa-eng.php
![Page 15: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/15.jpg)
Each year, one in six American (or 48 million people)
get sick with a domestically acquired food-borne illness.http://www.cdc.gov/foodborneburden/
![Page 16: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/16.jpg)
Improve Public Health Microbiology using Genomic Epidemiology• Genomic Epidemiology Definition: Using whole genome sequencing
data from pathogens and epidemiological investigations to track spread of an infectious disease
• Lead to faster and simpler test menu and more actionable information (virulence factors, AMR, source tracking)
• However, there are a few hurdles to overcome….
![Page 17: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/17.jpg)
Many Players in surveillance and outbreak – ineffective information sharing
Provincial public health dept.
National laboratory
Local public health dept.
Provincial laboratory
Cases
Physicians Frontline lab
Information
Bioinformatics and Analytical Capacities
![Page 18: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/18.jpg)
Sequencing Improvement outpaces Computing Improvements
Cloud Computing
Cluster Computing
Algorithm improvement
![Page 19: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/19.jpg)
IRIDA Platform Overview• IRIDA= Integrated Rapid Infectious Disease Analysis
• A free, open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations
Core Functions:
• Management of strain and genomic sequence data
• Rapid processing and analysis of genomic data
• Informative display of genomic results
• Sample, Case, and aggregate data (“metadata”) Management
Target audience:
• Public health agencies who need a platform to manage and process genomic data
• Public health agencies who need a platform to use genomics for outbreak investigations
IRIDA
Sequencing Instruments
Web Application
Data management
Built-in Analytical
Tools
External Galaxy
Command-line Tools
![Page 20: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/20.jpg)
IRIDA is a Partnership
- Project Team has direct access to state of the art research in academia
- Project Team is directly embedded in user organization
![Page 21: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/21.jpg)
IRIDA Has A Simple User Interface
Line List View (under testing)
Timeline View (Conceptualization)
Selectable fields
Travel
Symptoms and Onset
Exposure Types
Hospitalization
Launch a pipeline
![Page 22: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/22.jpg)
IRIDA is a Robust, Extensible Platform
• IRIDA uses Galaxy tomanage workflows
• Adding additional pipelines is relativelyeasy
• Using a standardAPI to allow 3rd party tools to obtain data from IRIDA (e.g. IslandViewer and GenGIS)
IRIDA
Servlet Container
REST API Central File Storage
Web Interface
Application Logic
Compute ClusterGalaxy
$ ~ >_ Galaxy
![Page 23: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/23.jpg)
IRIDA is Built to Enable Collaboration
• Be able to compare pipelines• Pipeline implemented using Galaxy –
transparent and shareable • Define QC criteria using ontology to compare
the different pipelines of the same purpose
• Be able to share data to minimize data re-entry from one platform to another
• Federation of platforms using standard API to share data and analysis results
![Page 24: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/24.jpg)
Distributed in Multiple, Flexible Access Options
• IRIDA is available in several different flavours.• Download latest version at https://github.com/phac-nml/irida
Local Install Virtual Machine Cloud Instance Public Version
Advantages Full control of the system; your data never leaves your centre
Full control of the system; Easy to setup
Full control of the system; does not require local computing infrastructure
No setup required, upload your data and have it processed using Compute Canada Resource
Disadvantages Computing infrastructure and IT support needed to main the resource
Not really scalable if run on your own desktop; some performance loss
Data goes into a cloud environment; uploading to cloud environment can be slow
Data goes into a public instance (data remain private to your account); upload can be slow
![Page 25: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/25.jpg)
Contextual Information is Crucial for Interpreting Genomics Data.
Sequence
+ =
Contextual Info Find the Pathogenic Culprit!
Source: Emma Griffiths
![Page 26: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/26.jpg)
Contextual Information Needs to be Shared…..So Keep the Next User in Mind.
International Partners Intervention Partners
Source: Emma Griffiths
![Page 27: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/27.jpg)
The
of Contextual InformationIsn’t
STANDARDIZED
Source: Emma Griffiths
![Page 28: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/28.jpg)
When Words Can Mean Different Things.
Semantic Ambiguity.
http://www.neurolang.com/wp-content/uploads/2013/05/RhymesAmbiguity.png
![Page 29: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/29.jpg)
“Ontologies are for the digital age what dictionaries were in the age of print.”
Logic
VocabularyHierarchy
Knowledge Extraction
Ontology
Ontology, A Way of Structuring Information.
• Standardized, well-defined hierarchy terms • interconnected with logical relationships• “knowledge-generation engine”
=
Source: Emma Griffiths
![Page 30: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/30.jpg)
Ontologies Standardize Vocabulary and Enable Complex Querying
Simple Food Ontology Hierarchy
Animal Feed Poultry Water
Pellets Nuggets Deli Meats Bottled Well
Produce
Spinach Sprouts Whole Mice
Transmission through_ ingestion or contact
Treated by_filtration
Taxonomy_Spniacea oleracea
Preparation_Ready-to-Eat
Animal (Consumer)_Snake
Synonym_Cold Cuts
Source: Emma Griffiths
![Page 31: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/31.jpg)
Case Studies: Ontology Can Help Resolve Issues of Taxonomy, Granularity and Specificity.
Leafy Greens
Spinach Lettuce
EndiveIcebergSpinacia oleracea Amaranthus hybridus
Taxonomy_species found in N. America
Taxonomy_species found in S. Africa Equivalent Subtypes
of Lettuce
a) Taxonomy & Granularity
Poultry
Chicken Nuggets
b) Specificity
Breast
Processing_Ready-to-Eat
Composition_breading, spices, chicken breast
Location of Purchase_Retail (Grocery Store vs Butcher)
Preparation_marinated
Source: Emma Griffiths
![Page 32: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/32.jpg)
Ontology Acts Like A Rosetta Stone.
• Need a common language
• Humans AND computers need to read it
• Mapping allows interoperability AND customization
*ontologies can be translated into different human languages as wellRosetta Stone – Egypt, 196 BC• stone tablet translating same text
into different ancient languages
Source: Emma Griffiths
![Page 33: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/33.jpg)
GenEpiO: Combining Different Epi, Lab, Genomics and Clinical Data Fields.
Lab AnalyticsGenomics, PFGE
Serotyping, Phage typingMLST, AMR
Sample MetadataIsolation Source (Food, Host
Body Product, Environmental), BioSample
Epidemiology InvestigationExposures
Clinical DataPatient demographics, Medical
History, Comorbidities, Symptoms, Health Status
ReportingCase/Investigation Status
GenEpiO(Genomic Epidemiology Application Ontology)
Source: Emma Griffiths
![Page 34: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/34.jpg)
Use computers to identify common exposures, symptoms etc among genomics clusters
Example: Automating Case Definition generationCorrelate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with High-Risk Food Types Spinach Leafy Greens and Geographical Location of Vancouver
XXXXXXXXXXXXXXGenEpiO Will Help Integrate Genomics and
Epidemiological Data
Source: Emma Griffiths
![Page 35: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/35.jpg)
Public Health Surveillance
Case Cluster Analysis
Result Reporting
Infectious Disease Epidemiology (from case to Intervention)Lab Surveillance (from sample to strain typing results)
Evidence Collection& Outbreak Investigation
Sample Collection& Processing
Sequence Data Generation &
Processing
Bioinformatics Analysis
Result Reporting
Whole Genome Sequencing (SO, ERO, OBI etc)
Quality Control (OBI, ERO)
LegendGenEpiO
OBO
Other
Anatomy (FMA)
Environment (Envo)
Food (FoodOn)
Clinical Sampling (OBI)
Custom LIMS
Quality Control (OBI, ERO)
AMR (ARO)
Virulence (PATO)
Phylogenetic Clustering (EDAM)
Mobile Elements (MobiO)
Quality Control (OBI, ERO)
Nomenclature & Taxonomy (NCBItaxon)
AMR (ARO) LOINC
Surveillance (SurvO)
Demographics (SIO)
Patient History (SIO)
Symptoms (SYMP)
Exposures (ExO)
Source Attribution (IDO)
Travel (IDO)
Transmission (TRANS)
Food (FoodOn)
Geography (OMRSE)
Outbreak Protocols
Surveillance (SurvO)
Food (FoodOn)
Surveillance (SurvO)
Mobile Elements (MobiO)
Infectious Disease (IDO)
Typing (TypON)
![Page 36: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/36.jpg)
Genomic Epidemiology Ontology: Using a Common Language to Get Ahead of the Epidemiological Curve
Fewer cases…faster resolution!
Source: Emma Griffiths
![Page 37: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/37.jpg)
Whole Genome SequencingSalmonella Enteritidis
![Page 38: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/38.jpg)
39Higher Salmonellosis Incidents in BC Higher salmonellosis rate than Canada national rate since 2007:
S. Enteritidis most commonly isolated serotype since 2006 (accounts for 30-50% of all Salmonella isolates in BC)
BCCanada
Source: http://www.bccdc.ca/NR/rdonlyres/B24C1DFD-3996-493F-BEC7-0C9316E57721/0/2011_CD_Annual_Report_Final.pdf
![Page 39: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/39.jpg)
• PFGE: Over half of isolates tested are 1 of 2 XbaI patterns
• Phagetyping (PT): ~half of isolates are 1 PT.
• So a better method of subtyping is needed for discrimination between cases of Enteritidis…
– OR is a very large outbreak (no supporting data for this)
Enteritidis Xba Patterns 1998-2012 SENXAI.0003
SENXAI.0001
SENXAI.0038
SENXAI.0006
SENXAI.0036
SENXAI.0004
SENXAI.0007
SENXAI.0008
SENXAI.0062
SENXAI.0041
SENXAI.0077
SENXAI.0002
SENXAI.0025
SENXAI.0060
SENXAI.0009
Enteritidis PT distribution 1998-2012 8
13a
13
Atypical6a
1
4
51
5b
41Untypable
1b
Untypeable
21
14b6
2All have been PFGE’d but not all PT’d
S. Enteritidis subtyping in BC
Source: Kim Macdonald
![Page 40: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/40.jpg)
Isolates and Methods
• 36 isolates from 9 confirmed food-borne outbreaks • Collected over 9 years – many more isolates in the freezer waiting to be organized• Subtyping data by PFGE and PT available• Isolates from epi-linked sources available for 2 of the outbreaks
• Isolate Picking Criteria:• believed to be single source outbreak (common food, common food handler or common ingredients)• clear epidemiological linkage through enhanced interviews• majority of the clusters have the same PT and/or PFGE. Some have one PFGE band difference
• Sequencing library prepared using Nextera or Nextera XT• Sequenced on Illumina MiSeq 150bp or 250bp paired-end• Minimal depth cover 30X per genome (average coverage 50x)
![Page 41: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/41.jpg)
SNP Analysis• What is a SNP?
• A SNP (single nucleotide polymorphism) is DNA sequence variation occurring when a single nucleotide differs between two or more genomes
ATCGCGATATCATACGGATCGCAATATCATACGGATCGCGATATCATACGGATCGCGATATCATACGGATCGCAATATCATACGG
• SNP can be created from point mutation but can also be created from insertion and deletion of one nucleotide
![Page 42: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/42.jpg)
Why are SNPs useful• Silent mutations that do not change protein sequences happen quite
frequently due to DNA replication errors => High Resolution
• SNPs occurs across the whole genome and can be detected from whole genome sequencing => Unbiased markers
• SNPs can be used to infer phylogeny of organisms• More shared SNPs = more closely related
![Page 43: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/43.jpg)
Minimal Spanning Tree – colored by PT
PT8
PT4
PT13a
PT52
Note: for PT13a, 3 isolates have identical SNVs and collapsed into a single node; edges are not drawn to scale
![Page 44: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/44.jpg)
Minimal Spanning Tree – Coloured by outbreak
Created using PhyloViz Online:http://online.phyloviz.net/
![Page 45: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/45.jpg)
Whole Genome SequencingGiardia lamblia (duodenalis)
![Page 46: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/46.jpg)
Giardia• Giardia is a primitive, eukaryote protozoa belonging to Diplomonads• Its representatives are differentiated into 8 lineages (A-H) with 2 lineages (A & B)
infecting human. Genomes (A, B, E) of 3 lineages are available.• G. duodenalis (lineage A & B) causes gastrointestinal disease (giardiasis) in human
and is spread by drinking water.• There is over 1 billion cases/ year worldwide.• In BC, various waterborne outbreaks have been reported (Isaac-Renton et al. 1992,
Safaris and Isaac-Renton 1992).• The infection may be transmitted by drinking water or food.• Giardia is often associated with an animal host (beaver, Castor canadiensis), and
giardiasis is called “beaver fever”.
![Page 47: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/47.jpg)
Study Overview For the present study, 89 samples from 4 major
outbreaks (Creston, Kitimat, Revelstoke and Barriere), as well as other events were included.
Trophozoites were retrieved from -80C freezer, and DNA were extracted from Giardia strains from surface water, human and beaver using a QIAamp DNA mini kit.
The identity of isolates was confirmed by 18S rRNA but 18S doesn’t differentiate subtypes
Paired-end (PE) DNA libraries were constructed with Nextera® XT DNA kit, and whole genome re-sequencing was conducted by Illumina MiSeq.
Aldergrove
Dawson Creek
Kamloops
Terrace
Mission Creek
Source: Clement Tsui
![Page 48: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/48.jpg)
Bioinformatics Pipelines
Genome Sequencing(MiSeq)
Quality checking(Fastqc,
Trim Galore)
Reference Mapping (Bowtie)
Variant calling (GATK or DiscoSNP)
SNPs analysis
De novo Assembly (SPades)
Gene calling (MAKER)
Comparative Genomics
Source: Clement Tsui
![Page 49: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/49.jpg)
Both A and B are present in outbreaks
Barriere
Kitimat
Creston
0 1 2 3 4 5 6 7 8 9
BA2A1
Outbreaks could have multiple sources.
Source: Clement Tsui
![Page 50: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/50.jpg)
VANC/89/UBC/33, Vancouver, Canada
VANC/87/UBC/28, Aldergrove, Canada
B5/19, Calgary, Canada
VANC/90/UBC/43, Creston, Canada
VANC/87/UBC/29, Aldergrove, Canada
VANC/85/UBC/5, Coquitlam, Canada
HAMILTON84/76, Hamilton, New Zealand
VANC/91/UBC/73, Kamloops, Canada
VANC/90/UBC/64, Barriere, Canada Δ
BE/1/IP/0482/1/15, Banff, Canada
VANC/89/UBC/37, Kitimat, Canada
ATCC50170/93, Madison, USA
BTW/109, Botwood, Canada
VANC/92/UBC/101, Mission Creek, Canada
VANC/93/UBC/70, Barriere, Canada
VANC/92/UBC/104, Mission Creek, Canada
VANC/89/UBC/36, Oliver, Canada
VANC/90/UBC/71, Creston, Canada
VANC/96/UBC/126/Major, Revelstoke, Canada
HAMILTON7/75, Hamilton, New Zealand
CB2/108, Cornerbrook, Canada
VANC/85/UBC/1, Hornby Island, Canada
VANC/93/UBC/106/major, Mission Creek, Canada
VANC/88/UBC/35, Vancouver, CandaVANC/88/UBC/34, Vancouver, Canada
SI/16, Strathmore, Canada
BE/2/IPO583/1/14, Banff, Canada
VANC/94/UBC/121, Chilliwack, Canada
MONASTASHE/6, Monastashe River, Canada
D3/18, Calgary, Canada
VANC/87/UBC/27/major, Aldergrove, Canada
WHANGAREI8/79, Whangarei, New Zealand
VANC/90/UBC/52, Creston, Canada A1
Panglobal, zoonotic
Creston Revestoke Barriere ΔKitimat
Surface WaterHumansVeterinary
0.09
1
1
1
1
0.995
1
1
1
0.793
VANC/90/UBC/55/minor, Goat River beaver lodge, Canada
VANC/92/UBC/107, Vancouver, Canada
VANC/90/UBC/57, Bella Coola, Canada VANC/86/UBC/3, Ashcroft, Canada (Mexico)
VANC/87/UBC/22, North Vancouver, Canada
VANC/85/UBC/2, Smithers, Canada
VANC/93/UBC/39, Campbell River, Canada (Kenya/Sudan)
VANC/87/UBC/23, Prince George, Canada
VANC/90/UBC/42, Creston, Canada
A2
ATCC50803, Bethesda, USA (Afghanistan) ATCC30888/13, Portland, USA
VANC/90/UBC/62, Barriere, Canada ΔATCC50163/89, Philadelphia, USA
VANC/85/UBC/7, Quesnel, Canada
Source: Clement Tsui
![Page 51: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/51.jpg)
0.06
VANC/96/UBC/126/minor, Revelstoke, Canada
VANC/91/UBC/68/2, Terrace, Canada
VANC/94/UBC/122, Mission Creek, Canada
VANC/92/UBC/102, Mission Creek, Canada
VANC/87/UBC/25, Kelowna, CanadaVANC/91/UBC/74, Mission Creek, Canada
VANC/90/UBC/63, Barriere, Canada
VANC/87/UBC/26, Slocan River, Canada
VANC/90/UBC/54, Goat River beaver lodge, Canada
VANC/92/UBC/103, Mission Creek, Canada
VANC/94/UBC/125, Mission Creek, Canada
VANC/90/UBC/47, Kitimat, Canada
VANC/94/UBC/124, Mission Creek, Canada
VANC/89/UBC/48, Kitimat Canada
VANC/90/UBC/41, Creston, Canada
VANC/90/UBC/45, Creston, CanadaVANC/90/UBC/44, Creston, Canada
VANC/91/UBC/85, Mission Creek, Canada
VANC/91/UBC/72, Thompson River, Kamloops, Canada
VANC/90/UBC/49, Creston, Canada
VANC/89/UBC/59, Nanaimo, Canada
VANC/91/UBC/67, Terrace, CanadaVANC/91/UBC/68/1, Terrace, Canada
VANC/93/UBC/105, Mission Creek, Canada
VANC/87/UBC/27/minor, Aldergrove, BC
VANC/90/UBC/55/major, Goat River beaver lodge, Canada
VANC/90/UBC/46, Creston, Canada
VANC/92/UBC/84, Mission Creek, Canada
VANC/90/UBC/53, Goat River beaver lodge, Canada
VANC/92/UBC/99, Mission Creek, Canada
VANC/91/UBC/65, Barriere, Canada
VANC/90/UBC/56, Goat River beaver lodge, Canada
VANC/87/UBC/8, North Vancouver, Canada
VANC/92/UBC/98, Mission Creek, Canada
VANC/96/UBC/127, Revelstoke, Canada
VANC/90/UBC/60, Creston, Canada
VANC/90/UBC/51, Kitimat, Canada
VANC/90/UBC/61, Barriere, Canada
VANC/93/UBC/106/minor, Mission Creek, Canada
VANC/96/UBC/129, Revelstoke, Canada
VANC/90/UBC/58, Mission Creek, Canada
VANC/91/UBC/69, Muskwa River, Dawson Creek, CanadaVANC/85/UBC/9, Terrace, Canada
VANC/90/UBC/40, Creston, Canada
VANC/90/UBC/50/2, Creston, Canada
VANC/96/UBC/128, Revelstoke, Canada
1
1
1
1
0.999
1
0.968
0.9751
1
1
0.978
0.99
1
1
1
0.818
Creston Outbreak
Revelstoke Outbreak
Barriere Outbreak
Kitimat Outbreak
Kelowna,Mission Creek
Surface WaterHumansVeterinary
Source: Clement Tsui
![Page 52: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/52.jpg)
Microbial genomics has been a valuable research tool• Help us understand:
• microbial evolution• pathogenesis• create novel industrial processes• create new laboratory tests
• Use historical isolates – not real time• Use of laboratory strains – no associated rich clinical and
epidemiological metadata
![Page 53: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/53.jpg)
Cultural and Practical DifferencesGenomics Research Laboratory Genomics Diagnostic Laboratory
Curiosity driven Production / Case driven
Exploratory analysis tolerated Exploratory analysis discouraged
Reproducibility = other labs’ problem Reproducibility critical
Tweaking protocols desirable Stability in protocols desirable
Protocols don’t need to be validated Protocols need to be validated
Novelty justifies the high cost of experiment
Conscious of cost per unit test; tests need to be scalable
By working together, we can bridge the cultural differences
![Page 54: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/54.jpg)
![Page 55: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/55.jpg)
AcknowledgementsIRDA Project Principle InvestigatorsFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NMLRob Beiko – Dalhousie U.Joᾶo Carriҫo – U. of LisboaMorag Graham – NMLEduardo Taboada - NMLLynn Schriml – U. of Maryland
National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsonTarah LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsChrystal BerryLorelee TschetterAleisha ReimerPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall
Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo
BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella
Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert
Dalhousie UniversityAlex Keddy
McMaster UniversityAndrew McArthurDaim Sardar
European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid
European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina
Sidra Medical CenterPatrick Tang
Salmonella ProjectKim MacdonaldMatthew CroxenLinda HoangAna Paccagnella Mark McCabeDiane EislerBrian AukNatalie PrystajeckyMarsha TaylorEleni Galanis
Giardia ProjectClement TsuiRuth MillerAnamaria CrisanDamion DooleyKirby CroninSara TanJustin DirkMark McCabeSunny MakBrian AukAnna LiC.P. FungLorraine McIntyreRenata ZanchettinNatalie PrystajeckyJudy Isaac-Renton
![Page 56: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health](https://reader036.fdocuments.us/reader036/viewer/2022062904/587d08001a28ab1e7e8b7af7/html5/thumbnails/56.jpg)
57
IRIDA Annual General MeetingWinnipeg, April 8-9, 2015