Post on 12-Aug-2020
6/27/2016
1
Fostering Collaboration for Public Health: The Role of NCBI
William KlimkeAPHL 2016
The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established by this Charter represents a substantial effort to strengthen collaboration and coordination of Federal public health and regulatory food safety responsibilities of the Centers for Disease Control and Prevention (CDC), the Food and Drug Administration (FDA), and the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) of the US Department of Health and Human Services, and the Food Safety and Inspection Service (FSIS) of the US Department of Agriculture.
Gen‐FS will strengthen Federal collaboration by addressing cross‐cutting priorities for molecular sequencing of foodborne and other pathogens causing human illness, and data collection, analysis and use, as outlined in the key findings of the Report of the Real Time Whole Genome Sequencing Surveillance Multi‐Agency Collaboration Meeting, September 22‐23, 2014, Natcher Center, NIH, Bethesda Maryland
Interagency Collaboration on Genomics and Food Safety (Gen-FS)
6/27/2016
2
4
FDA/CDC Real Time Listeria Project
FDA & CDC could leverage existing systems & work flows…
Could NCBI play a role?
Listeria Annual Stats (CDC)• ~1600 cases• ~260 deaths• ~$230 million (USDA ERS)
6/27/2016
3
Whole Genome Sequencing (WGS)Listeria Pilot Project
Started September 2013
Goal: Sequence all Listeria monocytogenes isolates
Near real‐time (<1 week for patient isolates)
Public Health Agency of Canada
6/27/2016
4
http://www.globalmicrobialidentifier.org/
Vision and objectivesThe vision is to develop a global system to aggregate, share, mine and use microbiological genomic data to address global public health and clinical challenges, a high impact area in need of focused effort. Such a system should be deployed in a manner which promotes equity in access and use of the current technology worldwide, enabling cost-effective improvements in plant, animal, environmental and human health.
Global Microbial Identifier
6/27/2016
5
sample_name
organism
strain/isolate
Category (attribute_package)
1a) Clinical/Host‐associated
1a1) specific_host
1a2) isolation_source
1a3) host‐disease
OR
1b) Environmental/Food/Other
1b1) isolation_source
collection_date
Geographic location
6a) geo_loc_name
OR
6b) lat_lon
collected by
Where
When
Who
What
minimal metadata
NCBI Biosample – Pathogen Template (Foodborne Outbreaks)
6/27/2016
6
Type Submissions
pathogen 117406
pathogen: clinical/host‐associated 68458
pathogen: food/environmental/other 48948
with publicly available SRA data 83243
Salmonella 48967
Listeria 12116
Campylobacter 2978
Escherichia and Shigella 13011
Other 40334
NCBI Biosample – Pathogen Template Total Submissions (May 2016)
Type SubmissionsKlebsiella 1815Acinetobacter 1906Enterobacter 822Staphylococcus 1960Streptococcus 4337Legionella 296Viruses 8589Serratia 125Pseudomonas 1133Mycobacterium 6161Vibrio 1149Bordetella 205Bacillus 332Neisseria 985
NCBI Biosample – Pathogen Template Other pathogens (May 2016)
6/27/2016
7
NCBI Pathogen Detection Pipeline Submissions and Analysis
NCBI Submission Portal
BioSamples
SRA
GenBank
BioProject
NCBI Pathogen Pipeline
Kmer analysis
Genome Assembly
Genome Annotation
Genome Placement
Clustering
SNP analysis
Tree Construction
Reports
QC
USA
UK
Aus
Clinical
NCBI Pathogen Detection Pipeline
Submissions (Jan – May, 2016)
6/27/2016
8
Type Total targets in k‐mer tree
Targets in clusters (single linkage <= 50 SNPs)
Salmonella 45297 38794Listeria 9621 8135E. coli & Shigella 13144 6046Campylobacter 2234 1569Acinteobacter 2179 1299Elizabethkingia 89 74Serratia 336 227Klebsiella 1194 677
Contributions of enteric pathogensfor food safety
http://www.ncbi.nlm.nih.gov/pathogens/contributors/
6/27/2016
9
6/27/2016
10
Contributions of clinical pathogens
http://www.ncbi.nlm.nih.gov/pathogens/
Results Available Now
6/27/2016
11
6/27/2016
12
NCBI’s Role in Combating Antibiotic Resistant Bacteria
“Create a repository of resistant bacterial strains (an “isolate bank”) and maintain a well‐curated reference database that describes the characteristics of these strains.”
“Develop and maintain a national sequence database of resistant pathogens.”
6/27/2016
13
6/27/2016
14
Clin Infect Dis. 2014 Aug 1;59(3):390‐7. doi: 10.1093/cid/ciu319. Epub 2014 May 1.
MBio. 2015 Jul 28;6(4):e01030. doi: 10.1128/mBio.01030‐15.
6/27/2016
15
AMR efforts at NCBI
• With collaborators, build database of sequenced isolates with standardized AMR metadata (i.e. accept antibiograms) (2019 Samples as of May 16 ‐http://www.ncbi.nlm.nih.gov/biosample/?term=antibiogram[filter])
• Collaborators include: (CDC, WRAIR, FDA, B&W)
• Stable, up‐to‐date database of AMR genes with standardized nomenclature• Collaborators (CARD)
• – RefSeq set released by June 2016
• Implement and validate tools for identifying AMR genes in new isolates
Antibiogram Fields• Fields designed to find balance between comprehensiveness and ease of submission
• Data dictionaries based on outside expertise (ASM, CLSI) standardize input and minimize ‘data drift’
6/27/2016
16
NCBI Outputs
Kmer tree
ftp://ftp.ncbi.nlm.nih.gov/pathogen/Results/Listeria/latest/
• Genome Workbench• full SNP reports• Integrated web‐based interactive
system*• AMR reports*• wgMLST*
Acknowledgements
Joshua CherryMichael DiCuccioWilliam KlimkeAleksandr MorgulisEyal MozesArjun PrasadKirill RotmistrovskyAlejandro SchafferSergey ShiryevMartin ShumwayAlexander SouvorovLukas WagnerAlexander Zasypkin
CDCFDA/CFSANUSDA‐FSISPHE/FERANIAIDWRAIRBroadWadsworth/MDH
pd‐help@ncbi.nlm.nih.gov
This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.govNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
David LipmanJames Ostell
SRA teamSystems groupSubmission Portal
6/27/2016
17
Automated Bacterial Assembly
SRA Reads sample 1
Trim reads (Ns, adaptor)
Reference Distance tree
Find closest reference genome(s)
ArgoCA (Combined Assembly)
De novo assembly panel
Argo (Reference assisted assembly) SOAP denovo GS‐assembler (newbler)MaSuRCA Celera Assembler
Reads remapped to combined assembly
Contig fastaRead placements (bam)Quality profile
SPAdes
6/27/2016
18
NCBI Pathogen Detection SNP Pipeline Web viewer (coming soon): example 3 – Elizabethkingia outbreak
704 SalmonellaEnteritidis
7102 columns (filtered)
Compatibility Parsimony
Data plus “noise”7402 total columns
Add “noise”:300
columns that had been
removed by filtering
No Changes
to Topology
Many Changes and
conflicts (47 + 43)
More Conflicts
(5 + 6 branches, of ~1100 total)
Few Conflicts between Topologies
R&D: Tree Building
6/27/2016
19
wgMLST approach• Complementary to SNP analysis e.g. consistency check
• Efficient for initial clustering of all isolates in species
• Generate loci using “essentially complete” RefSeq genomes
Organism Number of loci Genome in loci Number of genomes Major species
Acinetobacter 2420 58.25% 43/47 Baumannii
Campylobacter 1257 68.36% 90/132 Jejuni
Escherichia 2896 52.97% 159/165 Coli
Klebsiella 4004 82.54% 67/82 Pneumoniae
Listeria 2364 73.88% 73/81 Monocytogenes
Salmonella 3469 66.98% 137/147 Enterica
R&D: wgMLST
• Fast & relatively simple• Epidemiologists are
familiar with it• Good for initial clustering• Different heuristics• Can use special markers
for e.g. serovars• Still need to deal with
assembly errors• Recombination can still
be a problem…
wgMLST – a complementary
method
Loci are notindependent
R&D: wgMLST
6/27/2016
20
1. Initial partition of isolates within each species by kmer distances
2. Within each partition, blast comparison of all pairs of genomes
3. Single linkage clusters with at most 50 SNPs
4. Within clusters, SNPs with respect to one reference
5. Generate final SNP list and phylogenetic trees
Filtering:• Base level• Repeat • Density
Problematic genomes are eliminated at various points along the way
SNP pipeline
High SNP densityCumulative count of differences
Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430‐4.
6/27/2016
21
Number of RefSeq genomes with AMR hits
OrganismCarbapenem‐resistant beta
lactamase alleles
GES KPC NDM OXA IMP VIM IMI
3221 Escherichia coli 0 74 32 2 6 0 0
1096 Acinetobacter baumannii 0 2 32 2861 6 0 0
1081 Pseudomonas aeruginosa 0 6 0 0 0 234 0
781 Klebsiella pneumoniae 2 930 96 10 6 0 0
314 Enterobacter cloacae 0 278 8 0 6 0 1
74 Enterobacter aerogenes 0 16 0 0 0 0 0
72 Klebsiella oxytoca 0 20 4 0 0 0 0
70 Serratia marcescens 0 2 0 0 0 0 0
30 Citrobacter 0 24 4 0 0 3 0
NCBI Pathogen DetectionCarbapenem resistant beta lactamase alleles found
Organism Submitter
Number ofgenomes withcarbapenemases KPC NDM OXA
Salmonella CDC 2 1 1 0
Salmonella PHE 12 0 1 11
Serratia marcescens B&W Hospital 1 2 0 0
Pseudomonas aeruginosa B&W Hospital 2 0 0 3
Escherichia coli B&W Hospital 1 0 0 2
Klebsiella pneumoniae B&W Hospital 10 10 0 0
Enterobacter cloacae B&W Hospital 7 7 0 0
Acinetobacter B&W Hospital 6 0 0 10