Syndromic Surveillance from Emergency Department Triage Notes
-
Upload
karin-verspoor -
Category
Health & Medicine
-
view
52 -
download
2
description
Transcript of Syndromic Surveillance from Emergency Department Triage Notes
Syndromic Surveillance from Emergency Department triage notes
Karin M. Verspoor, The University of Melbourne
Antonio Jimeno Yepes, The University of Melbourne
Bahadorreza Ofoghi, The University of Melbourne
Geoffrey White, DSTO
26 September 2014 - MQClinicalNLP workshop
SynSurv
• SynSurv– Victorian Department of Health pilot
syndromic surveillance program– Detection of outbreaks based on ICD-10
diagnostic codes and presenting complaints as captured in free text notes
Our focus:Extracting information from unstructured free text to enable “early warning” monitoring
Objectives of our project
• Exploration of the application of natural language processing techniques to triage notes for syndromic surveillance– To enable surveillance directly from notes;
integration into natural workflow of ED– To support higher sensitivity and higher
precision than keyword-based methods
Emergency Department triage notes
• Free text notes– written by triage nurse upon assessment in
the Emergency Department– captures presenting symptoms and
complaints of a patient
CENTRAL CHEST DISCOMFORT WHILE EATING, RADIATING TO ARMS. PPM INSERTED 2/52 AGO. PAIN FREE O/A. HR72, BP160
FEBRILE ILLNESS FLU LIKE SYMPTOMS NAUSEA
L BASAL GANGLIAN BLEED POST COLLAPSE, NON VERBAL, EYES SPON OPENED, HYPERTENSIVE, P 70REG, PEARL, PMX CEREBRAL BLEED
SynSurv data characteristics
• 918,330 records• 730,054 records with ICD-10 diagnosis• 456,213 records with note text• 316,362 records with ICD-10 diagnosis
and note text
Two sets of Experiments
• Given a free text note,– Predict the ICD-10 code(s) for the note
– Predict a syndromic group, based on pre-defined sets of ICD-10 codes of interest
Machine learning for text analysis
Training setNotes + labels
for classes of interest(e.g. ICD-10 codes)
Machine learning algorithm
Words, Phrases,Linguistic categories;
names of entities;Domain concepts; Document features
Biomedical knowledge sources
UMLS (SnomedCT, ICD)
Language processing
ModelRelating features
of the text to classes of interest
Machine learning for text analysis
New notesto be classified
Words, Phrases,Linguistic categories;
names of entities;Domain concepts; Document features
Biomedical knowledge sources
UMLS (SnomedCT, ICD)
Language processing
Model
Predicted Classification
(label)
Abstracting linguistic variation
• Terminology mapping tools generalise language variation
• e.g. UMLS Concept C0027497• nausea• nauseated• feels sick• feeling sick• queasy• felt sick• nauseous
Predicting ICD-10 codes(Results)
• Direct term matching strategy outperformed by machine learning– Performance difference between micro-
average and macro-average indicates that some ICD-10 codes are underrepresented in the data, and cannot be modeled well
Predicting Syndromic Groups
• Task– Syndromic groups are defined by sets of
ICD-10 codes, e.g. Flu like group
Predicting Syndromic Groups(Detailed Results)
Issues for low performance
• Inconsistency in ICD-10 annotation– ? FISH BONE IN THROAT J03– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT T18– ? FISH BONE IN THROAT S10.9– ? FISH BONE IN THROAT J02.0
• Notes not related to the patient´s visit– DIRECT ADMISSION FROM BAIRNSDALE TO 3S BED 25
• Typos in the notes text– ? FIH BONE IN THROAT
Integrating with DSTO’s BioSurv system
• Input to the DSTO BioSurv system– Trained machine learning models used as
input to BioSurv (e.g., C2 algorithm)– Prediction probability > 0.5
Model
Predicted Classification
(label)
Yesflu-like illness
No
BioSurvCount +1
Example: Flu like syndrome NLP notes annotation
• Records with no ICD-10 codes in the database are now available to BioSurv
• 730,054 out of 918,330 records with ICD-10 codes
C2 algorithm: ICD-10 vs NLP
• Earlier alert time using NLP methods
ICD-10 NLP
Conclusions
• NLP methods can be used to support the BioSurv tool
• Machine learning methods perform better than dictionary-based methods
• Expansion of original syndromic groups improves machine learning performance
• Evaluation is a challenge– Noisy training data– What’s a “gold standard” alert?
Acknowledgements
• Victorian Department of Health(for SynSurv data)
• Defence Science and Technology Organisation (DSTO)(BioSurv system)
(funding and collaboration)
© Copyright The University of Melbourne 2011