Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science...
Transcript of Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science...
Tracy Edinger, ND, MS
Oregon Health & Science University
Twitter: #AMIA2017
Evaluation of Clinical Text Segmentation to Facilitate Cohort RetrievalEnhanced Cohort Identification and Retrieval
S105
Co-Authors
Dina Demner-Fushman, MD, PhD (National Library of Medicine)
Aaron Cohen, MD, MS (Oregon Health & Science University)
Steven Bedrick, PhD (Oregon Health & Science University)
William Hersh, MD (Oregon Health & Science University)
2AMIA 2017 | amia.org
Acknowledgements
3AMIA 2017 | amia.org
DMICE Faculty, Staff,
and Students
NLM 2 T15 LM 7088-21
National Library of Medicine OHSU
NLM Scientists, Staff,
and Fellows
Disclosure
I and my spouse/partner have no relevant relationships with commercial
interests to disclose.
4AMIA 2017 | amia.org
Learning Objectives
After participating in this session the learner should be better able to:
• Understand the importance of identifying document section headings for natural language
processing
• Understand rule-based identification of document section headings
5AMIA 2017 | amia.org
Use of Clinical Data
6AMIA 2017 | amia.org
• Secondary use of EHR data
Quality improvement Regulatory reporting
Disease surveillance Research
• To use this data, it is important to be able to retrieve specific patient cohorts
Image from http://epidemiologystudy.com/study.php
Structured and Unstructured Data for Cohort Retrieval
7AMIA 2017 | amia.org
• Structured data including diagnosis and procedure codes are commonly used to identify clinical cohorts
• Relying solely on structured data may not retrieve the full cohort
Denny JC (2012) Chapter 13: Mining Electronic Health Records in the Genomics
Era. PLoS Comput Biol 8(12): e1002823. doi:10.1371/journal.pcbi.1002823
Patients who had colonoscopies during the last 10 years
Cohort Retrieval from Clinical Text
8AMIA 2017 | amia.org
• Cohort retrieval from clinical text is difficult
• Terminology and spelling differences
• Multiple meanings for terms
• Temporality
• Negation
• References to illnesses in other people
• Clinical text may provide clues to help resolve some of these issues
Structure of Clinical Text
9AMIA 2017 | amia.org
S: Patient reports not much sleep last night; no complaints
this morning.
O: T 99 F, HR 68, RR 16, BP 107/75
Chest – CTA, bilateral breath sounds
CV – RRR without murmur
A: Ovarian carcinoma – POD #1 for staging laparotomy.
Adequate UOP, incision in good condition.
P: Clear liquids today. D/C foley catheter.
SOAP Format
Structure of Clinical Text
10AMIA 2017 | amia.org
Chief Complaint: Sent from NWH with left sided hemorrhage
History of Present Illness: The pt is a 44 year-old right handed woman with no significant PMH and family history significant for stroke (father, paternal uncle and sister @ 46 years) who was transferred from [**Hospital 1771**] Hospital with a left sided intraparenchymal hemorrhage. The patient was in her USOH ...
Past Medical History: Had an ulcer at age 10
Social History: Works at the [**Last Name (un) 10457**] Laboratories in [**Location (un) 2997**]. Married. Has a son. No ETOH, TOBACCO, or Drugs.
Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46.
Facilitating Retrieval by Segmenting Clinical Text
11AMIA 2017 | amia.org
Past Medical History: Had an ulcer at age 10
Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46.
Several algorithms have been published that segment clinical documents
- Segmenting was validated
- No published studies evaluate whether segmenting improves recall and precision
Sections provide clues that may avoid some retrieval issues
- Temporal differences
- References to illnesses in other people
Project Overview
12AMIA 2017 | amia.org
• Segmented a set of clinical documents
• Developed topics for several patient cohorts
• Developed queries with and without sections
• Judged a subset of documents for performance
• Analyzed results
Methods - Data
13AMIA 2017 | amia.org
• MIMIC-II database – neonatal and adult patients
• De-identified ICU records developed by MIT, Philips Medical Systems, and Beth Israel Deaconess Medical Center
• Relational database containing structured data and unstructured documents
25,000 patients
Discharge summaries
MD notes
Radiology reports
Nursing notes
Methods – Segmenting Documents
14AMIA 2017 | amia.org
• Identified section indicators
Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY
<allergies>Allergic to penicillin</allergies>Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to
Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M
Service: SURGERY Allergies: PenicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged toAdmission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M
Service: SURGERY Allergies - penicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to
Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M
Service: SURGERY Allergic to penicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to
• Searched for indicators and inserted XML tags
Methods – Segmenting Documents
15AMIA 2017 | amia.org
Original format
<TEXT>Admission Date: [**3391-5-21**] Discharge Date:
[**3391-6-1**] Date of Birth: [**3312-11-5**] Sex: M
Service: SURGERY Allergies: Penicillin
Attending:[**First Name3 (LF) 2679**] Addendum: Pt is
discharged to [**Hospital3 **] Hospital [**3391-6-1**].
This is an updated medication list, which has been
faxed to [**Hospital3 **]. Discharge Medications: 1.
Acetaminophen 325 mg Tablet Sig: 1-2 Tablets PO Q6H
(every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet
Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro
100 unit/mL Solution Sig: One (1) injection
Subcutaneous ASDIR (AS DIRECTED). Discharge
Disposition: Extended Care Facility: [**Hospital6 694**]
– [((Location (un) 695**] [**First Name11 (Name
Pattern1) 531**] [**Last Name (NamePattern1) 2684**]
MD [**MD Number 2685**]</TEXT>
<TEXT>
<preamble>Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**]
Date of Birth: [**3312-11-5**] Sex: M Service: SURGERY</preamble>
<allergies>Allergies: Penicillin</allergies>
<addendum>Addendum: Pt is discharged to [**Hospital3 **] Hospital [**3391-
6-1**]. This is an updated medication list, which has been faxed to
[**Hospital3 **]. </addendum>
<dc_meds>Discharge Medications: 1. Acetaminophen 325 mg Tablet Sig: 1-2
Tablets PO Q6H (every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet
Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro 100 unit/mL
Solution Sig: One (1) injection Subcutaneous ASDIR (AS DIRECTED).
</dc_meds>
<dc_disposition>Discharge Disposition: Extended Care Facility: [**Hospital6
694**] – [((Location (un) 695**] [**First Name11 (Name Pattern1) 531**]
[**Last Name (NamePattern1) 2684**] MD [**MD Number 2685**]
</dc_disposition>
</TEXT>
Segmented text
Methods – Search Engine
16AMIA 2017 | amia.org
NLM’s Essie
• Developed to facilitate searching of medical literature by non-clinicians through use of UMLS
• UMLS relates terms by concept
• Allows matching even if different words used
• Maps text corpus to the UMLS and indexes the corpus on these concepts
• Maps the search concepts to the UMLS
• Returns a ranked, scored list of documents
Methods – Clinical Topics
17AMIA 2017 | amia.org
• Began with topics from TRECMed 2012 and adapted them to the MIMIC ICU data
• Modified or eliminated topics that retrieved few documents
Methods – Clinical Topic Examples
18AMIA 2017 | amia.org
• Patients who develop thrombocytopenia in pregnancy
• Patients taking atypical antipsychotics without a diagnosis of schizophrenia or bipolar depression
• Patients with delirium, hypertension, and tachycardia
• Patients with thyrotoxicosis treated with beta-blockers
• Final set included 22 topics
Methods – Query Development
19AMIA 2017 | amia.org
• Developed initial query without sections
• Ran queries against data
• Examined retrieved documents to refine query
• Rewrote query using sections
• Ran queries against data
• Examined retrieved documents to refine query
• Ran all queries and recorded documents returned and scores
Methods – Query Development
20AMIA 2017 | amia.org
Topic: Patients with diabetes who also have thrombocytosis
• Baseline query
diabetes AND thrombocytosis
• With sections we could avoid Family History
thrombocytosis AND AREA[AdmissionDiagnosis]
diabetes OR AREA[ChiefComplaint] diabetes OR
AREA[Course] diabetes …
Methods – Document Sampling
21AMIA 2017 | amia.org
• Samples selected for each topic based on difference in scores
Total sample size was 574 documents
• Sample sizes ranged from 10 to 40
• Average sample size 26 documents
Segmented
Documents
0-10 docs
Whole
Document
0-10 docs
0-10 high
0-10 low
Methods – Document Evaluation
22AMIA 2017 | amia.org
1. Was the document relevant to the topic?
2. Why were non-relevant documents retrieved?
3. Did segmentation help retrieval and why?
Results – Document Relevance
23AMIA 2017 | amia.org
574 Documents Analyzed
Queries of
Segmented
Documents
Queries of
Whole
Documents
328 22026
Results – Document Relevance
24AMIA 2017 | amia.org
Segmented
Documents
Whole
Document82
Segmented
Documents
Whole
Document246
343 Relevant Documents
231 Non-relevant Documents
20 77
1436
Results – Reasons for Retrieving Non-relevant Documents
25AMIA 2017 | amia.org
Non-relevant reference to condition 84
Past or possible future condition 70
Condition mentioned but not diagnosed 23
Condition denied or ruled out 22
Issue with term mapping 20
Query issue 11
Results – Effect of Segmenting on Document Retrieval
26AMIA 2017 | amia.org
Segmenting avoided retrieval of non-relevant document
by avoiding specific sections132
Segmenting allowed retrieval of relevant document by
focusing on specific sections20
Performance unrelated to segmenting 320
Query error—did not look in the right section 80
Document not segmented correctly 18
Condition included in incorrect section of notes 1
Results
27AMIA 2017 | amia.org
Segmenting avoided retrieval of non-relevant documents
Patients who develop thrombocytopenia in pregnancy
Issue: Neonatal notes often document mother’s
pregnancy history
Solution: Look in sections containing the patient’s
diagnosis
Results
28AMIA 2017 | amia.org
Segmenting allowed retrieval of relevant documents by
focusing on specific sections
Patients taking atypical antipsychotics without a diagnosis
of schizophrenia or bipolar depression
Issue: Need to ignore mentions of these conditions in
family members
Solution: Look in sections containing the patient’s
diagnosis; avoid family-history section
Quantitative Analysis
29AMIA 2017 | amia.org
• Correlation to indicate whether querying the
segmented documents impacted performance
• Precision and recall
Analysis – Matthews Correlation Coefficient
30AMIA 2017 | amia.org
Segmented score
higher than base
Segmented score
lower than base
Document
relevant True Positive False Negative
Document not
relevantFalse Positive True Negative
MCC =
TP x TN – FP x FN
√((TP + FP)(TP + FN)(TN + FP)(TN + FN))
Values range from -1 to 1
Analysis – Matthews Correlation Coefficient
31AMIA 2017 | amia.org
-0.2 0 0.2 0.4 0.6 0.8 1
**** **
******** p<0.05
p<0.01
**Average
Analysis – Recall and Precision
32AMIA 2017 | amia.org
• Recall = Number of relevant documents retrieved
All relevant documents judged
• Precision = Number of relevant documents retrieved
All documents judged
• Values range from 0 to 1
Analysis - Recall
33AMIA 2017 | amia.org
0
0.2
0.4
0.6
0.8
1
Whole Document Segmented Document Avg
Analysis - Precision
34AMIA 2017 | amia.org
0
0.2
0.4
0.6
0.8
1
Whole Document Segmented Document Avg
Discussion
35AMIA 2017 | amia.org
• Queries of segmented documents retrieved fewer
documents
• These documents were more likely to be relevant
and less likely to be non-relevant
• Some queries performed better
• Some documents were easier to segment accurately
Limitations
36AMIA 2017 | amia.org
• Small sample size
• Only one person writing queries and doing relevance
judgments
• Inaccuracies in identifying note segments
• Some queries did not perform well
Future Work
37AMIA 2017 | amia.org
• Use validated algorithm to segment text
• Use larger sample and independent relevance judges
• Develop queries for specific type of clinical note
• Identify specific types of information that benefit from
searching specific sections
• Search unstructured and structured data together to
reflect real-world EHR data use
@AMIAInformatics
@AMIAinformatics
Official Group of AMIA
@AMIAInformatics
#WhyInformatics
38AMIA 2017 | amia.org
AMIA is the professional home for more
than 5,400 informatics professionals,
representing frontline clinicians,
researchers, public health experts and
educators who bring meaning to data,
manage information and generate new
knowledge across the research and
healthcare enterprise.
Thank you!Email me at: