Chunhua Weng, PhD - EliXR
description
Transcript of Chunhua Weng, PhD - EliXR
1
EliXR: An Approach to Eligibility Criteria Extraction and Representation
Chunhua Weng, PhD, Zhihui Luo, PhD, Stephen B. Johnson, PhDDepartment of Biomedical Informatics
Columbia UniversityMarch 10, 2011
Problem
Free-text clinical research eligibility criteria are not amenable for machine processing.
Computational representations (e.g., ontologies) are much needed to support electronic eligibility determination, clinical evidence application, clinical research knowledge management, etc.
2
Related Work
• Eligibility Rule Grammar and Ontology (ERGO)• Agreement on Standardized Protocol Inclusion
Requirements for Eligibility (ASPIRE)• Many other prior efforts1
3
1.Weng C, SW Tu, I Sim, R. Richesson, Formal Representations of Eligibility Criteria: A Literature Review, Journal of Biomedical Informatics: 43(2010), 451‐467.
The Research Gap
• Plethora of representations, no canonical model
• Ontology for human annotation vs. ontology for NLP
• Lacking ontology and NLP symbiosis
4
Our Research Question
Can we induce templates that can facilitate both representation and extraction from criteria text?
(A template is a “world model” for eligibility criteria as a semantic network)
5
From Text to Templates
TEMPLATES• Concepts• Semantic Relationships
6
TEXT• Phrases• Sentence• Phrase Co-occurrence Frequency
Template development = segmentation of UMLS Semantic Network for the eligibility criteria domain
Methods: The EliXR Framework
7
Criteria Corpus
Lexicon Creation1
Semantic Annotation1
Semantic Dependency
Parsing4
TemplateInduction6
Semantic Pattern Mining5
Template Filling
Dynamic Criteria Categorization2,3
Structured Criteria
UMLS
Automatic Template selection
Semantic Annotation Compared with MMTx
Example:Patients with complications such as serious cardiac, renal and hepatic disorders.
EliXR Annotation:{Patients | Patient or Disabled Group} {with|} {complications | Pathologic
Function} {such|} {as|} {serious | Qualitative Concept} {cardiac | Body Part, Organ, or Organ Component} {renal | Body Part, Organ, or Organ Component} {and|} {hepatic | Body Location or Region} {disorders | Disease or Syndrome} {.|.|}
MMTx 2.4C Annotation:{Patients | Patient or Disabled Group} {with complications | Pathologic Function}
{such as serious cardiac, renal | Idea or Concept} {and|} {hepatic disorders| Disease or Syndrome}
8
[Luo, CRI‐10]
Dependency Parsing for “at least 1 week since discontinuation of prior pulmonary
hypertension medication”
10
Frequent Semantic Patterns
11
GroupsSemantic patterns in
each group
Disease Criteria 60
Lab Results Criteria 36
Cancer Criteria 28
Medication Criteria 23
Therapy or Surgery Criteria 23
Temporal Expression 9
Total Unique Patterns 175
CLAS | FTCN | QLCO | QNCO | CLNA
DSYN | NEOPPATF | SOSY | FNDG
TOPP | PHSU | CLDG
BLOR | BPOC
LBPR| DIAP
TMCO
VIRS
PODG
Causes
Manifestation Of
Occurs in
Aggregated Patterns for Disease Criteria
ORGA
Modifier
DiseaseManifestation
Treatment (Therapy or Drug)
Body Location
Diagnostic Procedure
Temporal Constraints
Etiology
Population Group
Causes
Manifestation of
Occurs in
Disease Criteria Template
History
0:m
attribute
Class
Has‐a
Micro‐Templates for Temporal Expressions
temporal relationship
reference interval
1:m
0:m
0:m
0:m
temporal pattern0:m
intrinsic temporal pattern
intrinsic duration
0:m
0:m
anchor event
Temporal Expression
event
0:m
cycle
frequency
AQUA Parsing Accuracy
• Only 900 criteria sentences were used for training• A human review served as the gold standard
15
Five Test Sets(100 criteria each)
Tree Structure Correctness
1 90.60%2 94.30%3 92.80%4 93.00%5 93.00%
Avg. 92.70%
Evaluation of Semantic Patterns
16
Min. support
Semantic Type PatternsFrequent Sub-trees
Maximal Frequent Sub-trees
Min. Pattern Cover
Total Patterns
Total Binary
PatternsCoverage Total
PatternsCoverage Total
Number
Overlap with
Patterns2 1825 669 91.30% 175 81.30% 183 90
Min. support
Semantic Group PatternsFrequent Sub-trees
Maximal Frequent Sub-trees
Total Patterns
Total Binary
PatternsCoverage Total
PatternsCoverage
2 2378 120 92.50% 39 90.60%
Contributions
1. Templates with rich semantics that can be mapped to UMLS and semantically aligned with text;
2. A method for segmenting UMLS for boot strapping knowledge representation for eligibility criteria;
3. A method combining machine learning and dependency tree pattern mining for iterative, (semi-)automatic knowledge acquisition.
17
Acknowledgements• NLM R01 LM009886 (04/01/09 - ) “Bridging the semantic gap between
eligibility criteria and clinical data” (PI: Weng)
• Colleagues on the AQUA and EliXR team– Eneida Mendoca, MD, PhD– Robert Duffy, MS– Xiaoying Wu, PhD
• Feedback fromIda Sim, Samson Tu, James J. Cimino, Nigam Shah, GQ Zhang, Albert Lai
18
Resources for Sharing & Collaboration
1. A UMLS-based semantic lexicon for eligibility criteria
2. A semantic annotator
3. A dynamic semantic classifier for criteria sentences
4. A dependency parser with enriched semantic information
5. A temporal expression ontology for eligibility criteria
6. A tool for temporal expression extraction and encoding
19
References1. Johnson, SB, Conceptual graph grammar--a simple formalism for sublanguage. Methods of Information in Med. 1998 Nov;37(4-5):345-52. 2. Johnson, SB. A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6:205--218, 1999. 3. Campbell, DA., Johnson, SB. A transformational-based learner for dependency grammars in discharge summaries. Proc. Of ACL-02 workshop on BioNLP, 37--44, 2002.4. Zaki MJ. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. Knowl.Data Eng., 17(8):1021-1035, 2005. 5. Luo Z, Duffy R, Johnson SB, Weng C, Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS. Proc of AMIA Summit on Clinical Research Informatics. 2010: 26-31. 6. Luo Z, Johnson SB, Chase HS, Weng C, Semi-automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering, Proc of AMIA Symp 2010, 487-91.7. Weng C, Luo Z, Dynamic Categorization of Eligibility Criteria, Proc of AMIA Fall Symp 2010, 1306.
20