Zhe_2014JointSummits_v6
-
Upload
zhe-henry-he -
Category
Documents
-
view
70 -
download
0
Transcript of Zhe_2014JointSummits_v6
Categorizing the Rela/onships between Structurally Congruent Concepts from Pairs of Terminologies for Seman/c Harmoniza/on
Zhe He, PhD1,2 James Geller, PhD2, Gai Elhanan, MD3
1Department of Biomedical Informa3cs, Columbia University 2Department of Computer Science, New Jersey Ins3tute of Technology
3Halfpenny Technologies, Inc
2014 AMIA Summit on Transla5onal Bioinforma5cs April 8, 2014
Disclosure • Zhe He discloses that he has no rela/onships with commercial
interests. • James Geller discloses that he has no rela/onships with
commercial interests. • Gai Elhanan discloses that he has no rela/onships with
commercial interests.
2
Learning Objec/ve
• Use structural method to find poten/al concepts for enriching the conceptual content of a biomedical terminology
3
Overview • Mo/va/on – Exploring structural method for seman/c harmoniza/on
• Background – Importance of the conceptual content of SNOMED CT
• Methods – Structural matching of pairs of terminologies in the UMLS
• Results – Reusable knowledge can be derived by structural matching, including discovery of possible synonym
• Limita/ons and Future Work • Conclusions
4 4
Mo/va/on
� Need of well-‐developed and well-‐maintained terminologies
� NLP tools that process clinical text need a terminology with fruiXul concepts and synonyms.
� Complex clinical research texts require combined use of mul/ple terminologies (Weng et al. 2010)
� Terminologies need harmoniza/on to achieve seman/c interoperability (Bi]ner et al. 2005)
5 5
6
Seman/c Harmoniza/on Between Different Terminologies
• The BRIDG model as a user-‐centered harmoniza/on framework (Weng et al. 2010)
• Harmonized exis/ng /me ontologies for annota/ng temporal rela/on in clinical narra/ves (Tao et al. 2011)
• Seman/c harmoniza/on efforts have recently been extended for various terminologies • SNOMED CT and LOINC (AMIA 2013 Informa/cs Year in Review) • SNOMED CT and ICD 11 (Rodrigues et al. 2013)
7
Importance of Conceptual Content of SNOMED CT
• SNOMED CT is going to be a major terminology for EHR encoding of diagnoses and problem lists by 2015
• SNOMED CT has many problems! • Top two men/oned deficiencies of SNOMED CT (Elhanan 2011):
– Missing concepts – 23% – Missing synonyms – 17%
• Users will expect SNOMED CT to have correct synonyms and sufficient concepts to be used in EHR
8
Leveraging Common Structure of Pairs of Terminologies in the UMLS
• UMLS (Unified Medical Language System): – More than 170 source terminologies, 8.9 million terms, 2012AB release
Breast Cancer (NCIt)
Carcinoma of breast (SNOMED)
Carcinoma breast (MedDRA)
Breast Carcinoma (UMLS)
etc.
Structurally Congruent Concepts
1) X and Y are alterna/ve classifica/ons 2) X can be a parent of Y 3) Y can be a parent of X 4) X and Y are synonymous 5) Structural errors in Terminology 1 6) Structural errors in Terminology 2
9
Concept A
Concept X
Concept B Concept B
Concept Y same concept
same concept
Concept A
X does not occur in Terminology 2
Y does not occur in Terminology 1
Cycles were eliminated during the processing
IS_A
IS_A
IS_A
IS_A
Alterna/ve Classifica/on
Concept A
Concept X Concept Y
same concept
same concept
Concept A
Classified by assump3on of Terminology 2 Designer
Concept B Concept B
Classified by assump3on of Terminology 1 Designer
10 10
IS_A
IS_A
IS_A
IS_A
IS_A
IS_A
X can be a parent of Y
11
Concept X
Concept Y
Y can be a parent of X Concept Y
Concept X
Parent-‐Child Rela/onship
11
IS_A
IS_A
12
X and Y are synonymous but have not been iden/fied by the UMLS
Concept X (Synonym: Y)
Synonymous
12
META Terminologies in this Study
• Metathesaurus terminologies with “PAR” rela/onship and “inverse_isa” rela/onship a]ribute were chosen
• SCTUSX and UWDA were excluded • Reference Terminologies (Terminology 1): – MEDCIN (MEDCIN) – Na/onal Cancer Ins/tute Thesaurus (NCIt) – Gene Ontology (GO) – Medical En//es Dic/onary (CPM) – UMDNS: product category thesaurus (UMD) – Founda/onal Model of Anatomy Ontology (FMA)
• Terminology 2: SNOMED CT
13
14
Evalua/on: Pairs of Congruent Concepts Found
Reference Terminology
Size of Terminology # of Pairs of Congruent Concepts
Sample Size
MEDCIN 279529 655 70 NCI 95523 582 70 FMA 82062 116 70 UMD 15956 18 18 GO 61925 6 6 CPM 3078 7 7 Total -‐-‐ 1384 241
15
Review Results for Pairs of Congruent Concepts
Reference Terminology
Sample Size Alterna5ve Classifica5on
Y à X X à Y
MEDCIN 70 44 10 7 NCI 70 38 12 6 GO 6 2 -‐-‐ 4 CPM 7 5 -‐-‐ -‐-‐ UMD 18 9 1 -‐-‐ FMA 70 45 13 4 Total 241 143 36 21 Percentage 100% 59.3% 14.9% 8.7%
23.6%
Co-‐author GE reviewed the sample
16
Review Results for Pairs of Congruent Concepts
Reference Terminology
Sample Size
Error in Termi-‐ nology 1
Error in Termi-‐ nology 2
Synonym
MEDCIN 70 -‐-‐ 1 8 NCI 70 -‐-‐ 3 11 GO 6 -‐-‐ -‐-‐ -‐-‐ CPM 7 -‐-‐ -‐-‐ 2 UMD 18 -‐-‐ -‐-‐ 8 FMA 70 2 -‐-‐ 6 Total 181 2 4 35 Percentage 100% 0.8% 1.7% 14.5%
17
Example: Alterna/ve Classifica/on
17
Structure of posterior intercostal vein, C0226639
Eleventh right posterior intercostal vein, C0501203
Structure of posterior intercostal vein, C0226639
Eleventh posterior intercostal vein, C0506471
Lower right posterior intercostal veins, C1283497
Eleventh right posterior intercostal vein, C0501203
FMA3_1 SNOMEDCT_2012_07_31
17
Making Explicit an Implicit Assump/on of the Two Original Terminology Designers
Structure of posterior intercostal vein, C0226639
Eleventh right posterior intercostal vein, C0501203
Eleventh posterior intercostal vein, C0506471
Posterior intercostal vein classified by position
SNOMEDCT_2012_07_31
Structure of posterior intercostal vein, C0226639
Posterior intercostal vein classified by ordinality
Eleventh right posterior intercostal vein, C0501203
Lower right posterior intercostal veins, C1283497
FMA3_1
18 18
19
Example: Parent of the Other
Sign and Symptoms, C0037088
Finding by Site or System, C1333618 Finding by site, C1290906
Integumentary System Finding, C1291044
FMA3_1 SNOMEDCT_2012_07_31
Sign and Symptoms, C0037088
Integumentary System Finding, C1291044
Sugges/on: Concept Import
Finding by Site or System, C1333618
Finding by site, C1290906
Sign and Symptoms, C0037088
Integumentary System Finding, C1291044
20 20
Example: Synonym of the Other
21
Chemicals, C0220806
Organic Chemicals, C0029224
Chemicals, C0220806
Chemical Viewed Structurally, C1254350
Chemical categorized structurally, C0729761
Organic Chemicals, C0029224
CPM2003 SNOMEDCT_2012_07_31
21
Sugges/on: Merge Two Synonymous Concepts
Chemicals, C0220806
Chemical categorized structurally, C0729761 Synonym: Chemical Viewed Structurally
Organic Chemicals, C0029224
22
Limita/ons
23
� Harmoniza/on cannot be done without the consent of terminology curators
� All the terminologies are in UMLS Rich Release format
23
Future Work
24
� More complex configura/ons: more intermediate concepts
� Algorithm to iden/fy the rela/onships between intermediate concepts in complex configura/ons
� Pairs of any two terminologies
24
Conclusions
25
� Six reference terminologies of the UMLS vs. SNOMED
� In a sample of 241 congruency pairs � 143 out of 241 (59.3%) concept pairs: alterna/ve classifica/on. � 47 out of 241 (23.6%) concept pairs: parent-‐child rela/onships. � 35 (14.5%) new synonyms � Three pairs of concepts indicated errors
� Take home message:
� A semi-‐automated way based on common structure of the UMLS may complement exis/ng human-‐expert centered methods to find poten/al concepts for import and export to a terminology.
25
26
Acknowledgment
We would like to thank: Dr. Yehoshua Perl and Dr. Chunhua Weng for sharing their insights and giving feedback for this work
References (1)
27
Weng C, Tu SW, Sim I, Richesson R. Formal representa/on of eligibility criteria: a literature review. J Biomed Inform. 2010;43(3):451-‐67. Bi]ner T, Donnelly M, S. W. Ontology and seman/c interoperability. In: Prosperi D, Zlatanova S, editors. Large-‐scale 3D data integra/on: Problems and challenges: CRCpress (Tailor & Francis); 2005: 139-‐60. Tao C, Solbrig HR, Chute CG. CNTRO 2.0: A Harmonized Seman/c Web Ontology for Temporal Rela/on Inferencing in Clinical Narra/ves. AMIA Summits Transl Sci Proc. 2011;2011:64-‐8. Rodrigues JM, Schulz S, Rector A, Spackman KA, Üstün B, Chute CG, Della Mea V, Millar J, Persson KB. Sharing Ontology between ICD 11 and SNOMED CT will enable Seamless Re-‐use and Seman/c Interoperability. Stud Health Technol Inform. 2013;192:343-‐6 Elhanan G, Perl Y, Geller J. A survey of SNOMED CT direct users, 2010: impressions and preferences regarding content and quality. J Am Med Inform Assoc. 2011; Suppl 1:i33-‐44
27
References (2)
28
Kumar A, Smith B, Novotny DD. Biomedical informa/cs and granularity. Comp Funct Genomics. 2004;5(6-‐7):501-‐8. Weng C, Gennari JH, Fridsma DB, User-‐centered Seman/c Harmoniza/on: A Case Study, J Biomed Inform, 2007, 40(3):353-‐64 Schulz S, Boeker M, Stenzhorn H. How Granularity Issues Concern Biomedical Ontology Integra/on. In Proceedings of the Interna/onal Congress of the European Federa/on for Medical Informa/cs (MIE 2008). Gothenburg, Sweden; 2008: 863-‐68. Rector A, Rogers J, Bi]ner T. Granularity, scale and collec/vity: when size does and does not ma]er. J Biomed Inform. 2006 Jun;39(3):333-‐49. Mougin F, Bodenreider O. Approaches to elimina/ng cycles in the UMLS Metathesaurus: naive vs. formal. AMIA Annu Symp Proc. 2005:550-‐4.
28