Zhe_2014JointSummits_v6

29
Categorizing the Rela/onships between Structurally Congruent Concepts from Pairs of Terminologies for Seman/c Harmoniza/on Zhe He, PhD 1,2 James Geller, PhD 2 , Gai Elhanan, MD 3 1 Department of Biomedical Informa3cs, Columbia University 2 Department of Computer Science, New Jersey Ins3tute of Technology 3 Halfpenny Technologies, Inc 2014 AMIA Summit on Transla5onal Bioinforma5cs April 8, 2014

Transcript of Zhe_2014JointSummits_v6

Categorizing  the  Rela/onships  between  Structurally  Congruent  Concepts  from  Pairs  of  Terminologies  for  Seman/c  Harmoniza/on  

Zhe  He,  PhD1,2  James  Geller,  PhD2,  Gai  Elhanan,  MD3  

1Department  of  Biomedical  Informa3cs,  Columbia  University  2Department  of  Computer  Science,  New  Jersey  Ins3tute  of  Technology  

3Halfpenny  Technologies,  Inc  

2014  AMIA  Summit  on  Transla5onal  Bioinforma5cs  April  8,  2014  

Disclosure  •  Zhe  He  discloses  that  he  has  no  rela/onships  with  commercial  

interests.  •  James  Geller  discloses  that  he  has  no  rela/onships  with  

commercial  interests.  •  Gai  Elhanan  discloses  that  he  has  no  rela/onships  with  

commercial  interests.  

 

2  

Learning  Objec/ve  

•  Use  structural  method  to  find  poten/al  concepts  for  enriching  the  conceptual  content  of  a  biomedical  terminology  

3  

Overview  •  Mo/va/on  –  Exploring  structural  method  for  seman/c  harmoniza/on  

•  Background  –  Importance  of  the  conceptual  content  of  SNOMED  CT  

•  Methods  –  Structural  matching  of  pairs  of  terminologies  in  the  UMLS  

•  Results  –  Reusable  knowledge  can  be  derived  by  structural  matching,  including  discovery  of  possible  synonym  

•  Limita/ons  and  Future  Work  •  Conclusions  

4  4  

Mo/va/on  

� Need  of  well-­‐developed  and  well-­‐maintained  terminologies  

� NLP  tools  that  process  clinical  text  need  a  terminology  with  fruiXul  concepts  and  synonyms.  

�  Complex  clinical  research  texts  require  combined  use  of  mul/ple  terminologies  (Weng  et  al.  2010)  

�  Terminologies  need  harmoniza/on  to  achieve  seman/c  interoperability  (Bi]ner  et  al.  2005)    

5  5  

6  

Seman/c  Harmoniza/on  Between  Different  Terminologies

•  The  BRIDG  model  as  a  user-­‐centered  harmoniza/on  framework  (Weng  et  al.  2010)  

•  Harmonized  exis/ng  /me  ontologies  for  annota/ng  temporal  rela/on  in  clinical  narra/ves  (Tao  et  al.  2011)  

•  Seman/c  harmoniza/on  efforts  have  recently  been  extended  for  various  terminologies  •  SNOMED  CT  and  LOINC  (AMIA  2013  Informa/cs  Year  in  Review)  •  SNOMED  CT  and  ICD  11  (Rodrigues  et  al.  2013)  

7  

Importance  of  Conceptual  Content  of  SNOMED  CT

•  SNOMED  CT  is  going  to  be  a  major  terminology  for  EHR  encoding  of  diagnoses  and  problem  lists  by  2015  

•  SNOMED  CT  has  many  problems!  •  Top  two  men/oned  deficiencies  of  SNOMED  CT  (Elhanan  2011):  

– Missing  concepts  –  23%  – Missing  synonyms  –  17%  

•  Users  will  expect  SNOMED  CT  to  have  correct  synonyms  and  sufficient  concepts  to  be  used  in  EHR  

8  

Leveraging  Common  Structure  of  Pairs  of  Terminologies  in  the  UMLS

•  UMLS  (Unified  Medical  Language  System):    –  More  than  170  source  terminologies,  8.9  million  terms,  2012AB  release  

Breast  Cancer  (NCIt)  

Carcinoma  of  breast  (SNOMED)  

Carcinoma  breast  (MedDRA)  

Breast  Carcinoma  (UMLS)  

etc.  

Structurally  Congruent  Concepts

1)  X  and  Y  are  alterna/ve  classifica/ons  2)  X  can  be  a  parent  of  Y  3)  Y  can  be  a  parent  of  X  4)  X  and  Y  are  synonymous  5)  Structural  errors  in  Terminology  1  6)  Structural  errors  in  Terminology  2

9  

Concept A  

Concept X  

Concept B   Concept B  

 

Concept Y  same concept  

same concept  

Concept A  

X does not occur in Terminology 2  

Y does not occur in Terminology 1  

Cycles  were  eliminated  during  the  processing

IS_A  

IS_A  

IS_A  

IS_A  

Alterna/ve  Classifica/on

Concept  A  

Concept  X   Concept  Y  

same  concept  

same  concept  

Concept  A  

Classified  by  assump3on  of  Terminology  2  Designer  

Concept  B   Concept  B    

Classified  by  assump3on  of  Terminology  1  Designer  

10  10  

IS_A  

IS_A  

IS_A  

IS_A  

IS_A  

IS_A  

X  can  be  a  parent  of  Y  

11  

Concept X  

Concept Y  

Y  can  be  a  parent  of  X  Concept Y  

Concept X  

Parent-­‐Child  Rela/onship

11  

IS_A  

IS_A  

12  

X  and  Y  are  synonymous  but  have  not  been  iden/fied  by  the  UMLS  

Concept X (Synonym: Y)  

Synonymous

12  

META  Terminologies  in  this  Study

•  Metathesaurus  terminologies  with  “PAR”  rela/onship  and  “inverse_isa”  rela/onship  a]ribute  were  chosen  

•  SCTUSX  and  UWDA  were  excluded  •  Reference  Terminologies  (Terminology  1):  – MEDCIN  (MEDCIN)  –  Na/onal  Cancer  Ins/tute  Thesaurus  (NCIt)  –  Gene  Ontology  (GO)  – Medical  En//es  Dic/onary  (CPM)  –  UMDNS:  product  category  thesaurus  (UMD)  –  Founda/onal  Model  of  Anatomy  Ontology  (FMA)  

•  Terminology  2:  SNOMED  CT

13  

14  

Evalua/on:  Pairs  of  Congruent  Concepts  Found

Reference  Terminology

Size  of  Terminology #  of  Pairs  of  Congruent  Concepts

Sample  Size

MEDCIN 279529 655 70 NCI 95523 582 70 FMA 82062 116 70 UMD 15956 18 18 GO   61925 6 6 CPM 3078 7 7 Total -­‐-­‐ 1384 241

15  

Review  Results  for  Pairs  of  Congruent  Concepts

Reference   Terminology

Sample  Size Alterna5ve Classifica5on

Y  à  X X  à  Y

MEDCIN 70 44 10 7 NCI 70 38 12 6 GO 6 2 -­‐-­‐ 4 CPM 7 5 -­‐-­‐ -­‐-­‐ UMD 18 9 1 -­‐-­‐ FMA 70 45 13 4 Total 241 143 36 21 Percentage 100% 59.3% 14.9% 8.7%

23.6%  

Co-­‐author  GE  reviewed  the  sample  

16  

Review  Results  for  Pairs  of  Congruent  Concepts

Reference   Terminology

Sample  Size

Error  in    Termi-­‐    nology  1

Error  in  Termi-­‐ nology  2

Synonym

MEDCIN 70 -­‐-­‐ 1 8 NCI 70 -­‐-­‐ 3 11 GO 6 -­‐-­‐ -­‐-­‐ -­‐-­‐ CPM 7 -­‐-­‐ -­‐-­‐ 2 UMD 18 -­‐-­‐ -­‐-­‐ 8 FMA 70 2 -­‐-­‐ 6 Total 181 2 4 35 Percentage 100% 0.8% 1.7% 14.5%

17  

Example:  Alterna/ve  Classifica/on

17

Structure of posterior intercostal vein, C0226639  

Eleventh right posterior intercostal vein, C0501203  

Structure of posterior intercostal vein, C0226639  

Eleventh posterior intercostal vein, C0506471  

Lower right posterior intercostal veins, C1283497  

 Eleventh right posterior intercostal vein, C0501203  

 FMA3_1   SNOMEDCT_2012_07_31  

17  

Making  Explicit  an  Implicit  Assump/on  of  the  Two  Original  Terminology  Designers  

Structure of posterior intercostal vein, C0226639  

 

Eleventh right posterior intercostal vein, C0501203  

Eleventh posterior intercostal vein, C0506471  

 

Posterior intercostal vein classified by position  

 

SNOMEDCT_2012_07_31  

Structure of posterior intercostal vein, C0226639  

 Posterior intercostal vein classified by ordinality  

Eleventh right posterior intercostal vein, C0501203  

Lower right posterior intercostal veins, C1283497  

 

FMA3_1  

18  18  

19  

Example:  Parent  of  the  Other  

Sign and Symptoms, C0037088  

 

Finding by Site or System, C1333618   Finding by site, C1290906  

 

Integumentary System Finding, C1291044  

 FMA3_1   SNOMEDCT_2012_07_31  

Sign and Symptoms, C0037088  

Integumentary System Finding, C1291044  

Sugges/on:  Concept  Import

Finding by Site or System, C1333618  

Finding by site, C1290906  

 

Sign and Symptoms, C0037088  

Integumentary System Finding, C1291044  

20  20  

Example:  Synonym  of  the  Other

21

Chemicals, C0220806  

Organic Chemicals, C0029224  

Chemicals, C0220806  

Chemical Viewed Structurally, C1254350  

Chemical categorized structurally, C0729761  

 Organic Chemicals, C0029224  

CPM2003   SNOMEDCT_2012_07_31  

21  

Sugges/on:  Merge  Two  Synonymous  Concepts

Chemicals, C0220806  

Chemical categorized structurally, C0729761 Synonym: Chemical Viewed Structurally  

 

Organic Chemicals, C0029224  

22  

Limita/ons

23  

�  Harmoniza/on  cannot  be  done  without  the  consent  of  terminology  curators  

�  All  the  terminologies  are  in  UMLS  Rich  Release  format  

23  

Future  Work

24  

� More  complex  configura/ons:  more  intermediate  concepts  

�  Algorithm  to  iden/fy  the  rela/onships  between  intermediate  concepts  in  complex  configura/ons  

�  Pairs  of  any  two  terminologies  

24  

Conclusions

25  

�  Six  reference  terminologies  of  the  UMLS  vs.  SNOMED  

�  In  a  sample  of  241  congruency  pairs  �  143  out  of  241  (59.3%)  concept  pairs:  alterna/ve  classifica/on.  �  47  out  of  241  (23.6%)  concept  pairs:  parent-­‐child  rela/onships.  �  35  (14.5%)  new  synonyms    �  Three  pairs  of  concepts  indicated  errors  

�  Take  home  message:  

�  A  semi-­‐automated  way  based  on  common  structure  of  the  UMLS  may  complement  exis/ng  human-­‐expert  centered  methods  to  find  poten/al  concepts  for  import  and  export  to  a  terminology.  

25  

26  

Acknowledgment

We  would  like  to  thank:    Dr.  Yehoshua  Perl  and  Dr.  Chunhua  Weng    for  sharing  their  insights  and  giving  feedback  for  this  work      

References  (1)

27  

Weng  C,  Tu  SW,  Sim  I,  Richesson  R.  Formal  representa/on  of  eligibility  criteria:  a  literature  review.  J  Biomed  Inform.  2010;43(3):451-­‐67.    Bi]ner  T,  Donnelly  M,  S.  W.  Ontology  and  seman/c  interoperability.  In:  Prosperi  D,  Zlatanova  S,  editors.  Large-­‐scale  3D  data  integra/on:  Problems  and  challenges:  CRCpress  (Tailor  &  Francis);  2005:  139-­‐60.      Tao  C,  Solbrig  HR,  Chute  CG.  CNTRO  2.0:  A  Harmonized  Seman/c  Web  Ontology  for  Temporal  Rela/on  Inferencing  in  Clinical  Narra/ves.  AMIA  Summits  Transl  Sci  Proc.  2011;2011:64-­‐8.      Rodrigues  JM,  Schulz  S,  Rector  A,  Spackman  KA,  Üstün  B,  Chute  CG,  Della  Mea  V,  Millar  J,  Persson  KB.  Sharing  Ontology  between  ICD  11  and  SNOMED  CT  will  enable  Seamless  Re-­‐use  and  Seman/c  Interoperability.  Stud  Health  Technol  Inform.  2013;192:343-­‐6      Elhanan  G,  Perl  Y,  Geller  J.  A  survey  of  SNOMED  CT  direct  users,  2010:  impressions  and  preferences  regarding  content  and  quality.  J  Am  Med  Inform  Assoc.  2011;  Suppl  1:i33-­‐44      

27  

References  (2)

28  

Kumar  A,  Smith  B,  Novotny  DD.  Biomedical  informa/cs  and  granularity.  Comp  Funct  Genomics.  2004;5(6-­‐7):501-­‐8.      Weng  C,  Gennari  JH,  Fridsma  DB,  User-­‐centered  Seman/c  Harmoniza/on:  A  Case  Study,  J  Biomed  Inform,  2007,  40(3):353-­‐64    Schulz  S,  Boeker  M,  Stenzhorn  H.  How  Granularity  Issues  Concern  Biomedical  Ontology  Integra/on.  In  Proceedings  of  the  Interna/onal  Congress  of  the  European  Federa/on  for  Medical  Informa/cs  (MIE  2008).  Gothenburg,  Sweden;  2008:  863-­‐68.      Rector  A,  Rogers  J,  Bi]ner  T.  Granularity,  scale  and  collec/vity:  when  size  does  and  does  not  ma]er.  J  Biomed  Inform.  2006  Jun;39(3):333-­‐49.      Mougin  F,  Bodenreider  O.  Approaches  to  elimina/ng  cycles  in  the  UMLS  Metathesaurus:  naive  vs.  formal.  AMIA  Annu  Symp  Proc.  2005:550-­‐4.        

28  

Thanks!      

Zhe  He,  PhD  [email protected]