Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50...
Transcript of Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50...
![Page 1: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/1.jpg)
109/11/07
Using UMLS CUIs for WSD in the Biomedical Domain
Bridget T. McInnes¹Ted Pedersen²
and John Carlis¹
University of Minnesota Twin Cities¹ and
University of Minnesota Duluth²
![Page 2: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/2.jpg)
209/11/07
What is WSD?
The culture count doubled.
Culture
LaboratoryCulture
AnthropologicalCulture
Sense Inventory
![Page 3: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/3.jpg)
309/11/07
Sense Inventory: UMLS
Unified Medical Language System contains a list of Concept Unique Identifiers (CUIs) which are concepts (senses) associated with a word
or term
Culture
LaboratoryCulture (C0430400)
AnthropologicalCulture (C0010453)
Sense Inventory: UMLS
![Page 4: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/4.jpg)
409/11/07
UMLS: Semantic Network
framework encoded with different semantic and syntactic structures
AnthropologicalCulture (C0010453)
Semantic Type(s):Idea or Concept
Semantic Type(s):Laboratory Procedure
Semantic Type:Mental Process semantic relation:
assesses_effect_ofsemantic relation:
result_of
LaboratoryCulture (C0430400)
![Page 5: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/5.jpg)
509/11/07
MetaMap
Concept mapping system
maps text to concepts in the UMLS provides a wealth of information for all words in a document
phrasal informationPart of speech (POS) of a wordCUI of a wordSemantic types of a word
![Page 6: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/6.jpg)
609/11/07
Example
The culture count doubled
countCUI: Count (C0750480)semantic type: Idea or Concept (idcn)pos: noun
doubled
CUI: Duplicate (C0205173)semantic type: Functional Concept (ftcn)pos: verb
![Page 7: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/7.jpg)
709/11/07
Supervised Approaches
Leroy and Rindflesch 2005Semantic types, semantic relations, part-of-speech, and head information (from MetaMap)
Joshi, Pedersen and Maclin 2005
unigrams in the same sentence as the ambiguous word in the same abstract as the ambiguous word
Liu, Teller and Friedman 2004unigrams, direction and orientation of unigrams and collocations
![Page 8: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/8.jpg)
809/11/07
Questions
![Page 9: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/9.jpg)
909/11/07
Questions
Would UMLS CUIs be an improvement over semantic types?
![Page 10: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/10.jpg)
1009/11/07
Questions
Would UMLS CUIs be an improvement over semantic types?
Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?
![Page 11: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/11.jpg)
1109/11/07
Questions
Would UMLS CUIs be an improvement over semantic types?
Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?
Would increasing the context window in which surrounding CUIs are found improve the results?
![Page 12: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/12.jpg)
1209/11/07
Our supervised approach
Algorithm:
Naïve Bayes from WEKA datamining package using 10 fold cross validation
Features:
UMLS CUIs obtained from MetaMap
that occur in the same sentence as the ambiguous word more than one time (s-1-cui) that occur in the same abstract as the ambiguous word more than one time (a-1-cui)
![Page 13: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/13.jpg)
1309/11/07
Example
... The culture count doubled. The cells multiplied by twice the expected rate ...
C0750480 Count (2)C0205173 Duplicate (1)...
C0750480 Count (2)C0205173 Duplicate (3)C0007634 Cells (4)C1517001 Expected (1)C1521828 Rate (3)...
Sentence: Abstract:
![Page 14: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/14.jpg)
1409/11/07
Example Instances
Extract Relevant CUIs
Training Data Test Data
Algorithm
Naïve Bayes Algorithm
Sense TaggedTest Data
![Page 15: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/15.jpg)
1509/11/07
Dataset
National Library of Medicine's Word Sense Disambiguation (NLM-WSD) Dataset
50 words from the 1998 MEDLINE abstracts
100 instances for each of the 50 words
Each instance has been tagged by MetaMap
The target word was manually assigned a UMLS concept or None
Average number of concepts per ambiguous word is 2.26 (not including None)
![Page 16: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/16.jpg)
1609/11/07
Data subsets
Liu subsetLiu, Teller and Friedman 200422 out of the 50 words in NLM-WSD
Leroy subset
Leroy and Rindflesch 200515 out of the 50 words in NLM-WSD
Joshi subset
Joshi, Pedersen and Maclin 200528 out of the 50 words in NLM-WSD
(union of Leroy and Liu subsets)
![Page 17: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/17.jpg)
17
Results
![Page 18: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/18.jpg)
1809/11/07
Results for Question 1
Would CUIs be an improvement over semantic types?
![Page 19: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/19.jpg)
1909/11/07
Comparative results with Leroy and Rindflesch 2005
s-1-cui a-1-cui s-0-Leroy0
5
10
15
2025
30
35
40
45
5055
60
65
7075
Accuracy using Leroy subset
71% 74.5%
65.6%
![Page 20: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/20.jpg)
2009/11/07
Significance of Differences
Pairwise t-test
s-1-cui (71%) and s-0-Leroy (65.6%)
p <= 0.001 a-1-cui (74.5%) and s-0-Leroy (65.6%)
p <= .00005
![Page 21: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/21.jpg)
2109/11/07
Results for Question 2
Would the biomedical specific feature CUIs be an improvement over the more
general feature unigrams?
![Page 22: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/22.jpg)
2209/11/07
Comparative results with Joshi, Pedersen and Maclin
2005
s-1-cui a-1-cui s-4-Joshi a-4-Joshi0
10
20
30
40
50
60
70
80
90
Accuracy using Joshi subset
77.7% 80% 82.5%
79.3%
![Page 23: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/23.jpg)
2309/11/07
Significance of Results
Pairwise t-test
s-1-cui (77.7%) and s-4-Joshi (79.3%)p < 0.135
a-1-cui (80.0%) and a-4-Joshi (82.5%)p < 0.003
![Page 24: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/24.jpg)
2409/11/07
Results for Question 3
Would increasing the size of the context window in which surrounding CUIs are found improve the results, as
seen by Joshi, Pedersen and Maclin using unigrams?
![Page 25: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/25.jpg)
2509/11/07
Comparative results between size of context window
s-1-cui a-1-cui0
10
20
30
40
50
60
70
80
Accuracy using NLM-WSD dataset
83.3% 85.6%
![Page 26: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/26.jpg)
2609/11/07
Significance of Results
Pairwise t-test
s-1-cui (83.3%) and a-1-cui (85.6%)p < 0.0006
![Page 27: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/27.jpg)
2709/11/07
Comparative results with Liu, Teller and Friedman 2004
a-1-cui s-0-Liu0
10
20
30
40
50
60
70
80
90
Accuracy using the Liu subset
81.9%85.5%
![Page 28: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/28.jpg)
2809/11/07
Significance of Results
Pairwise t-test
a-1-cui (81.9%) and s-1-Liu (85.5%)p < 0.001
![Page 29: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/29.jpg)
2909/11/07
Conclusions
CUIs result in more accurate disambiguation than semantic types and are comparable to unigrams
Incorporating more surrounding context improves the results
MetaMap generates useful information that can used as features for supervised disambiguation
![Page 30: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/30.jpg)
3009/11/07
Future Work
Combination approach
Exploring additional UMLS features
Unsupervised approach using information from the UMLS
![Page 31: Using UMLS CUIs for WSD in the Biomedical Domainbtmcinnes/presentations/... · 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 17 Results. 09/11/07 18 Results for](https://reader035.fdocuments.us/reader035/viewer/2022071213/60354b4750776d54076dea57/html5/thumbnails/31.jpg)
3109/11/07
Software and Data
CuiTools version 0.05http://cuitools.sourceforge.net
NLM-WSD Dataset
http://wsd.nlm.nih.gov Pairwise t-test
http://www.quantitativeskills.com/sisa/statistics/