A word sense disambiguation technique for sinhala
-
Upload
vijayindu-gamage -
Category
Education
-
view
75 -
download
1
Transcript of A word sense disambiguation technique for sinhala
Janindu Arukgoda,Vidudaya Bandara,Samiththa Bashani,Vijayindu Gamage,Daya Wimalasuriya
A Word Sense Disambiguation Technique For Sinhala
Department of Computer Science & EngineeringUniversity of Moratuwa
Sri Lanka
Overview
Problem Statement
Uses of Word Sense Disambiguation for Sinhala
Existing Solutions for Other Languages
Attempts on Sinhala
Our Approach
Sinhala WordNet
System Architecture
Evaluation
Future works
Q & A
Problem Statement
All natural languages have words that have different senses in different contexts (Polysemy)
Identifying the implied sense of a polysemous word in a given context is called Word Sense Disambiguation
Open problem in NLP
Sinhala has a lot of polysemous words
But Sinhala doesn’t have a word sense disambiguation tool
Examples
දත් දදොස්තර මුව පරීක්ෂා කද ේය. (The dentist checked the mouth)
මුව රංචුව කැලය තුලට වැදුණි. (The herd of deerdisappeared in to the woods)
ඇයදේ මුව සඳක් වැනිය. (Her face looks like the moon)
Uses of Word Sense Disambiguation for Sinhala
Accurate translation from Sinhala to other languages
Create proper search methodologies for internet surfing for Sinhala resources
Text summarization
Better information retrieval systems for Sinhala
Word processing and spell checking
Content analysis
Information extraction
Text to Speech translation
Existing Solutions for Other Languages
A graph based approach for Hindi language
A rule based approach for Hindi language
Machine learning approaches for Hindi language
Semantic relatedness based approaches for German language
A WordNet based approach for German language
Rule based approaches for English language
Machine learning approaches for English language
Attempts on Sinhala
Word sense discrimination Vs disambiguation
No large enough corpus
High complexity
No morphological processor
Our Approach
Based on Sinhala WordNet
Important properties of human languages
o One sense per collocation
o One sense per discourse
Lesk Algorithm
Simplified Lesk Algorithm
Lesk Algorithm
Get the glosses of the senses of the target word from the WordNet
Compare the gloss of each sense of the target word with the glosses of every other word in the given window of words
Keep a count of the overlapping words in each sense pair
The most appropriate sense will be the one with highest count of overlaps
Simplified Lesk Algorithm
Get the glosses of the senses of the target word
Compare the gloss of each sense of the target word with a selected window of words ( n words around the target word in the given context, basically n/2 for right side and n/2 for the left side or choose accordingly),
Keep overlapping count for every sense within window pair
The most appropriate sense will be the one with highest count of overlaps
Sinhala WordNet
A lexical semantic network
Modeled after the Princeton WordNet for English
Crowdsourced
Evaluation Criteria
recall+precision
.recall precision2.=F1
returned answers totalNo.of
returned answers correctNo.of =precision
cases test totalNo.of
returned answers correctNo.of =recall