0 1 2 Source: IISI / WSD 3 Source: IISI / Consensus Economics (October 2003)/WSD.
WSD approaches
Transcript of WSD approaches
-
8/11/2019 WSD approaches
1/24
TMP4013 - Information Systems LaboratoryWord Sense Disambiguation
(WSD)Part I
Suhaila Saee
Faculty of Computer Sciences and Information TechnologyUniversiti Sarawak Malaysia
Wednesday, 10 September 2014
-
8/11/2019 WSD approaches
2/24
TMP4013 - Information Systems LaboratoryA Big picture
Word SenseDisambiguation (WSD)
Introduction
WSD Applications
WSD Challenges
DenitionsWord Sense
Representation
WSD Tasks
Basic Approaches
Machine Translation
Information Retrieval
Question Answering
KnowledgeAcquisition
Information Extraction
Content Analysis
Word Processing
Speech Processing
Ambiguous
Linguists
Dictionaries
2/24
-
8/11/2019 WSD approaches
3/24
TMP4013 - Information Systems LaboratoryIntroduction
Scenario
Scenario
3/24
-
8/11/2019 WSD approaches
4/24
TMP4013 - Information Systems LaboratoryIntroduction
Scenario
Computers versus Humans
Polysemy : most words have many possible meanings
A computer program has no basis for knowing which one isappropriate, even if it is obvious to a human
Ambiguity : property of textIt is rarely a problem for humans in their day to day
communication, except in extreme cases
4/24
-
8/11/2019 WSD approaches
5/24
TMP4013 - Information Systems LaboratoryIntroduction
Motivation
1 Ambiguity for Humans : Newspaper HeadlinesINCLUDE CHILDREN WHEN BAKING COOKIESFARMER BILL DIES IN HOUSE
2 Ambiguity for Computers
The sherman jumped off the bank and into the waterThe bank down the street was robbed!Back in the day, we had an entire bank of computers devotedto this problem.The bank in that road is entirely too steep and is really
dangerousThe plane took a bank to the left, and then headed off towards the mountains
5/24
-
8/11/2019 WSD approaches
6/24
TMP4013 - Information Systems LaboratoryIntroduction
Motivation
A problem for Machine Translation (Weaver, 1949)
A word can often only be translated if you know the specic senseintended:
Little John was looking for his toy box. Finally he found it.The box was in the pen . John was very happy.
Is pen a writing instrument or an enclosure where children play?declared it unsolvable, left the eld of MT!
6/24
TMP4013 I f i S L b
-
8/11/2019 WSD approaches
7/24
TMP4013 - Information Systems LaboratoryIntroduction
Denitions
Word Sense Disambiguation (WSD)
Ambiguity is inherent to natural languageA word has several senses
In a particular context, only one sense is activated
DenitionWSD : A computational task to determine the sense of a word thathas been activated in a particular context.A word sense is a commonly accepted meaning of a word.A sense inventory partitions the range of meaning of a word intoits senses.
7/24
TMP4013 I f ti S t L b t
-
8/11/2019 WSD approaches
8/24
TMP4013 - Information Systems LaboratoryIntroduction
Word Sense Representation
Word Sense Representation I
With respect to a dictionary
Examplechair = a seat for one person, with a support for the backchair = the position of professor
With respect to the translation in a second language
Examplechair = chaise (in French)chair = directeur (in French)
8/24
TMP4013 Information Systems Laboratory
-
8/11/2019 WSD approaches
9/24
TMP4013 - Information Systems LaboratoryIntroduction
Word Sense Representation
Word Sense Representation II
With respect to the context where it occurs (discrimination)
ExampleSit on a chair, Take a seat on this chairThe chair of the Math Department, The chair of the meeting
9/24
TMP4013 Information Systems Laboratory
-
8/11/2019 WSD approaches
10/24
TMP4013 - Information Systems LaboratoryIntroduction
The WSD Tasks
What are the Tasks?
1 The tasks:To identify the intended sense of a word in contextUsually assumes a xed inventory of senses (e.g. WordNet)
2 Can be viewed as categorisationSimilar to the POS tagging task
3 A crucial prerequisite for many NLP applications
WSD is not itself an end applicationMany other tasks need WSD (e.g. ??)
10/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
11/24
TMP4013 Information Systems LaboratoryIntroduction
WSD Tasks Variants
Tasks Variants I
Lexical sample task (or Targeted WSD)WSD for small, xed set of wordsFocusing on early work in WSDTo disambiguate a restricted set of target words usuallyoccurring one per sentenceSupervised systems are typically employed in this setting asthey can be trained using a number of hand-labelled instances(training set) and then applied to classify a set of unlabelledexamples (test set)
All words WSDWSD for every content word in a textTo disambiguate all open-class words in a text (i.e., nouns,verbs, adjectives, and adverbs)This task requires wide-coverage systems
11/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
12/24
TMP4013 Information Systems LaboratoryIntroduction
WSD Tasks Variants
Tasks Variants II
Big data sparsity problem: dont have labelled data for every
wordCant train separate classier for every word
Pseudowords - Articial word created by concatenating tworandomly chosen words together
12/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
13/24
y yIntroduction
WSD Tasks Variants
Pseudoword Task
To disambiguate articial ambiguous wordsWhy articial data?
Because disambiguating manually natural ambiguous words is
a time-intensive and laborious taskThe text with the pseudowords is considered as the ambiguoussource textThe original text is considered containing disambiguated words
ExamplePseudoword = banana-door All occurrences of banana and door in a corpus will be replaced bybanana-door
13/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
14/24
y yIntroduction
Basic Approaches
Basic Approaches
Supervised approachesUse machine-learning techniques to learn a classier fromlabelled (annotated) training sets
Unsupervised approachesBased on raw unlabelled (unannotated) corpora
Knowledge-based (or knowledge-rich, or dictionary-based)approaches
Rely on the use of external lexical resources (i.e. dictionariesand thesauri)
Corpus-based (or knowledge-poor) approachesDo not make use of any lexical resources for disambiguation
14/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
15/24
WSD ApplicationsMachine Translation
WSD & Machine Translation
WSD is required for lexical choice in MT for:words that have different translations for different sensespotentially ambiguous within a given domain
ExampleTranslate bill from English to Spanish.
Is it a pico (a bird jaw) or a cuenta (an invoice)?
15/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
16/24
WSD ApplicationsInformation Retrieval
WSD & Information Retrieval
Ambiguity has to be resolved in some queries.
ExampleFind all Web Pages about cricket.The sport or the insect ?
16/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
17/24
WSD ApplicationsQuestion Answering
WSD & Question Answering
ExampleWhat is George Millers position on gun control?The psychologist or US congressman ?
17/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
18/24
WSD ApplicationsKnowledge Acquisition
WSD & Knowledge Acquisition
ExampleAdd to knowledge base: Herb Bergson is the mayor of Duluth.Minnesota or Georgia ?
18/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
19/24
WSD ApplicationsInformation Extraction
WSD & Information Extraction
WSD is required for the accurate analysis of text in many
applicationsExampleThe BMW slowed down.BMW: a specic car or the car company ?
19/24
TMP4013 - Information Systems Laboratory
-
8/11/2019 WSD approaches
20/24
WSD ApplicationsContent Analysis
WSD & Content Analysis
ExampleClassication of blogs by main topics and nding semanticconnections between them
20/24
TMP4013 - Information Systems LaboratoryWSD A li i
-
8/11/2019 WSD approaches
21/24
WSD ApplicationsWord Processing
WSD & Word Processing
ExampleTo determine when diacritics should be inserted (spellingcorrection)
Italian: da (= from) vs da (= gives)Papa (= Pope) vs papa (= dad)
Other possible tasks:For case changes, e.g., HE READ THE TIMES He read the Times For lexical access of Semitic languages (in which vowels arenot written), e.g., Arabic: a root meaning of write has aform of k , t , b .
21/24
TMP4013 - Information Systems LaboratoryWSD A li ti
-
8/11/2019 WSD approaches
22/24
WSD ApplicationsSpeech Processing
WSD & Speech Processing
Speech synthesis: WSD for correct phonetisation of words,
e.g., word conjure in:He conjured up an image ORI conjure you to help me
Speech recognition: WSD for word segmentation and
homophone discrimination
22/24
TMP4013 - Information Systems LaboratoryWSD Challenges
-
8/11/2019 WSD approaches
23/24
WSD Challenges
WSD Challenges
Dictionary-based word sense denitions are ambiguousInter-agreement between linguists who annotate manuallyword senses is not as high as would be expectedWSD involves much world knowledge or common sense, whichis difficult to verbalise in dictionaries
23/24
TMP4013 - Information Systems LaboratoryReferences
-
8/11/2019 WSD approaches
24/24
References
References
1 Liu, H., Teller, V., Friedman, C. (2004). A Multi-aspectComparison Study of Supervised Word Sense Disambiguation.Journal of the American Medical Informatics Association,11(4): 235-240.
2 Navigli, R. (2009). Word Sense Disambiguation: A Survey.ACM Computing Surveys, Vol. 41, No. 2, Article 10.
3 Yarowsky, D. (1995). Unsupervised word-sense disambiguation
rivaling supervised methods. Proceedings of the 33rd AnnualMeeting of the Association for Computational Linguistics(ACL95). Cambridge, Mass, 189-196.
24/24