From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert...
-
Upload
elwin-long -
Category
Documents
-
view
215 -
download
0
description
Transcript of From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert...
![Page 1: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/1.jpg)
From Words to Senses: A Case Study of Subjectivity
Recognition
Author: Fangzhong Su & Katja Markert (University of Leeds, UK)
Source: COLING 2008 Reporter: Yong-Xiang Chen
![Page 2: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/2.jpg)
Background & Motivation• Subjectivity analysis focuses on determining
whether a language unit expresses subjectivity– private state, opinion or attitude– and, if so, what polarity is expressed
• Many words being subjectivity-ambiguous– Having both subjective and objective senses– Example: two sense of the word “positive”
• having a positive electric charge (objective)• involving advantage or good (subjective)
– The annotation of words independent of sense or domain does not capture such distinctions
![Page 3: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/3.jpg)
Goal & Advantage
• Determine the subjectivity of word sense– Avoid costly annotation during training step– Evaluate how useful of existing resources
• Which are not tailored towards word sense
• Increase the lexica’s usability– Allow to group fine-grained senses into higher-level cl
asses based on subjectivity/objectivity• Improve WSD task
– For subjectivity-ambiguous words
![Page 4: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/4.jpg)
Related work• Esuli and Sebastiani (2006)
– Determine the polarity of word senses in WordNet– Training set: Expand a small, manually determined se
ed set of WordNet senses – Use the resulting larger training set for supervised cla
ssification• Wiebe and Mihalcea (2006)
– Label word senses in WordNet as subjective or objective
– The method relying on • an independent, large manually annotated opinion corpus (M
PQA)• distributional similarity
![Page 5: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/5.jpg)
Subjectivity VS. Polarity
• In this study, do not see polarity as a indicator to the subjectivity of sense– Most subjective senses have a relatively clear
polarity– But polarity can be attached to objective word
s/senses as well• Tuberculosis 結核病 (objective)(negative)
![Page 6: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/6.jpg)
Annotation for subjectivity and polarity of word senses
• Annotate the Micro-WNOp corpus as test set – containing 1,105 WordNet synsets
• Subjectivity– subjective (S), objective (O), both (B)– (B): a WordNet synset contains both opinionated and objective e
xpressions• Polarity
– positive (P), negative (N), varied (V)– (V): a sense’s polarity varies strongly with the context
• Uncompromising(不妥協 ) will be positive or negative depending on what a person is uncompromising
• 7 sub categories– O:NoPol, O:P, O:N, S:P, S:N, S:V, and B
![Page 7: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/7.jpg)
Annotation scheme
• Manually annotate polarity for subjective senses, as well as objective senses that carry a strong association – Annotate subjectivity for finding and analysing
directly expressed opinions– Annotate polarity for either classifying these fu
rther or extracting objective words
![Page 8: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/8.jpg)
High Agreement
• The overall agreement using all 7 categories is 84.6%, with a kappa of 0.77– Between two annotators
• High agreement is due to– annotation of senses instead of words– sense descriptions providing more information– split of subjectivity and polarity annotation
made the task clearer
![Page 9: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/9.jpg)
Gold Standard
• The purpose is focus on subjectivity, so integrate labels into: S, O, B
• The Micro-WNOp corpus includes 298 different words– 97 (32.5%) are subjectivity-ambiguous
• Excluded all senses with the label B from Micro-WNOp for testing the automatic algorithms– resulting in a final 1061 senses
• 703 objective• 358 subjective
![Page 10: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/10.jpg)
Algorithms
1. Standard Supervised Approach2. Sentence Collections: Movie 3. Sentence Collections: MPQA4. Word Lists: General Inquirer 5. Word Lists: Subjectivity List
![Page 11: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/11.jpg)
Standard Supervised Approach• 10-fold cross validation for training and testing on t
he annotated Micro-WNOp corpus• Applied a Naive Bayes classifier• Three types of features:
– Lexical Features• unigrams in the glosses as bag-of-words• WordNet synsets
– Part-of-Speech Features– Relation Features
• Employ 8 relations– antonym, similar-to, derived from, attribute, also-see, direct-hypony
m, direct hypernym, and extended-antonym• Each relation R leads to 2 features
– describe for a sense A how many links of that type it has to synsets in the subjective or the objective training set
![Page 12: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/12.jpg)
Sentence Collections Approach• Cast word sense subjectivity classification as a sentence
classification task• Take the glosses that WordNet provides for each sense
as the sentences to be classified• Can in theory feed any collection of annotated sentences
as training data1. Movie-domain Subjectivity Data Set (Movie)
• 5000 subjective sentences and 5000 objective sentences2. MPQA Corpus
• contains news articles manually annotated at the phrase level• 6127 subjective and 4985 objective sentences
• Use a Naive Bayes algorithm with lexical unigram features
![Page 13: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/13.jpg)
Word Lists Approach• General Inquirer (GI)
– concentrates on word polarity– assume that both positive and negative words in the
GI list are subjective clues– 1915 positive, 2291 negative and 7582 no-polarity
words• Subjectivity clues list (SL)
– centers on subjectivity and provides part-of-speech, subjectivity strength, and prior polarity
– 8,000 subjective words• Both are not include word senses information
and cannot be used directly
![Page 14: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/14.jpg)
Unsupervised algorithm• Consider occurrence of subjective words in gloss to indica
te a subjective sense overall• Adopt rule-based unsupervised algorithm• Compute a subjectivity score S for each WN synset
– summing up the weight values of all subjectivity clues in its gloss• GI:all subjectivity clues weighted 1• SL:2 to strongly subjective clues and 1 to weakly subjective clues
1. If S is equal or higher than an agreed threshold T, then the synset is classified as subjective• Best thresholds: 2 for SL and 4 for the GI
2. Set two thresholds as rule to divide all synsets into subjective/objective training set• Best thresholds
– SL: T1=4 and T2=2– GI: T1=3 and T2=1
![Page 15: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/15.jpg)
Experiments and Evaluation
![Page 16: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/16.jpg)
Discussion• To three star methods, small but consistent
improvement when we use additional features• Why using SL always greatly outperforms GI?
– the GI lexicon is annotated for polarity, not subjectivity• It includes words that we see as objective but with a strong
positive or negative– GI lexicon does not operate with a clearly expressed
polarity definition and leading to conflicting annotations
– GI contains fewer features– GI contains many fewer subjective clues
![Page 17: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/17.jpg)
Discussion
• The results of using sentence dataset are not satisfactory– the subjectivity definition in the Movie corpus
does not seem to match ours• we define a word sense or a sentence as subjectiv
e if it expresses a private state (i.e., emotion, opinion, sentiment, etc.)
• in Movie dataset, its “objective” data set rarely contain opinions about the “movie”, but contain other opinionated content
• for example: about the “characters”
![Page 18: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/18.jpg)
Comparison to Prior Approaches
• VS. SentiWordNet– If the sum of positive and negative scores of a sense in SentiWo
rdNet is more than or equal to 0.5, then it is subjective and otherwise objective
– SentiWordNet achieves 75.3% accuracy on the Micro-WNOp– The CV* and SL* perform slightly better than SentiWordNet
• Test data of Wiebe and Mihalcea (2006) is not publically available– Precision = 48.9%, Recall = 60% for subjective senses– our best SL* method has a precision = 66% at about the same re
call
![Page 19: From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:](https://reader036.fdocuments.us/reader036/viewer/2022082620/5a4d1b107f8b9ab05998ed54/html5/thumbnails/19.jpg)
Conclusion
• Proposed different ways of extracting training data and clue sets
• The effectiveness of the resulting algorithms depends on the different definitions of subjectivity
• At least one of purpose methods performed on a par with a supervised classifier– it is possible to avoid any manual annotation for the
subjectivity classification of word senses