Sa Mincut Adityankfn
-
Upload
sukhvir-singh-lal -
Category
Documents
-
view
217 -
download
2
description
Transcript of Sa Mincut Adityankfn
![Page 1: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/1.jpg)
SSentiment AAnalysis
Presented byAditya Joshi 08305908
Guided byProf. Pushpak Bhattacharyya
IIT Bombay
![Page 2: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/2.jpg)
What is SA & OM?• Identify the orientation of opinion in a
piece of text
• Can be generalized to a wider set of emotions
The movie was fabulous!
The movie stars Mr. X
The movie was horrible!
![Page 3: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/3.jpg)
Motivation• Knowing sentiment is a very natural ability
of a human being.Can a machine be trained to do it?
• SA aims at getting sentiment-related knowledge especially from the huge amount of information on the internet
• Can be generally used to understand opinion in a set of documents
![Page 4: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/4.jpg)
Tripod of Sentiment AnalysisCognitive Science
Natural Language Processing
Machine Learning
Sentiment Analysis
Natural Language Processing
Machine Learning
![Page 5: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/5.jpg)
Contents :
Challenges
Subjectivitydetection
SAApproaches
Applications
Lexical Resources
![Page 6: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/6.jpg)
Challenges• Contrasts with standard text-based
categorization• Domain dependent• Sarcasm• Thwarted expressions
Mere presence of words is Indicative of the category
in case of text categorization.
Not the case with sentiment analysis
• Contrasts with standard text-based categorization
• Domain dependent• Sarcasm• Thwarted expressions
Sentiment of a word is w.r.t. the
domain.
Example: ‘unpredictable’
For steering of a car,
For movie review,
• Contrasts with standard text-based categorization
• Domain dependent• Sarcasm• Thwarted expressions
Sarcasm uses words ofa polarity to represent
another polarity.
Example: The perfume is soamazing that I suggest you wear it
with your windows shut
• Contrasts with standard text-based categorization
• Domain dependent• Sarcasm• Thwarted expressions
the sentences/words that contradict the overall sentiment
of the set are in majority Example: The actors are good,
the music is brilliant and appealing.Yet, the movie fails to strike a chord.
![Page 7: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/7.jpg)
SentiWordNet
•Lexical resource for sentiment analysis
•Built on the top of WordNet synsets
•Attaches sentiment-related information with synsets
![Page 8: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/8.jpg)
Quantifying sentiment
Objective Polarity
Subjective Polarity
Positive Negative
Term sense position
Each term has a Positive, Negative and Objective score. The scores sum to one.
![Page 9: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/9.jpg)
Building SentiWordNet
• Ln, Lo, Lp are the three seed sets• Iteratively expand the seed sets through K
steps• Train the classifier for the expanded sets
![Page 10: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/10.jpg)
LpLn
also-see
antonymy
Expansion of seed sets
The sets at the end of kth step are called Tr(k,p) and Tr(k,n)
Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
![Page 11: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/11.jpg)
Committee of classifiers• Train a committee of classifiers of different
types and different K-values for the given data
• Observations:– Low values of K give high precision and low
recall– Accuracy in determining positivity or
negativity, however, remains almost constant
![Page 12: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/12.jpg)
WordNet Affect
• Similar to SentiWordNet (an earlier work)• WordNet-Affect: WordNet + annotated
affective concepts in hierarchical order• Hierarchy called ‘affective domain labels’
– behaviour– personality– cognitive state
![Page 13: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/13.jpg)
Subjectivity detection• Aim: To extract subjective portions of text• Algorithm used: Minimum cut algorithm
![Page 14: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/14.jpg)
Constructing the graph• Why graphs?• Nodes and edges?• Individual Scores• Association scores
To model item-specific and pairwise information
independently.
• Why graphs?• Nodes and edges?• Individual Scores• Association scores
Nodes: Sentences of the document and source & sink
Source & sink representthe two classes of sentences
Edges: Weighted with either of the two scores
• Why graphs?• Nodes and edges?• Individual Scores• Association scores
Prediction whether the sentence is subjective or not
Indsub(si)=
• Why graphs?• Nodes and edges?• Individual Scores• Association scores
Prediction whether two sentences should have
the same subjectivity level
T : Threshold – maximum distance upto which sentences may be considered proximalf: The decaying functioni, j : Position numbers
![Page 15: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/15.jpg)
Constructing the graph• Build an undirected graph G with vertices
{v1, v2…,s, t} (sentences and s,t)• Add edges (s, vi) each with weight ind1(xi)
• Add edges (t, vi) each with weight ind2(xi)
• Add edges (vi, vk) with weight assoc (vi, vk)
• Partition cost:
![Page 16: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/16.jpg)
Example
Sample cuts:
![Page 17: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/17.jpg)
DocumentSubjective
Results (1/2)
• Naïve Bayes, no extraction : 82.8%• Naïve Bayes, subjective extraction : 86.4%• Naïve Bayes, ‘flipped experiment’ : 71 %
DocumentSubjectivity
detectorObjective
POLARITY CLASSIFIER
![Page 18: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/18.jpg)
Results (2/2)
![Page 19: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/19.jpg)
Approach 1: Using adjectives
• Many adjectives have high sentiment value– A ‘beautiful’ bag– A ‘wooden’ bench– An ‘embarrassing’ performance
• An idea would be to augment this polarity information to adjectives in the WordNet
![Page 20: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/20.jpg)
Setup• Two anchor words (extremes of the polarity
spectrum) were chosen• PMI of adjectives with respect to these
adjectives is calculated
Polarity Score (W)= PMI(W,excellent) – PMI (W, poor)
excellent poor
wordPMI PMI
![Page 21: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/21.jpg)
Experimentation• K-means clustering algorithm used on the
basis of polarity scores• The clusters contain words with similar
polarities• These words can be linked using an
‘isopolarity link’ in WordNet
![Page 22: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/22.jpg)
Results• Three clusters seen• Major words were with negative polarity
scores• The obscure words were removed by
selecting adjectives with familiarity count of 3– the ones that are not very common
![Page 23: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/23.jpg)
Approach 2: Using Adverb-Adjective Combinations (AACs)
• Calculate sentiment value based on the effect of adverbs on adjectives
• Linguistic ideas:• Adverbs of affirmation: certainly• Adverbs of doubt: possibly• Strong intensifying adverbs: extremely• Weak intensifying adverbs: scarcely• Negation and Minimizers: never
![Page 24: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/24.jpg)
Moving towards computation…
• Based on type of adverb, the score of the resultant AAC will be affected
• Example of an axiom:
• Example : ‘extremely good’ is more positive than ‘good’
![Page 25: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/25.jpg)
AAC Scoring Algorithms
1. Variable Scoring Algorithm2. Adjective Priority Scoring Algorithm3. Adverb first scoring algorithm
![Page 26: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/26.jpg)
Scoring the sentiment on a topic
• Rel (t) : Sentences in d that reference to topic t
• s : Sentence is Rel (t)• Appl+(s) : AACs with positive score in s• Appl-(s) : AACs with negative score in s• Return strength =
![Page 27: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/27.jpg)
Findings• APSr with r=0.35 worked the best (Better
correlation with human subject)– Adjectives are more important than adverbs in
terms of sentiment
• AACs give better precision and recall as compared to only adjectives
![Page 28: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/28.jpg)
Approach 3: Subject-based SA
• Examples:
The horse bolted.
The movie lacks a good story.
![Page 29: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/29.jpg)
Lexiconsubj. bolt
b VB bolt subj
subj. lack obj.
b VB lack obj ~subj
Argument that sends the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
Argument that receives the sentiment (subj./obj.)
![Page 30: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/30.jpg)
Lexicon• Also allows ‘\S+’ characters• Similar to regular expressions• E.g. to put \S+ to risk
– The favorability of the subject depends on the favorability of ‘\S+’.
![Page 31: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/31.jpg)
Example
The movie lacks a good story.
G JJ good obj.
The movie lacks \S+.
B VB lack obj ~subj.
Lexicon : Steps :
1) Consider a context window of upto five words
2) Shallow parse the sentence
3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step
![Page 32: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/32.jpg)
ResultsDescription Precision Recall
Benchmark corpus
Mixed statements
94.3% 28%
Open Test corpus
Reviews of a camera
94% 24%
![Page 33: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/33.jpg)
Applications• Review-related analysis• Developing ‘hate mail filters’ analogous to
‘spam mail filters’• Question-answering (Opinion-oriented
questions may involve different treatment)
![Page 34: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/34.jpg)
Conclusion & Future Work• Lexical Resources have been developed to
capture sentiment-related nature• Subjective extracts provide a better accuracy of
sentiment prediction• Several approaches use algorithms like Naïve
Bayes, clustering, etc. to perform sentiment analysis
• The cognitive angle to Sentiment Analysis can be explored in the future
![Page 35: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/35.jpg)
References (1/2)• Tetsuya Nasukawa, Jeonghee Yi. ‘Sentiment Analysis: Capturing
Favorability Using Natural Language Processing’. In K-CAP ’03, Florida, pages 1-8. 2003.
• Alekh Agarwal, Pushpak Bhattacharyya. ‘Augmenting WordNet with polarity information on adjectives’. In K-CAP ’03, Florida, pages 1-8. 2003.
• SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli, Fabrizio Sebastiani
• ‘Machine Learning’, Han and Kamber, 2nd edition, 310-330.• http://wordnet.princeton.edu• Farah Benamara, Carmine Cesarano, Antonio Picariello, VS Subrahmanian
et al; ‘Sentiment Analysis: Adjectives and Adverbs are better than Adjectives Alone’; In ICWSM ’2007 Boulder, CO USA, 2007.
![Page 36: Sa Mincut Adityankfn](https://reader035.fdocuments.us/reader035/viewer/2022062811/5695d1a91a28ab9b02976cfd/html5/thumbnails/36.jpg)
References (2/2)• Jon M. Kleinberg; ‘Authoritative Sources in a Hyperlinked Environment’ as
IBM Research Report RJ 10076, May 1997, Pgs. 1 – 34.• www.cs.uah.edu/~jrushing/cs696-summer2004/notes/Ch8Supp.ppt• Opinion Mining and Sentiment Analysis, Foundations and Trends in
Information Retrieval, B. Pang and L. Lee, Vol. 2, Nos. 1–2 (2008) 1–135, 2008.
• Bo Pang, Lillian Lee; ‘A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts’; Proceedings of the 42nd ACL; pp. 271–278; 2004.
• http://www.cse.iitb.ac.in/~veeranna/ppt/Wordnet-Affect.ppt