“Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI...
-
Upload
sydney-lawson -
Category
Documents
-
view
219 -
download
2
Transcript of “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI...
“Alexandru Ioan Cuza” of IașiFaculty of computer Science
SEMANTICA ȘI PRAGMATICA
LIMBAJULUI NATURAL
Daniela GÎFU
Iași09 Oct. 2014
IMPACT OF TOPIC
Sentiment Analysis (SA) - one of the most current topics in NLP.
SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly.
SA - very popular in social media.
-Target: academia and industry.
PURPOSE AND MOTIVATION
- to create a complete SOTA in SA, with a focus on social media posts.- to enhance the results of context-based SA.
- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums.- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).
CONTENT
1. Introduction2. A general view on the subject3. SA levels
3.1. SA at document level3.2. SA at clause/sentence level3.3. Features-based on SA3.4. Comparative sentiment analysis3.5. Sentiment lexicon acquisition3.6. Conclusions
4. Applications4.1. Business and government4.2. Review sites4.3. Other domains: politics and sociology4.4. Conclusions
5. Conclusions and discussions
2. A general view on the subject
SA - a module of extracting opinions, sentiments and subjectivity of the text;
SA – terminology:
- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986];- analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000];- affect [Batson, Shaw, and Oleson 1992];- point of view [Wiebe 1994; Scheibman 2002];- evaluation [Hunston and Thompson, 2001]- appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].
3. SA levels - document
Positive Negative Neutral
Fig. 2 Supervised learning – for three classes
a) supervised approach
3. SA levels - document
Fig. 2 Python NLTK Demos for Natural Language Text Processing
a) supervised approach
http://text-processing.com/demo/
3. SA levels - document
a) unsupervised approach
Based on determining the semantic orientation (SO) of specific words/phrases.
1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011]
1. Set of predefined POS models – [Turney, 2002]
3. SA levels – clause/sentence
More complex – identifying if a sentence is opinionated and establishing the nature of opinion;
- using supervised methods;
1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003]
2. an approach based on minimal reductions. [Pang and Lee, 2004]
The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?
3. SA levels – features
- more entities for each analyzed text or more attributes for each entity;- extraction of the attributes of an object;
Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/.
- extract and store all NPs;
- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]
3. SA levels – comparative
-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]
Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less)- superlative adjectives and adverbs: mai, cel puţin (En. - more, at least)- additional clauses: decât, împotriva (En. - rather than, against).
cover 98% of the comparative opinions
3. SA levels – sentiment lexicon
a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]
Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.
<classes><class name="emotional" id="1"/><class name="positive" id="2" parent="1"/><class name="negative" id="3" parent="1"/><class name="anxiety" id="4" parent="3"/><class name="anger" id="5" parent="3"/><class name="sadness" id="6" parent="3"/><class name="spectacular" id="7" parent="2"/><class name="firmness” id="8" parent="2"/><class name="moderation" id="9" parent="2"/>
</classes>
3. SA levels – sentiment lexicon
Our software performs part-of-speech (POS) tagging and lemmatization of words.
For example: <lexic name="Politic" lang="ro">
<word lemma="clevetitor" classes="1,3,6"/><word lemma="genial" classes="1,2,7"/>
…</lexic>
3. SA levels – sentiment lexicon
a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.
- a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).
Examples: bărbat puternic şi armonios / bărbat puternic şi armonios
femeie senzuală sau inteligentă? / femeie sărmană sau înstărită?
băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici urât...
4. Applications – business and government
“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004]
Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky)
or - misperceptions (even though they are wrong)
Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].
4. Applications – business and government
Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class);
- more classes, associating to each word with a value in the range -5 to +5;
- [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3;
- [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.
4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT>
<P ID="1"><S ID="1"><W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"offset="0"></W><NP HEADID="11.2" ID="0" ref="0"><W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"Number="singular" POS="PRONOUN" Person="third" Type="negative"offset="1">Nimic</W><W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W><W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W><W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W><W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W><W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"offset="29">decât</W></NP><NP HEADID="11.9" ID="1" ref="1"><W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W><NP HEADID="11.10" ID="2" ref="2"><W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W><W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W><NP HEADID="11.12" ID="3" re f="3"><W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"offset="53">platformă</W></NP></NP></NP></DOCUMENT>
4. Applications – business and government
- 46 rules for values. <rule>
<word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS“ value=”ADJECTIVE”/>
</rule>
Ex: cel mai bun
<rule><word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS” value=”bun”/>
</rule>
4. Applications – review sites
- to appreciate the reviews and ratings about your company or yourself;- to summarize reviews.
Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013]
6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter.
- we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.
4. Applications – politics/sociology
Two dimensions in politics:1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008];2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]
In sociology:- how ideas and innovations are propagated [Rosen, 1974]Ex: the polls on different issues
CONCLUSIONS AND DISCUSSIONS
SA - a complex task;SA - an emerging discipline with promising academic and, most important, industrial applications;....the sentiment classification problem - more challenging
Future work...
- to develop an independent sentiment classifier using machine learning methods;- to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization;- to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.