“Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI...

25
“Alexandru Ioan Cuza” of Iași Faculty of computer Science SEMANTICA ȘI PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iași 09 Oct. 2014

Transcript of “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI...

“Alexandru Ioan Cuza” of IașiFaculty of computer Science

SEMANTICA ȘI PRAGMATICA

LIMBAJULUI NATURAL

Daniela GÎFU

Iași09 Oct. 2014

SENTIMENT ANALYSIS – AN OVERVIEW

Cursul nr. 2

IMPACT OF TOPIC

Sentiment Analysis (SA) - one of the most current topics in NLP.

SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly.

SA - very popular in social media.

-Target:  academia and industry.

PURPOSE AND MOTIVATION

- to create a complete SOTA in SA, with a focus on social media posts.- to enhance the results of context-based SA.

- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums.- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).

CONTENT

1. Introduction2. A general view on the subject3. SA levels

3.1. SA at document level3.2. SA at clause/sentence level3.3. Features-based on SA3.4. Comparative sentiment analysis3.5. Sentiment lexicon acquisition3.6. Conclusions

4. Applications4.1. Business and government4.2. Review sites4.3. Other domains: politics and sociology4.4. Conclusions

5. Conclusions and discussions

2. A general view on the subject

SA - a module of extracting opinions, sentiments and subjectivity of the text;

SA – terminology:

- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986];- analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000];- affect [Batson, Shaw, and Oleson 1992];- point of view [Wiebe 1994; Scheibman 2002];- evaluation [Hunston and Thompson, 2001]- appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

3. Sentiment classification techniques

Fig. 1 Sentiment classification techniques

3. SA levels - document

Positive Negative Neutral

Fig. 2 Supervised learning – for three classes

a) supervised approach

3. SA levels - document

Fig. 2 Python NLTK Demos for Natural Language Text Processing

a) supervised approach

http://text-processing.com/demo/

3. SA levels - document

a) unsupervised approach

Based on determining the semantic orientation (SO) of specific words/phrases.

1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011]

1. Set of predefined POS models – [Turney, 2002]

3. SA levels – clause/sentence

More complex – identifying if a sentence is opinionated and establishing the nature of opinion;

- using supervised methods;

1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003]

2. an approach based on minimal reductions. [Pang and Lee, 2004]

The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?

3. SA levels – features

- more entities for each analyzed text or more attributes for each entity;- extraction of the attributes of an object;

Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/.

- extract and store all NPs;

- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

3. SA levels – comparative

-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]

Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less)- superlative adjectives and adverbs: mai, cel puţin (En. - more, at least)- additional clauses: decât, împotriva (En. - rather than, against).

cover 98% of the comparative opinions

3. SA levels – sentiment lexicon

a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]

Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.

<classes><class name="emotional" id="1"/><class name="positive" id="2" parent="1"/><class name="negative" id="3" parent="1"/><class name="anxiety" id="4" parent="3"/><class name="anger" id="5" parent="3"/><class name="sadness" id="6" parent="3"/><class name="spectacular" id="7" parent="2"/><class name="firmness” id="8" parent="2"/><class name="moderation" id="9" parent="2"/>

</classes>

3. SA levels – sentiment lexicon

Our software performs part-of-speech (POS) tagging and lemmatization of words.

For example: <lexic name="Politic" lang="ro">

<word lemma="clevetitor" classes="1,3,6"/><word lemma="genial" classes="1,2,7"/>

…</lexic>

3. SA levels – sentiment lexicon

a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.

- a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).

Examples:  bărbat puternic şi armonios / bărbat puternic şi armonios

femeie senzuală sau inteligentă? / femeie sărmană sau înstărită?

băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici urât...

4. Applications – business and government

“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004]

Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky)

or - misperceptions (even though they are wrong)

Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

4. Applications – business and government

Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class);

- more classes, associating to each word with a value in the range -5 to +5;

- [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3;

- [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT>

<P ID="1"><S ID="1"><W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"offset="0"></W><NP HEADID="11.2" ID="0" ref="0"><W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"Number="singular" POS="PRONOUN" Person="third" Type="negative"offset="1">Nimic</W><W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W><W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W><W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W><W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W><W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"offset="29">decât</W></NP><NP HEADID="11.9" ID="1" ref="1"><W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W><NP HEADID="11.10" ID="2" ref="2"><W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W><W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W><NP HEADID="11.12" ID="3" re f="3"><W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"offset="53">platformă</W></NP></NP></NP></DOCUMENT>

4. Process phases: POS-tagger & NER & Anaphora Resolution

Fig. 3 The interface of the EAT system

4. Applications – business and government

- 46 rules for values.  <rule>

<word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS“ value=”ADJECTIVE”/>

</rule>

Ex: cel mai bun

<rule><word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS” value=”bun”/>

</rule>

4. Applications – review sites

- to appreciate the reviews and ratings about your company or yourself;- to summarize reviews.

Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013]

6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter.

- we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

4. Applications – politics/sociology

Two dimensions in politics:1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008];2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]

In sociology:- how ideas and innovations are propagated [Rosen, 1974]Ex: the polls on different issues

CONCLUSIONS AND DISCUSSIONS

SA - a complex task;SA - an emerging discipline with promising academic and, most important, industrial applications;....the sentiment classification problem - more challenging

Future work...

- to develop an independent sentiment classifier using machine learning methods;- to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization;- to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.

Thank you for your attention!

?