Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam,...

31
Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book

Transcript of Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam,...

Page 1: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Natural Language ProcessingSentiment Analysis

Potsdam, 7 June 2012

Saeedeh MomtaziInformation Systems Group

based on the slides of the course book

Page 2: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Sentiment Analysis

---------------------------------------------------------------------------------------------------------

Saeedeh Momtazi | NLP | 07.06.2012

2

Page 3: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Outline

1 Applications

2 Task

3 Machine Learning Approach

4 Rule-based Approach

Saeedeh Momtazi | NLP | 07.06.2012

3

Page 4: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Outline

1 Applications

2 Task

3 Machine Learning Approach

4 Rule-based Approach

Saeedeh Momtazi | NLP | 07.06.2012

4

Page 5: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Hotel Reviews

Saeedeh Momtazi | NLP | 07.06.2012

5

Page 6: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Product Reviews

Picture Quality

Ease of Use

Size

Weight

Color

Zoom

Saeedeh Momtazi | NLP | 07.06.2012

6

Page 7: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Social Media

Saeedeh Momtazi | NLP | 07.06.2012

7

Page 8: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Event Analysis and Prediction

Analyzing the side effects of events in different communitiesPredicting the election resultsPredicting the Stock exchange...

Saeedeh Momtazi | NLP | 07.06.2012

8

Page 9: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Outline

1 Applications

2 Task

3 Machine Learning Approach

4 Rule-based Approach

Saeedeh Momtazi | NLP | 07.06.2012

9

Page 10: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Sentiment Analysis Levels

�� ��Text

⇓ ⇓�� ��Opinion�� ��Fact

⇓ ⇓�� ��+�� ��−

⇓ ⇓

⇓ ⇓�

�happy

surprised...

�angry

afraid...

Saeedeh Momtazi | NLP | 07.06.2012

10

Page 11: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Advanced Sentiment Analysis

Opinion holderOpinion target / aspect

�Students︸ ︷︷ ︸ like Wikipedia︸ ︷︷ ︸ because it is easy to use and it sounds authoritative.

op holder target�

�I had a nice stay in this hotel and the rooms︸ ︷︷ ︸ were very clean.

. aspect

Mixed opinions��

��The restaurant has an amazing view but it is very dirty.

Saeedeh Momtazi | NLP | 07.06.2012

11

Page 12: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Other Names

Opinion miningOpinion extractionSentiment miningSubjectivity detectionSubjectivity analysis

Saeedeh Momtazi | NLP | 07.06.2012

12

Page 13: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Sentiment Analysis Approaches

Machine learning methods⇒classification

Rule-based methods⇒ dictionary oriented

Saeedeh Momtazi | NLP | 07.06.2012

13

Page 14: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Outline

1 Applications

2 Task

3 Machine Learning Approach

4 Rule-based Approach

Saeedeh Momtazi | NLP | 07.06.2012

14

Page 15: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Machine Learning Approach

Training

T1 → C1T2 → C2

...

Tn → Cn

−−−−→

f1f2...

fn

−−−−→�� ��Model

Testing

Tn+1 →? −−−−→ fn+1 −−−−→

←−−−−−−−−

Cn+1

Saeedeh Momtazi | NLP | 07.06.2012

15

Page 16: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Sentiment Classification

Using any kinds of supervised classifiersK Nearest NeighborSupport Vector MachinesNaïve BayesMaximum EntropyLogistic Regression...

Saeedeh Momtazi | NLP | 07.06.2012

16

Page 17: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Features

Word

All words or adjectives?All words works better than adjectives only

Word occurrence or frequency?Word occurrence is more useful than frequency

Using binary value for wordsReplace all word counts higher than 0 in each text by 1

Saeedeh Momtazi | NLP | 07.06.2012

17

Page 18: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Features

Negation

Negation words change the text polarityAdding prefix NOT− to every word between negation and nextpunctuation

�� ��“I did not like the restaurant location, but the food ...”

I did not NOT-like NOT-the NOT-restaurant NOT-location but the food ...

Saeedeh Momtazi | NLP | 07.06.2012

18

Page 19: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Features

Other emotions

Considering emoticons as additional features:):(

Saeedeh Momtazi | NLP | 07.06.2012

19

Page 20: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Fine-grained Analysis

Dealing with finer classes of sentiment-3,-2,-1,+1,+2,+3

ApproachesUsing multiclass classifier (6 classes in this case)Using two level classifier

First level: polarity classifier (positive or negative)Second level: strength classifier (1 or 2 or 3)

Saeedeh Momtazi | NLP | 07.06.2012

20

Page 21: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Outline

1 Applications

2 Task

3 Machine Learning Approach

4 Rule-based Approach

Saeedeh Momtazi | NLP | 07.06.2012

21

Page 22: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Training

T1 → C1T2 → C2...

Tn → Cn

goodlove

braveintelligent

nice...

badhatelie

uglypoor...

Testing

Tn+1 →? −−−−−−−→

←−−−−

Cn+1

Saeedeh Momtazi | NLP | 07.06.2012

22

Page 23: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Looking for opinionated words in each textClassifying the text based on the number of positive andnegative words

Considering different rules for classificationFine-grained dictionaryNegation wordsBooster wordsIdiomsEmoticonsMixed opinionsLinguistic features of the language

Saeedeh Momtazi | NLP | 07.06.2012

23

Page 24: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Fine-grained Dictionary

�� ��“It was a good song.”

�� ��“The song was excellent.”

Saeedeh Momtazi | NLP | 07.06.2012

24

Page 25: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Negation Words

�� ��“The song was good.”

�� ��“The song was not good.”

Saeedeh Momtazi | NLP | 07.06.2012

25

Page 26: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Booster Words

�� ��“The song was interesting.”

�� ��“The song was very interesting.”

�� ��“The song was somewhat interesting.”

Saeedeh Momtazi | NLP | 07.06.2012

26

Page 27: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Idioms

�� ��“shock horror”

Saeedeh Momtazi | NLP | 07.06.2012

27

Page 28: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

Mixed Opinions

�� ��“The song was good, but I think its title was strange.”

Saeedeh Momtazi | NLP | 07.06.2012

28

Page 29: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Rule-based Approach

German Linguistic Features

�“I do not love the song.” ⇒ “Ich liebe nicht das Lied.”

“Ich liebe das Lied nicht.”

Saeedeh Momtazi | NLP | 07.06.2012

29

Page 30: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Opinion Dictionary

English

Subjectivity Clues (2005)SentiSpin (2005)SentiWordNet (2006)Polarity Enhancement (2009)SentiStrength (2010)

German

GermanPolarityClues (2010)SentiWortSchatz (2010)GermanSentiStrength (2012)

Saeedeh Momtazi | NLP | 07.06.2012

30

Page 31: Sentiment Analysis Potsdam, 7 June 2012 · Natural Language Processing Sentiment Analysis Potsdam, 7 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the

Machine Learningwith Opinion Dictionary

Using opinion words as a feature in the algorithmsIgnoring other words in the text

Adjectives alone do not work well,but opinion words are the best features to be used

Saeedeh Momtazi | NLP | 07.06.2012

31