Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research...

28
Sentiment Analysis State of the Art in Research and Industry 16.9.2016 SDS 2016 Zurich Mark Cieliebak Zurich University of Applied Sciences

Transcript of Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research...

Page 1: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

Sentiment Analysis

State of the Art in Research and Industry

16.9.2016 – SDS 2016 – Zurich

Mark Cieliebak Zurich University of Applied Sciences

Page 2: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

Mark Cieliebak

+ PhD in Theoretical Computer Science

+ IT Consultant in Major Swiss Bank

+ CIO at Netbreeze (bought by Microsoft)

+ >30 Publications

Lecturer Conference ChairCEO

SwissText

Page 3: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

3Mark Cieliebak, 16.9.2016ZHAW

Sentiment Analysis

Goal: Decide whether a text

expresses positive or

negative emotion.

" This is a nice conference! "

Page 4: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

4Mark Cieliebak, 16.9.2016ZHAW

Insights for Marketing and Sales

Sentiment Analysis can identify trends in Social Media

Page 5: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

5Mark Cieliebak, 16.9.2016ZHAW

Characteristics of Sentiment Analysis

Labels:

• Positive

• Negative

• Neutral

• Mixed

• (unknown)

Tasks:

• Single sentence

• Complete document

• Specific aspect/target

• Quantification

Page 6: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

6Mark Cieliebak, 16.9.2016ZHAW

Sentiment-Analysis sounds easy

…but it isn't

@francesco_con40 2nd worst

QB. DEFINITELY Tony Romo.

The man who likes to share

the ball with everyone.

Including the other team

Tim Tebow may be availible !

Wow Jerry , what the heck you

waiting for !

http://t.co/a7z9FBL4

@prodnose is this one of your

little jokes like Elvis playing at

the Marquee next Tuesday?

#YouCantDateMe if u still sag ur

pants super hard...dat shit is

played the fuck out!!!

Page 7: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

7Mark Cieliebak, 16.9.2016ZHAW

A Remark about Tool Quality

"They all suck…and we suck, too."

CEO of a sentiment analysis company (2013)

Page 8: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

8Mark Cieliebak, 16.9.2016ZHAW

Evaluation of Commercial Sentiment

Analysis Tools in 2013

7 Text Corpora

• Single statements

• Various media types (tweet, news,

reviews, speech transcripts etc.)

• Total: 28'653 texts

9 Commercial APIs

• Stand-alone

• Free for this evaluation

• English text

Page 9: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

9Mark Cieliebak, 16.9.2016ZHAW

Quality of Commercial Tools in 2013

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

Best Tool perCorpus

Overall Best Tool(Sentigem)

Source: M. Cieliebak et al.: Potential and Limitations of Commercial Sentiment Detection Tools, ESSEM 2013.

F1-S

co

re

Page 10: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

10Mark Cieliebak, 16.9.2016ZHAW

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

Worst Tool perCorpus

Best Tool perCorpus

Overall Best Tool(Sentigem)

Source: M. Cieliebak et al.: Potential and Limitations of Commercial Sentiment Detection Tools, ESSEM 2013.

F1-S

co

re

Quality of Commercial Tools in 2013

Page 11: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

11Mark Cieliebak, 16.9.2016ZHAW

SemEval: International Competition for

Sentiment Analysis

Year Winning Team F1-Score Winning Technology Remarks

2013 NRC Canada 69.02 Features + large

dictionaries

First run of the competition

2014 TeamX 72.12 Similar approach as in 2013 First two participants using

deep learning

2015 Webis 64.84 Ensemble of 4 approaches

from previous years

2016 SwissCheese 63.30 CNN+Distant Supervision 30'000 new tweets

Dominance of deep learning

among submissions

Task: Build a system for sentiment analysis (pos, neg,

neutral) on tweets in English

Page 12: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

12Mark Cieliebak, 16.9.2016ZHAW

F1-S

co

re

60

65

70

75

80

2013 2014 2015 2016

Winner of the Respective Year

Did Sentiment Technology Improve?

Page 13: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

13Mark Cieliebak, 16.9.2016ZHAW

Did Sentiment Technology Improve?F

1-S

co

re

60

65

70

75

80

2013 2014 2015 2016

Winner of the Respective Year Winner of 2016

Red line: performance of SemEval winner from 2016 (SwissCheese)

if only trained on training data for each year

Page 14: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

14Mark Cieliebak, 16.9.2016ZHAW

A Shallow Dive into

Technology

Page 15: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

15Mark Cieliebak, 16.9.2016ZHAW

SwissCheese: 3-Phase Training with Distant Supervision

Twitter

cat 0.1 0.9 0.3

cats 0.3 0.2 0.7

cute 0.2 0.3 0.1

Word Embeddings

Adapted Word Emb.

:-):-(

word2vecGloVe

DistantSupervision

2-Layer

CNN

Raw Tweets(200M)

Smiley Tweets(90M) .

AnnotatedTweets(18k)

UnknownTweet

cat 0.1 0.9 0.3

cats 0.3 0.2 0.7

cute 0.2 0.3 0.1

PredictiveModel

3-P

hase T

rain

ing

Ap

pli

cati

on

Page 16: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

16Mark Cieliebak, 16.9.2016ZHAW

2-Layer Convolutional Neural Network

for Sentiment Analysis

Page 17: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

17Mark Cieliebak, 16.9.2016ZHAW

Distant Phase rearranges Word Embeddings

Before the Distant Phase After the Distant Phase

Page 18: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

18Mark Cieliebak, 16.9.2016ZHAW

The More Data, The Better!

Number of annotated

tweets

Number of tweets in distant

phase

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

test2013-task-B test2014-task-B

test2015-task-B test2016-task-A

Page 19: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

19Mark Cieliebak, 16.9.2016ZHAW

Learn on Tweets, Classify News?

testtrain

SemEval'13_tweets MPQ_reviews DIL_reviews DAI_tweets Union of All Test Data

SemEval'13_tweets 72.4 45.8 53.1 62.2 63.9

MPQ_reviews 62.2 54.1 40.9 57.8 58.7

DIL_reviews 57.3 36.8 55.1 48.5 52.9

DAI_tweets 67.9 37.7 50.4 70.8 60.4

Union of All Training Data 73.0 50.8 49.9 76.6 66.6

Measured in F1 score

Cross-Domain Performance of SemEval Winner 2016

Page 20: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

20Mark Cieliebak, 16.9.2016ZHAW

Sentiment for other Languages

Language Available Data Best Know Result

(F1 Score)

Reference

German 10'000 Tweets 64.19 Deriu et al., 2016,

WSDM (submitted)

Spanish 68'000 Tweets 71.1

(precision)

Villena-Roman et

al., 2013,

Procesamiento del

Lenguaje Natural

Italian 7'000 Tweets 65.87 Deriu et al., 2016,

WSDM (submitted)

Dutch 1'100 Tweets

(labeled pos/neg)

88.33 Deriu et al., 2016,

WSDM (submitted)

Arabic 1'100 Tweets 73.5 Salab et al., 2015,

ANLP

Page 21: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

21Mark Cieliebak, 16.9.2016ZHAW

We did it: Theory is over!

Page 22: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

22Mark Cieliebak, 16.9.2016ZHAW

Do It Yourself:

Sentiment Analysis Tools and APIs

Big Players

• Google Prediction API

• IBM AlchemyAPI

• Microsoft Azure Text Analytics API

NLP Specialists

• RapidMiner

• Repustate

• Semantria

• SentiStrength

• SpinningBytes

Development Toolkits

• Natural Language ToolKit NLTK (Python)

• StanfordNLP (Java)

Page 23: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

23Mark Cieliebak, 16.9.2016ZHAW

Understand Customer Reviews

Source: http://blog.aylien.com/aspect-based-sentiment-analysis-now-available-in/

Example: Aspect-based Sentiment Analysis for Hotel Reviews

Page 24: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

24Mark Cieliebak, 16.9.2016ZHAW

Use Twitter to predict Heart Disease

Mortality

Source: Eichstaedt et al., 2015: Psychological Language on Twitter Predicts County-Level Heart

Disease Mortality

Page 25: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

25Mark Cieliebak, 16.9.2016ZHAW

"Cleantechness" of Company Products

Air and Environment

Disaster Prevention

Energy Production

Energy Transportation

Energy Efficiency

Mobility

Company Website

Cleantech Topics

Automatic Classifier

Page 26: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

26Mark Cieliebak, 16.9.2016ZHAW

Age and Gender of "Anonymous" Users

Goal: Predict age (18-24, 25-34, 35-49, 50+)

and gender (male/female) of Twitter users

Results PAN 2015:

Age: 86%

Gender: 84%

Source: Rangel et al., 2015: Overview of the 3rd Author Profiling Task at PAN 2015

Page 27: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

27Mark Cieliebak, 16.9.2016ZHAW

Talk in Short!

Sentiment Analysis

approx. 70% F1-score

the more data – the better

has important application

Page 28: Sentiment Analysis - Willkommen an der ZHAW€¦ · Sentiment Analysis State of the Art in Research and Industry 16.9.2016 –SDS 2016 –Zurich ... • Microsoft Azure Text Analytics

28Mark Cieliebak, 16.9.2016ZHAW

Thanks!!

Mark Cieliebak

Zurich University of Applied Sciences (ZHAW)

Email: [email protected], Website: www.zhaw.ch/~ciel

This presentation is based on

joint work with:

• Aurelien Lucchi, ETH

• Dominic Egger, ZHAW

• Fatih Uzdilli, ZHAW

• Jan Deriu, ZHAW

• Leon Derczynski, Univ. of

Sheffield

• Martin Jaggi, EPFL

• Maurice Gonzenbach, ZHAW

• Valeria de Luca, ETH