Potential and Limitations of Commercial Sentiment Detection Tools

17
Zurich Universitiy of Applied Sciences Potential and Limitations of Commercial Sentiment Detection Tools Fatih Uzdilli joint work with Mark Cieliebak and Oliver Dürr 03.12.2013 @ ESSEM’13

description

In this paper, we analyze the quality of several commercial tools for sentiment detection. All tools are tested on nearly 30’000 short texts from various sources, such as tweets, news, reviews etc. In addition to the quality analysis (measured by various metrics), we also investigate the effect of increasing text length on the performance. Finally, we show that combining all tools using machine learning techniques increases the overall performance significantly.

Transcript of Potential and Limitations of Commercial Sentiment Detection Tools

Page 1: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Potential and Limitations of Commercial Sentiment Detection Tools

Fatih Uzdillijoint work with Mark Cieliebak and Oliver Dürr

03.12.2013 @ ESSEM’13

Page 2: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

About Me

Fatih UzdilliInstitute of Applied Information Technology (InIT)

ZHAW, Winterthur, Switzerland

Email: , more about me: home.zhaw.ch/~uzdi

Research Interest

Information Retrieval, Machine Learning, Sentiment Analysis

Background

Software Engineer, Social Media Monitoring, Search Technologies

203.12.2013 Fatih Uzdilli

Page 3: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Abstract

Evaluation of 9 commercial sentiment tools on approx. 30'000 short texts.

Best commercial tools have accuracy of only 60%.

Combining all tools using Random Forest improved the accuracy.

303.12.2013 Fatih Uzdilli

Page 4: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Motivation

• Scientific results for sentiment detection: «very good performance: > 80%

accuracy»

• Blog posts about commercial tools: «very poor quality,

unusable»4

C

D03.12.2013 Fatih Uzdilli

Page 5: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Motivation

• Scientific results for sentiment detection: «very good performance: > 80%

accuracy»

• Blog posts about commercial tools: «very poor quality,

unusable»5

C

DReally?

03.12.2013 Fatih Uzdilli

Page 6: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

How good is commercial Sentiment Detection?

Is there potential for improvement?

6

source: http://www.commute.com/images/schools_evaluation.jpg source: http://3.bp.blogspot.com/-u3acK_WjaLU/ULYv51mHEhI/ AAAAAAAAARY/ DIZqOfxuswc/s1600/IcebergQ1.jpg

03.12.2013 Fatih Uzdilli

Page 7: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Evaluation Setup

• 7 Public Text Corpora– Single Statements– Different Media Types

• Tweet, News, Review, Speech Transcript

– Total: 28653 Texts

• 9 Commercial APIs– Stand-alone– Free for this evaluation– Arbitrary Text

NEGATIVEPOSITIVE

OTHER( neutral / mixed )

703.12.2013 Fatih Uzdilli

Page 8: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Tool Accuracy

C1(Tweets)

C2(Quotations)

C3(Reviews)

C4(Headlines)

C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per Corpus

Average of All Tools

Worst Tool per Corpus

Acc

ura

cy

8

61%

52%

40%

Avg.

03.12.2013 Fatih Uzdilli

Page 9: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Tool Accuracy

C1(Tweets)

C2(Quotations)

C3(Reviews)

C4(Headlines)

C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusOverall Best Tool

Acc

ura

cy

9

61%52%40%59%45%

Avg.

03.12.2013 Fatih Uzdilli

Page 10: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Further Findings

• Longer texts are hard to classify

• Corpus annotations might be erroneous

1003.12.2013 Fatih Uzdilli

Page 11: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Can a Meta-Classifier do better?

• 1st Approach: Majority Classifier– Sentiment with most votes chosen

Illustration:

1103.12.2013 Fatih Uzdilli

api1 api2 api3 api4 api5 api6 api7 MajorityText 1 + + - o - + o +Text 2 - + + - - - - -Text 3 - o + + + + - +Text n o o + o - o o o

Page 12: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Tool Accuracy

C1(Tweets)

C2(Quotations)C3(Reviews)

C4(Headlines)C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per Corpus

Average of All Tools

Worst Tool per Corpus

Acc

ura

cy

1203.12.2013 Fatih Uzdilli

Page 13: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Majority Classifier beats Average

C1(Tweets)

C2(Quotations)C3(Reviews)

C4(Headlines)C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier

Acc

ura

cy

1303.12.2013 Fatih Uzdilli

Page 14: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

2nd Approach: Random-Forest

1403.12.2013 Fatih Uzdilli

api1 api2 api3 … api n

annotation

Text 1 + - + … o +

Text 2 - + o … + -

Text 3 - o - … + -

Text 4 + o + … - +

Text 5 + o + … o o

Text 6 + o o … - o

Text 7 + - + … o unknown

Text 8 + + o … - unknown

Text 9 o - + … o unknown

Random Forest

Classifier

+

+

o

Train

Train

Train

Train

Predict

Predict

Predict

Train

Train

Page 15: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Before Random Forest

C1(Tweets)

C2(Quotations)C3(Reviews)

C4(Headlines)C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier

Acc

ura

cy

1503.12.2013 Fatih Uzdilli

Page 16: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Random Forest Beats Best Single Tool

C1(Tweets)

C2(Quotations)C3(Reviews)

C4(Headlines)C5(Reviews)

C6(Reviews)C7(News)

0.2

0.3

0.4

0.5

0.6

0.7

0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier

Acc

ura

cy

1603.12.2013 Fatih Uzdilli

Page 17: Potential and Limitations of Commercial Sentiment Detection Tools

Zurich Universitiy of Applied Sciences

Summary

• Best Tool: 59% Accuracy• Random Forest combination: Up to 9% improvement

<=9%

1703.12.2013 Fatih Uzdilli