What Business Innovators Need to Know about Sentiment Analysis, Claire Cardie

Post on 26-Jan-2015

104 views 1 download

Tags:

description

What Business Innovators Need to Know about Sentiment Analysis, presented by Prof. Claire Cardie of Cornell University.

Transcript of What Business Innovators Need to Know about Sentiment Analysis, Claire Cardie

What Business Innovators Need to Know about Sentiment Analysis

Claire Cardie

Department of Computer ScienceChair, Information Science Department

Cornell University

Co-founderChief Scientist

Plan for the Talk

Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level

Next steps…

Subjective Language

Subjective text expresses speculations, beliefs, emotions, evaluations, goals, opinions, judgments, …

• Jill said, "I hate Bill." • John thought about whom to vote for. • Seth knew his symposium would go well.

Subjectivity vs. Sentiment

Sentiment-bearing text expresses positive and negative speculations, beliefs, emotions, evaluations, goals, opinions, judgments,…

• Jill said, "I hate Bill." • John thought about whom to vote for. • Seth knew his symposium would go well.

+

-

sentiment analysis tome [Pang & Lee, 2008]

~

A Word on Polarity (tone, valence)

Positive “I love NY.”Negative “I hate NY.”

Neither positive nor negative– Objective?

“I thought about NY.”– Neutral?

“I’m ambivalent about NY.”– Mixed polarity?

“Sometimes I love NY; other times I hate it.”

And What About Intensity?

Strength/intensity

– Low, medium, high, very high, extreme– ratings– rotten tomatoes

“I love NY.”

“I absolutely adore NY!”

Plan for the Talk

Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level

Next steps…

Document-level Sentiment Analysis

Is the overall sentiment in the document

positive? negative? neutral?

Document

Identifying Tone of a Collection

Sentiment (w.r.t. a topic)–Example: Tone on “economic stimulus”

Detecting “chatter” or “buzz”

Chatter (w.r.t. a topic)–Example: Buzz on “economic stimulus”

Keyword-based Approaches

Search the text for the presence of specific terms from a manually created “sentiment lexicon”– +: “great”, “praise”, “peace”, “superb”, …– -: “war”, “dull”, “messy”, “criticize”, …

Sentiment is based on the counts– E.g.,

If more positive terms than negative terms, then return +, else return –

Keyword-based Approaches

Complications– Inherent ambiguities of language…

– This laptop is a great deal.– A great deal of media attention surrounded the

release of the new laptop model.– If you think this laptop is a great deal, I’ve got

a nice bridge for you to buy.

[Examples from Lillian Lee]

[Pang & Lee, 2008]

Machine-learning Approaches

Learn from training dataAre better able to take advantage of context to disambiguate terms

examples

ML Algorithm

statistical model

(program)(novel) examples class

Measuring Performance

Precision: #correct / #attemptedRecall: #correct / #possibleF-measure: harmonic mean of P and R

1. _______

2. _______

3. _______

4. _______

P = 3 / 4 = .75P = 3 / 3 = 1.00R = 3 / 4 = .75

accuracy

Measuring Performance

How well do document-level sentiment analysis systems work?

It depends…– Product reviews easier than Movie reviews,

easier than News/editorials– Shorter documents harder than longer ones– Messy documents harder than clean ones

~75 F - ~85 F

This is actually quite good…

Comparison is not vs. 100% P/R…but vs. human sentiment analysis accuracy– Cohen’s kappa

Machine-learning methods for sentiment analysis approach human agreement levels– ~85 F: for positive/negative – ~75 F: when neutrals are included

Sentiment Analysis at Passage Level

Passage tone– Optionally w.r.t. a

topic – E.g., AIG or Geithner

The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…speculation that Obama will have to replace him, despite the president’s insistence to Leno that Geithner is doing "an outstanding job“.

Sentiment Analysis at Phrase Level

Fine-grained opinion analysisIdentify who is saying what about what

Fine-Grained Sentiment Extraction

The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses… speculation that Obama will have to replace him, despite thepresident’s insistence to Leno that Geithner is doing "an outstanding job".

Fine-Grained Sentiment Extraction

– Opinion trigger– Polarity – Intensity– Opinion holder– Target (topic)

…the president insisted to Leno that Geithner is doing "an outstanding job".

Opinion FramePolarity: positive Intensity: highOpinion Holder: “the president”Target: “Geithner”

Example – fine-grained opinions

opinion frameopinion frame

opinion frameopinion frame

opinion frame

opinion frame

opinion frame

The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimedthat several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…the president insisted to Leno that Geithner is doing "an outstanding job".

Example – Opinion Summary

AIG

Obama

Menendez

Geithner

Americans

Example – Opinion Summary

AIGAIG

Summarize thoughts and views acrossdocuments– Critical addition: opinion holder

What makes this hard?

Same issues of ambiguity as before plus…Need to associate opinion with topic and with opinion holderRequires different machine learning methodsRequires many language-processing modules

Noun Phrase Coreference Resolution

Ng & Cardie [2002, 2003]; Stoyanov & Cardie [2006, 2008]

The suggestion that the White House never took seriously an issue that infuriated millions of Americans was supported by Senator Robert Menendez, a New Jersey Democrat who claimed that several weeks earlier he warned Timothy Geithner, the Treasury secretary, that AIG was planning to use taxpayer funds to pay out $165m in bonuses…speculation thatObama will have to replace Geithner, despite the president’s insistence to Leno that he is doing "an outstanding job".

Performance

opinion extraction

OH extractor

link classifier

79F

69F

82F

Choi, Breck & Cardie [2006, 2007]

–<opinion holder> expresses <opinion>

Plan for the Talk

Subjectivity and sentiment in languageContinuum of capabilities– Surface-level in-depth understanding– Document-level phrase-level

Next steps…

Next Steps…

Predicting business outcomes from opinions– Doable in some settings

Determining the key influencers

Thank you!

Questions?