Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing...

63
Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group , Computer Lab Supervised by: Dr. Simone Te

Transcript of Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing...

Page 1: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentiment Analysis of

Scientific Citations

Awais Athar

Natural Language and Information Processing Group , Computer Lab

Supervised by: Dr. Simone Teufel

Page 2: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentiment AnalysisSentiment Analysis focuses on identifying positive and negative opinions, emotions or expressions in given text.

Subjectivity Analysis

Page 3: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Example: Movie Reviews

Can we do it automatically?

Page 4: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Simple Sentiment Analysis

This movie is absolutely HILARIOUS!!! I hated

the Spice Girls before my friend made me watch

this movie, and now I LOVE them! This movie is

one of the funniest movies I've ever seen in

my life, and I watch comedies all the time.

This is definitely my new favorite movie.

Sentiment = sign(Number of positive words - Number of negative words) = sign(4 - 1) = sign(3) = +ve

Page 5: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Does it always work?

I hate the Spice Girls. I hate how their music is so

… I hate how they promote … And I hate how they're

all over … Why I saw this movie is a really, really,

really long story, but I did, and one would think I'd

despise every minute of it. But... Okay, I'm really

ashamed of it, but I enjoyed it. I mean, I admit it's

a really awful movie, a wannabe … filled with excuses

for them to act wacky as hell… the ninth floor of

hell … a cheap ass cameo in a cheap ass movie. The

plot is such a mess that it's terrible. But I loved

it.

CENSORED

CENSORED

http://www.imdb.com/reviews/111/11181.html

Page 6: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

I work on scientific text…• Scientific papers cite other papers• A citation is any mention of another document• Used in citation indexes for search

Page 7: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

What do researchers think about this paper?

35 237

43 151

6 75

18 163

Is citation count a good

measure ?

Page 8: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

A citation sub-graph

N04-1021

N09-1025

P02-1039J93-2003

W03-1002

Page 9: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Colour the edges

N04-1021

N09-1025

P02-1039J93-2003

W03-1002

Page 10: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

After a top-sort

J93-2003 P02-1039 W03-1002 N04-1021 N09-1025

Page 11: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Why not reuse existing classifiers?

• Sentiment is often hidden

• Often neutral

While SCL has been successfully applied to POS tagging and Sentiment Analysis (Blitzer et al., 2006), its effectiveness for parsing was rather unexplored.

There are five different IBM translation models (Brown et al. , 1993).

Page 12: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Scientific Text

• Negative polarity is often expressed in contrastive terms

• Variation in lexicon

This method was shown to outperform the class based model proposed in (Brown et al., 1992) . . .

Similarity-based smoothing (Dagan, Lee, and Pereira 1999) provides an intuitively appealing approach to language modeling.

Page 13: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Scientific Text

• Technical terms play a major role

• Scope of influence of citations varies widely

Current state of the art machine translation systems (Och, 2003) use phrasal (n-gram) features . . .

As reported in Table 3, small increases in METEOR (Banerjee and Lavie, 2005), BLEU (Papineni et al., 2002) and NIST scores (Doddington, 2002) suggest that . . .

Page 14: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Applications

• Determining the quality of a paper for ranking in citation indexes by including negative citations in the weighting scheme

• Identifying contributions of some research work in the domain.

• Identifying shortcomings and detecting problems in a particular approach

• Recognising unaddressed issues and possible gaps in current research approaches.

• Identifying personal bias of an author by observing his criticism trends.

Page 15: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Task 1

Given a formal citation, predict its sentiment

Page 16: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Corpus for Citation Sentiment Analysis

• Manually annotated 8736 citations • From 310 research papers • ACL Anthology (Bird et al., 2008)

Citations

7541

293

902

Distribution of Sentiment across Citations

Objective Negative Positive

Page 17: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features

Word Level

N-grams

Parts of Speech

Science Lexicon

Contextual Polarity*

Subjectivity Clues

Negation Phrases

Valance Shifters

Sentence Structure

Dependency Structures

Sentence Splitting

Negation

* Wilson et al. 2009

Page 18: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Word Level Features

• N-grams: “The results were good”– The results were good– Unigrams: The, results, were, good– Bigrams: The results, results were, were good– Trigrams: The results were, results were good

• Parts of Speech– This lead to good results– DT VBP TO JJ NNS – This/DT lead/VBP to/TO good/JJ results/NNS

Page 19: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Word Level Features

• Science Specific Sentiment Lexicon– 83 manually extracted polar phrases– From 736 citations– Negative: complicated, daunting, deficiencies, degrade, difficult, inability, lack, poor, restrict, unexplored, worse

– Positive: acceptance, accurately, adequately, aided, appealing, bestperforming, better …

Page 20: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Contextual Polarity Features

• Adjectives• Adverbs• Subjectivity Clues

– Strong / Weak– Positive / Negative

• Cardinal Numbers • Modal Auxiliary Verbs (can, may, could, might, …)• Negation Phrases (no, not, never, …)• Polarity Shifters (so-called effort)

Page 21: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentence Structure Features

<CIT> showed that the results for French-English were competitive

nsubj

ccomp

complm

det prep

nsubj

pobjcop

The relationship between results and competitive will be missed by trigrams but the dependency representation captures it in the nsubj(competitive, results) feature.

Dependency Relations

Output from Stanford parser

Page 22: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentence Structure Features

Removing irrelevant polar phrases around a citation might improve results. Sentence trimming

Page 23: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentence Trimming Algo

Page 24: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Sentence Structure Features

”Turney’s method did not work_neg well_neg although they reported 80% accuracy in <CIT>.

All words inside a k-word window of any negation term are suffixed with a token neg to distinguish them from their non-polar versions

Negation

Page 25: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Classifier

• Support Vector Machine• 10-fold cross-validation

w

1 bxw 1 bxw

0 bxw

bluexredx

Page 26: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Kernel Trick

𝜑 (𝐱 ) → (𝑥1❑2 ,√2𝑥1𝑥2 , 𝑥2❑

2 )

Page 27: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Evaluation

Citations

7541

293902

𝐹𝑚𝑖𝑐𝑟𝑜=𝐹 𝑜75418736

+𝐹𝑛293

8736+𝐹 𝑝

9028736

¿0.87 𝐹𝑜+0.03𝐹𝑛+0.10𝐹 𝑝

𝐹𝑚𝑎𝑐𝑟𝑜=𝐹𝑜+𝐹𝑛+𝐹 𝑝

3

Page 28: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Results

Page 30: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Challenges in Citation Sentiment Analysis

• Negative sentiment is ‘politically dangerous’- (Ziman, 1968)

• Personal biases are hedged - (Hyland, 1995)

• Criticism is ‘sweetened’ - (MacRoberts and MacRoberts, 1984; Hornsey et al., 2008)

“While SCL has been successfully applied to POS tagging and Sentiment Analysis (Blitzer et al., 2006), its effectiveness for parsing was rather unexplored.”

Page 31: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Problem: Context is Ignored

Page 32: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Problem: Informal Citations Are ignoredCurrent work assumes that the sentiment present in the citation sentence represents

the true sentiment

Page 33: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Task 2

Given a sentence,predict whether or not it

contains an informal citation

Page 34: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Corpus Construction• Starting point: Athar's 2011

citation sentence corpus• Select top 20 papers; treat all

incoming citations to these• 1,741 citations (from >850

papers)• 4-class scheme

– objective/neutral– positive– negative– e cluded

x

Page 35: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

View of the Annotation Tool

Demo

Page 36: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Distribution of Classes

Page 37: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Formal Citation

Page 38: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Author’s Name

Page 39: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Acronyms

Page 40: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Work Nouns (Teufel, 2010)

Page 41: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Pronoun

Page 42: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Connectors

Page 43: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Section Markers

Page 44: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Citation Lists

Page 45: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: Lexical Hooks

Page 46: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features: n-Grams

• Using as baseline• SVM• 10-fold cross-validation• F-score

Page 47: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Results

Page 48: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Task 3 (redefinition of Task 1?)

Given a citation,predict sentiment

(taking informal citations into account)

Page 49: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Impact on Sentiment Detection

• n-grams of length 1 to 3• Dependency triplets (Athar, 2011)

det_results_Thensubj_good_resultscop_good_were

Page 50: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Annotation Unit is the Citation• Problem

– There may be more than 1 sentiment /citation

• Annotation unit = citation. Projection needed:– For Gold Standard: assume last sentiment is what is really

meant– For Automatic Treatment: merge citation context into one

single sentence

Page 51: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Results: Context Helps!

• SVM• 10-fold cross-validation• F-score

Page 52: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Back to the original question

35 237

43 151

6 75

18 163

Is citation count a good

measure ?

Page 53: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Referenced Papers and Citation Count

• Traditional Measure: Citation count

• Misses informal citations– 1 Formal, 27 informal

Page 54: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

• Most papers are cited out of ‘politeness, policy or piety’

– Ziman (1968) • Out of 2,300 citations, 80% were cited only to

point towards further information– Spiegel-Rosing (1977)

• Out of 623 references, only 9% were of essential importance to the citing paper

– Hanney et al. (2005)

Page 55: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Task 4

Given a referenced paper,predict whether or not it is

significant

Page 56: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features

Page 57: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Features

Page 58: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

New Features

Page 59: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Results

Page 60: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Class-based Comparison

Page 61: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Conclusion

• New large citation sentiment corpus – more than 200,000 sentences

• Citation contexts carry subjective references – ignoring them would result in loss of a lot of

sentiment, specially criticism.• Citation sentiment detection

– all forms of citations– indirect mentions and acronyms.

• New task of detecting `in passing’ references

Page 62: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

References

• A. Athar, “Detecting Sentiment in Scientific Citations”, PhD Thesis, Computer Lab, University of Cambridge. 2013 (expected)

• A. Athar and S. Teufel, “Detection of implicit citations for sentiment detection”, in Proc. of Workshop on Detecting Structure in Scholarly Discourse 2012, Jeju, Republic of Korea. 2012.

• A. Athar and S. Teufel, “Context-Enhanced Citation Sentiment Detection”, in Proc. of NAACL/HLT 2012, Montréal, Canada. 2012.

• A. Athar, “Sentiment Analysis of Citations using Sentence Structure-Based Features”, in Proc. Of ACL 2011, Portland, Oregon, US. 2011.

Page 63: Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing Group, Computer Lab Supervised by: Dr. Simone Teufel.

Thank you!