NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf ·...

34
NLP for Social Media: POS Tagging, Sentiment Analysis Pawan Goyal CSE, IITKGP August 05, 2016 Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 1 / 23

Transcript of NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf ·...

Page 1: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

NLP for Social Media: POS Tagging, Sentiment Analysis

Pawan Goyal

CSE, IITKGP

August 05, 2016

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 1 / 23

Page 2: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Part-of-Speech (POS) Tagging

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 2 / 23

Page 3: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Penn Treebank POS Tags

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 3 / 23

Page 4: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Part-of-Speech (POS) Tagging

Words often have more than one POS

POS tagging problem is to determine the POS tag for a particular instance of aword.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 4 / 23

Page 5: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Part-of-Speech (POS) Tagging

Words often have more than one POS

POS tagging problem is to determine the POS tag for a particular instance of aword.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 4 / 23

Page 6: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Relevant papers and tool

Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das,Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, JeffreyFlanigan, and Noah A. Smith. Part-of-Speech Tagging for Twitter:Annotation, Features, and Experiments. In Proceedings of ACL 2011.

Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, NathanSchneider and Noah A. Smith. Improved Part-of-Speech Tagging forOnline Conversational Text with Word Clusters. In Proceedings of NAACL2013.

Parts-of-Speech tagget for twitter -http://www.cs.cmu.edu/~ark/TweetNLP/

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 5 / 23

Page 7: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

POS Tagset for Twitter

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 6 / 23

Page 8: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Twitter-specific Tags

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 7 / 23

Page 9: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

The case of Hashtags

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 8 / 23

Page 10: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Combined forms

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 9 / 23

Page 11: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Tagging Scheme

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 10 / 23

Page 12: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Tagging Scheme

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 11 / 23

Page 13: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Tagging Scheme

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 12 / 23

Page 14: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Building the POS tagger

CRF model was used. Some insighful features:

Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.

Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.

Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.

Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.

Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23

Page 15: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Building the POS tagger

CRF model was used. Some insighful features:

Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.

Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.

Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.

Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.

Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23

Page 16: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Building the POS tagger

CRF model was used. Some insighful features:

Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.

Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.

Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.

Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.

Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23

Page 17: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Building the POS tagger

CRF model was used. Some insighful features:

Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.

Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.

Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.

Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.

Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23

Page 18: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Building the POS tagger

CRF model was used. Some insighful features:

Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.

Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.

Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.

Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.

Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23

Page 19: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Related Problems

Entity RecognitionNamed Entity: Names of people, places, organization

Date and time

Can you model it as a sequence labeling problem?

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 14 / 23

Page 20: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Related Problems

Entity RecognitionNamed Entity: Names of people, places, organization

Date and time

Can you model it as a sequence labeling problem?

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 14 / 23

Page 21: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Sentiment Analysis

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 15 / 23

Page 22: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Sentiment Analysis: Relevant Paper

Pak, Alexander, and Patrick Paroubek. “Twitter as a Corpus for SentimentAnalysis and Opinion Mining.” LREC. Vol. 10. 2010.

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 16 / 23

Page 23: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Sentiment Analysis: Examples

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 17 / 23

Page 24: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Data Creation

How to do that without manual labeling?

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23

Page 25: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Data Creation

How to do that without manual labeling?

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23

Page 26: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Data Creation

How to do that without manual labeling?

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23

Page 27: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

POS tag Distribution: Subjective vs. Objective

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 19 / 23

Page 28: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

POS tag Distribution: Positive vs. Negative

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 20 / 23

Page 29: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Classification Model

Naïve Bayes Model: Main featuresPOS tags, Word n-grams

Using negations in Word n-grams

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 21 / 23

Page 30: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Classification Model

Naïve Bayes Model: Main featuresPOS tags, Word n-grams

Using negations in Word n-grams

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 21 / 23

Page 31: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Disregarding common n-grams

Using entropy and salience

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23

Page 32: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Disregarding common n-grams

Using entropy and salience

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23

Page 33: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

Disregarding common n-grams

Using entropy and salience

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23

Page 34: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as

N-grams with high salience and low entropy

Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 23 / 23