Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan...
-
Upload
hope-butler -
Category
Documents
-
view
216 -
download
0
description
Transcript of Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan...
Extending SASI to Satirical Product Reviews: A Preview
Bernease HermanUniversity of Michigan
Monday, April 22, 2013
Satirical Amazon Reviews
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 2
For a fun list: http://www.geekosystem.com/funny-amazon-reviews/
Defining Irony, Sarcasm and Satire
• Irony: “the use of words to convey a meaning that is the opposite of its literal meaning”
• Sarcasm: “a sharply ironical taunt; sneering or cutting remark”
• Satire: “the use of irony, sarcasm, ridicule, or the like, in exposing, denouncing, or deriding vice, folly, etc.”
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 3
Sarcastic Review: Shure SE110 Sound Isolating Earphones
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 4
Satirical Review: BIC Cristal For Her ballpoint pens
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 5
Satirical Review: Zenith Men’s Defy Xtreme Titanium Watch
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 6
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 7
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Algorithm detects sarcasm in individual sentences using k-Nearest Neighbors type algorithm.Features include pattern-matching and punctuation.There are additional features to consider for satire that are not present in sarcasm model.Classification baseline needs to be determined from multiple options.Sentence-based sarcasm detector, not full document.
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 8
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Jindal and Liu (2008) has 66,000 data set of book and product reviews. Filatova (2012) provides corpora of Amazon reviews labeled ironic, sarcastic, both, regular.
• Specific products, authors, companies, and book titles were replaced with [product], [author], etc.
• HTML and special symbols were removed from text
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 9
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Tsur et al. (2010) posited that sarcastic sentences co-appear with others. Gathered nearby sentences using Yahoo! BOSS API with seeds.
Satirical reviews prove true, not sarcastic ones. Sarcasm Satire
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 10
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Via Davidov and Rappoport (2006, 2008):• High frequency words(HFWs)• Content words (CWs)
What can I say about the 571B Banana Slicer that hasn't already been said about the wheel, penicillin or the iPhone…• “What can I CW CW the”• “I CW CW the [product]”• “[product] that hasn’t CW been CW about”• “about the CW”• “CW or the CW”
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 11
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 12
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Generic features regarding punctuation, all normalized to [0, 1].
• Sentence length in words• Number of “!” characters• Number of “?” characters• Number of quotes in sentence• Number of capitalized words or words in
all capitals
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 13
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
• Burfoot and Baldwin (2009) introduced notion of validity for which models absurdity via a measure close to PMI. Related to number of made-up or mismatched named entities. Works well with satire, but not here.
• Absurdity of product• Relevancy of product• How often product is reviewed
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 14
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Classification via feature vectors for each pattern in training set.Use Euclidean distance for each of the matching vectors that share at least one pattern.
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 15
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Since semi-supervised, the classification algorithm takes advantage of the definition of sarcasm. Assumes low star rating and text with positive literal meaning.
Not as clear-cut with satire, options:• Variation in rating for product• Purchases vs Page Views of product• People finding review helpful• Other heuristics
Semi-supervised Algorithm for Sarcasm Identification (SASI)
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 16
• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary
Satire seems to have a distinct advantage in the data enrichment phase in comparison to sarcasm.
Satire seems to have a huge disadvantage in the baseline options for classification compared to sarcasm. This is the detail that must be worked out before moving forward with implementation.
Future Goals
Following the end of the course, I wish to implement SASI - taking the features mentioned today into account.Extend model to sarcasm in other domains.
Any questions or comments?
Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 17