An Effective Statistical Approach to Blog Post Opinion Retrieval

An Effective Statistical Approach to Blog Post

Opinion Retrieval

Ben He, Craig Macdonald, Jiyin He, Iadh Ounis

(CIKM 2008)

Introduction

Blogs have recently emerged as a new grassroots publishing medium.

A key feature that distinguishes blog content from other Web content is their subjective nature.

Bloggers tend to express opinions and comments towards some given targets, such as persons, organizations or products.

Under the TREC opinion finding task, only a handful of groups achieved an improvement over their baseline, using techniques such as NLP or SVM classifiers.

These proposed approaches either involve considerable manual efforts in collecting evidence for opinions, or lead to little improvement over a baseline that does not include any opinion finding feature.

Introduction

This paper proposes a statistical and light-weight automatic dictionary-based approach.

Also shows that despite its apparent simplicity, it provides statistically significant improvements over robust baselines, including the best TREC baseline run, without any manual effort.

Introduction

The Statistical Dictionary-basedApproach to Opinion Retrieval

1. Automatically generates a dictionary from the collection without requiring manual effort.

2. Assigns a weight to each term in the dictionary, which represents how opinionated the term is.

3. Assigns an opinion score to each document in the collection using the top weighted terms from the dictionary as a query.

4. Appropriately combines the opinion score with the initial relevance score produced by the retrieval baseline.

Dictionary Generation

To derive the dictionary, we filter out too frequent or too rare terms in the collection.

We remove those terms because if a term appears too many or too few times in the collection, then it probably contains too little or too specific information so that it can not be generalized to different queries in indicating opinion.

We firstly rank all terms in the collection by their within-collection frequencies in descending order.

The terms, whose rankings are in the range (s·#terms, u·#terms), are selected in the dictionary.

We apply s = 0.00007 and u = 0.001.

Dictionary Generation

Term Weighting

D(Rel): relevant document set. D(opRel): opinionated relevant document set. For each term t in the opinion term dictionary, w

e measure wopn(t), the divergence of the term’s distribution in D(opRel) from that in D(Rel).

This divergence value measures how a term stands out from the opinionated documents, compared with all relevant documents.

The higher the divergence is, the more opinionated the term is.

Term Weighting

A commonly used measure for term weighting is the KL divergence from a term’s distribution in a document set to its distribution in the whole collection.

KL divergence measure considers only the divergence from one distribution to the other, while ignoring how frequent a term occurs in the opinionated documents.

The weights of the terms in the opinion dictionary might be biased towards the terms with high KL divergence values, but containing low information in the opinionated document set D(opRel).

Term Weighting

Another method: Bo1 term weighting model, which measures how informative a term is in the set D(opRel) against D(Rel).

λ= tfrel/Nrel

Generating the Opinion Score

We take the X (in the experiment, set X=100) top weighted terms from the opinion dictionary, and submit them to the retrieval system as a query Qopn.

The retrieval system assigns a relevance score to each document in the collection.

Such a relevance score reflects the extent to which the top weighted opinionated terms are informative in the document, capturing the overall opinionated nature of the document.

This is called the opinion score: Score(d, Qopn).

Score Combination

1. Linear combination:

2. Log. combination:

Experiment: Data

Dataset: Blog06 collection. Use permalinks, which are the blog posts and t

heir associated comments. Each term is stemmed using Porter’s English st

emmer, and standard English stopwords are removed.

Experiment: Baseline

InLB document weighting model:

b=0.2337

Experiment: External Opinion Dictionary

We also manually generate a dictionary compiled from various external linguistic resources.

The dictionary contains approximately 12,000 English words, mostly adjectives, adverbs and nouns, which are supposed to be subjective.

In this paper, we denote the manually edited dictionary by the external dictionary, and we denote the automatically derived one by the internal dictionary.

Experiment: External Opinion Dictionary

Experiment: Evaluation

Use Bo1 term weighting method. Set a=0.25, k=250.

This paper has proposed an effective and practical approach to retrieving opinionated blog posts without the need for manual effort.

The use of the automatically generated internal dictionary provides a retrieval performance that is as good as the use of an external dictionary manually compiled from various linguistic resources.

Conclusions and Future Work

In the future:1. Extend the work to detecting the polarity or the

orientation of the retrieved opinionated documents.

2. Study the connection of the opinion finding task to question answering.

Ex. Extracting the opinionated sentences within a blog post about a given target.

Conclusions and Future Work

An Effective Statistical Approach to Blog Post Opinion Retrieval

Documents

Transcript of An Effective Statistical Approach to Blog Post Opinion Retrieval

Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.

... · 2018-10-16 · OPINION THETELEGRAPH.COM.AU FRIDAY AUGUST 23 2013 @DåilyTelegraph SIMONBENSON imon.benson@news.com.au thetelegraph.com.au/blogs email blog Cracks now appearing

Fuzzy Database & Information Retrieval. Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with.

for Music Information Retrieval …mtg.upf.edu/system/files/Tutorial_NLP4MIR_2017.pdf · 2018-06-02 · for Music Information Retrieval Sergio Oramas, ... B., & Lee, L. (2006). Opinion

Using Blog Properties to Improve Retrieval

Opinion Writing Week 3 - aldenschools.org...A blog is a website on which people share opinions, thoughts, and information. In the blog, our teachers write about the reasons for ...

Opinion Retrieval

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Top-Ten most-read opinion/review blog-posts from between ...€¦ · Top-Ten most-read opinion/review blog-posts from between 2012 – 2015 ... underfunded, under-resourced art colleges,

Content-Based Image Retrieval Rong Jin. Content-based Image Retrieval Retrieval by text Label database images by text tags Image retrieval as text retrieval.

CS54701: Information Retrieval©Jan20-16 Christopher W. Clifton 4 What is an Opinion? (Liu, a Ch. in NLP handbook) • An opinion is a quintuple (oj, f jk, so ijkl, h i, t l), where

DISSENTING OPINION - Philippine Center for Investigative ...pcij.org/blog/wp-docs/CJ_Puno_dissenting_opinion_Executive... · DISSENTING OPINION PUNO, ... Senate Bill No. 1793, introduced

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval

Topical Opinion Retrieval

Retrieval and Feedback Models for Blog Feed Search SIGIR 2008

A Unified Relevance Model for Opinion Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval…

A Read the blog post and choose Fact or Opinion

Opinion Mining - Alexandru Ioan Cuza University€¦ · Opinion mining Opinion mining (OM) is a recent discipline at the crossroads of information retrieval and computational linguistics

131 Nev., Advance Opinion 69 - Darren Welsh Blog · PDF filefiled sep 1 7 2015 131 nev., advance opinion (61 in the supreme court of the state of nevada land baron investments, inc.,