A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of...

A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Ivan Titov (University of Illinois)

Ryan McDonald (Google Inc.)

ACL 2008

Introduction

An example of an aspect-based summary

Q1: Aspect identification and mention extraction (coarse or fine?)

Q2: sentiment classification

Introduction: Extraction problem

Assumptions for their model

Ratable aspects normally represent coherent topics which can be potentially discovered from co-occurrence information in the text.

Most predictive features of an aspect rating are features derived from the text segments discussing the corresponding aspect.

Multi-Aspect Sentiment model (MAS)

This model consists of two pars: Multi-Grain Latent Dirichlet Allocation (Titov an

d McDonald, 2008) : build topics

A set of sentiment predictors : force specific topics correlated with a particular aspect.

MG-LDA (1)

An extension of LDA (Latent Dirichlet Allocation): build topics that globally classify terms into product instances. (Creative Labs Mp3 players versus iPods, New York versus Paris Hotels)

MG-LDA models global topics and local topics.

The distribution of global topics is fixed for a document, while the distribution of local topics is allowed to vary across the document.

MG-LDA (2)

Ratable aspects will be captured by local topics and global topics will capture properties of reviewed items.

Example: “. . . public transport in London is straightforward, the tube station is about an 8 minute walk . . . or you can get a bus for £1.50”

A mixture of topic London (London, tube, £) The ratable aspect location (transport, walk, bus) Local topics are reused between very different types of

items.

MG-LDA (3)

A doc is represented as a set of sliding windows, each covering T adjacent sentences.

Each window v in doc d has an associated distribution over local topics and a distribution defining preference for local topics versus global topics A word can be sampled using any window covering its sentence s, where the window is chosen according to a categorical distribution

Windows overlap permits the model to exploit a larger co-occurrence domain.

Symmetrical Dirichlet prior for

Dirichlet distribution: Dir(α)

Its probability density function returns the belief that the probabilities of K rival events are xi given that each event has been observed αi - 1 times.

Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).

Multi-Aspect Sentiment Model (1)

Assumption: the text of the review discussing an aspect is predictive of its rating.

MAS introduces a classifier for each aspect, which is used to predict its rating.

Only words assigned to that topic can participate in the prediction of the sentiment rating of the aspect.

However, rating for different aspects can be correlated. Ex. Negative cleanliness -> rooms, service, dining.

Multi-Aspect Sentiment Model (2)

Opinions about an item in general without referring to any particular aspect. Ex. This product is the worst I have ever purchased -> low ratings for every aspect.

Based on overall sentiment rating and compute corrections.

N-gram model:

Inference in MAS

Gibbs sampling Appears only if ratings are known

Experiments - Corpus

Reviews of hotels from TripAdvisor.com. 10,000 reviews (109,024 sentences, 2,145,31

3 words in total) Every review was rated with at least 3 aspect

s: service, location, and rooms. Ratings from 1 to 5.

Result Example

Evaluation

779 random sentences labeled with one or more aspects.

164, 176, 263 sentences for service, location, and rooms, respectively.

Results: Aspect Service

Results: Aspect Location

Result: Aspect Rooms

A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of...

Documents

Transcript of A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of...