1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing...

27
1 Entity Discovery and Assignm ent for Opinion Mining Appli cations (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling

Transcript of 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing...

Page 1: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

1

Entity Discovery and Assignment for Opinion Mining Applications

(ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang

Date: 09/01/09

Speaker: Hsu, Yu-Wen

Advisor: Dr. Koh, Jia-Ling

Page 2: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

2

Outline

IntroductionProblem DefineEntity DiscoveryEntity AssignmentOpinion MiningEmpirical EvaluationConclusion & Future Work

Page 3: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

3

Introduction

Most opinion mining researches are based on product reviews because a review usually focuses on a specific product or entity and contains little irrelevant information.

However, in forum discussions and blogs, the situation is very different, where the authors often talk about multiple entities (e.g., products), and compare them.

Page 4: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

4

Introduction *This raises two important issues:

(1) how to discover the entities that are talked about in a sentence the named entity recognition (NER) problem Over-capitalization Under-capitalization

Page 5: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

5

(2) how to assign entities to each sentence because in many sentences entity names are not explicitly mentioned, but are implied. similar to pronoun resolution in NLP harder due to ungrammatical sentences, and

missing or wrong punctuations.

Page 6: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

6

Example 1: “(1) I bought Camera-A yesterday. (2) I took some pictures in the evening in my living room. (3) The images are very clear. (4) They are definitely better than those from my old Camera-B. (5) The battery is very good too.”

Example 2: “(1) (2) (3) (4) (5) The pictures of that camera were blurring for night shots, but for day shots it was ok”

Page 7: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

7

sentiment consistency : which says that consecutive sentiment expressions should be consistent with each other. It would be ambiguous if this consistency is

not observed in writing.

Page 8: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

8

Opinion Mining: Two tasks are necessary:

(1) for a comparative sentence, we need to identify which entity is superior

(2) for the subsequent sentence, we need to determine whether its first clause (sentence 5 of Example 2) is positive, negative, or neutral.

Page 9: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

9

Problem Definition

Thread: consists of a start post and a list of follow-up posts or replies.

: A thread thus can be modeled as a sequence of posts is the start post.

: Each post consists of a sequence of sentences

: Each sentence describes something on a subset of entities

Page 10: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

10

Problem statement: Given a set of threads T in a particular domain, two tasks are performed in this paper: 1. Entity discovery: discover the set of entities E

discussed in the posts of the threads 2. Entity assignment: assign the entities in E

that each sentence of each post in talks about.

Page 11: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

11

Entity Discovery

Step 1 – data preparation for sequential pattern mining <{JJ, mad}{NN, everyone}{NN, doesnt}{VBP, have}{DT,

a}{CD, ENTITYXYZ}{NN, phone}{NN, fetish} {JJ, ducky}>

Step 2 – Sequential pattern mining <{IN}, {DT}, {NNP, ENTITYXYZ }, {is}>

Step 3 – Pattern matching to extract candidate entities a/DT Nokia/NNP 7390/CD at/IN

<{DT}, {NNP, ENTITYXYZ}, {CD}> Nokia <{DT}, {NNP}, {CD, ENTITYXYZ}, {IN}> 7390

Page 12: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

12

Step 4 – Candidate pruning … with/IN all/PDT the/DT Sony/NNP Ericsson/NNP

walkman/NN phone/NN accessories/CDNNS (pruning)

Step 5 – Pruning using brand and model relation and syntactic patterns discover relationships from the entities

Nokia, 7390 Nokia: brand 7390: Model remove those entities discover in step 4 that never

appear together with a <Band> or a <Model>, or never appear with a candidate in the syntactic patterns.

Page 13: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

13

Entity Assignment* Comparatives and Superlatives

Comparative Sentences Non-equal gradable: “greater or less” Equative: “equal to” Non-gradable: compare two or more entities

Superlative Sentences: –est

Page 14: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

14

Entity Assignment *Sentiment Consistency

If he/she wants to introduce a new entity e, he/she has to state the name of the entity explicitly in a sentence , which can be (1) a normal,

: normal , : normal e : normal , : comparative e & new entity

Page 15: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

15

(2) is a comparative : normal

non-equal gradable :positive (respectively negative) sentiment the superior (or inferior) entity

equative the previous entity before . non-gradable the previous entity before .

: comparative the entities in (3) is a superlative sentence.

: normal the superlative entity in : comparative the entities in

Page 16: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

16

Opinion Mining

Opinion Indicators Opinion words and phrases

opinion lexicon orientations depend on contexts

Negations “not” without “not only…but also”

But-clauses The orientation before “but” is opposite to that after

“but”. not contain “but also”

Page 17: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

17

*Specification for Opinion Indicators

we propose a specification language to enable the user to specify indicators, which are (1)opinion words and phrases, (2) negation words and phrases, (3)but-like words and phrases, (4) non-opinion phrases involving sentiment words,

a good deal of (5) non-negation phrases involving negation words,

not only (6) non-but phrases involving but-like words.

but also

Page 18: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

18

Specification of Individual Words

ex: like [VB] => Po

*Two Type of Specification

Page 19: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

19

Specification for Phrases

“great => Po”

“a great[T] + deal + of => NEU”

Page 20: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

20

Opinion Mining

Step 1 – Part-of-speech taggingStep 2 – Applying indicator word rules

The picture quality is not[Ng] good[Po], reaction is too slow[Neu], but[But] the battery life is long[Neu].

Step 3 - Applying phrase rules The picture quality is not[Ng] good[Po], reactio

n is too slow[NE], but[But] the battery life is long[Neu].

Page 21: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

21

Step 4 - Handling negations The picture quality is not[Ng] good[Negative], r

eaction is too slow[NE], but[But] the battery life is long[Neu].

Step 5 - Aggregating opinions Opinion aggregation :

postive:1 negative: -1 sum up >0:postive, =0: neutral, <0: nagative

Page 22: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

22

Opinion Mining of Comparisons

more/most + Pos → Positive more/most + Neg → Negative less/least + Pos → Negative less/least + Neg → Positive

Non-standard words “In term of battery life, Camera-X is superior to Camer

a-Y” depend on the meaning

Identify comparative and superlative sentences

Discover superior entities

Page 23: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

23

Empirical Evluation

Page 24: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

24

Experimental Results

Entity Discovery

NET: Named Entity Tagger

CRF: Conditional Random Fields Method

Page 25: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

25

Page 26: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

26

Entity Assignment

Page 27: 1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:

27

Conclusion

This paper presented two problem: mining entities discussed in a set of posts and assigning entities to each sentence.

Our experimental results show that the proposed techniques are effective.