Best Practices in Chinese Microblog Communications: MSL China Executive Whitepaper
A Corpus for Entity Profiling in Microblog Posts
-
Upload
damiano-spina-valenti -
Category
Technology
-
view
507 -
download
1
description
Transcript of A Corpus for Entity Profiling in Microblog Posts
![Page 1: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/1.jpg)
A Corpus for Entity Profiling in Microblog Posts
UNED NLP & IR Group
Madrid, Spain
ISLA, University of Amsterdam
Amsterdam, The Netherlands
LREC Workshop on Language Engineering for Online Reputation Management
May 26th, 2012 - Istambul, Turkey
Edgar Meij, Andrei Oghina, Minh T. Bui, Mathias Breuss,
Maarten de Rijke Damiano Spina
![Page 2: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/2.jpg)
Introduction
• Online Reputation Management
– Public image of an entity in Online Media
– Entity = { brand, organization, company, person, product }
• Microblogging services (e.g. Twitter)
– People sharing thoughts about an entity
– Dynamic, Real-Time
• Human Language Technologies
– Aid to reputation managers
– Retrieval and Analysis of entity mentions
![Page 3: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/3.jpg)
Sentiment vs. Profiling
• Sentiment analysis
• Entity Profiling – “hot” topics that people talk about in the context of an entity
![Page 4: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/4.jpg)
Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b
more beautiful
• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
![Page 5: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/5.jpg)
Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b more beautiful
• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
![Page 6: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/6.jpg)
Goal
• Build manually annotated corpora
– Evaluate the task of entity profiling in microblog streams
![Page 7: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/7.jpg)
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)
![Page 8: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/8.jpg)
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Pooling Aspects
Tweet annotation
Opinion targets
![Page 9: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/9.jpg)
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Pooling Aspects
Tweet annotation
Opinion targets
![Page 10: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/10.jpg)
Approach I: Pooling aspects
• Pooling methodology
– 4 Ranking Methods:
• TF.IDF [Salton and Buckley, 1988]
• Log-Likelihood Ratio [Dunning, 1993]
• Parsimonious Language Model [Hiemstra et al. 2004]
• Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010]
– Top 10 terms
• Manual annotation
![Page 11: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/11.jpg)
Aspects dataset: annotation example
![Page 12: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/12.jpg)
Aspects dataset: outcome
• Three annotators, substantial agreement
(> 0.6 Cohen/Fleiss’ kappa)
• 94 entities, 17775 tweets, ≈177 tweets/entity
• 2455 terms, 1304 aspects (54.11%)
![Page 13: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/13.jpg)
Approach II: Tweet annotation
• Opinion targets dataset
• Tweet-level annotation – Is the tweet subjective?
• Phrase-level annotation – Subjective phrase
– Opinion target phrase p: • p is an aspect of the entity
• p is included in a sentence that contains a direct subjective phrase
• p is the target of the expressed opinion
![Page 14: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/14.jpg)
Opinion Targets dataset: annotation example
![Page 15: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/15.jpg)
Opinion targets dataset: outcome
• 59 entities, 9396 tweets, ≈159 tweets/entity
• 15.16% of tweets with subjective phrases
• 13.82% of tweets with opinion targets
![Page 16: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/16.jpg)
Aspects vs. Opinion targets
1650 783 270
Aspects
Terms in Opinion Targets
![Page 17: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/17.jpg)
Aspects vs. Opinion targets
1650 783 270
Aspects
Terms in Opinion Targets
26.69%
12.67%
![Page 18: A Corpus for Entity Profiling in Microblog Posts](https://reader034.fdocuments.us/reader034/viewer/2022042700/55865522d8b42a221b8b462f/html5/thumbnails/18.jpg)
A Corpus for Entity Profiling in Microblog Posts
• Available at
http://bitly.com/profilingTwitter
WePS-3 ORM Corpus
Pooling
Aspects dataset
Tweet annotation
Opinion targets dataset
• 94 entities, 17,775 tweets ≈177 tweets/entity • 2455 terms, 1304 aspects (54.11%)
• 59 entities, 9,396 tweets, ≈159 tweets/entity • 15.16% of tweets with subj. phrases • 13.82% of tweets with opinion targets