Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation...

11
Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual International ACM SIGIR Conference [email protected] , [email protected] , [email protected] , [email protected] Priya Radhakrishnan, Romil Bansal, Manish Gupta, Vasudeva Varma International Institute of Information Technology, Hyderabad, India SIEL@ERD

description

system TAGME[1] system with time and performance optimizations Mention detection Reduce the number of DB look-ups. Disambiguation Use (1-δ) instead of δ Prominent senses restriction Pruning

Transcript of Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation...

Page 1: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Exploiting Wikipedia Inlinks for Linking Entities in Queries

Entity Recognition and Disambiguation ChallengeACM SIGIR 2014 July 6-11, 2014The 37th Annual International ACM SIGIR Conference

[email protected] , [email protected], [email protected] , [email protected]

Priya Radhakrishnan, Romil Bansal, Manish Gupta, Vasudeva Varma

International Institute of Information Technology, Hyderabad, India

SIEL@ERD

Page 2: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

ERD ChallengeSIEL@ERD team from IIIT, Hyderabad

SIEL@ERD team from IIIT, Hyderabad

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a

given entity collection[1] or knowledge base.

http://www.freebase.com/m/046yc7

SIEL@ERD

Page 3: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

SIEL@ERD systemTAGME[1] system with time and performance optimizations

Mention detection • Reduce the number of DB look-ups.

Disambiguation • Use (1-δ) instead of δ• Prominent senses restriction •

Pruning•

SIEL@ERD

Page 4: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Data Preprocessing and Measures

Inlinks are links made from anchor to wikipedia article. Indexes Process English Wikipedia dump to create three indexes1. In-Link Graph Index2. Anchor Dictionary3. WikiTitlePageId IndexMeasures 4. link frequency link(a)5. total frequency freq(a)6. pages linking to anchor(a) Pg(a), 7. Prior probability Pr(p/a)8. Link Probability lp(a),9. Wikipedia Link-based Measure (δ)[3]

SIEL@ERD

Page 5: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Optimization - Mention detection

mention is any word or group of words that can potentially identify an entity.

Checking every word (and word group) for DB presence, increases the number of DB look-ups.Reduce the number of mention candidates - Mention filtering methods.1. Stopword filtering2. Twitter POS Filtering

SIEL@ERD

Page 6: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Optimization - Mention detection

Mention filtering methods.

SIEL@ERD

1. Stopword Filtering : If the mention identified in the givenquery text contains only stopwords, we ignore that mention.We use the standard JMLR stopword list.2. Twitter POS Filtering : The query text is Part-Of-Speech(POS) tagged with a tweet POS tagger [12]. Mentions that donot contain at least one word with POS tag as NN (indicatingnoun) are ignored.RUNS : Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48)

Page 7: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Optimization - DisambiguationIdentify all senses of the mention and choose the right one.1. For identical pages, the δ should be 1. So we measured Relatedness between pages as

2. Prominent senses restriction 3. Disambiguation score for a mention a from candidate sense Pa

RUNS: Run3 achieved an F1 of 0.483

SIEL@ERD

Page 8: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Optimization – PruningIdentify and discard senses that are not semantically coherent

SIEL@ERD

Coherence is defined as the average relatedness between the given sense pa and the senses assigned to all other anchors.Pruning score combines coherence and link probability

RUNS : Run6

Page 9: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Results

SIEL@ERD

RUN # Description F1

1 Base System* 0.53**

2 Disambiguation score uses Pr(p/a) instead of lp(a) 0.50**

3 Threshold Combination + Stopword Filtering + Prominent senses restriction

0.483

4 Linear Combination + Non-normalized vote + single-row anchor index + Singleton Object

0.472

5 TPOS Filtering 0.483

6 Pruning score uses lp(a) instead of Pr(p/a) 0.44

7 Stopword Filtering 0.53

**Evaluated on 100 query set*Base System : Linear Combination + TPOS Filtering + Normalized vote + Multi-row anchor index

Page 11: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

References [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum,2014

[2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010

[3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 2008.