Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation...
-
Upload
drusilla-cummings -
Category
Documents
-
view
219 -
download
0
description
Transcript of Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation...
![Page 1: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/1.jpg)
Exploiting Wikipedia Inlinks for Linking Entities in Queries
Entity Recognition and Disambiguation ChallengeACM SIGIR 2014 July 6-11, 2014The 37th Annual International ACM SIGIR Conference
[email protected] , [email protected], [email protected] , [email protected]
Priya Radhakrishnan, Romil Bansal, Manish Gupta, Vasudeva Varma
International Institute of Information Technology, Hyderabad, India
SIEL@ERD
![Page 2: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/2.jpg)
ERD ChallengeSIEL@ERD team from IIIT, Hyderabad
SIEL@ERD team from IIIT, Hyderabad
The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a
given entity collection[1] or knowledge base.
http://www.freebase.com/m/046yc7
SIEL@ERD
![Page 3: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/3.jpg)
SIEL@ERD systemTAGME[1] system with time and performance optimizations
Mention detection • Reduce the number of DB look-ups.
Disambiguation • Use (1-δ) instead of δ• Prominent senses restriction •
Pruning•
SIEL@ERD
![Page 4: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/4.jpg)
Data Preprocessing and Measures
Inlinks are links made from anchor to wikipedia article. Indexes Process English Wikipedia dump to create three indexes1. In-Link Graph Index2. Anchor Dictionary3. WikiTitlePageId IndexMeasures 4. link frequency link(a)5. total frequency freq(a)6. pages linking to anchor(a) Pg(a), 7. Prior probability Pr(p/a)8. Link Probability lp(a),9. Wikipedia Link-based Measure (δ)[3]
SIEL@ERD
![Page 5: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/5.jpg)
Optimization - Mention detection
mention is any word or group of words that can potentially identify an entity.
Checking every word (and word group) for DB presence, increases the number of DB look-ups.Reduce the number of mention candidates - Mention filtering methods.1. Stopword filtering2. Twitter POS Filtering
SIEL@ERD
![Page 6: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/6.jpg)
Optimization - Mention detection
Mention filtering methods.
SIEL@ERD
1. Stopword Filtering : If the mention identified in the givenquery text contains only stopwords, we ignore that mention.We use the standard JMLR stopword list.2. Twitter POS Filtering : The query text is Part-Of-Speech(POS) tagged with a tweet POS tagger [12]. Mentions that donot contain at least one word with POS tag as NN (indicatingnoun) are ignored.RUNS : Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48)
![Page 7: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/7.jpg)
Optimization - DisambiguationIdentify all senses of the mention and choose the right one.1. For identical pages, the δ should be 1. So we measured Relatedness between pages as
2. Prominent senses restriction 3. Disambiguation score for a mention a from candidate sense Pa
RUNS: Run3 achieved an F1 of 0.483
SIEL@ERD
![Page 8: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/8.jpg)
Optimization – PruningIdentify and discard senses that are not semantically coherent
SIEL@ERD
Coherence is defined as the average relatedness between the given sense pa and the senses assigned to all other anchors.Pruning score combines coherence and link probability
RUNS : Run6
![Page 9: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/9.jpg)
Results
SIEL@ERD
RUN # Description F1
1 Base System* 0.53**
2 Disambiguation score uses Pr(p/a) instead of lp(a) 0.50**
3 Threshold Combination + Stopword Filtering + Prominent senses restriction
0.483
4 Linear Combination + Non-normalized vote + single-row anchor index + Singleton Object
0.472
5 TPOS Filtering 0.483
6 Pruning score uses lp(a) instead of Pr(p/a) 0.44
7 Stopword Filtering 0.53
**Evaluated on 100 query set*Base System : Linear Combination + TPOS Filtering + Normalized vote + Multi-row anchor index
![Page 10: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/10.jpg)
Please visit our poster
SIEL@ERDSource code and Datasets : https
://github.com/priyaradhakrishnan0/Entity-Recognition-and-Disambiguation-Challenge
SIEL@ERD
![Page 11: Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.](https://reader036.fdocuments.us/reader036/viewer/2022082419/5a4d1b587f8b9ab0599aa19e/html5/thumbnails/11.jpg)
References [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum,2014
[2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010
[3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 2008.