Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes...
-
Upload
marjory-adams -
Category
Documents
-
view
214 -
download
1
Transcript of Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes...
![Page 1: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/1.jpg)
Similarity Measures for Query Expansion in TopX
Caroline Gherbaoui
Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I
Fachrichtung 6.2 - Informatik
Max-Planck-Institut für InformatikAG 5 - Datenbanken und Informationssysteme
Prof. Dr. Gerhard Weikum
![Page 2: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/2.jpg)
Overview
background knowledgesimilarity measures for the query expansionevaluation of the computed similarity valueschanges in TopXconclusion
![Page 3: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/3.jpg)
Background
top-k query processing provides k most relevant results
query expansion extends source query terms
word sense disambiguation extracts correct meaning
ontology amount of terms with their meanings and
semantic relations
![Page 4: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/4.jpg)
Word Sense Disambiguation
„java, coffee“
„java “
„island“
„coffee“
„programming language“
…
![Page 5: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/5.jpg)
Query Expansion
„COFFEE“ „drink, espresso“
![Page 6: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/6.jpg)
TopX
top-k retrieval enginetext and XML dataword sense disambiguationquery expansionontology
![Page 7: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/7.jpg)
TopX – WordNet Ontology
lexicon for the English languagehierarchical relationsone relation one direction~160,000 words~120,000 synsets~210,000 relations
![Page 8: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/8.jpg)
TopX – YAGO Ontology
Wikipedia and WordNethierarchical and not hierarchical relationsone relation two directions~2,100,000 words~2,200,000 concepts~6,000,000 relations
![Page 9: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/9.jpg)
Similarity Measures
Dice similarity the already used measure in TopX
NAGA similarity applied measure for YAGO
Best WordNet similarity measure with best result among WordNet
measures
![Page 10: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/10.jpg)
Dice Similarity Measure
sdfsdf
measures the intersection of two regions
BA
BABADICE
2
,
BFREQAFREQ
BAFREQBADICE
,2
,
![Page 11: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/11.jpg)
NAGA Similarity Measure
sdfasfsdf
combination of the confidence of a relation and the informativeness of a relation
BABAconfBANAGA ,inf1,,
n
iii wtrustwBAacc
nBAconf
1
,,1
,
)(
,,inf
AFREQ
BAFREQBA
)(
,,inf
BFREQ
BAFREQAB
![Page 12: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/12.jpg)
Best WordNet Similarity Measure
sdfsdfsdf
product of the transfer function of the path length and the transfer function of the concept depth
hflfBAWordNet 21,
lelf 1
hh
hh
ee
eehf
2
![Page 13: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/13.jpg)
Evaluation
All Relation Types
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
0 0 < 0.01 < 0.025 < 0.1 < 0.25 < 0.5 < 1
Dice (WordNet ontology)
Dice
NAGA backward
NAGA forward
Best WordNet
am
ou
nt
in %
![Page 14: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/14.jpg)
Evaluation
DICE measure applicable also on the YAGO ontology
NAGA measure applicable with omitting of the forward direction
Best WordNet measure not applicable due to the density of YAGO
![Page 15: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/15.jpg)
Changes for TopX
tuning of some procedures Dijkstra algorithm word sense disambiguation query expansion
extension of configuration file
![Page 16: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/16.jpg)
Conclusion
larger knowledge basemore flexibilityincreased complexityfurther measure for the similarity
computation NAGA similarity
![Page 17: Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.](https://reader036.fdocuments.us/reader036/viewer/2022082821/5697bfb91a28abf838ca00ea/html5/thumbnails/17.jpg)
Questions?