Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named...
-
Upload
mediaeval2012 -
Category
Documents
-
view
704 -
download
2
description
Transcript of Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named...
![Page 1: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/1.jpg)
ELIS – Multimedia Lab
Tom De NiesPedro Debevere, Davy Van Deursen, Wesley De Neve, Erik
Mannens and Rik Van de Walle
Ghent University – IBBT – Multimedia Lab
MediaEval: Search and Hyperlinking4-5 October, Pisa, Italy
![Page 2: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/2.jpg)
2
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
1. Create enriched representation of videos and queries
2. Apply multiple similarity metrics3. Merge results by late fusion
Our approach in a nutshell
![Page 3: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/3.jpg)
3
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Enriched Data Representation
![Page 4: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/4.jpg)
4
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
AdvantagesComparable queries and videosExtra metadata containing disambiguated conceptsEasy conversion from video to query object
→ possible to use same approach for Search and Linking!Disadvantages
o Enrichment step when ingesting data can take a whileo Only English NER tools → automatic translation step for
other languages
Enriched Data Representation
![Page 5: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/5.jpg)
5
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
1. Create enriched representation of videos and queries
2. Apply multiple similarity metrics3. Merge results by late fusion
![Page 6: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/6.jpg)
6
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
1. “Bag of words” similarity2. Named Entity-based similarity3. Tag-based similarity
Similarity metrics
![Page 7: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/7.jpg)
7
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Bag of Words similarity
TEXTSTOP WORD
REMOVAL
TEXTWITHOUT
STOPWORDS
CALCULATE TERM FREQUENCY
(TF)
CALCULATE INVERSE
DOCUMENT FREQUENCY (IDF)
FOREACHWORD
TF-IDFVECTOR
ofTF-IDF weights:
TF(t,D) = # of occurrences of
t in D
𝐼𝐷𝐹 (𝑡)=log ¿𝐷∨ ¿¿𝑑∈𝐷 :𝑡∈𝑑∨¿
¿¿
![Page 8: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/8.jpg)
8
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Similarity between two documents=
Cosine similarity of their TF-IDF vectors
Bag of Words similarity
![Page 9: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/9.jpg)
9
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Bag of Words similarity
Both corpus & documents taken into account
Common words get lower weight to exploit unique features
Expensive training step (IDF initialization)
No semantics → ambiguity
![Page 10: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/10.jpg)
10
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Named Entities are extracted from content
Similar content will have similar entities!
Named Entity-based Similarity
![Page 11: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/11.jpg)
11
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Cosine similarity of vectors as in Bag of Words except:
Words → Named EntitiesIDF → inverse support (IS)
TF-IS weights:
Named Entity-based Similarity
![Page 12: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/12.jpg)
12
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Named Entity-based Similarity
Less entities than terms → less calculations than BoW
Named Entities are unambiguous
IDF → IS : no indexing of corpus required
Lower precision / coarser granularity than BoW
![Page 13: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/13.jpg)
13
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Tags expanded with synonyms for maximal recall(WordNet)
Jaccard Similarity
Tag-based similarity
![Page 14: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/14.jpg)
14
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Tag-based similarity
Uses user-generated metadata
Very coarse granularity / Low precision
Synonyms for higher recall
![Page 15: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/15.jpg)
15
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
1. Create enriched representation of videos and queries
2. Apply multiple similarity metrics3. Merge results by late fusion
![Page 16: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/16.jpg)
16
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Late Fusion
![Page 17: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/17.jpg)
17
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
RunMRR mGAP MASP
60 30 10 60 30 10 60 30 10
1 (LIMSI: BoW+NE)
0.188 0.15 0.11
70.120
0.089
0.033
0.066
0.066
0.061
2 (LIUM: BoW+NE)
0.254
0.187
0.054
0.140
0.069
0.033
0.046
0.046
0.028
3 (LIMSI: BoW+NE+Tags)
0.165
0.128
0.094
0.099
0.069
0.017
0.061
0.061
0.057
4 (LIUM: BoW+NE+Tags)
0.221
0.154
0.038
0.115
0.053
0.017
0.040
0.041
0.023
Evaluation: Search
![Page 18: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/18.jpg)
18
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Unexpected:• LIUM > LIMSI, even though LIMSI had better language
detection → due to automatic translation?
• NE + BoW > NE + BoW + Tags→ Tags give false positives higher rank and find more results, so MRR decreases
Evaluation: Search
![Page 19: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/19.jpg)
19
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Evaluation: Search
Run Precision @60 Recall @60
1 (LIMSI: BoW+NE) 0.056 0.40
2 (LIUM: BoW+NE) 0.061 0.467
3 (LIMSI: BoW+NE+Tags) 0.054 0.433
4 (LIUM: BoW+NE+Tags) 0.059 0.50
![Page 20: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/20.jpg)
20
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
MAP (Ground Truth)
MAP (Search results)
LIMSI (BoW + NE) 0.157 0.014
LIUM (BoW + NE) 0.171 0.040
LIMSI (BoW + NE + Tags)
0.157 0.003
LIUM (BoW + NE + Tags) 0.171 0.037
Evaluation: Linking
Possible explanations:• Thresholds optimized for Search task, not for Linking• User-generated tags vs. extracted tags
… to be investigated!
![Page 21: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/21.jpg)
21
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
• Better ranking criteria / late fusion• Improve tag-similarity• Optimize parameters for linking
Improvements / Future Work
![Page 22: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities](https://reader034.fdocuments.us/reader034/viewer/2022052617/547bd1a7b4795990098b4ee7/html5/thumbnails/22.jpg)
22
ELIS – Multimedia Lab
MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)
05/10/2012
Discussion
These research activities were funded by Ghent University, IBBT, the IWT Flanders, the FWO-Flanders, and the European Union, in the context of the IBBT project Smarter Media in Flanders (SMIF).