Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and...
-
Upload
domenic-ellis -
Category
Documents
-
view
215 -
download
0
description
Transcript of Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and...
![Page 1: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/1.jpg)
Semantics-Based News Recommendation
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
June 14, 2012
Michel [email protected]
Marnix [email protected]
Flavius [email protected]
Frederik [email protected]
Erasmus University RotterdamPO Box 1738, NL-3000 DRRotterdam, the Netherlands
![Page 2: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/2.jpg)
Introduction (1)• Recommender systems help users to plough through
a massive and increasing amount of information
• Recommender systems:– Content-based– Collaborative filtering– Hybrid
• Content-based systems are often term-based
• Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 3: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/3.jpg)
Introduction (2)• One could take into account semantics:
– Semantic Similarity (SS) recommenders:• Jiang & Conrath [1997]• Leacock & Chodorow [1998]• Lin [1998]• Resnik [1995]• Wu & Palmer [1994]
– Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):
• Reduces noise caused by non-meaningful terms• Yields less terms to evaluate• Allows for semantic features, e.g., synonyms• Relies on a domain ontology• Published at WIMS 2011
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 4: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/4.jpg)
Introduction (3)• One could take into account semantics:
– Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF):
• Similar to CF-IDF• Does not rely on a domain ontology
• Implementations in Ceryx (as a plug-in for Hermes [Frasincar et al., 2009], a news processing framework)
• What is the performance of semantic recommenders?– TF-IDF vs. SF-IDF– TF-IDF vs. SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 5: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/5.jpg)
Framework: User Profile• User profile consists of all read news items
• Implicit preference for specific topics
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 6: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/6.jpg)
Framework: Preprocessing• Before recommendations can be made, each news
item is parsed:– Tokenizer– Sentence splitter– Lemmatizer– Part-of-Speech
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 7: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/7.jpg)
Framework: Synsets• We make use of the WordNet dictionary and WSD
• Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):– Turkey:
• turkey, Meleagris gallopavo (animal)• Turkey, Republic of Turkey (country)• joker, turkey (annoying person)• turkey, bomb, dud (failure)
– Fly:• fly, aviate, pilot (operate airplane)• flee, fly, take flight (run away)
• Synsets are linked using semantic pointers– Hypernym, hyponym, …
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 8: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/8.jpg)
Framework: TF-IDF• Term Frequency: the occurrence of a term ti in a
document dj, i.e.,
• Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
k jk
jiji n
ntf
,
,,
|}:{|||log
jii dtj
Didf
ijiji idftfidftf ,,-
![Page 9: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/9.jpg)
Framework: SF-IDF• Synset Frequency: the occurrence of a synset si in a
document dj, i.e.,
• Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
k jk
jiji n
nsf
,
,,
|}:{|||log
jii dsj
Didf
ijiji idfsfidfsf ,,-
![Page 10: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/10.jpg)
Framework: SS (1)• TF-IDF and SF-IDF use cosine similarity:
– Two vectors: • User profile items scores• News message items scores
– Measures the cosine of the angle between the vectors
• Semantic Similarity (SS):– Two vectors:
• User profile synsets• News message synsets
– Jiang & Conrath [1997], Resnik [1995] , and Lin [1998]: information content of synsets
– Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 11: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/11.jpg)
Framework: SS (2)• SS score is calculated by computing the pair-wise
similarities between synsets in the unread document u and the user profile r:
where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
||
),()( ),(
W
rusimurank Wru
![Page 12: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/12.jpg)
Implementation: Hermes• Hermes framework is utilized for building a news
personalization service for RSS
• Its implementation is the Hermes News Portal (HNP):– Programmed in Java– Uses OWL / SPARQL / Jena / GATE / WordNet
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 13: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/13.jpg)
Implementation: Ceryx• Ceryx is a plug-in for HNP
• Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD
• Main focus is on recommendation support
• User profiles are constructed
• Computes TF-IDF, SF-IDF, and SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 14: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/14.jpg)
Evaluation (1)• Experiment:
– We let 19 participants evaluate 100 news items– User profile: all articles that are related to Microsoft, its
products, and its competitors– Ceryx computes TF-IDF, SF-IDF, and SS with cut-off of 0.5– Measurements:
• Accuracy• Precision• Recall• Specificity• F1-measure• t-tests for determining significance
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 15: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/15.jpg)
Evaluation (2)• Results:
– SF-IDF significantly outperforms TF-IDF– Almost all SS methods significantly outperform TF-IDF
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
Measure TF-IDF SF-IDF J&C L&C L R W&P
Accuracy 78.2% 80.1% 78.3% 59.5% 38.1% 74.5% 58.5%
Precision 77.4% 77.8% 64.2% 33.7% 19.9% 56.4% 35.3%
Recall 22.0% 35.9% 29.3% 63.5% 49.7% 40.0% 73.6%
Specificity 97.2% 94.7% 94.6% 57.9% 34.0% 86.3% 52.6%
F1-measure 32.0% 46.8% 38.4% 43.2% 27.7% 42.8% 47.1%
![Page 16: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/16.jpg)
Conclusions• Common recommendation is performed using TF-IDF
• Semantics could be considered by considering synsets:– SF-IDF– SS
• Semantics-based recommendation outperforms the classic term-based recommendation
• Future work:– Employ also the similarity of words (e.g., named entities)
missing from WordNet (e.g., based on the Google Distance)– Compare CF-IDF, SF-IDF, and SS with LDA (latent dirichlet
allocation) and ESA (explicit semantic analysis)
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)
![Page 17: Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle](https://reader035.fdocuments.us/reader035/viewer/2022062905/5a4d1b077f8b9ab059988c93/html5/thumbnails/17.jpg)
Questions
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)