Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and...

Semantics-Based News Recommendation

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)

June 14, 2012

Michel Capellemichelcapelle@gmail.com

Marnix Moerlandmarnix.moerland@gmail.com

Flavius Frasincarfrasincar@ese.eur.nl

Frederik Hogenboomfhogenboom@ese.eur.nl

Erasmus University RotterdamPO Box 1738, NL-3000 DRRotterdam, the Netherlands

Introduction (1)• Recommender systems help users to plough through

a massive and increasing amount of information

• Recommender systems:– Content-based– Collaborative filtering– Hybrid

• Content-based systems are often term-based

• Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]

Introduction (2)• One could take into account semantics:

– Semantic Similarity (SS) recommenders:• Jiang & Conrath [1997]• Leacock & Chodorow [1998]• Lin [1998]• Resnik [1995]• Wu & Palmer [1994]

– Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):

• Reduces noise caused by non-meaningful terms• Yields less terms to evaluate• Allows for semantic features, e.g., synonyms• Relies on a domain ontology• Published at WIMS 2011

Introduction (3)• One could take into account semantics:

– Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF):

• Similar to CF-IDF• Does not rely on a domain ontology

• Implementations in Ceryx (as a plug-in for Hermes [Frasincar et al., 2009], a news processing framework)

• What is the performance of semantic recommenders?– TF-IDF vs. SF-IDF– TF-IDF vs. SS

Framework: User Profile• User profile consists of all read news items

• Implicit preference for specific topics

Framework: Preprocessing• Before recommendations can be made, each news

item is parsed:– Tokenizer– Sentence splitter– Lemmatizer– Part-of-Speech

Framework: Synsets• We make use of the WordNet dictionary and WSD

• Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):– Turkey:

• turkey, Meleagris gallopavo (animal)• Turkey, Republic of Turkey (country)• joker, turkey (annoying person)• turkey, bomb, dud (failure)

– Fly:• fly, aviate, pilot (operate airplane)• flee, fly, take flight (run away)

• Synsets are linked using semantic pointers– Hypernym, hyponym, …

Framework: TF-IDF• Term Frequency: the occurrence of a term ti in a

document dj, i.e.,

• Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,

• And hence

jiji n

|}:{|||log

jii dtj

ijiji idftfidftf ,,-

Framework: SF-IDF• Synset Frequency: the occurrence of a synset si in a

document dj, i.e.,

• Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,

• And hence

jiji n

|}:{|||log

jii dsj

ijiji idfsfidfsf ,,-

Framework: SS (1)• TF-IDF and SF-IDF use cosine similarity:

– Two vectors: • User profile items scores• News message items scores

– Measures the cosine of the angle between the vectors

• Semantic Similarity (SS):– Two vectors:

• User profile synsets• News message synsets

– Jiang & Conrath [1997], Resnik [1995] , and Lin [1998]: information content of synsets

– Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets

Framework: SS (2)• SS score is calculated by computing the pair-wise

similarities between synsets in the unread document u and the user profile r:

where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.

),()( ),(

rusimurank Wru

Implementation: Hermes• Hermes framework is utilized for building a news

personalization service for RSS

• Its implementation is the Hermes News Portal (HNP):– Programmed in Java– Uses OWL / SPARQL / Jena / GATE / WordNet

Implementation: Ceryx• Ceryx is a plug-in for HNP

• Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD

• Main focus is on recommendation support

• User profiles are constructed

• Computes TF-IDF, SF-IDF, and SS

Evaluation (1)• Experiment:

– We let 19 participants evaluate 100 news items– User profile: all articles that are related to Microsoft, its

products, and its competitors– Ceryx computes TF-IDF, SF-IDF, and SS with cut-off of 0.5– Measurements:

• Accuracy• Precision• Recall• Specificity• F1-measure• t-tests for determining significance

Evaluation (2)• Results:

– SF-IDF significantly outperforms TF-IDF– Almost all SS methods significantly outperform TF-IDF

Measure TF-IDF SF-IDF J&C L&C L R W&P

Accuracy 78.2% 80.1% 78.3% 59.5% 38.1% 74.5% 58.5%

Precision 77.4% 77.8% 64.2% 33.7% 19.9% 56.4% 35.3%

Recall 22.0% 35.9% 29.3% 63.5% 49.7% 40.0% 73.6%

Specificity 97.2% 94.7% 94.6% 57.9% 34.0% 86.3% 52.6%

F1-measure 32.0% 46.8% 38.4% 43.2% 27.7% 42.8% 47.1%

Conclusions• Common recommendation is performed using TF-IDF

• Semantics could be considered by considering synsets:– SF-IDF– SS

• Semantics-based recommendation outperforms the classic term-based recommendation

• Future work:– Employ also the similarity of words (e.g., named entities)

missing from WordNet (e.g., based on the Google Distance)– Compare CF-IDF, SF-IDF, and SS with LDA (latent dirichlet

allocation) and ESA (explicit semantic analysis)

Questions

Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and...

Documents

Transcript of Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and...

Julien Capelle, J Gilgert, I Dmytrakh, G Pluvinage

Wire Information Management System (WIMS)...WIMS Release Notes Wire Information Management System (WIMS) 2.0 includes additional functionality, enhancements, and defect corrections.

WIMS User Guide. Accessing WIMS2 Objectives Obtain Password for WIMSObtain Password for WIMS Logon to WIMSLogon to WIMS Navigation MethodsNavigation Methods.

SYLLABUS 6 - WIMS

Booklet - Capelle aan den Ijssel 2011

WIMS 2015 International Conference on Web Intelligence, Mining and Semantics 22 – 24 June | Larnaca, CYPRUS.

capelle...REUNION 111 Hippodrome International de La Capelle en Thiérache Dimanche 4 Mai 2014 Début des opérations : 15 h 05

Yves CAPELLE, GNSS applications & services Business Development – Telespazio France

Capelle Campus Kids gids

WIMS Capstone Proposal DSP Demo

WIMS User Guide - National Interagency Fire Center · WIMS User Guide . Accessing WIMS 2 Objectives • Obtain Password for NAP (NESS Application Portal) • Login to WIMS via NAP

WIMS-AECL Course Notes57 10. VALIDATION OF WIMS-IST FOR STANDARD CANDU WIMS-IST is defined to be WIMS-AECL Release 2-5d, used with the ENDF/B-VI-based NDAS library Version 1a and the

CAPELLE AAN DEN IJSSEL · F M 0 0 6 Capelle an den Ijssel page 1 CITY 1. program The program of re-development for the Europan site in Capelle an den Ijssel has been constructed registering

Capturing Remote D ata in WIMS with

IJssel & Lekstreek Capelle week47

Leading With Integrity For Wims Creighton University

Hachette - Taxi! Methode de français - Guy Capelle - Robert Menand

Hach WIMS Indirect Server-Side Interface to Cimplicity … · Hach WIMS Indirect Server-Side Interface to Cimplicity SQL Q12495 Documentation

SHOWCASING 22623 WIMS RD, CLARKSBURG, MD 20871

Wims paper (1)