Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th...

Similarity & Recommendation

Arjen P. de Vries

arjen@cwi.nl CWI Scientific Meeting

September 27th 2013

Recommendation

• Informally:– Search for information “without a query”

• Three types:– Content-based recommendation– Collaborative filtering (CF)

• Memory-based• Model-based

– Hybrid approaches

Recommendation

• Informally:– Search for information “without a query”

• Three types:– Content-based recommendation– Collaborative filtering

• Memory-based• Model-based

– Hybrid approaches

Today’s focus!

Collaborative Filtering• Collaborative filtering (originally introduced by

Patti Maes as “social information filtering”)

1. Compare user judgments2. Recommend differences between

similar users

• Leading principle:People’s tastes are not randomly distributed– A.k.a. “You are what you buy”

Collaborative Filtering• Benefits over content-based approach

– Overcomes problems with finding suitable features to represent e.g. art, music

– Serendipity– Implicit mechanism for qualitative aspects like

• Problems: large groups, broad domains

Context• Recommender systems

– Users interact (rate, purchase, click) with items

Context• Nearest-neighbour recommendation methods

– The item prediction is based on “similar” users

Context• Nearest-neighbour recommendation methods

– The item prediction is based on “similar” users

Similarity

s( , ) sim( , )s( , )

Research Question

• How does the choice of similarity measure determine the quality of the recommendations?

Sparseness

• Too many items exist, so many ratings will be missing

• A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items

“Best” similarity?

• Consider cosine similarity vs. Pearson similarity

• Most existing studies report Pearson correlation to lead to superior recommendation accuracy

• Common variations to deal with sparse observations:– Item selection:

• Compare full profiles, or only on overlap

– Imputation:• Impute default value for unrated items

– Filtering:• Threshold on minimal similarity value

• Cosine superior (!), but not for all settings– No consistent results

Analysis

Distance Distribution

• In high dimensions, nearest neighbour is unstable:If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour

Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

• Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)

NNk Graph

• Graph associated with the top k nearest neighbours

• Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood– Ignore similarity values (already included in

the distance distribution analysis)

NNk Graph

MRR vs. Features

• Quality:– If most of the user population is far away, high

similarity correlates with effectiveness– If most of the user population is close, high

similarity correlates with ineffectiveness

MRR vs. Features

Conclusions (so far)

• “Similarity features” correlate with recommendation effectiveness– “Stability” of a metric (as defined in database

literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours

Future Work

• How to exploit this knowledge to now improve recommendation systems?

News Recommendation Challenge

Thanks

• Alejandro Bellogín – ERCIM fellow in the Information Access group

Details: Bellogín and De Vries, ICTIR 2013.

Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th...

Documents

Transcript of Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th...

Arjen Holtzer - IPv6 Deployment Monitoring Survey

MonetDB/XQuery InfoMgmt 2009 Peter Boncz boncz@cwi.nl (CWI Amsterdam) Querying XML Data Sources using MonetDB/XQuery.

Arjen Lenstra Introduction to Cryptology

Quantum Pascal’s Triangle and Sierpinski’s carpet · Quantum Pascal’s Triangle and Sierpinski’s carpet Tom Bannink bannink@cwi.nl Harry Buhrmanz buhrman@cwi.nl August 2017

eGovernment12 - Arjen Hof - GovUnited

Coping with copies on the Web: Investigating Deduplication by Major Search Engines Wouter.Mettrop@cwi.nl CWI, Amsterdam, The Netherlands Paul.Nieuwenhuysen@vub.ac.be.

02. Arjen de Ruiter, Bol.com

Efficient and Flexible Information Retrieval Using MonetDB/X100 Sándor Héman CWI, Amsterdam Marcin Zukowski, Arjen de Vries, Peter Boncz January 08, 2007.

Rudi Cilibrasi CWI CWI and University of Amsterdam · 2008-02-01 · arXiv:cs/0312044v2 [cs.CV] 9 Apr 2004 Clustering by Compression Rudi Cilibrasi∗ CWI Paul Vitanyi† CWI and

keijzer@cwi.nl arXiv:1611.05342v1 [cs.GT] 16 Nov 2016 · 3 Centrum Wiskunde & Informatica (CWI), Amsterdam, keijzer@cwi.nl 4 Sapienza University of Rome, leonardi@diag.uniroma1.it

11/1/2005StreetTIVO Arjen P. de Vries arjen@acm.org.

Search by strategy Arjen P. de Vries arjen@acm.org arjen@acm.org CWI, Spinque, Delft University of Technology.

Arjen Kamphuis arjen@gendo.ch Infosec & counter-surveillance If you don't need this you lack ambition.

The Dynamics of Reaction-Diffusion Patterns Arjen Doelman (CWI & U of Amsterdam) (Rob Gardner, Tasso Kaper, Yasumasa Nishiura, Keith Promislow, Bjorn Sandstede)

Saffman-Taylor streamer discharges Fabian Brau, CWI Amsterdam Fabian Brau, CWI Amsterdam Alejandro Luque, CWI Amsterdam Alejandro Luque, CWI Amsterdam.

Scilens Infrastructure - :// fileIntroDesignHardwareSoftwareFuture SCILENS INFRASTRUCTURE HTTP://SCILENS.ORG Niels Nes Martin Kersten Arjen de Rijke 15-04-2016 Niels CWI Scilens Infrastructure

Database Techniques Martin Kersten @ cwi.nl manegold/teaching/DBtech.

Speeding up detection of SHA-1 collision attacks using ... · Marc Stevens1 and Dan Shumow2 1 CWI, Amsterdam, The Netherlands marc.stevens@cwi.nl 2 Microsoft Research danshu@microsoft.com

QSNABROCKER SCHRIFTEN ZUR MATHEMATIK - cwi.nl

Arjen Oosterhout - Real Time Drums