Similarity & Recommendation Arjen P. de Vries [email protected] CWI Scientific Meeting September 27th...

32
Similarity & Recommendation Arjen P. de Vries [email protected] CWI Scientific Meeting September 27th 2013

Transcript of Similarity & Recommendation Arjen P. de Vries [email protected] CWI Scientific Meeting September 27th...

Page 1: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Similarity & Recommendation

Arjen P. de Vries

[email protected] CWI Scientific Meeting

September 27th 2013

Page 2: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Recommendation

• Informally:– Search for information “without a query”

• Three types:– Content-based recommendation– Collaborative filtering (CF)

• Memory-based• Model-based

– Hybrid approaches

Page 3: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Recommendation

• Informally:– Search for information “without a query”

• Three types:– Content-based recommendation– Collaborative filtering

• Memory-based• Model-based

– Hybrid approaches

Today’s focus!

Page 4: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Collaborative Filtering• Collaborative filtering (originally introduced by

Patti Maes as “social information filtering”)

1. Compare user judgments2. Recommend differences between

similar users

• Leading principle:People’s tastes are not randomly distributed– A.k.a. “You are what you buy”

Page 5: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Collaborative Filtering• Benefits over content-based approach

– Overcomes problems with finding suitable features to represent e.g. art, music

– Serendipity– Implicit mechanism for qualitative aspects like

style

• Problems: large groups, broad domains

Page 6: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Recommender systems

– Users interact (rate, purchase, click) with items

Page 7: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Recommender systems

– Users interact (rate, purchase, click) with items

Page 8: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Recommender systems

– Users interact (rate, purchase, click) with items

Page 9: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Recommender systems

– Users interact (rate, purchase, click) with items

Page 10: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Nearest-neighbour recommendation methods

– The item prediction is based on “similar” users

Page 11: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Context• Nearest-neighbour recommendation methods

– The item prediction is based on “similar” users

Page 12: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Similarity

Page 13: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Similarity

Page 14: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Similarity

s( , ) sim( , )s( , )

Page 15: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Research Question

• How does the choice of similarity measure determine the quality of the recommendations?

Page 16: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Sparseness

• Too many items exist, so many ratings will be missing

• A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items

Page 17: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

“Best” similarity?

• Consider cosine similarity vs. Pearson similarity

• Most existing studies report Pearson correlation to lead to superior recommendation accuracy

Page 18: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

“Best” similarity?

• Common variations to deal with sparse observations:– Item selection:

• Compare full profiles, or only on overlap

– Imputation:• Impute default value for unrated items

– Filtering:• Threshold on minimal similarity value

Page 19: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

“Best” similarity?

• Cosine superior (!), but not for all settings– No consistent results

Page 20: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Analysis

Page 21: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Distance Distribution

• In high dimensions, nearest neighbour is unstable:If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour

Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

Page 22: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Distance Distribution

Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

Page 23: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Distance Distribution

• Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)

Page 24: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Distance Distribution

Page 25: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

NNk Graph

• Graph associated with the top k nearest neighbours

• Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood– Ignore similarity values (already included in

the distance distribution analysis)

Page 26: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

NNk Graph

Page 27: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

MRR vs. Features

• Quality:– If most of the user population is far away, high

similarity correlates with effectiveness– If most of the user population is close, high

similarity correlates with ineffectiveness

Page 28: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

MRR vs. Features

Page 29: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Conclusions (so far)

• “Similarity features” correlate with recommendation effectiveness– “Stability” of a metric (as defined in database

literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours

Page 30: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Future Work

• How to exploit this knowledge to now improve recommendation systems?

Page 31: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

News Recommendation Challenge

Page 32: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013.

Thanks

• Alejandro Bellogín – ERCIM fellow in the Information Access group

Details: Bellogín and De Vries, ICTIR 2013.