EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús...

35
ACCURACY AND BEYOND GITHUB.COM/HCORONA/AICS-2016 EVALUATING RECOMMENDER SYSTEMS 24-10-2016 HUMBERTO CORONA @TOTOPAMPIN

Transcript of EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús...

Page 1: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

A C C U R A C Y A N D B E Y O N D

G I T H U B . C O M / H C O R O N A / A I C S - 2 0 1 6

E VA L U AT I N G R E C O M M E N D E R

S Y S T E M S

2 4 - 1 0 - 2 0 1 6

H U M B E R TO C O R O N A @ TO TO PA M P I N

Page 2: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2

A B O U T M E

Page 3: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3

R E F E R E N C E S

[1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative Performance of Neighbourhood-Based Recommender Systems." Spanish Conference of Information Retrieval, 2014

[2] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative Performance of Collaborative Filtering Recommender Systems." Journal of Universal Computer Science 21.13 (2015): 1849-1868.

Page 4: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

https://www.zalando.co.uk/women-street-style/https://www.zalando.co.uk/men-street-style/

4

Z A L A N D O

Page 5: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

5

R E C O M M E N D E R S Y S T E M S

Enable content discovery by learning the user preferences and exploiting the wisdom of the crowd.

Page 6: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

6

E VA L U AT I O N

Page 7: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

7

D I V E R S I T Y P O P U L A R I T Y C ATA L O G C O V E R A G E

P E R U S E R I T E M

C O V E R A G EU N I Q U E N E S S

E VA L U AT I O N M E T R I C S

P R E C I S I O N R E C A L L F - 1R M S E

Page 8: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

8

P R E C I S I O N R E C A L L F - 1

E VA L U AT I O N M E T R I C S , A C C U R A C Y

R M S E

Page 9: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

9

D I V E R S I T Y P O P U L A R I T Y C ATA L O G C O V E R A G E

P E R U S E R I T E M

C O V E R A G EU N I Q U E N E S S

E VA L U AT I O N M E T R I C S , B E Y O N D A C C U R A C Y

Page 10: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 0

D I V E R S I T Y

E VA L U AT I O N M E T R I C S

Page 11: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

11

E VA L U AT I O N M E T R I C S

P O P U L A R I T Y

Page 12: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 2

E VA L U AT I O N M E T R I C S

C ATA L O G C O V E R A G E

The proportion of items, across the catalog, which are candidates for recommendations.

Proportion of items which ever get recommended.

P E R U S E R I T E M

C O V E R A G E

Page 13: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 3

U N I Q U E N E S S

E VA L U AT I O N M E T R I C S

Page 14: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 4

D I V E R S I T Y P O P U L A R I T Y C ATA L O G C O V E R A G E

P E R U S E R I T E M

C O V E R A G EU N I Q U E N E S S

E VA L U AT I O N M E T R I C S

P R E C I S I O N R E C A L L F - 1R M S E

Page 15: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 5

D I V E R S I T Y P O P U L A R I T Y C ATA L O G C O V E R A G E

P E R U S E R I T E M

C O V E R A G EU N I Q U E N E S S

E VA L U AT I O N M E T R I C S

P R E C I S I O N R E C A L L F - 1R M S E

Page 16: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 6

A R E U K N N A N D I K N N R E A L LY T H AT D I F F E R E N T ?

A C O M PA R AT I V E A N A LY S I S

Page 17: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 7

T H E D ATA

T R A I N I N G D ATA

T E S T I N G D ATA

E X P E R I M E N T D E S I G N

1 0 I T E M S T E S T S E T

T H E M O D E L S

U K N NI K N N

E VA L U AT I O N

A C C U R A C Y B E Y O N D

A C C U R A C Y U K N N [ 2 0 , 2 0 0 ]

M O V I E L E N S - 1 0 0 K

M O V I E L E N S - 1 M

I K N N F I X E D

Page 18: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 8

U S E R B A S E D C O L L A B O R AT I V E

F I LT E R I N G ( U K N N )

I T E M - B A S E D C O L L A B O R AT I V E

F I LT E R I N G ( I K N N )

•Find similar users•word of mouth •The neighbours paradigm•Scales with number of users

•Find similar items•Scalable •Widely used

T H E A L G O R I T H M S

Page 19: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

1 9 Insert footnote

R E S U LT S

Page 20: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 0 Insert footnote

R E S U LT S

Page 21: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 1 Insert footnote

R E S U LT S

Page 22: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 2

S U M M A RY

Page 23: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 3

• One size fits all is not true, never, ever!• Use many metrics, even if you don’t optimise for them

• They help understanding what is the model doing• Use various datasets (if you want to publish a paper) - Do results generalise?• Understand what is the best proxy or dataset for your evaluation goal.

L E S S O N S L E A R N E D

Page 24: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 4

• User-based (UKNN) and item-based (UKNN) collaborative filtering algorithms have a high inverse correlation between popularity and diversity.

• Smaller neighbourhood sizes (for UKNN) lead to more unique, less popular, and more diverse recommendations.

• Recommend a common set of items at large neighbourhood sizes.

• Matrix factorisation approach (WMF) leads to more accurate and diverse recommendations, while being less biased toward popularity.

• item-based collaborative filtering (IKNN) has significantly better catalog coverage.

C O N C L U S I O N S

Page 25: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

A C C U R A C Y A N D B E Y O N D

G I T H U B . C O M / H C O R O N A / A I C S - 2 0 1 6

E VA L U AT I N G R E C O M M E N D E R

S Y S T E M S

2 4 - 1 0 - 2 0 1 6

H U M B E R TO C O R O N A @ TO TO PA M P I N

Page 26: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 6

E X P E R I M E N T I I

Page 27: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 7

A B I A S A N A LY S I S

Page 28: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 8

T H E D ATA

T R A I N I N G D ATA

T E S T I N G D ATA

E X P E R I M E N T D E S I G N

1 0 F O L D C R O S S VA L I D AT I O N

T H E M O D E L S

U K N NI K N NW M F

E VA L U AT I O N

A C C U R A C Y

B E Y O N D A C C U R A C Y

A C C U R A C Y O P T I M I S AT I O N

S I G N I F I C A N C E

FA C E B O O K D ATA S E T

M O V I E L E N S -H E T R E C

L A S T F M - H E T R E C

Page 29: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

2 9

T H E D ATA S E T S

FA C E B O O K D ATA S E T

M O V I E L E N S -H E T R E C

L A S T F M - H E T R E C

M U S I C / B A N D S

M O V I E S

M U S I C / B A N D S

Page 30: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 0

U S E R B A S E D C O L L A B O R AT I V E

F I LT E R I N G ( U K N N )

I T E M - B A S E D C O L L A B O R AT I V E

F I LT E R I N G ( I K N N )

M AT R I X FA C TO R I S AT I O N

( W E I G H T E D )

• Find similar users• word of mouth • The neighbours paradigm• Scales with number of users

• Find similar items• Scalable • Widely used

• Latent Factors • Really good accuracy• Scalable • Parallel computing• Very accurate

T H E A L G O R I T H M S

Page 31: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 1

E VA L U AT I O N M E T R I C S

• PRECISION: Out of the items recommended, how many are good recommendations?

• RECALL: How many of the items the user likes are being recommended?

• F-1: Mixes the properties of Precision and Recall into a single metric

• DIVERSITY: How different are the items in the list of the recommendations?

• POPULARITY: How popular are the items recommended • (PER USER) ITEM COVERAGE: Proportion of items that are candidates for recommendations

• CATALOG COVERAGE: The proportion of items of the catalog that ever get recommended • UNIQUENESS: How many items in two recommendation lists are different from each other?

Page 32: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 2

R E S U LT S

Page 33: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 3

R E S U LT S - P O P U L A R I T Y B I A S

Page 34: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 4

R E S U LT S - O T H E R P R O P E R T I E S

•Accuracy: WMF performs best in terms of F-1 for the Facebook and MovieLens datasets, while the accuracy of the UKNN and IKNN algorithms are similar.

•Per-user item coverage •WMF algorithm considers almost every item as a candidate (UICov > 98%).•The UKNN algorithm (by definition) only items which are in the user’s neighbourhood can be considered as recommendation candidates. IKNN was seen to outperform UKNN in all datasets in terms of

•Coverage: the IKNN algorithm, performs significantly better than the other algorithms, covering up to 30% of the item catalog - Up to 6 times more items than the UKNN and WMF algorithms.

•Diversity: the WMF algorithm performs better, with a performance around 9% higher on average than the best neighbourhood-based approach

Page 35: EVALUATING RECOMMENDER SYSTEMS - Home | Ml … · 2020-03-13 · 3 REFERENCES [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative

3 5

R E S U LT S - C O N S I S T E N C Y

•Important to evaluate in different datasets. • MovieLens dataset, (3 times more dense than the Facebook and LastFM datasets), the catalog coverage of the IKNN algorithm is ∼ 10 times smaller than for the LastFM and Facebook datasets.