Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp...

22
Recommending From Experience Francisco J Peña 30 Oct 2015 [email protected]

Transcript of Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp...

Page 1: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Recommending From Experience

Francisco J Peña

30 Oct 2015

[email protected]

Page 2: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Recommender Systems

30 Oct 2015 Insight Centre for Data Analytics Slide 2

Page 3: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Recommender Systems

Collaborative Filtering User reviews

Sentiment Analysis

Feature Extraction

Context Extraction

30 Oct 2015 Insight Centre for Data Analytics Slide 3

Page 4: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Context

30 Oct 2015 Insight Centre for Data Analytics Slide 4

Page 5: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Context

30 Oct 2015 Insight Centre for Data Analytics Slide 5

Page 6: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Context

30 Oct 2015 Insight Centre for Data Analytics Slide 6

Page 7: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Context in Recommender Systems

• Most of them predefine context •  Small number of features •  Small number of values

Context Aware Recommender Systems

•  Context is richer, is open-ended •  Birthday, anniversary, parking, accessibility, eat-in vs. take

away, pet friendly, …

Open-ended context is very wide

30 Oct 2015 Insight Centre for Data Analytics Slide 7

I’m travelling for: Work Leisure Companion: Solo Couple Family

Page 8: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Motivation

•  Discover open-ended contextual information from user reviews. Context doesn’t have to be predefined.

•  Incorporate the open-ended contextual information together with ratings into a context sensitive recommender system.

•  Make recommendations that better satisfy user goals.

30 Oct 2015 Insight Centre for Data Analytics Slide 8

Page 9: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Assumptions

•  “During the summer, we like to take a mini staycation. This year it was extra special as we also got engaged. Our stay at the Biltmore was just fantastic. The service exceptional, the food amazing.

Specific Review

•  “Nice hotel, all the amenities you need, great complex of pools.

Generic Review

30 Oct 2015 Insight Centre for Data Analytics Slide 9

Page 10: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Assumptions

•  “During the summer, we like to take a mini staycation. This year it was extra special as we also got engaged. Our stay at the Biltmore was just fantastic. The service exceptional, the food amazing.

Specific Review

•  “Nice hotel, all the amenities you need, great complex of pools.

Generic Review

30 Oct 2015 Insight Centre for Data Analytics Slide 10

Specific reviews contain more contextual information than generic ones.

Page 11: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Rich Context (RC)

RC

RCMiner

RCRecommender

30 Oct 2015 Insight Centre for Data Analytics Slide 11

Page 12: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

RCMiner

30 Oct 2015 Insight Centre for Data Analytics Slide 12

Reviews Classifier

SpecificReviews

GenericReviews

LDA TopicModel Filter Contextual

Topics

Inspired by [Bauman&Tuzhilin2014]

Page 13: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

RCMiner

30 Oct 2015 Insight Centre for Data Analytics Slide 13

Reviews Classifier

SpecificReviews

GenericReviews

LDA TopicModel Filter Contextual

Topics

Inspired by [Bauman&Tuzhilin2014]

Page 14: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Classification

30 Oct 2015 Insight Centre for Data Analytics Slide 14

•  300 tagged reviews •  Logistic Regression •  Feature Subset Selection

•  Number of words •  Number of verbs in past

tense

Page 15: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

LDA (Latent Dirichlet Allocation)

30 Oct 2015 Insight Centre for Data Analytics Slide 15

“During the summer, we like to take a mini staycation. This year it was extra special as we also got engaged. Our stay at the Biltmore was just fantastic. The service exceptional, the food amazing- it was great at the pool, Wrights and also at Frank and Alberts. The only reason I am not giving it a full 5 stars is the 'upgraded' room was just a nice basic room. Though it was certainly nice, it wasnt what I expected for being the Biltmore. However, everything else certainly lived up to that expectation”.

“During the summer, we like to take a mini staycation. This year it was extra special as we also got engaged. Our stay at the Biltmore was just fantastic. The service exceptional, the food amazing- it was great at the pool, Wrights and also at Frank and Alberts. The only reason I am not giving it a full 5 stars is the 'upgraded' room was just a nice basic room. Though it was certainly nice, it wasnt what I expected for being the Biltmore. However, everything else certainly lived up to that expectation”.

Summer (0.04) Weekend (0.02) Monday (0.01) …

Holiday (0.05) Romantic (0.03) Staycation (0.01) …

Room (0.05) Pool (0.04) Sauna (0.01) …

Free (0.03) Cheap (0.02) Expensive (0.01) …

Topic Distribution of the Review

0

0.25

0.5

0.75

1.0

•  Each document is a mixture of topics

•  Each topic is composed of words that co-occur along documents

Page 16: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Filtering

•  Apply the topic model to both specific and generic reviews.

•  Count the number of times topics appear in specific and generic reviews.

•  The ones that appear more frequently in specific reviews are labeled as contextual topics.

30 Oct 2015 Insight Centre for Data Analytics Slide 16 Specific

Generic

Topic Distribution Among Reviews

0

0.25

0.5

0.75

1.0

Contextual topics

Page 17: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

RCRecommender

30 Oct 2015 Insight Centre for Data Analytics Slide 17

Context k-NN

Reviews&

Ratings

RecommendationUser Text

Context&

Ratings

Page 18: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Rating Prediction

30 Oct 2015 Insight Centre for Data Analytics Slide 18

Inspired by [Cheng,Burke&Mobasher2013]

Traditional k-NN RC Neighbourhood Most similar users Users who have rated the item

in a similar context. User Similarity Based on rating similarity Based on rating similarity,

weighting each rating by its context similarity.

Aggregated rating

Average weighted by user similarity

Average weighted by user similarity which includes context similarity.

Page 19: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Evaluation

30 Oct 2015 Insight Centre for Data Analytics Slide 19

•  Yelp Data Challenge dataset. •  5034 hotel reviews. •  5579 spa reviews.

•  5-fold cross validation. •  Compared against a k-NN user-based baseline

recommender.

Change Over User Baseline

Dataset RMSE Coverage RMSE Coverage

Hotels 1.30 8% +13.7% +74.6%

Spas 1.32 3% +18.2% +34.8%

Top-N Recall

Change Over User Baseline

0.56 +49.5%

0.51 +59.6%

Page 20: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Final Thoughts

•  RC is capable of extracting open-ended context from user reviews.

•  It integrates open-ended contextual information into a k-NN based model.

Conclusions

•  Improve RCMiner by designing new features and using different classifiers.

•  Improve the coverage of RCRecommender on sparse datasets.

Future Work

30 Oct 2015 Insight Centre for Data Analytics Slide 20

Page 21: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

Thanks!

[email protected]

Page 22: Recommending From Experience · Insight Centre for Data Analytics 30 Oct 2015 Slide 19 • Yelp Data Challenge dataset. • 5034 hotel reviews. • 5579 spa reviews. • 5-fold cross

References

[1] K. Bauman and A. Tuzhilin. Discovering contextual information from user reviews for recommendation purposes. In 1st Workshop on New Trends in Content-based Recommender Systems, pages 2–9, 2014.

[2] L. Chen, G. Chen and F. Wang. Recommender systems based on user reviews: the state of the art. User Modeling and User- Adapted Interaction, 25(2):99–154, 2015.

[3] Y. Zheng, R. Burke, and B. Mobasher. Recommendation with differential context weighting. In User Modeling, Adaptation, and Personalization, LNCS 7899, pages 152–164, 2013.

30 Oct 2015 Insight Centre for Data Analytics Slide 22