Gl conference2014 recsys_and_text_chris

27
Demos for the next session https://s3.amazonaws.com/GraphLab-Datasets/demos/recommendation- systems.ipynb https://s3.amazonaws.com/GraphLab-Datasets/demos/matrix-factorization- demo.ipynb https://s3.amazonaws.com/GraphLab-Datasets/demos/text-analysis.ipynb Survey: https://www.surveymonkey.com/s/GraphLab2014TrainingDay

description

temp

Transcript of Gl conference2014 recsys_and_text_chris

Page 1: Gl conference2014 recsys_and_text_chris

Demos for the next session

https://s3.amazonaws.com/GraphLab-Datasets/demos/recommendation-systems.ipynb

https://s3.amazonaws.com/GraphLab-Datasets/demos/matrix-factorization-demo.ipynb

https://s3.amazonaws.com/GraphLab-Datasets/demos/text-analysis.ipynb

Survey: !https://www.surveymonkey.com/s/GraphLab2014TrainingDay

Page 2: Gl conference2014 recsys_and_text_chris

Recommendation systems and text analysis with GraphLab Create

Page 3: Gl conference2014 recsys_and_text_chris

Outline• Recommendation systems • Background • Computing item similarities • Matrix factorization methods

• Text analysis • Munging and preprocessing • Finding similar documents • Topic modeling

Page 4: Gl conference2014 recsys_and_text_chris

Recommendation systems with!GraphLab Create

Page 5: Gl conference2014 recsys_and_text_chris

Why recommendation systems?

Page 6: Gl conference2014 recsys_and_text_chris

user_id item_idAlex Game of Thrones

Alex True Detective

Alex House of Cards

Alex Usual Suspects

Bob Game of Thrones

Bob True Detective

Bob Vikings

Alice Game of Thrones

Alice True Detective

… …

Page 7: Gl conference2014 recsys_and_text_chris

Alex

Bob

Alice

Barbara

user_id item_idAlex Game of Thrones

Alex True Detective

Alex House of Cards

Alex Usual Suspects

Bob Game of Thrones

Bob True Detective

Bob Vikings

Alice Game of Thrones

Alice True Detective

… …

GoT

True Detective

House of Cards

Usual Suspects

Vikings

SFrame SGraph

Page 8: Gl conference2014 recsys_and_text_chris

user_id item_idAlex Game of Thrones

Alex True Detective

Alex House of Cards

Alex Usual Suspects

Bob Game of Thrones

Bob True Detective

Bob Vikings

Alice Game of Thrones

Alice True Detective

… …

Similarity between True Detective and Usual Suspects: !# who watched both # who watched either !!

=

Alex

Bob

Alice

Barbara

GoT

True Detective

House of Cards

Usual Suspects

Vikings

24

Page 9: Gl conference2014 recsys_and_text_chris

user_id item_idAlex Game of Thrones

Alex True Detective

Alex House of Cards

Alex Usual Suspects

Bob Game of Thrones

Bob True Detective

Bob Vikings

Alice Game of Thrones

Alice True Detective

… …

For each item: • Accumulate statistics about

the number of users in common

• Rank top 100 nearest items

Alex

Bob

Alice

Barbara

GoT

True Detective

House of Cards

Usual Suspects

Vikings

Page 10: Gl conference2014 recsys_and_text_chris

>>> import graphlab >>> m = graphlab.recommender.create(data) >>> recs = m.recommend()

Getting recommendations for a set of users

>>> r = m.recommend(users=my_user)

Restricting recommendations to a particular set of items

Excluding previously seen observations

>>> r = m.recommend(items=candidates)

>>> r = m.recommend(exclude=ignore_these)

Creating a recommendation system in GraphLab Create

Page 11: Gl conference2014 recsys_and_text_chris

Demo time!

Page 12: Gl conference2014 recsys_and_text_chris

user_idd

item_id ratingAlex Game of Thrones 5

Alex True Detective 5

Alex House of Cards 5

Alex Usual Suspects 3

Bob Game of Thrones 5

Bob True Detective 4

Bob Vikings 5

Alice Game of Thrones 1

Alice True Detective 5

… …

5 5 5 3

5 4 5

1 5 4

3 5 5

Alex

Bob

Alice

Barbara

Game of Thrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

Page 13: Gl conference2014 recsys_and_text_chris

Alex

Bob

Alice

Barbara

Game of Thrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

5 5 5 3

5 4 5

1 5 4

3 5 5

Page 14: Gl conference2014 recsys_and_text_chris

5 5 5 3

5 4 5

1 5 4

3 5 5

Game of Thrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

Alex

Bob

Alice

Barbara

Model parameters

Page 15: Gl conference2014 recsys_and_text_chris

5 5 5 3

5 4 5

1 5 4

3 5 5

HBO peopleGame of T

hrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

Alex

Bob

Alice

Barbara

Page 16: Gl conference2014 recsys_and_text_chris

5 5 5 3

5 4 5

1 5 4

3 5 5

HBO peopleViolent historical

Game of Thrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

Alex

Bob

Alice

Barbara

Page 17: Gl conference2014 recsys_and_text_chris

5 5 5 3

5 4 5

1 5 4

3 5 5

HBO peopleViolent historicalKevin Spacey fans

Game of Thrones

Vikings

House of Cards

True Detectiv

e

Usual Suspects

Alex

Bob

Alice

Barbara

Page 18: Gl conference2014 recsys_and_text_chris

Matrix factorization: Extensible

Side features factorization_machine

Ranking unobserved_rating

Overfitting regularization

Page 19: Gl conference2014 recsys_and_text_chris

from graphlab import recommender recommender.create(data, method=‘matrix_factorization’, n_factors=20)

Page 20: Gl conference2014 recsys_and_text_chris

Demo!

Page 21: Gl conference2014 recsys_and_text_chris

Text analytics

Page 22: Gl conference2014 recsys_and_text_chris

Text• Data often has free-form text • Reviews of movies, restaurants, etc. • Email, tweets, etc.

• Hard to include in automated analysis • Hand-crafted features are not ideal

Page 23: Gl conference2014 recsys_and_text_chris

Tools for common tasks• SFrames help with typical cleaning tasks • Method for computing “bag-of-words” • TF-IDF: discount common words • Topic modeling • More to come!

Page 24: Gl conference2014 recsys_and_text_chris

The burrito was terrible. I…

Sometimes sushi here …

The waiters never came until…

When you need gyoza, you…

My favorite place ever! You…

Topic Models• Statistical model of text that assumes a

document collection can be explained by a small set of topics.

Page 25: Gl conference2014 recsys_and_text_chris

Topic Models• Statistical model of text that assumes a

document collection can be explained by a small set of topics.

Terrible

AwfulNever

WorstDisgusting

Chips

BurritoSalsa

TacoGuacamole

Soy

SushiGyoza

WasabiNigiri

The burrito was terrible. I…

Sometimes sushi here …

The waiters never came until…

When you need gyoza, you…

My favorite place ever! You…

Page 26: Gl conference2014 recsys_and_text_chris

Demo

Page 27: Gl conference2014 recsys_and_text_chris

Create scalable data products fast in Python !Got questions? Join our community at graphlab.com

Questions?