Download - Recommender Systems, Matrices and Graphs

Transcript
Page 1: Recommender Systems, Matrices and Graphs

Recommender Systems, MaTRICES

and Graphs

Roelof Pieters [email protected]

14 May 2014 @ KTH

Page 2: Recommender Systems, Matrices and Graphs

About meInterests in: • IR, RecSys, Big Data, ML, NLP, SNA,

Graphs, CV, Data Visualization, Discourse Analysis

History: • 2002-2006: almost-BA Computer Science @

Amsterdam Tech Uni (dropped out in 2006) • 2006-2010: BA Cultural Anthropology @

Leiden & Amsterdam Uni’s • 2010-2012: MA Social Anthropology @

Stockholm Uni • 2011-Current: Working @ Vionlabs

se.linkedin.com/in/roelofpieters/ [email protected]

Page 3: Recommender Systems, Matrices and Graphs

Say Hello! St: Eriksgatan 63

112 33 Stockholm - Sweden Email: [email protected]

Tech company here in Stockholm with Geeks and Movie lovers… Since 2009:

• Digital ecosystems for network operators, cable TV companies, and film distributor such as Tele2/Comviq, Cyberia, and Warner Bros

• Various software and hardware hacks for different companies: Webbstory, Excito, Spotify, Samsung

Focus since 2012: • Movie and TV recommendation

service FoorSee

Page 4: Recommender Systems, Matrices and Graphs

WE LOVE MOVIES….

Page 5: Recommender Systems, Matrices and Graphs
Page 6: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 7: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Taxonomy

•History

•Evaluating Recommenders

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 8: Recommender Systems, Matrices and Graphs
Page 9: Recommender Systems, Matrices and Graphs

Information Retrieval• Recommender

Systems as part of Information Retrieval

Document(s)Document(s)Document(s)Document(s)Document(s)

Retrieval

USER

Query

• Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.

Page 10: Recommender Systems, Matrices and Graphs

IR: Measure Success

• Recall: success in retrieving all correct documents

• Precision: success in retrieving the most relevant documents

• Given a set of terms and a set of document terms select only the most relevant documents (precision), and preferably all the relevant ones (recall)

Page 11: Recommender Systems, Matrices and Graphs

“generate meaningful recommendations to a (collection of) user(s) for items or products that

might interest them”

Recommender Systems

Page 12: Recommender Systems, Matrices and Graphs

Where can RS be found?• Movie recommendation (Netflix) • Related product recommendation (Amazon) • Web page ranking (Google) • Social recommendation (Facebook) • News content recommendation (Yahoo) • Priority inbox & spam filtering (Google) • Online dating (OK Cupid) • Computational Advertising (Yahoo)

Page 13: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Taxonomy

•History

•Evaluating Recommenders

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 14: Recommender Systems, Matrices and Graphs

Taxonomy of RS

• Collaborative Filtering (CF)

• Content Based Filtering (CBF)

• Knowledge Based Filtering (KBF)

• Hybrid

Page 15: Recommender Systems, Matrices and Graphs

Taxonomy of RS

• Collaborative Filtering (CF)!

• Content Based Filtering (CBF)

• Knowledge Based Filtering (KBF)

• Hybrid

Page 16: Recommender Systems, Matrices and Graphs

Collaborative Filtering:• relies on past user behavior

• Implicit feedback

• Explicit feedback

• requires no gathering of external data

• sparse data

• domain free

• cold start problem

16

Page 17: Recommender Systems, Matrices and Graphs

Collaborative

(Dietmar et. al. At ‘AI 2011)

User based Collaborative Filtering

Page 18: Recommender Systems, Matrices and Graphs

User based Collaborative Filtering

Page 19: Recommender Systems, Matrices and Graphs

Taxonomy of RS

• Collaborative Filtering (CF)

• Content Based Filtering (CBF)!

• Knowledge Based Filtering (KBF)

• Hybrid

Page 20: Recommender Systems, Matrices and Graphs

Content Filtering• creates profile for user/movie

• requires gathering external data

• dense data

• domain-bounded

• no cold start problem

20

Page 21: Recommender Systems, Matrices and Graphs

Content based

(Dietmar et. al. At ‘AI 2013)

Item based Collaborative Filtering

Page 22: Recommender Systems, Matrices and Graphs

Item based Collaborative Filtering

Page 23: Recommender Systems, Matrices and Graphs

Taxonomy of RS

• Collaborative Filtering (CF)

• Content Based Filtering (CBF)

• Knowledge Based Filtering (KBF)!

• Hybrid

Page 24: Recommender Systems, Matrices and Graphs

Knowledge based

(Dietmar et. al. At ‘AI 2013)

Knowledge based Content Filtering

Page 25: Recommender Systems, Matrices and Graphs

Knowledge based Content Filtering

Page 26: Recommender Systems, Matrices and Graphs

Knowledge based Content Filtering

Page 27: Recommender Systems, Matrices and Graphs

Taxonomy of RS

• Collaborative Filtering (CF)

• Content Based Filtering (CBF)

• Knowledge Based Filtering (KBF)

• Hybrid

Page 28: Recommender Systems, Matrices and Graphs

Hybrid

(Dietmar et. al. At ‘AI 2013)

Page 29: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Taxonomy

•History

•Evaluating Recommenders

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 30: Recommender Systems, Matrices and Graphs

History

• 1992-1995: Manual Collaborative Filtering

• 1994-2000: Automatic Collaborative Filtering + Content

• 2000+: Commercialization…

Page 31: Recommender Systems, Matrices and Graphs

TQL:

Tapestry (1992)

(Golberg et. al 1992)

Page 32: Recommender Systems, Matrices and Graphs

Grouplens (1994)

(Resnick et. al 1994)

Page 33: Recommender Systems, Matrices and Graphs

2000+: Commercial CF’s• 2001: Amazon starts using item based collaborative

filtering (Patent filed at 1998)

• 2000: Pandora starts music genomeproject, where each song“is analyzed using up to 450 distinct musical characteristics by a trained music analyst.”

• 2006-2009: Netflix Contents: 2 of many algorithms put in use by Netflix replacing “Cinematch": Matrix Factorization (SVD) and Restricted Boltzmann Machines (RBM)

(http://www.pandora.com/about/mgp)

(http://www.netflixprize.com)

Page 34: Recommender Systems, Matrices and Graphs

Annual Conferences

• RecSys (since 2007) http://recsys.acm.org

• SIGIR (since 1978) http://sigir.org/

• KDD (official since 1998) http://www.kdd.org/

• KDD Cup

Page 35: Recommender Systems, Matrices and Graphs

Ongoing Discussion• Evaluation • Scalability • Similarity versus Diversity • Cold start (items + users) • Fraud • Imbalanced dataset or Sparsity • Personalization • Filter Bubbles • Privacy • Data Collection

Page 36: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Taxonomy

•History

•Evaluating Recommenders

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 37: Recommender Systems, Matrices and Graphs

Evaluating Recommenders• Least mean squares prediction error

• RMSE

• Similarity measure enough ?

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 38: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 39: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 40: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 41: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 42: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 43: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 44: Recommender Systems, Matrices and Graphs

Evaluating Recommenders

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 45: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 46: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Content based Algorithms *

•Collaborative Algorithms *

•Classification

•Rating/Ranking *

•Graphs

(* math magicians better pay attention here)

Page 47: Recommender Systems, Matrices and Graphs

• content is exploited (item to item filtering)

• content model:

• keywords (ie TF-IDF)

• similarity/distance measures:

• Euclidean distance:

• L1 and L2-norm

• Jaccard distance

Content-based Filtering

• (adjusted) Cosine distance

• Edit distance

• Hamming distance

Page 48: Recommender Systems, Matrices and Graphs

• similarity/distance measures:

• Euclidean distance

• Jaccard distance

• Cosine distance

Content-based Filtering

dot product x.y is 1 × 2 + 2 × 1 + (−1) × 1 = 3

x = [1,2, −1] and = [2,1,1].

L2-norm =√12 + 22 + (−1)2 = 6

ie:

Page 49: Recommender Systems, Matrices and Graphs

• similarity/distance measures:

• Euclidean distance

• Jaccard distance

• Cosine distance

Content-based Filtering

dot product x.y is 1 × 2 + 2 × 1 + (−1) × 1 = 3

x = [1,2, −1] and = [2,1,1].

cosine of angle: 3/(√6√6) =1/2cos distance of 1/2: 60 degrees,

L2-norm =√12 + 22 + (−1)2 = 6

ie:

Page 50: Recommender Systems, Matrices and Graphs

Examples

• Item to Query

• Item to Item

• Item to User

Page 51: Recommender Systems, Matrices and Graphs

Examples

• Item to Query!

• Item to Item

• Item to User

Page 52: Recommender Systems, Matrices and Graphs

Example: Item to QueryTitle Price Genre Rating

The Avengers 5 Action 3,7

Spiderman II 10 Action 4,5

user query q : “price (6) AND genre(Adventure) AND rating (4)”

weights of features: 0.22 0.450.33

Sim(q,”The Avengers”) = 0.22 x (1 - 1/25) + 0.33 x 0 + 0.45 x (1 - 0.3/5) = 0.6342

1-25 price range no matchdiff of 1 diff of 0.3 0-5 rating range

Sim(q,”Spiderman II”) = 0.5898 (0.6348 if we count rating 4.5 > 4 as match)

Weighted Sum:

Page 53: Recommender Systems, Matrices and Graphs

Examples

• Item to Query

• Item to Item!

• Item to User

Page 54: Recommender Systems, Matrices and Graphs

Example: Item to Item Similarity

Title ReleaseTime Genres Actors Rating

TA 90s, start 90s, 1993 Action, Comedy, Romance X,Y,Z 3,7

S2 90s, start 90s, 1991 Action W,X,Z 4,5

numericArray of Booleans

Sim(X,Y) = 1 - d(X,Y) or

Sim(X,Y) = exp(- d(X,Y))

where 0 ≤ wi ≤ 1, and i=1..n (number of features).

Set of hierarchical related symbols

Page 55: Recommender Systems, Matrices and Graphs

Title ReleaseTime Genres Actors Rating

TA 90s, start 90s, 1993 Action, Comedy, Romance X,Y,Z 3,7

S2 90s, start 90s, 1991 Action W,X,Z 4,5

numericArray of Booleans Set of hierarchical related symbols

X1 = (90s,S90s,1993)

X2 = (1,1,1)

X3 = (0,1,1,1)

X4 = 3.7

TA

W 0.5 0.3 0.2

X1 = (90s,S90s,1991)

X2 = (1,0,0)

X3 = (1,1,0,1)

X4 = 4.5

S2

weights of feature all the same

weights of categories within “Release time” different

Example: Item to Item Similarity

Page 56: Recommender Systems, Matrices and Graphs

X1 = (90s,S90s,1993)

X2 = (1,1,1)

X3 = (0,1,1,1)

X4 = 3.7

TA

W 0.5 0.3 0.2

X1 = (90s,S90s,1991)

X2 = (1,0,0)

X3 = (1,1,0,1)

X4 = 4.5

S2

exp(- (1/√4) √d1(X1,Y1)2 +…+d4(X4,Y4 )2 ) =

exp(- )

exp(-(1/√4) √(1-(0.3+0.5))2 + (1-1/3)2 +(1-2/4)2 + (1-0.8/5)2 ) =exp(-(1/√4) √(1.5745 ) = exp(-0.339) = 0.534

Sim( dest1,dest2 ) =

Example: Item to Item Similarity

Page 57: Recommender Systems, Matrices and Graphs

(content factors)

Page 58: Recommender Systems, Matrices and Graphs

Examples

• Item to Query

• Item to Item

• Item to User

Page 59: Recommender Systems, Matrices and Graphs

Example: Item to User

Title Roelof Klas Mo Max X(Action)

X(

)The Avengers 5 1 2 5 0.8 0.1

Spiderman II ? 2 1 ? 0.9 0.2

American Pie 2 5 ? 1 0.05 0.9

X(1) =1

0.80.1

For each user u, learn a parameter �����∈ R(n+1).Predict user u as rating movie i with (����)Tx(i)

Page 60: Recommender Systems, Matrices and Graphs

Title Roelof Klas Mo Max X(Action)

X(

)The Avengers 5 1 2 5 0.8 0.1

Spiderman II ? 2 1 ? 0.9 0.2

American Pie 2 5 ? 1 0.05 0.9

Mo (�(3)) and Klas (�(2))

predict rating Mo (�(3)), American pie (X(3))

�(2) �(3)�(1) �(4)

X(1)

X(2)

X(3)

X(3) =1

0.050.9

temp �(3) =

005

Example: Item to User

Page 61: Recommender Systems, Matrices and Graphs

Title Roelof Klas Mo Max X(Action)

X(

)The Avengers 5 1 2 5 0.8 0.1

Spiderman II ? 2 1 ? 0.9 0.2

American Pie 2 5 ? 1 0.05 0.9

Mo (�(3)) and Klas (�(2))

predict rating Mo (�(3)), American pie (X(3))

�(2) �(3)�(1) �(4)

X(1)

X(2)

X(3)

10.050.9

005

Example: Item to User

dot product

≈ 4.5

Page 62: Recommender Systems, Matrices and Graphs

Title Roelof Klas Mo Max X(Action)

X(

)The Avengers 5 1 2 5 0.8 0.1

Spiderman II ? 2 1 ? 0.9 0.2

American Pie 2 5 4.5 1 0.05 0.9

Mo (�(3)) and Klas (�(2))

predict rating Mo (�(3)), American pie (X(3))

�(2) �(3)�(1) �(4)

X(1)

X(2)

X(3)

10.050.9

005

Example: Item to User

dot product

≈ 4.5

Page 63: Recommender Systems, Matrices and Graphs

Title Roelof Klas Mo Max X(Action)

X(

)The Avengers 5 1 2 5 0.8 0.1

Spiderman II ≈4 2 1 ≈4 0.9 0.2

American Pie 2 5 4.5 1 0.05 0.9

How do we learn these user factor parameters?

�(2) �(3)�(1) �(4)

X(1)

X(2)

X(3)

Example: Item to User

Page 64: Recommender Systems, Matrices and Graphs

problem formulation:!

• r(i,u) = 1 if user u has rated movie i, otherwise 0

• y(i,u) = rating by user u on movie i (if defined)

• �(u) = parameter vector for user u

• x(i) = feature vector for movie i

• For user u, movie i, predicted rating: (����

)T(x(i))

• temp m(u) = # of movies rated by user u

min ∑ ( (�(u))T!(i) - "(i,u) )2 + ∑ (�����)2

ƛ——2

#

k=1(u)

�(u)

12——

������������ m(u)m(u)

Example: Item to User

Say what?• learning �(u) =

(A. Ng. 2013)

Page 65: Recommender Systems, Matrices and Graphs

min ∑ ∑ ((�(u))T!(i) - "(i,u))2 + ∑ ∑ (�����)2ƛ—2

#

u=1

problem formulation:!

• learning �(u):

• learning �(1), �(2) , … , ��#�� :

����� ����#���

12—

min ∑ ( (�(u))T!(i) - "(i,u) )2 + ∑ (����)2

ƛ—2

#

k=1(u)

�(u)

12—�������������

���������������

#u

k=1(u)

regularization term

#�

squared error term

actualpredicted

learn for “all” users

Example: Item to Userremember:y = rating ��� parameter vector for a userx = feature vector for a movie

Page 66: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Content based Algorithms *

•Collaborative Algorithms *

•Classification

•Rating/Ranking *

•Graphs

(* math magicians better pay attention here)

Page 67: Recommender Systems, Matrices and Graphs

Collaborative Filtering:• User-based approach!

• Find a set of users Si who rated item j, that are most similar to ui

• compute predicted Vij score as a function of ratings of item j given by Si (usually weighted linear combination)

• Item-based approach!

• Find a set of most similar items Sj to the item j which were rated by ui

• compute predicted Vij score as a function of ui's ratings for Sj

Page 68: Recommender Systems, Matrices and Graphs

Collaborative Filtering:• Two primary models:

• Neighborhood models!

• focus on relationships between movies or users

• Latent Factor models

• focus on factors inferred from (rating) patterns

• computerized alternative to naive content creation

• predicts rating by dot product of user and movie locations on known dimensions

68

(Sarwar, B. et al. 2001)

Page 69: Recommender Systems, Matrices and Graphs

Neighborhood (user oriented)

69

(pic from Koren et al. 2009)

Page 70: Recommender Systems, Matrices and Graphs

Neighbourhood Methods• Problems:

• Ratings biased per user • Ratings biased towards certain items • Ratings change over time • Ratings can rapidly change through real time

events (Oscar nomination, etc) • Bias correction needed

Page 71: Recommender Systems, Matrices and Graphs

Latent Factors

71

• latent factor models map users and items into a latent feature space

• user's feature vector denotes the user's affinity to each of the features

• item's feature vector represents how much the item itself is related to the features.

• rating is approximated by the dot product of the user feature vector and the item feature vector.

Page 72: Recommender Systems, Matrices and Graphs

Latent Factors (users+movies)

72

(pic from Koren et al. 2009)

Page 73: Recommender Systems, Matrices and Graphs

Latent Factors (x+y)

73

(http://xkcd.com/388/)

xkcd.com

Page 74: Recommender Systems, Matrices and Graphs

Latent Factor models• Matrix Factorization:

• characterizes items + users by vectors of factors inferred from (ratings or other user-item related) patterns

• Given a list of users and items, and user-item interactions, predict user behavior

• can deal with sparse data (matrix)

• can incorporate additional information74

Page 75: Recommender Systems, Matrices and Graphs

Matrix Factorization

• Dimensionality reduction

• Principal Components Analysis, PCA

• Singular Value Decomposition, SVD

• Non Negative Matrix Factorization, NNMF

Page 76: Recommender Systems, Matrices and Graphs

Matrix Factorization: SVDSVD, Singular Value Decomposition

• transforms correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items.

• identifies and orders the dimensions along which data points exhibit the most variation.

• allowing us to find the best approximation of the original data points using fewer dimensions.

Page 77: Recommender Systems, Matrices and Graphs

SVD: Matrix Decomposition

77

U: document-to-concept similarity matrix !V: term-to-concept similarity matrix !ƛ : its diagonal elements: ‘strength’ of each concept !

(pic by Xavier Amatriain 2013)

Page 78: Recommender Systems, Matrices and Graphs

SVD for Collaborative Filtering

each item i associated with vector qi ∈ ℝf each user u associated with vector pu ∈ ℝf qi measures extent to which item possesses factors pu measures extent of interest for user in items which possess high on factors user-item interactions modeled as dot products within the factor space, measured by qiT pu user u rating on item i approximates: rui = qiT pu

78

^

Page 79: Recommender Systems, Matrices and Graphs

SVD for Collaborative Filtering

• compute u,i mappings: qi,pu ∈ ℝf

• factor user, item matrix

• imputation (Sarwar et.al. 2000)

• model only observed ratings + regularization (Funk 2006; Koren 2008)

• learn factor vectors qi and pu by minimizing (regularized) squared error on set of known ratings: approximate user u rating of item i, denoted by rui, leading to Learning Algorithm:

79

^

Page 80: Recommender Systems, Matrices and Graphs

SVD Visualized

regression line reducing two dimensional space into one dimensional one

Page 81: Recommender Systems, Matrices and Graphs

reducing three dimensional (multidimensional) space into two dimensional plane

SVD Visualized

Page 82: Recommender Systems, Matrices and Graphs

SVD: Code Away!

<Coding Time>

82

Page 83: Recommender Systems, Matrices and Graphs

Stochastic Gradient Descent

• optimizable by Stochastic Gradient Descent (SGD) (Funk 2006)

• incremental learning

• loops trough ratings and computes prediction error for predicted rating on rui :

• modify parameters by magnitude proportional to y in opposite direction of the gradient, giving learning rule:

83

and

Page 84: Recommender Systems, Matrices and Graphs

Gradient Descent

<Coding Time>

84

Page 85: Recommender Systems, Matrices and Graphs

Alternating Least Squares

• optimizable by Alternating Least Squares (ALS) (2006)

• both qi and pu unknown: minimum function not convex—> can not be solved for a minimum.

• ALS rotates between fixing qi’s and pu’s

• Fix qi or pu makes optimization problem quadratic —> one not optimized can now be solved

• qi and pu independently computed of other item/user factors: parallelization

• Best for implicit data (dense matrix)85

Page 86: Recommender Systems, Matrices and Graphs

Alternating Least Squares• rotates between fixing qi’s and pu’s

• when all pu’s fixed, recompute qi’s by solving a least squares problem:

• Fix matrix P as some matrix P, so that minimization problem:

• or fix Q similarly as:

• Learning Rule:

86

where

and

Page 87: Recommender Systems, Matrices and Graphs

• Add Biases:

• Add Input Sources: Implicit Feedback:pu in rui becomes (pu + + (…) )Add Temporal Aspect / time-varying parameters

• Vary Confidence Levels of Inputs

Develop Further…

87

and

pic: Lei Guo 2012

(Salakhutdinov & Mnih 2008; Koren 2010)

Page 88: Recommender Systems, Matrices and Graphs

Develop Further…

• Final Algoritm:

88

confidence bias terms

regularization

(Paterek,A. 2007)

Page 89: Recommender Systems, Matrices and Graphs

• Final Algorithm with Temporal dimensions:

Develop Further…

89

Page 90: Recommender Systems, Matrices and Graphs

• So what if we don’t have any content factors known?

• Probabilistic Matrix Factorization to the rescue!

• describe each user and each movie by a small set of attributes

Page 91: Recommender Systems, Matrices and Graphs

Probabilistic Matrix Factorization

• Imagine we have the following rating data: we could say that Roelof and Klas like Action movies, but don’t like Comedy’s, while its the opposite for Mo and Max

Title Roelof Klas Mo Max

The Avengers 5 1 1 4

Spiderman II 4 2 1 5

American Pie 3 5 4 1

Shrek 1 4 5 2

Page 92: Recommender Systems, Matrices and Graphs

Probabilistic Matrix Factorization

• This could be represented by the PMF model by using three dimensional vectors to describe each user and each movie.

• example latent vectors: • AV: [0, 0.3]

• SPII: [1, 0.3]

• AP [1, 0.3]

• SH [1, 0.3]

• Roelof: [0, 3]

• Klas: [8, 3]

• Mo [10, 3]

• Max [10, 3]

• predict rating by dot product of user vector with the item vector

• So predicting Klas’ rating for Spiderman II = 8*1 + 3*0.3 =

• But descriptions of users and movies not known ahead of time.

• PGM discovers such latent characteristics

Page 93: Recommender Systems, Matrices and Graphs

<CODE TIME>

ratings

Probabilistic Matrix Factorization

Page 94: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Content based Algorithms *

•Collaborative Algorithms *

•Classification

•Rating/Ranking *

•Graphs

(* math magicians better pay attention here)

Page 95: Recommender Systems, Matrices and Graphs

Classification• k-Nearest Neighbors (KNN)

• Decision Trees

• Rule-Based

• Bayesian

• Artificial Neural Networks

• Support Vector Machines

Page 96: Recommender Systems, Matrices and Graphs

Classification• k-Nearest Neighbors (KNN)!

• Decision Trees

• Rule-Based

• Bayesian

• Artificial Neural Networks

• Support Vector Machines

Page 97: Recommender Systems, Matrices and Graphs

k-Nearest Neighbor s

• non parametric lazy learning algorithm

• data as feature space

• simple and fast

• k-nn classification

• k-nn regression: density estimation

Page 98: Recommender Systems, Matrices and Graphs

kNN: Classification

• Classify

• several Xi used to classify Y

• compare (X1p,X2p) and (X1q,Xq) by Squared Euclidean distance: d2pq = (X1p - x1q)2 + (X2p - X2q)2

• find k-Nearest Neighbors

Page 99: Recommender Systems, Matrices and Graphs

kNN: Classification• input: content extracted emotional values of 561

movies. thanks: Johannes Östling :) ie:

dimensions of movie

“Hamlet”:

Page 100: Recommender Systems, Matrices and Graphs

KNN

<CODE>

Page 101: Recommender Systems, Matrices and Graphs

k-Nearest Neighborsemotional dimension “Anger” vs “Love”

Page 102: Recommender Systems, Matrices and Graphs

k-Nearest Neighbors

Negative: afraid, confused, helpless', hurt, sad, angry, depressed

Positive: good, interested, love, positive, strong

aggregate of positive and negative emotions

Page 103: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Content based Algorithms *

•Collaborative Algorithms *

•Classification

•Rating/Ranking *

•Graphs

(* math magicians better pay attention here)

Page 104: Recommender Systems, Matrices and Graphs

Rating predictions:• Pos — Neg

• Average

• Bayesian (Weighted) Estimates

• Lower bound of Wilson score confidence interval for a Bernoulli parameter

Page 105: Recommender Systems, Matrices and Graphs

Rating predictions:• Pos — Neg!

• Average

• Bayesian (Weighted) Estimates

• Lower bound of Wilson score confidence interval for a Bernoulli parameter

Page 106: Recommender Systems, Matrices and Graphs

P — N• (Positive ratings) - (Negative ratings)

• Problematic:

(http://www.urbandictionary.com/define.php?term=movies)

Page 107: Recommender Systems, Matrices and Graphs

Rating predictions:• Pos — Neg

• Average!

• Bayesian (Weighted) Estimates

• Lower bound of Wilson score confidence interval for a Bernoulli parameter

Page 108: Recommender Systems, Matrices and Graphs

Average• (Positive ratings) / (Total ratings)

• Problematic:

(http://www.amazon.co.uk/gp/bestsellers/electronics/)

Page 109: Recommender Systems, Matrices and Graphs

Rating predictions:• Pos — Neg

• Average

• Bayesian (Weighted) Estimates!

• Lower bound of Wilson score confidence interval for a Bernoulli parameter

Page 110: Recommender Systems, Matrices and Graphs

Ratings• Top Ranking at IMDB (gives Bayesian estimate):

• Weighted Rating (WR) = (v / (v+m)) × R + (m / (v+m)) × C!

• Where:

R = average for the movie (mean) = (Rating)v = number of votes for the movie = (votes)m = minimum votes required to be listed in the Top 250 (currently 25000)C = the mean vote across the whole report (currently 7.0)

Page 111: Recommender Systems, Matrices and Graphs

Bayesian (Weighted) Estimates

• :

• weighted average on a per-item basis:

(source(s): http://www.imdb.com/title/tt0368891/ratings)

Page 112: Recommender Systems, Matrices and Graphs

Bayesian (Weighted) Estimates @ IMDB

Bayesian Weights for m = 1250

0"

0,1"

0,2"

0,3"

0,4"

0,5"

0,6"

0,7"

0,8"

0,9"

1"

0" 250" 500" 750" 1000" 1250" 2000" 3000" 4000" 5000"

specific" global"

• specific part for individual items

• global part is constant over all items • can be

precalculated

Page 113: Recommender Systems, Matrices and Graphs

m=1250

Page 114: Recommender Systems, Matrices and Graphs

Rating predictions:• Pos — Neg

• Average

• Bayesian (Weighted) Estimates

• Lower bound of Wilson score confidence interval for a Bernoulli parameter

Page 115: Recommender Systems, Matrices and Graphs

Wilson Score interval

• 1927 by Edwin B. Wilson

• Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what?

Page 116: Recommender Systems, Matrices and Graphs

Wilson Score interval• used by Reddit for comments ranking

• “rank the best comments highest regardless of their submission time”

• algorithm introduced to Reddit by Randall Munroe (the author of xkcd).

• treats the vote count as a statistical sampling of a hypothetical full vote by everyone, much as in an opinion poll.

Page 117: Recommender Systems, Matrices and Graphs

Wilson Score interval• Endpoints for Wilson Score interval:

• Reddit’s comment Ranking function (phat+z*z/(2*n) - z*sqrt((phat*(1-phat) + z*z/(4*n))/n))

/(1+z*z/n)

Page 118: Recommender Systems, Matrices and Graphs

CODE

Page 119: Recommender Systems, Matrices and Graphs

CODE

Page 120: Recommender Systems, Matrices and Graphs

Bernoulli anyone?

*as the trial (N) = 2 (2 throws of dice) its actually

not a real Bernoulli distribution

Page 121: Recommender Systems, Matrices and Graphs

What’s next?

GRAPHS

Page 122: Recommender Systems, Matrices and Graphs

Outline•Recommender Systems

•Algorithms*

•Graphs

(* math magicians better pay attention here)

Page 123: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?!

• Why Graphs?

• Who uses Graphs?

• Talking with Graphs

• Graph example: Recommendations

• Graph example: Data Analysis

Page 124: Recommender Systems, Matrices and Graphs

What’s a Graph?

124

Movie

has_genre

Genre

features_actor

Actor

Director

directed_bylik

es

User

wat

ches

rate

s

Userlikes_user

likes_userfriends

follows

comm

ents_movie

Comment

likes

_commen

t

likes_actor

has_

X

etceteralocations!

time!moods!

keywords!…

Vertices (Nodes)

Edges (Relations)

Page 125: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?

• Why Graphs?!

• Who uses Graphs?

• Talking with Graphs

• Graph example: Recommendations

• Graph example: Data Analysis

Page 126: Recommender Systems, Matrices and Graphs

Why Graphs?

• more complex (social networks…)

• more connected (wikis, pingbacks, rdf, collaborative tagging)

• more semi-structured (wikis, rss)

• more decentralized: democratization of content production (blogs, twitter*, social media*)

and just: MORE

Its the nature of todays Data, which is getting:

Page 127: Recommender Systems, Matrices and Graphs

Data Trend“Every 2 days we create as much information as we did up to 2003”— Eric Schmidt, Google

Why Graphs?

Page 128: Recommender Systems, Matrices and Graphs

Graphs vs Relational

128

relational

graph

graph

(pic by Michael Hunger, neo4j)

Why Graphs?Its Fast!

Matrix based Calculations: Exponential run-time

(items x users x factori x …)

Page 129: Recommender Systems, Matrices and Graphs

Graphs vs Relational

129

relational

graph

graph

(pic by Michael Hunger, neo4j)

Why Graphs?Its Fast!

Graph based Calculations: Linear/Constant run-time (item of interest x relations)

Page 130: Recommender Systems, Matrices and Graphs

Its White-Board

Friendly !

(pic by Michael Hunger, neo4j)

Why Graphs?

Page 131: Recommender Systems, Matrices and Graphs

(pic by Michael Hunger, neo4j)

Its White-Board

Friendly !

Why Graphs?

Page 132: Recommender Systems, Matrices and Graphs

(pic by Michael Hunger, neo4j)

Its White-Board

Friendly !

Why Graphs?

Page 133: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?

• Why Graphs?

• Who uses Graphs?!

• Talking with Graphs

• Graph example: Recommendations

• Graph example: Data Analysis

Page 134: Recommender Systems, Matrices and Graphs

Who uses Graphs?• Facebook: Open Graph (https://

developers.facebook.com/docs/opengraph)

• Google: Knowledge Graph (http://www.google.com/insidesearch/features/search/knowledge.html)

• Twitter: FlockDB (https://github.com/twitter/flockdb)

• Mozilla: Pancake (https://wiki.mozilla.org/Pancake)

• (…)

Page 135: Recommender Systems, Matrices and Graphs

135(pic by Michael Hunger, neo4j)

Page 136: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?

• Why Graphs?

• Who uses Graphs?

• Talking with Graphs!

• Graph example: Recommendations

• Graph example: Data Analysis

Page 137: Recommender Systems, Matrices and Graphs

Talking with Graphs

• Graphs can be queried!

• no unions for comparison, but traversals!

• many different graph traversal patterns

(xkcd)

Page 138: Recommender Systems, Matrices and Graphs

graph traversal patterns

• traversals can be seen as a diffusion proces over a graph!

• “Energy” moves over a graph and spreads out through the network!

• energy:

(Ghahramani 2012)

Page 139: Recommender Systems, Matrices and Graphs

Energy Diffusion

(pic by Marko A. Rodriguez, 2011)

Page 140: Recommender Systems, Matrices and Graphs

Energy Diffusion

(pic by Marko A. Rodriguez, 2011)

energy = 4

Page 141: Recommender Systems, Matrices and Graphs

Energy Diffusion

(pic by Marko A. Rodriguez, 2011)

energy = 3

Page 142: Recommender Systems, Matrices and Graphs

Energy Diffusion

(pic by Marko A. Rodriguez, 2011)

energy = 2

Page 143: Recommender Systems, Matrices and Graphs

Energy Diffusion

(pic by Marko A. Rodriguez, 2011)

energy = 1

Page 144: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?

• Why Graphs?

• Who uses Graphs?

• Talking with Graphs

• Graph example: Recommendations!

• Graph example: Data Analysis

Page 145: Recommender Systems, Matrices and Graphs

Diffusion Example: Recommendations

• Energy diffusion is an easy algorithms for making recommendations!

• different paths make different recommendations!

• different paths for different problems can be solved on same graph/domain!

• recommendation = “jumps” through the data

Page 146: Recommender Systems, Matrices and Graphs

Friend Recommendation

• Who are my friends’ friends that are not me or my friends

(pic by Marko A. Rodriguez, 2011)

Page 147: Recommender Systems, Matrices and Graphs

Friend Recommendation

• Who are my friends’ friends

• Who are my friends’ friends that are not me or my friends

G.V(‘me’).outE[knows].inV.outE.inV

G.V(‘me’).outE[knows].inV.aggregate(x).outE. inV{!x.contains(it)}

Page 148: Recommender Systems, Matrices and Graphs

Product Recommendation

• Who likes what I like —> of these things, what do they like which I dont’ already like

(pic by Marko A. Rodriguez, 2011)

Page 149: Recommender Systems, Matrices and Graphs

Product Recommendation

• Who likes what I like

• Who likes what I like —> of these things, what do they like which I dont’ already like

• Who likes what I like —> of these things, what do they like which I dont’ already like

G.V(‘me’).outE[likes].inV.inE[likes].outV

G.V(‘me’).outE[likes].inV.aggregate(x).inE[likes]. outV.outE[like].inV{!x.contains(it)}

G.V(‘me’).outE[likes].inV.inE[likes].outV.outE[like].inV

Page 150: Recommender Systems, Matrices and Graphs

Recommendations atwith FoorSee

Page 151: Recommender Systems, Matrices and Graphs
Page 152: Recommender Systems, Matrices and Graphs

Graph Based Approaches• Whats a Graph?

• Why Graphs?

• Who uses Graphs?

• Talking with Graphs

• Graph example: Recommendations

• Graph example: Data Analysis

Page 153: Recommender Systems, Matrices and Graphs

154

Pulp Fiction

Page 154: Recommender Systems, Matrices and Graphs

Graphs: Conclusion

• Fast!

• Scalable!

• Diversification!

• No Cold Start!

• Sparsity/Density not applicable

Page 155: Recommender Systems, Matrices and Graphs

Graphs: Conclusion

• Natural Visualizable!

• Feedback / Understandable!

• Connectable to the “web” / semantic web!

• Social Network Analysis!

• Real Time Updates / Recommendations !

Page 156: Recommender Systems, Matrices and Graphs

WARNING

Graphs are

Addictive!

Page 157: Recommender Systems, Matrices and Graphs

Les Miserables

Page 158: Recommender Systems, Matrices and Graphs

Facebook Network

Page 159: Recommender Systems, Matrices and Graphs

References• J. Dietmar, G. Friedrich and M. Zanker (2011) “Recommender Systems”,

International Joint Conference on Artificial Intelligence Barcelona

• Z. Ghahramani (2012) “Graph-based Semi-supervised Learning”, MLSS, La Palma

• D. Goldbergs, D. Nichols, B.M. Oki and D. Terry (1992) “Using collaborative filtering to weave an information tapestry”, Communications of the ACM 35 (12)

• M. Hunger (2013) “Data Modeling with Neo4j”, http://www.slideshare.net/neo4j/data-modeling-with-neo4j-25767444

• S. Funk (2006) “Netflix Update: Try This at Home”, sifter.org/~simon/journal/20061211.html

159

Page 160: Recommender Systems, Matrices and Graphs

References• Y. Koren (2008) “Factorization meets the Neighborhood: A

Multifaceted Collaborative Filtering Model”, SIGKDD, http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf

• Y. Koren & C. Bell, (2007) “Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights”

• Y, Koren (2010) “Collaborative filtering with temporal dynamics”

• A. Ng. (2013) Machine Learning, ml-004 @ Coursera

• A. Paterek (2007) “Improving Regularized Singular Value Decomposition for Collaborative Filtering”, KDD

160

Page 161: Recommender Systems, Matrices and Graphs

References• P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994),

“GroupLens: An Open Architecture for Collaborative Filtering of Netnews”, Proceedings of ACM

• B.R. Sarwar et al. (2000) “Application of Dimensionality Reduction in Recommender System—A case Study”, WebKDD

• B. Saewar, G. Karypis, J. Konstan, J, Riedl (2001) “Item-Based Collaborative Filtering Recommendation Algorithms”

• R. Salakhutdinov & A. Mnih (2008) “Probabilistic Matrix Factorization”

• xkcd.com

161

Page 162: Recommender Systems, Matrices and Graphs

Take Away Points

• Focus on the best Question, not just the Answer…!

• Best Match (most similar) vs Most Popular!

• Personalized vs Global Factors!

• Less is More ?!

• What is relevant?

Page 163: Recommender Systems, Matrices and Graphs

Thanks for listening!

163

(xkcd)

Page 164: Recommender Systems, Matrices and Graphs

Say What?

• So what other stuff do we do at Vionlabs?

• Some examples of data extraction which is fed into our BAG (Big Ass Grap)…

Page 165: Recommender Systems, Matrices and Graphs
Page 166: Recommender Systems, Matrices and Graphs

Computer Vision

Page 167: Recommender Systems, Matrices and Graphs

NLTK

167