Recommender Systems, Matrices and Graphs

Click here to load reader

Embed Size (px)

description

talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.

Transcript of Recommender Systems, Matrices and Graphs

  • Recommender Systems, MaTRICES and Graphs Roelof Pieters [email protected] 14 May 2014 @ KTH
  • About me Interests in: IR, RecSys, Big Data, ML, NLP, SNA, Graphs, CV, Data Visualization, Discourse Analysis History: 2002-2006: almost-BA Computer Science @ Amsterdam Tech Uni (dropped out in 2006) 2006-2010: BA Cultural Anthropology @ Leiden & Amsterdam Unis 2010-2012: MA Social Anthropology @ Stockholm Uni 2011-Current: Working @ Vionlabs se.linkedin.com/in/roelofpieters/ [email protected]
  • Say Hello! St: Eriksgatan 63 112 33 Stockholm - Sweden Email: [email protected] Tech company here in Stockholm with Geeks and Movie lovers Since 2009: Digital ecosystems for network operators, cable TV companies, and lm distributor such as Tele2/Comviq, Cyberia, and Warner Bros Various software and hardware hacks for different companies: Webbstory, Excito, Spotify, Samsung Focus since 2012: Movie and TV recommendation service FoorSee
  • WE LOVE MOVIES.
  • Outline Recommender Systems Algorithms* Graphs (* math magicians better pay attention here)
  • Outline Recommender Systems Taxonomy History Evaluating Recommenders Algorithms* Graphs (* math magicians better pay attention here)
  • Information Retrieval Recommender Systems as part of Information Retrieval Document(s)Document(s)Document(s)Document(s)Document(s) Retrieval USER Query Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.
  • IR: Measure Success Recall: success in retrieving all correct documents Precision: success in retrieving the most relevant documents Given a set of terms and a set of document terms select only the most relevant documents (precision), and preferably all the relevant ones (recall)
  • generate meaningful recommendations to a (collection of) user(s) for items or products that might interest them Recommender Systems
  • Where can RS be found? Movie recommendation (Netix) Related product recommendation (Amazon) Web page ranking (Google) Social recommendation (Facebook) News content recommendation (Yahoo) Priority inbox & spam ltering (Google) Online dating (OK Cupid) Computational Advertising (Yahoo)
  • Outline Recommender Systems Taxonomy History Evaluating Recommenders Algorithms* Graphs (* math magicians better pay attention here)
  • Taxonomy of RS Collaborative Filtering (CF) Content Based Filtering (CBF) Knowledge Based Filtering (KBF) Hybrid
  • Taxonomy of RS Collaborative Filtering (CF)! Content Based Filtering (CBF) Knowledge Based Filtering (KBF) Hybrid
  • Collaborative Filtering: relies on past user behavior Implicit feedback Explicit feedback requires no gathering of external data sparse data domain free cold start problem 16
  • Collaborative (Dietmar et. al. At AI 2011) User based Collaborative Filtering
  • User based Collaborative Filtering
  • Taxonomy of RS Collaborative Filtering (CF) Content Based Filtering (CBF)! Knowledge Based Filtering (KBF) Hybrid
  • Content Filtering creates prole for user/movie requires gathering external data dense data domain-bounded no cold start problem 20
  • Content based (Dietmar et. al. At AI 2013) Item based Collaborative Filtering
  • Item based Collaborative Filtering
  • Taxonomy of RS Collaborative Filtering (CF) Content Based Filtering (CBF) Knowledge Based Filtering (KBF)! Hybrid
  • Knowledge based (Dietmar et. al. At AI 2013) Knowledge based Content Filtering
  • Knowledge based Content Filtering
  • Knowledge based Content Filtering
  • Taxonomy of RS Collaborative Filtering (CF) Content Based Filtering (CBF) Knowledge Based Filtering (KBF) Hybrid
  • Hybrid (Dietmar et. al. At AI 2013)
  • Outline Recommender Systems Taxonomy History Evaluating Recommenders Algorithms* Graphs (* math magicians better pay attention here)
  • History 1992-1995: Manual Collaborative Filtering 1994-2000: Automatic Collaborative Filtering + Content 2000+: Commercialization
  • TQL: Tapestry (1992) (Golberg et. al 1992)
  • Grouplens (1994) (Resnick et. al 1994)
  • 2000+: Commercial CFs 2001: Amazon starts using item based collaborative ltering (Patent led at 1998) 2000: Pandora starts music genome project, where each songis analyzed using up to 450 distinct musical characteristics by a trained music analyst. 2006-2009: Netix Contents: 2 of many algorithms put in use by Netix replacing Cinematch": Matrix Factorization (SVD) and Restricted Boltzmann Machines (RBM) (http://www.pandora.com/about/mgp) (http://www.netixprize.com)
  • Annual Conferences RecSys (since 2007) http://recsys.acm.org SIGIR (since 1978) http://sigir.org/ KDD (ofcial since 1998) http://www.kdd.org/ KDD Cup
  • Ongoing Discussion Evaluation Scalability Similarity versus Diversity Cold start (items + users) Fraud Imbalanced dataset or Sparsity Personalization Filter Bubbles Privacy Data Collection
  • Outline Recommender Systems Taxonomy History Evaluating Recommenders Algorithms* Graphs (* math magicians better pay attention here)
  • Evaluating Recommenders Least mean squares prediction error RMSE Similarity measure enough ? rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Evaluating Recommenders rmse(S) = s |S| 1 X (i,u)2S (rui rui)2
  • Outline Recommender Systems Algorithms* Graphs (* math magicians better pay attention here)
  • Outline Recommender Systems Algorithms* Content based Algorithms * Collaborative Algorithms * Classication Rating/Ranking * Graphs (* math magicians better pay attention here)
  • content is exploited (item to item ltering) content model: keywords (ie TF-IDF) similarity/distance measures: Euclidean distance: L1 and L2-norm Jaccard distance Content-based Filtering (adjusted) Cosine distance Edit distance Hamming distance
  • similarity/distance measures: Euclidean distance Jaccard distance Cosine distance Content-based Filtering dot product x.y is 1 2 + 2 1 + (1) 1 = 3 x = [1,2, 1] and = [2,1,1]. L2-norm = 12 + 22 + (1)2 = 6 ie:
  • similarity/distance measures: Euclidean distance Jaccard distance Cosine distance Content-based Filtering dot product x.y is 1 2 + 2 1 + (1) 1 = 3 x = [1,2, 1] and = [2,1,1]. cosine of angle: 3/(66) =1/2 cos distance of 1/2: 60 degrees, L2-norm = 12 + 22 + (1)2 = 6 ie:
  • Examples Item to Query Item to Item Item to User
  • Examples Item to Query! Item to Item Item to User