Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine...

44
Advances of Deep & Reinforcement Learning on Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net Jan. 06, 2020 at Tsinghua University

Transcript of Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine...

Page 1: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Advances ofDeep & Reinforcement Learning

on Recommender Systems

Weinan ZhangShanghai Jiao Tong University

http://wnzhang.net

Jan. 06, 2020 at Tsinghua University

Page 2: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Content• A brief review of recommender system road map

• Deep learning for recommender systems

• Deep reinforcement learning for recommender systems

• Summary

Page 3: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Road Map of Recommendation Technique

2000-2006 Neighborhood based collaborative filtering

2007-2009 Matrix factorization and variants

2010-2015 Factorization machine and variants

2015-2017 Deep neural networks for user behavior prediction

2017-2019 Deep reinforcement learning for decision making

Page 4: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Matrix Factorization Techniques

Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix factorization techniques for recommender systems." Computer 42.8 (2009).

r̂u;i = ¹ + bu + bi + p>u qir̂u;i = ¹ + bu + bi + p>u qi

Globalbias

Userbias

Itembias

User-itemInteraction

Page 5: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Factorization Machine

• Incorporate all possible information for recommender systems• One-hot encoding for each discrete (categorical) field• One real-value feature for each continuous field• All features are with latent factors• A more general regression model

Steffen Rendle. Factorization Machines. ICDM 2010. (10-year Best Paper)http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdfOpen source: http://www.libfm.org/

Page 6: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Factorization Machine is a Neural NetworkA NEW PERSPECTIVE

Page 7: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Content• A brief review of recommender system road map

• Deep learning for recommender systems

• Deep reinforcement learning for recommender systems

• Summary

Page 8: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Factorization-machine Neural Networks (FNN)

[Factorization Machine Initialized]

Weinan Zhang et al. Deep Learning over Multi-Field Categorical Data: A Case Study on User Response Prediction. ECIR 2016

Page 9: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

But factorization machine is still different from common additive neural networks!

Productoperation

Page 10: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Product Operations as Feature Interactions

Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016

Page 11: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Product-basedNeural Network(PNN)

• Blue Pi nodesare productoperators

Feature 1 Feature 2 Feature N

Embed 1 Embed 2 Embed N

P1 P2 Pi

Embedding Layer

Product Layer

Fully Connected Layers

Prediction

Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016

Page 12: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

DeepFM

Huifeng Guo, Ruiming Tang and Xiuqiang He et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. IJCAI 2017.

Page 13: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Attentional Factorization Machines

• Basic idea: reweighting the field-pair interaction by attention network

Jun Xiao et al. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. IJCAI 2017.

element-wise product of two vectors

Page 14: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Field-aware Interaction• In FFM, this interaction is implemented with field-

aware embeddings• Substitute with unified embedding and “field-

aware parameter”

F1 Kernel F2* *

F1 F2

FC layer

Score

• Network-in-Network (NFM, PIN)• Generalize interaction to any

functions with sub-networks

• Kernel Interaction (KFM, KPNN)• Use different kernels to

project interactions separately

Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.

Page 15: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Product-network In Network (PIN)

• We can design various sub-net to explore the interaction pattern between two fields

Feature 1 Feature 2 Feature N

Embed 1 Embed 2 Embed N Embedding Layer

Fully Connected Layers

Prediction

Sub-net 1 Sub-net 2 Sub-net i

F1 F2

FC layer

Hidden State

F1*F2

Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.

Page 16: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Public Data Experiment Performance

Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.

PIN achieved the best performance on well-recognized benchmarks and Huawei’s private dataset.

Page 17: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Content• A brief review of recommender system road map

• Deep learning for recommender systems

• Deep reinforcement learning for recommender systems

• Summary

Page 18: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

From PCs to Mobiles

• User will only provide feedback on the recommended items, which depend on the current recommendation algorithms

• Learning from interactions with users

Page 19: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Reinforcement Learning

• At each step t, the agent• Receives observation Ot

• Executes action At• Receives scalar reward Rt

• The environment• Receives action At• Emits observation Ot+1• Emits scalar reward Rt+1

• t increments at environment step

Agent

Environment

Learning from interaction: Given the current situation, what to do next in order to maximize utility?

Page 20: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Deep RL for Recommender Systems

• Methodologies

• Policy-based solutions• Policy gradient• Deep deterministic policy gradient / actor-critic

• Value-based solutions• Deep Q-learning

Page 21: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Policy-based works

• Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

• Deep Reinforcement Learning in Large Discrete Action Spaces. AriXiv 2015.

• Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

• Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

ICT

DeepMind

JD

SJTU

Page 22: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Reinforcement Learning to Rank with Markov Decision Process

• State: the state st is defined as a pair [t, Xt] where Xt is the remaining documents for ranking

• Action: at ∈ A(st) selects a document xm(at) ∈ Xt for the ranking position t+1

• Transition:

• Reward:

• Model how to select the next item in the ranking list as an MDP

Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

State

Action

Page 23: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Reinforcement Learning to Rank with Markov Decision Process

Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

REINFORCE Policy Gradient

Page 24: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Reinforcement Learning to Rank with Markov Decision Process

Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

Ranking accuracies on MQ2007 dataset

Page 25: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Deep Reinforcement Learning in Large Discrete Action Spaces

• Algorithm

Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.

Argmax on KNN

Page 26: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Deep Reinforcement Learning in Large Discrete Action Spaces• Experiment performance

Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.

Page 27: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions• There are a large number of candidate items as actions to take

• Cause very large computational complexity• No previous literature on this topic• No previous application with such a setting

• TPGR solution: building hierarchical item structure for sequential decision making

Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

Page 28: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions

• Item correlation is based on current policy• Policy (and value function) can be regarded as a table

Action 1 Action 2 Action 3 Action 4State 1 0.1 0.3 0.2 0.4State 2 0.4 0.3 0.1 0.2State 3 0.1 0.1 0.3 0.5State 4 0.4 0.2 0.2 0.2State 5 0.2 0.3 0.3 0.2State 6 0.1 0.1 0.6 0.2

• Based on such a table, we can cluster the items into a hierarchy

Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

Page 29: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Model-based RL for RecSys• Motivations: model-free deep RL methods

• Consume huge amount of data (low sample efficiency)• Suffer from sparse positive feedback (sparse reward)

Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

Page 30: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Actor Critic• Estimate the value of

an action in different scenarios

• Entrance/detail page

Skip on entrance page

Click on entrance page

Leave on entrance page

Skip on item detail page

Click on item detail page

Leave on item detail page

Build predictive models to estimate user behaviors: skip/click/leave

Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

Critic network

Page 31: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Deep RL for Recommender Systems

• Methodologies

• Policy-based solutions• Policy gradient• Deep deterministic policy gradient

• Value-based solutions• Deep Q-learning

Page 32: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

DQN-based works

• Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

• DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

• Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.

• Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. 2020.

JD

MSR

PolyU

SJTU

Page 33: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

feed positive input and negative input separately

Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

Problem: how to effectively represent the user state?

Page 34: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

maximizing the difference of Q-values between enemy items

enemy items

Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

Page 35: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

DRN: A Deep Reinforcement Learning Framework for News Recommendation• How to train an RL policy online and offline?

Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

Page 36: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

DRN: A Deep Reinforcement Learning Framework for News Recommendation

• Use user and item features to represent Q(s,a)

Dueling Q-network

Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

Page 37: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Exploration by Dueling Bandit Gradient Descent

• Trial-and-update learning (somewhat like evolutionary search)

Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

Page 38: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

DRN: A Deep Reinforcement Learning Framework for News Recommendation

Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

Page 39: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream

• Observations: previous interactions

• States: Hidden layer of an LSTM

• Actions: push or not

Haihui Tan et al. Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.

Page 40: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

KGQR: Leverage Knowledge Graphs to Better Item & State Representation

Sijin Zhou et al. Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. In submission 2020.

Page 41: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Summary of Current RL Solutions for Rec.

• State: Weak user profile representation• Action: Unable to well handle large-scale discrete

action space• Learning: Off-policy model-free RL to avoid data

bias and user modeling• System: Lack of online experiments or long time

tuning; online/offline learning combination• Data efficiency is quite low

• Modeling user dynamics would be a promising direction• Efficient state/action representation

Page 42: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Content• A brief review of recommender system road map

• Deep learning for recommender systems

• Deep reinforcement learning for recommender systems

• Summary

Page 43: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Summary:Road Map of Recommendation Technique

2000-2006 Neighborhood based collaborative filtering

2007-2009 Matrix factorization and variants

2010-2015 Factorization machine and variants

2015-2017 Deep neural networks for user behavior prediction

2017-2019 Deep reinforcement learning for decision making

Design neural nets to automatically capture complex interaction patterns in user-item data

Design RL settings for sequential recommendation decision making; train policies in an effective way

Page 44: Advances of Deep & Reinforcement Learning on Recommender ... · Factorization Machine •Incorporate all possible information for recommender systems •One-hot encoding for each

Thank You!Questions?

Dr. Weinan ZhangAssistant ProfessorAPEX Data & Knowledge Management LabJohn Hopcroft Center for Computer ScienceShanghai Jiao Tong University

Know more about me at http://wnzhang.net