New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. ·...

1
Laplacian-regularized graph bandits: Algorithms and theoretical analysis Kaige Yang 1 , Xiaowen Dong 2 and Laura Toni 1 [email protected];xdong.robots.ox.ac.uk;[email protected] 1 Department of Electrical and Electronic Engineering, Uninversity College London 2 Department of Engineering Science, University of Oxford Abstract We study contextual multi-armed bandit problems in the case of multiple users, where we exploit the structure in the user domain to reduce the cumulative regret. We model user relation as a graph, and assume that the parameters (preferences) of users form smooth signals on the graph. Based on a graph Laplacian-regularized estimator, we propose a novel bandit algorithm whose performance depends on a notion of local smoothness on the graph. We provide a closed-form solution to the estimator and provide theoretical analysis on the estimation error, single-user upper confidence bound (UCB) and cumulative regret. The single-user estimation and UCB also allow us to further propose a low complexity algorithm, whose computational complexity scales linearly with the number of users. Our theoretical claims and algorithms are validated and tested empirically upon both synthetic and real-world datasets. Problem Setting Linear contextual bandit (n users and m arms ) Graph-based bandit Theorems Single-user estimation: ; Local smoothness measure: Estimation error bound: Single-user UCB: Conclusion We have proposed G-UCB and G-UCB SIM, two Laplacian-regularized graph bandit algorithms that exploit the relation between users for more efficient learning. • As future directions, we may further consider negative edge weights in the adjacency matrix, which relaxes the condition that all user parameters are encouraged to be similar to each other, • We may also consider an directed graph, which may help take into account the different levels of influence a pair of users may have on each other. Theorems Validation Bandit Experiments Reference 1. Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms or linear stochastic bandits. In Advances in Neural Information Processing Systems, Ppge 2312-23202011 2. Nicolo Cesa-Bianchi,Claudio Gentile, and Giovanni Zappella. A gang of bandits. In Advances in Neural Information Processing Systems, pages 737-745, 2013 3. Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits. In International Conference on Machine Learning, pages 757-765, 2014 4. LiLong Li, Wei Chu, John Langd ford, and Robert E Schapire. A contextual-bandit approach to personalized new article recommendation. In Proceedings of the 19th international conference on world wide web, pages 661-670,. ACM, 2010 y = x T θ + ϵ ϵ (0, σ 2 ) R T = T t=1 (x T i,* θ t,i x T t θ t,i ) tr (Θ T LΘ)= 1 2 d k=1 ij W ij ( Θ ik D ii Θ jk D jj ) 2 ̂ Θ t = arg min Θ∈ℝ n×d n i=1 || Y t (i, :) Θ(i, :)X t,i || 2 F + αtr (Θ T LΘ) || Δ i || 2 = || θ i ( n ji L ij θ j ) || 2 || θ i ̂ θ t,i || 2 tr (A 2 t,i )(α || ̂ Δ t,i || 2 + || X t,i η t,i || 2 ) β t,i = α M 1 t,ii || ̂ Δ t,i || 2 + || X t,i η t,i || M 1 t,ii ̂ θ t,i =(A t,i + αL ii I) 1 B t,i + αA 1 t,i n ji L ij A 1 t, j B t, j Smoothness Smoothness Noise Sparsity Netflix Movielens Approximation UCB Algorithm 1: G-UCB Algorithm Input : , T , empty graph Laplacian L, ˆ 0,i = 0 2 R d , M 0,ii = I 2 R dd , β 0,i =0 for t 2 [1,T ] do For the appeared user i, calculate ˆ Δ t,i = ˆ t-1,i - P n j 6=i -L ij ˆ ls t-1,j Calculate β t,i Select one arm x t from the set D by maximizing UCB(i, t)= x T ˆ t-1,i + β t,i ||x|| M -1 t-1,ii Receive payoy t , update Y t-1 ! Y t and X t-1,i ! X t,i for all i Update ˆ t (and ˆ t,i ) Update ˆ ls t,i and M -1 t,ii : ˆ ls t,i =(X t,i X T t,i ) -1 X t,i Y t (i, :) T , M t,ii = X t,i X T t,i + L ii I Update W (and L) via Gaussian RBF: W ij = exp( -|| ˆ t,i - ˆ t,j || 2 2 2σ 2 w ), σ w is width θ x A t,i = X t,i X T t,i M t,ii X t,i X T t,i + αL ii I G-UCB SIM (Low Complexity Algorithm) If user i has at least one neighbour: If user i has no neighbours: ̂ θ t,i = ̂ θ ridge t,i + αA 1 t,i n ji L ij ̂ θ ls t, j ̂ θ t,i = ̂ θ ls t,i B t,i = X t,i Y t (i, :)

Transcript of New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. ·...

Page 1: New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. · Laplacian-regularized graph bandits: Algorithms and theoretical analysis Kaige Yang1, Xiaowen

Laplacian-regularized graph bandits: Algorithms and theoretical analysisKaige Yang1, Xiaowen Dong2 and Laura Toni1 [email protected];xdong.robots.ox.ac.uk;[email protected] 1Department of Electrical and Electronic Engineering, Uninversity College London 2Department of Engineering Science, University of Oxford

Abstract We study contextual multi-armed bandit problems in the case of multiple users, where we exploit the structure in the user domain to reduce the cumulative regret. • We model user relation as a graph, and assume that the parameters (preferences) of users form smooth signals on the graph. • Based on a graph Laplacian-regularized estimator, we propose a novel bandit algorithm whose performance depends on a notion of local smoothness on the graph. • We provide a closed-form solution to the estimator and provide theoretical analysis on the estimation error, single-user upper confidence bound (UCB) and cumulative regret. • The single-user estimation and UCB also allow us to further propose a low complexity algorithm, whose computational complexity scales linearly with the number of users. • Our theoretical claims and algorithms are validated and tested empirically upon both synthetic and real-world datasets.

Problem Setting • Linear contextual bandit (n users and m arms )

• Graph-based bandit

Theorems • Single-user estimation: ;

• Local smoothness measure:

• Estimation error bound:

• Single-user UCB:

Conclusion • We have proposed G-UCB and G-UCB SIM, two Laplacian-regularized

graph bandit algorithms that exploit the relation between users for more efficient learning.

• As future directions, we may further consider negative edge weights in the adjacency matrix, which relaxes the condition that all user parameters are encouraged to be similar to each other,

• We may also consider an directed graph, which may help take into account the different levels of influence a pair of users may have on each other.

Theorems Validation

Bandit Experiments

Reference 1. Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms or linear stochastic

bandits. In Advances in Neural Information Processing Systems, Ppge 2312-2320, 2011 2. Nicolo Cesa-Bianchi,Claudio Gentile, and Giovanni Zappella. A gang of bandits. In Advances in

Neural Information Processing Systems, pages 737-745, 2013 3. Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits. In International

Conference on Machine Learning, pages 757-765, 2014 4. LiLong Li, Wei Chu, John Langd ford, and Robert E Schapire. A contextual-bandit approach to

personalized new article recommendation. In Proceedings of the 19th international conference on world wide web, pages 661-670,. ACM, 2010

y = xTθ + ϵ ϵ ∼ 𝒩(0, σ2)

RT =T

∑t=1

(xTi,*θt,i − xT

t θt,i)

tr(ΘTLΘ) =12

d

∑k=1

∑i∼j

Wij(Θik

Dii−

Θjk

Djj

)2

Θt = arg minΘ∈ℝn×d

n

∑i=1

| |Yt(i, :) − Θ(i, :)Xt,i | |2F + αtr(ΘTLΘ)

| |Δi | |2 = | |θi − (n

∑j≠i

− Lijθj) | |2

| |θi − θt,i | |2 ≤ tr(A−2t,i )(α | | Δt,i | |2 + | |Xt,iηt,i | |2 )

βt,i = α M−1t,ii | | Δt,i | |2 + | |Xt,iηt,i | |M−1

t,ii

θt,i = (At,i + αLiiI)−1Bt,i + αA−1t,i

n

∑j≠i

− LijA−1t,j Bt,j

Smoothness Smoothness Noise

Sparsity Netflix Movielens

Approximation UCB

Algorithm 1: G-UCB AlgorithmInput : ↵, T , empty graph Laplacian L,✓0,i = 0 2 Rd, M0,ii = I 2 Rd⇥d, �0,i = 0

for t 2 [1, T ] do

• For the appeared user i, calculate �t,i = ✓t�1,i �⇣Pn

j 6=i �Lij ✓lst�1,j

• Calculate �t,i

• Select one arm xt from the set D by maximizingUCB(i, t)= xT ✓t�1,i + �t,i||x||M�1

t�1,ii

• Receive payo↵ yt, update Yt�1 ! Yt and Xt�1,i ! Xt,i for all i

• Update ⇥t (and ✓t,i)

• Update ✓lst,i and M�1

t,ii:

✓lst,i = (Xt,iX

Tt,i)

�1Xt,iYt(i, :)T , Mt,ii = Xt,iX

Tt,i + ↵LiiI

• Update W (and L) via Gaussian RBF: Wij = exp(�||✓t,i�✓t,j ||22

2�2w

), �w is kernel

width

θ x

At,i = Xt,iXTt,i

Mt,ii ≈ Xt,iXTt,i + αLiiI

G-UCB SIM (Low Complexity Algorithm) If user i has at least one neighbour:

If user i has no neighbours:

θt,i = θridget,i + αA−1

t,i

n

∑j≠i

− Lijθlst,j

θt,i = θlst,i

Bt,i = Xt,iYt(i, :)