New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. ·...

Laplacian-regularized graph bandits: Algorithms and theoretical analysisKaige Yang1, Xiaowen Dong2 and Laura Toni1 [email protected];xdong.robots.ox.ac.uk;[email protected] 1Department of Electrical and Electronic Engineering, Uninversity College London 2Department of Engineering Science, University of Oxford

Abstract We study contextual multi-armed bandit problems in the case of multiple users, where we exploit the structure in the user domain to reduce the cumulative regret. • We model user relation as a graph, and assume that the parameters (preferences) of users form smooth signals on the graph. • Based on a graph Laplacian-regularized estimator, we propose a novel bandit algorithm whose performance depends on a notion of local smoothness on the graph. • We provide a closed-form solution to the estimator and provide theoretical analysis on the estimation error, single-user upper confidence bound (UCB) and cumulative regret. • The single-user estimation and UCB also allow us to further propose a low complexity algorithm, whose computational complexity scales linearly with the number of users. • Our theoretical claims and algorithms are validated and tested empirically upon both synthetic and real-world datasets.

Problem Setting • Linear contextual bandit (n users and m arms )

• Graph-based bandit

Theorems • Single-user estimation: ;

• Local smoothness measure:

• Estimation error bound:

• Single-user UCB:

Conclusion • We have proposed G-UCB and G-UCB SIM, two Laplacian-regularized

graph bandit algorithms that exploit the relation between users for more efficient learning.

• As future directions, we may further consider negative edge weights in the adjacency matrix, which relaxes the condition that all user parameters are encouraged to be similar to each other,

• We may also consider an directed graph, which may help take into account the different levels of influence a pair of users may have on each other.

Theorems Validation

Bandit Experiments

Reference 1. Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms or linear stochastic

bandits. In Advances in Neural Information Processing Systems, Ppge 2312-2320， 2011 2. Nicolo Cesa-Bianchi,Claudio Gentile, and Giovanni Zappella. A gang of bandits. In Advances in

Neural Information Processing Systems, pages 737-745, 2013 3. Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits. In International

Conference on Machine Learning, pages 757-765, 2014 4. LiLong Li, Wei Chu, John Langd ford, and Robert E Schapire. A contextual-bandit approach to

personalized new article recommendation. In Proceedings of the 19th international conference on world wide web, pages 661-670,. ACM, 2010

y = xTθ + ϵ ϵ ∼ 𝒩(0, σ2)

RT =T

∑t=1

(xTi,*θt,i − xT

t θt,i)

tr(ΘTLΘ) =12

d

∑k=1

∑i∼j

Wij(Θik

Dii−

Θjk

Djj

)2

Θt = arg minΘ∈ℝn×d

n

∑i=1

| |Yt(i, :) − Θ(i, :)Xt,i | |2F + αtr(ΘTLΘ)

| |Δi | |2 = | |θi − (n

∑j≠i

− Lijθj) | |2

| |θi − θt,i | |2 ≤ tr(A−2t,i )(α | | Δt,i | |2 + | |Xt,iηt,i | |2 )

βt,i = α M−1t,ii | | Δt,i | |2 + | |Xt,iηt,i | |M−1

t,ii

θt,i = (At,i + αLiiI)−1Bt,i + αA−1t,i

n

∑j≠i

− LijA−1t,j Bt,j

Smoothness Smoothness Noise

Sparsity Netflix Movielens

Approximation UCB

Algorithm 1: G-UCB AlgorithmInput : ↵, T , empty graph Laplacian L,✓0,i = 0 2 Rd, M0,ii = I 2 Rd⇥d, �0,i = 0

for t 2 [1, T ] do

• For the appeared user i, calculate �t,i = ✓t�1,i �⇣Pn

j 6=i �Lij ✓lst�1,j

⌘

• Calculate �t,i

• Select one arm xt from the set D by maximizingUCB(i, t)= xT ✓t�1,i + �t,i||x||M�1

t�1,ii

• Receive payo↵ yt, update Yt�1 ! Yt and Xt�1,i ! Xt,i for all i

• Update ⇥t (and ✓t,i)

• Update ✓lst,i and M�1

t,ii:

✓lst,i = (Xt,iX

Tt,i)

�1Xt,iYt(i, :)T , Mt,ii = Xt,iX

Tt,i + ↵LiiI

• Update W (and L) via Gaussian RBF: Wij = exp(�||✓t,i�✓t,j ||22

2�2w

), �w is kernel

width

θ x

At,i = Xt,iXTt,i

Mt,ii ≈ Xt,iXTt,i + αLiiI

G-UCB SIM (Low Complexity Algorithm) If user i has at least one neighbour:

If user i has no neighbours:

θt,i = θridget,i + αA−1

t,i

n

∑j≠i

− Lijθlst,j

θt,i = θlst,i

Bt,i = Xt,iYt(i, :)

mailto:[email protected]

mailto:[email protected]

New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. ·...

Documents

Transcript of New Laplacian-regularized graph bandits: Algorithms and theoretical … · 2019. 6. 6. ·...