Multi Armed Bandits - cs.ubc.ca · I There are a variety of MAB speci cations with various...

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Online ConvexOptimization (OCO)

Measuring ThePerformance

Bandit ConvexOptimization

Motivation

Multi Armed Bandit

A Simple MABAlgorithm

Stochastic MultiArmed Bandit

Definition

Bernoulli Multi ArmedBandit

Algorithms

Contextual Bandits

Motivation

Multi Armed Bandits

Alireza Shafaei

Machine Learning Reading GroupThe University of British Columbia

Summer 2017

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Outline

A Quick ReviewOnline Convex Optimization (OCO)Measuring The Performance

Bandit Convex OptimizationMotivationMulti Armed BanditA Simple MAB AlgorithmEXP3

Stochastic Multi Armed BanditDefinitionBernoulli Multi Armed BanditAlgorithms

Contextual BanditsMotivation

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Online Convex OptimizationDefinition

I At each iteration t, the player chooses xt ∈ K.

I A convex loss function ft ∈ F : K → R is revealed.I A cost ft(xt) is incurred.

→ F is a set of bounded functions.→ ft is revealed after choosing xt .→ ft can be adversarially chosen.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Online Convex OptimizationRegret

I Given an algorithm A.

I Regret of A after T iterations is defined as:

regretT (A) := sup{fi}Ti=1∈F

{T∑t=1

ft(xt)−minx∈K

T∑t=1

ft(x)}

I Online Gradient Descent O(GD√T ).

I If fi is α-strongly convex then O(G2

2α (1 + log(T )).

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Bandit Convex OptimizationDefinition1

I In OCO we had access to ∇ft(xt).

I in BCO we only observe ft(xt).

I Multi Armed Bandit further constrains the BCO setting.

1Material from [1].

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Bandit Convex OptimizationMotivation

I In data networks, the decision maker can measure theRTD of a packet, but rarely has access to thecongestion pattern of the entire network.

I In ad-placement, the search engine can inspect whichads were clicked through, but cannot know whetherdifferent ads, had they been chosen, would have beenclick through or not.

I Given a fixed budget, how to allocate resources amongthe research projects whose outcome is only partiallyknown at the time of allocation and may changethrough time.

I Wikipedia. Originally considered by Allied scientists inWorld War II, it proved so intractable that, according toPeter Whittle, the problem was proposed to be droppedover Germany so that German scientists could alsowaste their time on it.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Multi Armed Bandit (MAB)Definition

I On each iteration t, the player choses an action it froma predefined set of discrete actions {1, . . . , n}.

I An adversary, independently, chooses a loss ∈ [0, 1] foreach action.

I The loss associated with it is then revealed to theplayer.

I There are a variety of MAB specifications with variousassumptions and constraints.

I This definition is similar to the multi-expert problem,except we do not observe the loss associated with theother experts.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

MAB as a BCO

I The algorithms usually choose an action w.r.t adistribution over the actions.

I If we define K = ∆n, an n-dimensional simplex then

ft(x) = `>t x =n∑

`t(i)x(i) ∀x ∈ K

I We have an exploration-exploitation trade-off.I A simple approach would be to

→ Exploration With some probability, explore by choosingactions uniformly at random. Construct an estimate ofthe actions’ losses with the feedback.

→ Exploitation Otherwise, use the estimates to make adecision.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

A Simple MAB AlgorithmDefinition

I An algorithm can be constructed:

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

A Simple MAB AlgorithmAnalysis

I This algorithm guarantees:

E[T∑t=1

`t(it)−mini

T∑t=1

`t(i)] ≤ O(T34√n)

E[ˆ̀t(i)] = P[bt = 1] · P[it = i |bt = 1] · nδ`t(i) = `t(i)

‖ˆ̀t‖2 ≤

δ· |`t(it)| ≤

I E[f̂t ] = ft

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

A Simple MAB AlgorithmAnalysis

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

I Has a worst-case near-optimal regret bound ofO(√Tn log n).

I See P104 of the OCO book for proof.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Stochastic Multi Armed BanditDefinition

I On each iteration t, the player choses an action it froma predefined set of discrete actions {1, . . . , n}.

I Each action i has an underlying (fixed) probabilitydistribution Pi with mean µi .

I The loss associated with it is then revealed to theplayer. (A sample is taken from Pit ).

I Pi ’s could be a simple Bernoulli variable.

I A more complex version could assume a Markov processfor each action, within which the state of one or allprocesses change after each iteration.

I We still have the exploration-exploitation trade-off.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Bernoulli Multi Armed BanditDefinition2

I N[a] The number of times arm a is pulled.

I Q[a] The running average of rewards for arm a.

I S [a] The number of successes for arm a.

I F [a] The number of failures for arm a.

I The same notion of regret, except the optimal strategyis to pull the arm with the largest mean, µ∗.

2Material from [2].

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Algorithms

I Random Selection ⇒ O(T ).

I Greedy Selection ⇒ O(T ).

I ε-Greedy Selection ⇒ O(T ).

I Boltzmann Exploration ⇒ O(T ).

I Upper-Confidence Bound ⇒ O(ln(T )).

I Thompson Sampling ⇒ O(ln(T )).

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

AlgorithmsEmpirical Evaluation

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Contextual BanditsMotivation

I There are scenarios within which we can have access tomore information.

I The extra information can be encoded as a contextvector.

I In online advertising, the behaviour of each user, or thesearch context for instance, can provide valuableinformation.

I One simple way is to treat each context having its ownbandit problem.

I Variations of the previous algorithms relate the contextvector with the expected reward through linear models,neural networks, kernels, or random forests.

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

Thanks

Thanks!Questions?

Multi ArmedBandits

Alireza Shafaei

A Quick Review

Motivation

Multi Armed Bandit

Definition

Algorithms

Contextual Bandits

Motivation

References I

E. Hazan.Introduction to Online Convex Optimization.http://ocobook.cs.princeton.edu/OCObook.pdf

S. Raja.Multi Armed Bandits and Exploration Strategies.https://sudeepraja.github.io/Bandits/

Multi Armed Bandits - cs.ubc.ca · I There are a variety of MAB speci cations with various...

Documents

Transcript of Multi Armed Bandits - cs.ubc.ca · I There are a variety of MAB speci cations with various...

COMBINATORIAL IDENTIFICATION IN MULTI-ARMED BANDITS · COMBINATORIAL IDENTIFICATION IN MULTI-ARMED BANDITS 4 1. Introduction The multi-armed bandit problem is a sequential decision

Multi-arm Bandits

the DB - cs.ubc.ca

Multi-Armed Bandits in Metric Spaces - Brown …cs.brown.edu/~eli/papers/bandits-lip-full.pdfMulti-Armed Bandits in Metric Spaces Robert Kleinbergy Aleksandrs Slivkinsz Eli Upfalx

Foraging and Multi-armed Bandits Optimal Foraging …vaibhav/talks/2013a.pdfForaging and Multi-armed Bandits ... the multi-armed bandit problem with switching cost. IEEE Transactions

The Dueling Bandits Problem Yisong Yue. Outline Brief Overview of Multi-Armed Bandits Dueling Bandits – Mathematical properties – Connections to other.

American Options, Multi–armed Bandits, and Optimal ...

Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation.

Multi-armed Bandits with Cost Subsidy - Facebook Research

Deep Sets - cs.ubc.ca

Multitasking, Multi-Armed Bandits, and the Italian Judiciarytintin.hec.ca/pages/decio.coviello/research_files/judges.pdf · Multitasking, Multi-Armed Bandits, and the ... The Italian

Active Learning in Multi-Armed Bandits Iszepesva/papers/Allocation-TCS09.pdfActive Learning in Multi-Armed Bandits I Andr as Antosa,, Varun Groverb, Csaba Szepesv aria,b, a Computer

Bayesian Contextual Multi-armed Bandits Contextual Multi-armed Bandits ... The Epoch-Greedy Algorithm for Contextual Multi-armed ... topic model w/ a Bayesian multi-armed bandit analysis

Designing Truthful Contextual Multi-Armed Bandits based ...

clipping - cs.ubc.ca

Top Arm Identiﬁcation in Multi-Armed Bandits with Batch ...nowak.ece.wisc.edu/aistats16.pdfTop Arm Identiﬁcation in Multi-Armed Bandits with Batch Arm Pulls ... We introduce a

Foraging and Multi-armed Bandits Optimal Foraging and Multi-armed Bandits Vaibhav ...vaibhav/talks/2013a.pdf · 2016-08-16 · Optimal Foraging and Multi-armed Bandits Vaibhav Srivastava

Lecture 2: Exploration and Exploitation in Multi … 2: Exploration and Exploitationin Multi-Armed Bandits Lecture 2: Exploration and Exploitation in Multi-Armed Bandits Hado van Hasselt

Multi-armed bandits for fun and profit

Know Your Customer: Multi-armed Bandits with Capacity ... · Know Your Customer: Multi-armed Bandits with Capacity Constraints Ramesh Johari Vijay Kambley Yash Kanoriaz Abstract A