R.US mean Eexp

sub Gaussian R.US

A meanzero RV Z is d sub Gaussian if E expX2 Eexp O'a HER

Chernoff bound w sub Gaussian RU.s

If Xi Xa are IID RV w Elk M and X M is 82sub Gaussian then

IPC Xi p e e exp it S

If XE 0,13 then take a 0,6 1 X Ely is t sus GaussianHoeffding's Lemma

If XE La b the X Efx is bid sub GaussianUnion Bound or sub additivity of probability

Let A An be events in a welldefined probability space

00ther IPC A 3 E ÉPCA

Given n treatments Input horizon TEIN

for til 2 T

Patient t arrives w illness

prescribes treatment It Eln l inAgent

Observe XI t patient t recovers given It

Let Xi patient t recovers given i

Assume for each ie n Xi É are IID w E Xi O

Questionsperhaps Xia e

How long does it take to identify the best treatment 917 Oi

with probability at least I 8

If Rt MEEEEEXie Xia how can we minimize Rt

Expected Regret

Rt YES III Xie Xi É Ii ELTwhere Ai YE O Oi T É It i

Warm up n tonsi.de strategy that plays

action ieln round robin until Ti 3 then

plays 91 4 Gi for the remainder T ns times

T empirical mean of first 3 plays

ieCn

Define di 10 Oilers at

hemal PLAYED I IPC E 3 21 8

Prod Gi III Xis Xi e40,13 ae iid IELXi.itXi Oi is Isas Cess

with 02 1 4

PCE PAO Oil FIXE 2expt E.ltZexp log 2n 8

IPC id 3 eÉ IP d 12 4 8

Because IP f Ei I J assume I di occurs

Assume WLOG 02022 On

If 2 580 log 2m18 then on events d ME

8 O Fast Cd

O

Out

28 FÉ Ez

Oss

E

With probability at least I J strategy

identifies 937Y Oi0,2022 20h n 2 O Oz

Rt É Ai ELT A ELI since m2 and A O

A ELT HEE.n 3 1144 ve 3

Az Tolland A ELINE VE

E Az 3ELILENED A IT 3 ELISE VE 3

Note on d ME we proved above On

If Q then T J T T J

d nd 35 3 T T 33

IE To II nd 31K nd

3 ELI d ndI 3

Tell d Adz 31154 nd

It is true that IP d n dz 21 8 thus

W.p 21 8 Aztz E Az

I IeeeS

E A J t AZT 8

It 8A a log 2n 8 A TS

Idea choose 8 117 AEI

E 8A log antI

Ve wanted Rt oct we have it 105ft so

Note If treatments were nearly identical so that A 10

then our regret bound says R I 1010

But this is nonsense ble it shouldn't matter what

is prescribed they act nearly the same

We know RT A ELT E AT

Rtt min 2 88102 glut Azt

É J

I

maximize over Az ofRt 2 4 O

minlehtl.IM

A Fi2 Attend

Note R ITinstance indep regretbound

Multi armed Bandit problem

Imagine Casino

farmthat you pull to play I armed bandit

B DI postI 2 n

Each machine i pays out random amount w mean Oi

Elimination style algorithm

using pullsonlyd from round l

an

Assume observation Xi at time t is sub Gaussian

and iid and IELXi.AE oil

Assume WLOG O Or On Ai O Oi A Y'Is HiAz

hemmed With probability at least l S

arm IEXe and fixed I See KLEIN

Proof For all le IN and ie n define

die 18 0 see

Note that Eez feats Thus

IPCY.Yexed.ie E1P II dieEEFE.PCE.ie

fact É 2

III retrieslixen

P I I die 21 J thus assume these events hold

II ai is kicked out of Xe then 7 arm je Xe

Oj e Fi Zee

Assume IEXe and show I excel

For all j exe we have

One One Oe O 10 One A

E Ee t Ee to

ZeeI EXey arm I E Xe KLEIN

Fix any ie te s t Ai teeRecall I E Xe te EN

Want to show that arm it does not enter XeWe need to show YYxe

j.e Oi Zee

Yet Qe Qe One dieO e 0 10 One Ai

E Ee Ee t Ai

1 2 ee t 4 Ee

Z Z Ee

II One One 24 i X ee

Max

Jeter Aj 4 Ee 8Ey Ee Ze

Max

j exe Aj E 84 KLEINBY

Sample complexity Let To be the total pullsfrom arm i at the end of game

Total pulls II TiClaim Xe for all l See LA

85 Es 85 Eze l 1kg 817

Ee It He for l Tim 18s l

For any

in Ti É Yelliexe

e'If Je 111A see

Je flog a logicallog za

Éeilogite 17

E T2 log tee ItE zloglsloillft.tl EgI

Ec É log oshi

Thug 81 E logo 81 log 1615

For any C O log a I 7 C 20

S.t log ca E c'log a

T YY To EIT T ZITI EXIT

the Wip 21 8 after at most

c I log losidiff total pulls we

ha thatalyexits w best arm 1

ee due to eat doe

L i

antiQin

myX2

dit 10 0 EffinClaim p 11 di 21 8

Consider UCB upper confidencebound alg

Pull every arm once

for tiny l ntl

pull It 71592FÉ

i

i

t.ESIY

fyg

On yourhomework you show that Tilt Ees log E

FEETRt III ELT

it

We've demonstrated the existence of an algorithm

that satisfies Eltabectiloy T Rtt CÉÉ'doge

Lai Robbins 3

That Any strategy that satisfies E Told o ta tomato

w Ai O we have himto gtÉL

gg

It EI Ei at 2449ft

Thd As Tye a I UCB a

Thomasonsatistis i'III É Sampling

Thompson Sampling 1930s Thompson

Or Plo Observe Xi Xu N Ops

i

i

Plo Xiii P Exile 8 PCO

Given posterior distribution P Oilskin Xi toofor ith arm

I can answer the question P i 970 xi.d.ttThompson Sampling

for til

Pull arm Iii ul prob P i 970 xi.d.tt

I

Tequivalent

Of PLO xi.LI

Ie aroma Otit r

IP i as O xi.d.tt filling no dP O xi.d.tt

In MAB typically no sharing of information

Plo it Piloi s

R.US mean Eexp

Documents

Transcript of R.US mean Eexp