Dueling Bandits

download Dueling Bandits

of 68

Transcript of Dueling Bandits

  • 7/25/2019 Dueling Bandits

    1/68

    The Dueling Bandits Problem

    Yisong Yue

  • 7/25/2019 Dueling Bandits

    2/68

    Outline

    Brief Overview of Multi-ArmedBandits Sequential Experimental Design

    Dueling Bandits Mathematical properties

    Connections to other problems

    Algorithmic Principles

    New Directions & Ongoing Research

  • 7/25/2019 Dueling Bandits

    3/68

    Multi-Armed Bandit Problem(stochastic version)

    K actions (aa arms or bandits!

    Each action has an a"erage re#ard$ % &nno#n to us

    Assume ')* that u+is largest

    ,or t - +.T Algorithm chooses action a(t!

    /ecei"es random re#ard 0(t!

    Expectation %a(t!

    Goal: minimi1e Tu+2 (%a(+! 3 %a(4! 3 . 3 %a(T!!

    Algorithm onl0 recei"es5eedbac on chosen action

    65 #e had per5ect in5ormation to start Expected /e#ard o5 Algorithm

    Regret!

  • 7/25/2019 Dueling Bandits

    4/68

    Sports

    -- -- -- -- --

    0 0 0 1 0# Shown

    Average Likes : 0

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    5/68

    -- -- -- 0 --

    0 0 0 1 0# Shown

    Average Likes : 0

    Example:

    Interactive Personalization

    Sports

  • 7/25/2019 Dueling Bandits

    6/68

    -- -- -- 0 --

    0 0 1 1 0# Shown

    Average Likes : 0

    Politics

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    7/68

    -- -- 1 0 --

    0 0 1 1 0# Shown

    Average Likes : 1

    Politics

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    8/68

    -- -- 1 0 --

    0 0 1 1 1# Shown

    Average Likes : 1

    'orld

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    9/68

    -- -- 1 0 0

    0 0 1 1 1# Shown

    Average Likes : 1

    'orld

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    10/68

    -- -- 1 0 0

    0 1 1 1 1# Shown

    Average Likes : 1

    Econom0

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    11/68

    -- 1 1 0 0

    0 1 1 1 1# Shown

    Average Likes : 2

    Econom0

    "

    Example:

    Interactive Personalization

  • 7/25/2019 Dueling Bandits

    12/68

    -- 0.44 0.4 0.33 0.2

    0 25 10 15 20# Shown

    Average Likes : 24

    hat should Al!orithm "ecommend#

    Exploit: Explore: est:

    Politics

    Econom0

    Celebrit0

    #ow to O$timall% Balance '$lore('$loit )radeo*+Characteri1ed b0 the Multi7Armed Bandit Problem

  • 7/25/2019 Dueling Bandits

    13/68

    ( )

    )pportunit0 cost o5 not no#ingpre5erences

    8no-regret9 i5 /(T!:T ;

    E

  • 7/25/2019 Dueling Bandits

    14/68

    $he Motivatin! Problem

    Slot Machine - )ne7Armed Bandit

    Goal: Minimi1e regret ,rom pullingsuboptimal arms

    6mage source$ http$::research=microso5t=com:en7us:pro>ects:bandits:

    Each Arm ?asDi@erent Pa0o@

  • 7/25/2019 Dueling Bandits

    15/68

    Man% Applications

    )nline Ad"ertising Search Engines /ecommender S0stems

    Personali1ed Clinical

    Treatment

    ,euential '$erimental Design

  • 7/25/2019 Dueling Bandits

    16/68

    Experimental &esi!n

    ?o# to split trials to collect in5ormation

    ,tatic '$erimental Design Standard practice

    (pre7planned!

    http$::en=#iipedia=org:#ii:Designo5experiments

    Treatment Placebo Treatment Placebo Treatment

    "

  • 7/25/2019 Dueling Bandits

    17/68

    'euential Experimental &esi!n

    Adapt experiments based onoutcomes

    Treatment Placebo Treatment Treatment

    "

    Treatment

    ential '$erimental Design as .nteractive /ersonali0

    et 1 total 2 of $ositive outcomes

  • 7/25/2019 Dueling Bandits

    18/68

    'euential Experimental &esi!nMatters

    http$::###=n0times=com:4;+;:;:+:health:research:+trial=html

    o Cousins, Two Paths-- Thomas McLaughlin, left, was gipromising experimental rug to treat his lethal s!in cancermeical trial" #ranon R$an ha to go without it%&

  • 7/25/2019 Dueling Bandits

    19/68

    hat i "e*ards aren+t &irectl%Measureable#

  • 7/25/2019 Dueling Bandits

    20/68

    Clic

    .nter$retation 3:/esult 4 is good=(Absolute!

    .nter$retation 4:

    /esult 4 is betterthan /esult +=(/elati"e :Pre5erence!

    Evaluatin! usin! ,lic &ata

  • 7/25/2019 Dueling Bandits

    21/68

    Retrieval 5unction A Retrieval 5unction B

    'hich isbetter

    Evaluatin! usin! ,lic &ata

    Clic

    Clic

    Clic

  • 7/25/2019 Dueling Bandits

    22/68

    Analo!% to 'ensor% $estin!

    (?0pothetical! taste experiment$"s atural usage context

    Experiment +$ A6solute Metrics

    3 cans 3 cans 2 cans 1 can 5 cans 3 cans

    Total: 8 cans Total: 9 cans

    7er% )hirst%8

  • 7/25/2019 Dueling Bandits

    23/68

    Analo!% to 'ensor% $estin!

    (?0pothetical! taste experiment$"s atural usage context

    Experiment +$ Relative Metrics

    2 - 1 3 - 0 2 - 0 1 - 0 4 - 1 2 - 1

    All 6 prefer Pepsi

  • 7/25/2019 Dueling Bandits

    24/68

    Ran9ing A

    += apa Falle0 2 The authorit0 5orlodging===###=napa"alle0=com

    4= apa Falle0 'ineries 7 Plan 0our#ine===###=napa"alle0=com:#ineries

    G= apa Falle0 College

    ###=napa"alle0=edu:homex=aspH= Been There I Tips I apa Falle0

    ###=i"ebeenthere=co=u:tips:+JJ+

    L= apa Falle0 'ineries and 'ine###=napa"intners=com

    J= apa Countr0 Cali5ornia 2'iipediaen=#iipedia=org:#ii:apaFalle0

    Ran9ing B

    += apa Countr0 Cali5ornia 2'iipedia

    en=#iipedia=org:#ii:apaFalle0

    4= apa Falle0 2 The authorit0 5orlodging===

    ###=napa"alle0=comG= apa$ The Stor0 o5 an American

    Eden===boos=google=co=u:boosisbn-===

    H= apa Falle0 ?otels 2 Bed andBrea5ast===###=napalins=com

    L= apaFalle0=org###=napa"alle0=org

    J= The apa Falle0 Marathon###=napa"alle0marathon=org

    /resented Ran9ing+= apa Falle0 2 The authorit0 5or

    lodging===###=napa"alle0=com

    4= apa Countr0 Cali5ornia 2'iipedia

    en=#iipedia=org:#ii:apaFalle0G= apa$ The Stor0 o5 an AmericanEden===boos=google=co=u:boosisbn-===

    H= apa Falle0 'ineries 2 Plan 0our#ine===

    ###=napa"alle0=com:#ineriesL= apa Falle0 ?otels 2 Bed and

    AB

    [Radlinski et al. 2008]

    Interleavin! ($aste $est in 'earch)

  • 7/25/2019 Dueling Bandits

    25/68

    Ran9ing A

    += apa Falle0 2 The authorit0 5orlodging===###=napa"alle0=com

    4= apa Falle0 'ineries 7 Plan 0our#ine===###=napa"alle0=com:#ineries

    G= apa Falle0 College

    ###=napa"alle0=edu:homex=aspH= Been There I Tips I apa Falle0

    ###=i"ebeenthere=co=u:tips:+JJ+

    L= apa Falle0 'ineries and 'ine###=napa"intners=com

    J= apa Countr0 Cali5ornia 2'iipediaen=#iipedia=org:#ii:apaFalle0

    Ran9ing B

    += apa Countr0 Cali5ornia 2'iipedia

    en=#iipedia=org:#ii:apaFalle0

    4= apa Falle0 2 The authorit0 5orlodging===

    ###=napa"alle0=comG= apa$ The Stor0 o5 an American

    Eden===boos=google=co=u:boosisbn-===

    H= apa Falle0 ?otels 2 Bed andBrea5ast===###=napalins=com

    L= apaFalle0=org###=napa"alle0=org

    J= The apa Falle0 Marathon###=napa"alle0marathon=org

    /resented Ran9ing+= apa Falle0 2 The authorit0 5or

    lodging===###=napa"alle0=com

    4= apa Countr0 Cali5ornia 2'iipedia

    en=#iipedia=org:#ii:apaFalle0G= apa$ The Stor0 o5 an AmericanEden===boos=google=co=u:boosisbn-===

    H= apa Falle0 'ineries 2 Plan 0our#ine===

    ###=napa"alle0=com:#ineriesL= apa Falle0 ?otels 2 Bed and

    B#insO

    Clic

    [Radlinski et al. 2008]

    Clic

    Interleavin! ($aste $est in 'earch)

  • 7/25/2019 Dueling Bandits

    26/68

    ueries

    6nterlea"ing is more sensitiveand more relia6le

    Disagreeme

    ntProbabilit0

    QChapelle Roachims /adlinsi ue T)6S 4;+

    &eplo%ment on .ahoo/ 'earch En!ineComparing Two Ranking Functions

    .nterleaving

    A6solute Metricsg; 2

  • 7/25/2019 Dueling Bandits

    27/68

    >eftwins

    Rightwins

    A vs B ; 3

    A vs < ; ;

    B vs < ; ;

    Interleave A vs B

  • 7/25/2019 Dueling Bandits

    28/68

    >eftwins

    Rightwins

    A vs B ; +

    A vs < ; 3

    B vs < ; ;

    Interleave A vs C

  • 7/25/2019 Dueling Bandits

    29/68

    >eftwins

    Rightwins

    A vs B ; +

    A vs < ; +

    B vs < ; 3

    Interleave B vs C

  • 7/25/2019 Dueling Bandits

    30/68

    >eftwins

    Rightwins

    A vs B ; +

    A vs < 3 +

    B vs < ; +

    &oal: 'axii%e total (ser (tilit)

    Exploit: run (interlea!e "it# itself)

    Explore: interlea!e A !s $

    est: A(interlea!e A "it# itself)

    %o" to interact opti&all'

    *(eling an+its ,role

    Interleave A vs C

  • 7/25/2019 Dueling Bandits

    31/68

    Example Pair*ise Preerences

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B 7

    ;=;G

    ; ;=;G ;=;L ;=; ;=+

    +< 7

    ;=;H7;=;G

    ; ;=;H ;=;N ;=;

    D 7;=;J

    7;=;L

    7;=;H

    ; ;=;L ;=;N

    7;=+;

    7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;

    Values are Pr(ro ! "ol# 0.$

    ?tilit% function ma% note'ist

    #ow to de@ne regret+

  • 7/25/2019 Dueling Bandits

    32/68

    Example Pair*ise Preerences

    A B < D 5

    A

    C

    3

    33

    B 7

    ;=;G

    ; ;=;G ;=;L ;=; ;=+

    +< 7

    ;=;H7;=;G

    ; ;=;H ;=;N ;=;

    D 7;=;J

    7;=;L

    7;=;H

    ; ;=;L ;=;N

    7;=+;

    7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;

    Values are Pr(ro ! "ol# 0.$

    ?tilit% function ma% not e'ist

    #ow to de@ne regret+

  • 7/25/2019 Dueling Bandits

    33/68

    &uelin! Bandits Problem(*ith 0ose Broder1 "obert 2leinber! and $horsten 0oachims)

    K bandits b+ . bK

    Each iteration$ compare (duel! t#o bandits )bser"e (nois0! outcome

    Cost 5unction (regret!$

    (bt btW! are the t#o bandits chosen bXis the o"erall best one

    (?o# much human user pre5erred bXo"er chosenbandits!

    Que Broder Kleinberg Roachims C)T 4;;

    /equires Dueling Mechanism

    RT= P('*> 't)+ P('*> 't')1t=1

    T

  • 7/25/2019 Dueling Bandits

    34/68

    &uelin! Bandits Problem

    Values are Pr(ro ! "ol# 0.$

    +>=T

    t

    ttT bbPbbPR

    1

    1)'*()*(

  • 7/25/2019 Dueling Bandits

    35/68

    &uelin! Bandits Problem

    Values are Pr(ro ! "ol# 0.$

    =T

    t

    ttT bbPbbPR

    1

    1)'*()*(

  • 7/25/2019 Dueling Bandits

    36/68

    &uelin! Bandits Problem

    Values are Pr(ro ! "ol# 0.$

    +>=T

    t

    ttT bbPbbPR

    1

    1)'*()*(

  • 7/25/2019 Dueling Bandits

    37/68

    Modeling Assumptions

    P(biY b>! - Z 3 [i>(distinguishabilit0!

    ,trong ,tochastic )ransitivit%

    ,or three bandits biY b>Y b $

    Monotonicit0 propert0

    ,tochastic )riangle .neualit%

    ,or three bandits biY b>Y b $

    Diminishing returns propert0

    Satis\ed b0 man0 standard models E=g= ogistic : Bradle07Terr0

    { }jkijik ,max

    i!i(+(!

  • 7/25/2019 Dueling Bandits

    38/68

    'tron! 'tochastic $ransitivit%

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B 7;=;G

    ; ;=;G ;=;L ;=; ;=++

    < 7;=;H 7;=;G ; ;=;H ;=;N ;=;

    D 7;=;J

    7;=;L

    7;=;H

    ; ;=;L ;=;N

    7

    ;=+;

    7

    ;=;

    7

    ;=;N

    7

    ;=;L

    ; ;=;

    G

    Values are Pr(ro ! "ol# 0.$

    Monotonic

    Mo

    no

    to

    n

    ic

    { }jkijik ,max

  • 7/25/2019 Dueling Bandits

    39/68

    'tochastic $rian!le Ineualit%

    A B < D 5

    A ; ;=;G C

    ;=+; ;=++

    B 7;=;G

    ;

    F

    ;=; ;=++

    < 7;=;H 7;=;G ; C ;=;N ;=;

    D 7;=;J

    7;=;L

    7;=;H

    ; ;=;L ;=;N

    7

    ;=+;

    7

    ;=;

    7

    ;=;N

    7

    ;=;L

    ; ;=;

    G

    Values are Pr(ro ! "ol# 0.$

    Red Blue HGreen

    jkijik +

  • 7/25/2019 Dueling Bandits

    40/68

    'tochastic $rian!le Ineualit%

    A B < D 5

    A ; ;=;G ;=;H ;=;J 3

    33

    B 7;=;G

    ; ;=;G ;=;L I

    33

    < 7;=;H 7;=;G ; ;=;H E J

    D 7;=;J

    7;=;L

    7;=;H

    ; F

    E

    7

    ;=+;

    7

    ;=;

    7

    ;=;N

    7

    ;=;L

    ;

    Values are Pr(ro ! "ol# 0.$

    Red Blue HGreen

    jkijik +

  • 7/25/2019 Dueling Bandits

    41/68

    Aside:,onidence Intervals

    True pre5erence Current Estimate

    #oe*dingKs.neualit%:

    Desired Error Tolerance

  • 7/25/2019 Dueling Bandits

    42/68

    Example

    t-+;; t-H;; t-+J;;

    2

  • 7/25/2019 Dueling Bandits

    43/68

    Explore-then-Exploit

    Decompose into 4 Phases

    '$lore /hase

    6denti50 the best bandit #=h=p= Minimi1e incurred regret

    '$loit /hase

    Pla0 best bandit "s itsel5 6ncurs no regret

  • 7/25/2019 Dueling Bandits

    44/68

    ,onnection to $ournaments

    Each pair 8duels9 until statisticalsigni\cance

    Aa ois0 Tournament *uarantees \nding best bandit #=h=p=

  • 7/25/2019 Dueling Bandits

    45/68

    Analog%: ?0potheticalSoccer Tournament A team #ins #hen it has a G7goal lead

    Audience pre5ers good teams pla0 Lregret

    )wo Lnearl% euall% 6ad teams will $la% for along time

    $ournament is Bad

    Each pair 8duels9 until statisticalsigni\cance

    /ro6lem$ t#oEquall0 bad bandits

  • 7/25/2019 Dueling Bandits

    46/68

    Tournament E=g= tennis Q,eige et al= +HU

    Champion E=g= boxing Que Broder Kleinberg Roachims 4;;U

    S#iss E=g= group rounds in 'orld Cup Que Roachims 4;++U

    t#er *+plore ,trateies

  • 7/25/2019 Dueling Bandits

    47/68

    Champion duels each challenger (roundrobin! &ntil statistical signi\cance

    Que Broder Kleinberg Roachims C)T 4;;

    ,hampion(Interleaved 3ilter)

  • 7/25/2019 Dueling Bandits

    48/68

    /egret per champion bounded uicl0 replaced i5 bad

    Comparisons until elimination$ /egret per comparison$

    /egret o5 challenge:champion pair$ e"erage Transiti"it0 Triangle 6nequalit0

    Que Broder Kleinberg Roachims C)T 4;;

    ,hampion is 4ood

    Margin bet#een best bandit and rest

    /egret perChampion$

    /emaining Bandits

    O R

    logT

    O min1

    i(2,1

    1i2

    logT

    1i+1(

    O1

    1i

    logT

  • 7/25/2019 Dueling Bandits

    49/68

    /egret per champion bounded uicl0 replaced i5 bad

    Sequence o5 champions as a random #al og rounds to arri"e at best

    Que Broder Kleinberg Roachims C)T 4;;

    One of these will 6ecome ne't cham$ion

    ,hampion is 4ood

    Better

    Margin bet#een best bandit and rest

    /egret perChampion$

    /emaining Bandits

    O R

    logT

  • 7/25/2019 Dueling Bandits

    50/68

    One of these will 6ecome ne't cham$ion

    Que Broder Kleinberg Roachims C)T 4;;

    ,hampion is 4ood

    /egret per champion bounded uicl0 replaced i5 bad

    Sequence o5 champions as a random #al og rounds to arri"e at best

    Better

    Margin bet#een best bandit and rest

    /egret perChampion$

    /emaining Bandits

    O R

    logT

  • 7/25/2019 Dueling Bandits

    51/68

    One of these will 6ecome ne't cham$ion

    Que Broder Kleinberg Roachims C)T 4;;

    ,hampion is 4ood

    /egret per champion bounded uicl0 replaced i5 bad

    Sequence o5 champions as a random #al og rounds to arri"e at best

    Better

    Margin bet#een best bandit and rest

    /egret perChampion$

    /emaining Bandits

    O R

    logT

  • 7/25/2019 Dueling Bandits

    52/68

    Que Broder Kleinberg Roachims C)T 4;;

    ,hampion is 4ood

    /egret per champion bounded uicl0 replaced i5 bad

    Sequence o5 champions as a random #al og rounds to arri"e at best

    Better

    Bandits

    Margin bet#een best bandit and others

    Time ?ori1onRegret:O$timal RegretGuarantee8

    Margin bet#een best bandit and rest

    /egret perChampion$

    /emaining Bandits

    ) RT[ ]= O *

    logT

    O R

    logT

  • 7/25/2019 Dueling Bandits

    53/68

    Each iteration$ Duel random $air Eliminate bandit #: #orst record

    Que Roachims 6CM 4;++U

    '*iss(Beat the Mean)

    est

    DuelingMechanism

    /emo"e duels #ith eliminatedbandit 5rom all remaining records(onl0 a 5raction o5 a records!

    /ecord - #in rate "s 8mean9 bandit

    /elated to action elimination algorithmsQE"en7Dar et al= 4;;JU

    /egret untilnext remo"al$ O

    1logT

  • 7/25/2019 Dueling Bandits

    54/68

    Champion has high "ariance Depends on initial champion

    S#iss o@ers lo#7"ariance alternati"e Successi"el0 eliminate #orst bandit

    Que Roachims 6CM 4;++U

    '*iss is Better

    Regret: )ptimal /egret#: ?igh Probabilit0ORT= O *

    logT

  • 7/25/2019 Dueling Bandits

    55/68

    /egret

    B&

    )

    )

    &

    R

    Bandits

    Que Roachims 6CM 4;++U

    urnamentper5orms poorl0 (3' worse!some experiments Y+;;;;x #orse!isshas lo#er "ariance than

  • 7/25/2019 Dueling Bandits

    56/68

    4eneral Al!orithmic 'tructure(most existin! &B al!orithms ollo* this structure)

    5irst 6andit is a $ivot(anchor 6nterlea"ed ,ilter$ Champion

    Beat the Mean$ &ni5orm(all remaining bandits!

    ,econd 6andit e'$lores relativeto $ivot 6nterlea"ed ,ilter$ /ound /obin

    Beat the Mean$ /andomi1ed /ound /obin

    ?$date /ivot /eriodicall%This structure maes it easier to anal01e regret

    E=g= spend more time exploring good pi"ots

  • 7/25/2019 Dueling Bandits

    57/68

    More "ecent "esults

    /elaxing Strong Transiti"it0$

    Be0ond Condorcet 'inner Borda 'inner

    "on eumann 'inner Copeland 'inner

    An0time algorithms )riginal #or required no#ing T a priori

    More sophisticated dueling mechanisms

    6ncluding

    o@7polic0e"aluation

    G

    Ad"ersarial Setting

    Contextual Setting

    { }jkijik ,max

  • 7/25/2019 Dueling Bandits

    58/68

    On!oin! or: &ependent Arms(*ith .anan 'ui1 5incent 6huan! and 0oel Burdic)

    Suppose K is "er0 large (possibl0in\nite! But arms ha"e dependenc0 structure

    E=g= P(aYb! ] P(aYbW! i5 b similar to bW Measure similarit0 using ernel

    .nitial results: As0mptoticall0 optimal but impractical

    algorithm$Degrees o5 5reedom o5 ernel:mani5old=

  • 7/25/2019 Dueling Bandits

    59/68

    Personalized ,linical $reatment(*ith .anan 'ui1 5incent 6huan! and 0oel Burdic)

    Hmm

    +;mm

    Medtronichumanarra0

    6mage source$#illiamcapicottomd=com

    SC6 Patient

    ach $atient is uniue

    3J $ossi6le con@gurations8

  • 7/25/2019 Dueling Bandits

    60/68

    ,entralized vs &istributed

    Most DB algorithms are centrali1ed Single algorithm controls choice o5 both

    bandits

    'hat about distributed algorithmsT#o algorithms each controlling one

    bandit ,$arring!QAilon Karnin and Roachims 6CM 4;+HU

    Each algorithm pla0s a standard MABalgorithm

    /ecentl0 Anal01edQDudi Schapire and Sli"ins

  • 7/25/2019 Dueling Bandits

    61/68

    &uelin! Bandits 7 6ero-'um 4ame

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B7;=;G ; ;=;G ;=;L ;=; ;=++

    < 7;=;H

    7;=;G

    ; ;=;H ;=;N ;=;

    D 7

    ;=;J

    7

    ;=;L

    7

    ;=;H

    ; ;=;L ;=;

    N 7

    ;=+;7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;Values are Pr(ro ! "ol# 0.$

    Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are

    /la%er 3

    /la%er4

  • 7/25/2019 Dueling Bandits

    62/68

    &uelin! Bandits 7 6ero-'um 4ame

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B 7;=;G

    ; ;=;G ;=;L ;=; ;=++

    < 7;=;H

    7;=;G

    ; ;=;H ;=;N ;=;

    D 7

    ;=;J

    7

    ;=;L

    7

    ;=;H

    ; ;=;L ;=;

    N 7

    ;=+;7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;Values are Pr(ro ! "ol# 0.$

    Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are

    /la%er 3

    /la%er4

  • 7/25/2019 Dueling Bandits

    63/68

    &uelin! Bandits 7 6ero-'um 4ame

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B 7;=;G

    ; ;=;G ;=;L ;=; ;=++

    < 7;=;H

    7;=;G

    ; ;=;H ;=;N ;=;

    D 7

    ;=;J

    7

    ;=;L

    7

    ;=;H

    ; ;=;L ;=;

    N 7

    ;=+;7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;Values are Pr(ro ! "ol# 0.$

    Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are

    /la%er 3

    /la%er4

  • 7/25/2019 Dueling Bandits

    64/68

    &uelin! Bandits 7 6ero-'um 4ame

    A B < D 5

    A ; ;=;G ;=;H ;=;J ;=+; ;=++

    B 7;=;G

    ; ;=;G ;=;L ;=; ;=++

    < 7;=;H

    7;=;G

    ; ;=;H ;=;N ;=;

    D 7

    ;=;J

    7

    ;=;L

    7

    ;=;H

    ; ;=;L ;=;

    N 7

    ;=+;7;=;

    7;=;N

    7;=;L

    ; ;=;G

    5 7;=++

    7;=++

    7;=;

    7;=;N

    7;=;G

    ;Values are Pr(ro ! "ol# 0.$

    Basic Setting$ Single Dominant Strateg0/egret - )pportunit0 Cost to Social 'el5are

    /la%er 3

    /la%er4

  • 7/25/2019 Dueling Bandits

    65/68

    On!oin! or:8earnin! in 4ames(*ith 'id Barman and 2atrina 8i!ett)

    E

  • 7/25/2019 Dueling Bandits

    66/68

    On!oin! or:8earnin! in 4ames(*ith 'id Barman and 2atrina 8i!ett)

    Centrali1ed algorithms -8coordination9 Settings that bene\t 5rom minimal

    coordination

    o# /an Matrix$T#o algorithms coordinate on

    exploration Small initial phase

  • 7/25/2019 Dueling Bandits

    67/68

    'ummar%: &uelin! Bandits Problem

    Elicits pre5erence 5eedbac Moti"ated b0 human7centric

    personali1ation

    Characteri1es explore:exploit tradeo@

    Connections to nois0 tournaments

    Connections to learning in games

    "eerences

  • 7/25/2019 Dueling Bandits

    68/68

    )he -armed Dueling Bandits /ro6lem b0 isong ue Rose5 Broder /obert Kleinbergand Thorsten Roachims C)T 4;;

    .nteractivel% O$timi0ing .nformation Retrieval ,%stems as a Dueling Bandits/ro6lem b0 isong ue and Thorsten Roachims 6CM 4;;

    Beat the Mean Bandit b0 isong ue and Thorsten Roachims 6CM 4;++Reusing #istorical .nteraction Data for 5aster Online >earning to Ran9 for .Rb0 Kat>a ?o5mann Anne Schuth Shimon 'hiteson and Maarten de /i>e 'SDM 4;+G

    Generic '$loration and -armed 7oting Bandits b0 Tangu0 &r"o0 ,abrice Clerot/aphael ,eraud and Sami aamane 6CM 4;+G

    Reducing Dueling Bandits to