Beat the Mean Bandit

Post on 19-Feb-2016

44 views 0 download

Tags:

description

Beat the Mean Bandit. ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University). Optimizing Information Retrieval Systems. Increasingly reliant on user feedback E.g., clicks on search results Online learning is a popular modeling tool - PowerPoint PPT Presentation

Transcript of Beat the Mean Bandit

Beat the Mean Bandit

ICML 2011

Yisong Yue Carnegie Mellon University

Joint work with Thorsten Joachims (Cornell University)

Optimizing Information Retrieval Systems

• Increasingly reliant on user feedback– E.g., clicks on search results

• Online learning is a popular modeling tool– Especially partial-information (bandit) settings

• Our focus: learning from relative preferences– Motivated by recent work on interleaved retrieval

evaluation (example following)

Team Draft Interleaving(Comparison Oracle for Search)

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

AB

[Radlinski et al. 2008]

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

B wins!

Click

[Radlinski et al. 2008]

Click

Team Draft Interleaving(Comparison Oracle for Search)

…A B C Total wins Total losses

A wins vs… 0 1 0 1 0B wins vs… 0 0 0 0 1C wins vs… 0 0 0 0 0

Interleave A vs B

Interleave A vs C

A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 0 0 0 1C wins vs… 1 0 0 1 0

Interleave B vs C

A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 1 0 1 1C wins vs… 1 0 0 1 1

Interleave A vs B

A B C Total wins Total lossesA wins vs… 0 1 0 1 2B wins vs… 0 2 0 2 1C wins vs… 1 0 0 1 1

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

• Algorithm: Beat-the-Mean

• Empirical Validation

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

[Yue et al. 2009]

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

• Cost function (regret):

• (bt, bt’) are the two bandits chosen• b* is the overall best one• (% users who prefer best bandit over chosen ones)

T

tttT bbPbbPR

1

1)'*()*(

[Yue et al. 2009]

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare E & F:•P(A > E) = 0.61•P(A > F) = 0.61•Incurred Regret = 0.22

T

tttT bbPbbPR

1

1)'*()*(

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare B & C:•P(A > B) = 0.55•P(A > C) = 0.55•Incurred Regret = 0.10

T

tttT bbPbbPR

1

1)'*()*(

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Compare A & A:•P(A > A) = 0.50•P(A > A) = 0.50•Incurred Regret = 0.00

T

tttT bbPbbPR

1

1)'*()*(

Interleaving shows ranking produced by A.

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •A > D should be at least 0.06

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •C > E should be at least 0.04

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Example Pairwise PreferencesA B C D E F

A 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

Violation in internal consistency!For strong stochastic transitivity: •D > F should be at least 0.04

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Modeling Assumptions

• P(bi > bj) = ½ + εij

• Let b1 be the best overall bandit

• Relaxed Stochastic Transitivity– For three bandits b1 > bj > bk :– γ ≥ 1 (γ = 1 for strong transitivity **)– Relaxed internal consistency property

• Stochastic Triangle Inequality– For three bandits b1 > bj > bk :– Diminishing returns property

jkjk 11

(** γ = 1 required in previous work, and required to apply for all bandit triplets)

Example Pairwise Preferences

A B C D E FA 0 0.05 0.05 0.04 0.11 0.11B -0.05 0 0.05 0.06 0.08 0.10C -0.05 -0.05 0 0.04 0.01 0.06D -0.04 -0.04 -0.04 0 0.04 0.00E -0.11 -0.08 -0.01 -0.04 0 0.01F -0.11 -0.10 -0.06 -0.00 -0.01 0

γ = 1.5

jkjk , max 11

•Values are Pr(row > col) – 0.5•Derived from interleaving experiments on http://arXiv.org

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Comparison Results

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Mean Score &Confidence Interval

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

A’s performance vs rest

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

00

00

00

00

00

--0

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

A’s mean performance

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

00

00

--0

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

00

--0

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

00

00

00

00

--0

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

00

00

00

00

00

00

--0

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

01

00

00

00

00

00

0.001

0.00 1.00

F wins Total

00

00

00

00

00

00

--0

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

00

11

00

00

00

00

1.001

0.00 1.00

B wins Total

00

00

0 0

00

01

00

0.001

0.00 1.00

C wins Total

00

00

00

00

00

11

1.001

0.00 1.00

D winsTotal

00

00

01

00

00

00

0.001

0.00 1.00

E wins Total

01

00

00

00

00

00

0.001

0.00 1.00

F wins Total

00

00

01

00

00

00

0.001

0.00 1.00

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.59150

0.49 0.69

B wins Total

1430

1530

1319

1520

1726

2025

0.63150

0.53 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.55150

0.45 0.65

D winsTotal

920

1528

1021

1123

1528

1530

0.50150

0.40 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.43150

0.33 0.53

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.59150

0.49 0.69

B wins Total

1430

1530

1319

1520

1726

2025

0.63150

0.53 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.55150

0.45 0.65

D winsTotal

920

1528

1021

1123

1528

1530

0.50150

0.40 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.43150

0.33 0.53

B dominates E!(B’s lower bound greater than E’s upper bound)

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1624

1122

1628

2030

1321

0.58120

0.49 0.67

B wins Total

1430

1530

1319

1520

1526

2025

0.62124

0.51 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.50126

0.39 0.61

D winsTotal

920

1528

1021

1123

1528

1530

0.49122

0.38 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.42120

0.31 0.53

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1325

1725

1122

1628

2030

1321

0.58121

0.49 0.67

B wins Total

1430

1530

1319

1520

1526

2025

0.62124

0.51 0.73

C wins Total

1228

1022

1323

1528

2024

1325

0.50126

0.39 0.61

D winsTotal

920

1528

1021

1123

1528

1530

0.49122

0.38 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1129

425

1018

1225

1430

1323

0.42120

0.31 0.53

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.56145

0.46 0.66

B wins Total

1533

1734

1524

2027

1526

2327

0.62145

0.52 0.72

C wins Total

1331

1128

1429

1530

2024

1627

0.48145

0.38 0.68

D winsTotal

1126

1731

1226

1429

1528

1733

0.49145

0.39 0.59

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.56145

0.46 0.66

B wins Total

1533

1734

1524

2027

1526

2327

0.62145

0.52 0.72

C wins Total

1331

1128

1429

1530

2024

1627

0.48145

0.38 0.68

D winsTotal

1126

1731

1226

1429

1528

1733

0.49145

0.39 0.59

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

B dominates F!(B’s lower bound greater than F’s upper bound)

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

1530

1929

1428

1833

2330

1525

0.55120

0.43 0.67

B wins Total

1533

1734

1524

2027

1526

2327

0.56118

0.44 0.68

C wins Total

1331

1128

1429

1530

2024

1627

0.45118

0.33 0.57

D winsTotal

1126

1731

1226

1429

1528

1733

0.48112

0.36 0.60

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.55300

0.48 0.62

B wins Total

3169

3878

4778

5175

1526

2327

0.56300

0.49 0.63

C wins Total

3377

3177

3570

3976

2024

1627

0.46300

0.49 0.53

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

B dominates D!(B’s lower bound greater than D’s upper bound)

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.55225

0.46 0.64

B wins Total

3169

3878

4778

5175

1526

2327

0.52225

0.43 0.61

C wins Total

3377

3177

3570

3976

2024

1627

0.33225

0.24 0.42

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

A dominates C!(A’s lower bound greater than C’s upper bound)

Beat-the-MeanA B C D E F Mean Lower

BoundUpperBound

A winsTotal

4180

4475

3870

4275

2330

1525

0.5180

0.38 0.64

B wins Total

3169

3878

4778

5175

1526

2327

0.52147

0.45 0.49

C wins Total

3377

3177

3570

3976

2024

1627

0.33225

0.24 0.42

D winsTotal

3076

2777

3574

3573

1528

1733

0.42300

0.35 0.49

E wins Total

824

1125

622

1429

1431

1019

0.42150

0.32 0.52

F wins Total

1232

730

1326

1328

1430

1529

0.41145

0.31 0.51

Eventually… A is last bandit remaining. A is declared best bandit!

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

• Thus, we can bound the total regret with high probability:– γ is typically close to 1

TKORT log

7

We also have a similar PAC guarantee.

Regret Guarantee• Playing against mean bandit calibrates preference scores

– Estimates of (active) bandits directly comparable – One estimate per active bandit = linear number of estimates

• We can bound comparisons needed to remove worst bandit– Varies smoothly with transitivity parameter γ– High probability bound

• We can bound the regret incurred by each comparison– Varies smoothly with transitivity parameter γ

• Thus, we can bound the total regret with high probability:– γ is typically close to 1

TKORT log

7

We also have a similar PAC guarantee.

Not possible with previous approaches!

•Simulation experiment where γ = 1.3•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]

•Beat-the-Mean maintains linear regret guarantee•Interleaved Filter suffers quadratic regret in the worst case

•Simulation experiment where γ = 1 (original DB setting)•Light = Beat-the-Mean•Dark = Interleaved Filter [Yue et al. 2009]

•Beat-the-Mean has high probability bound•Beat-the-Mean exhibits significantly lower variance

Conclusions

• Online learning approach using pairwise feedback– Well-suited for optimizing information retrieval systems

from user feedback– Models violations in preference transitivity

• Algorithm: Beat-the-Mean– Regret linear in #bandits and logarithmic in #iterations– Degrades smoothly with transitivity violation– Stronger guarantees than previous work– Empirically supported