Beat the Mean Bandit

ICML 2011

Yisong Yue Carnegie Mellon University

Joint work with Thorsten Joachims (Cornell University)

Optimizing Information Retrieval Systems

• Increasingly reliant on user feedback– E.g., clicks on search results

• Online learning is a popular modeling tool– Especially partial-information (bandit) settings

• Our focus: learning from relative preferences– Motivated by recent work on interleaved retrieval

evaluation (example following)

Team Draft Interleaving(Comparison Oracle for Search)

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

[Radlinski et al. 2008]

Ranking A1.Napa Valley – The authority for lodging...

www.napavalley.com2.Napa Valley Wineries - Plan your wine...

www.napavalley.com/wineries3.Napa Valley College

www.napavalley.edu/homex.asp4. Been There | Tips | Napa Valley

www.ivebeenthere.co.uk/tips/166815. Napa Valley Wineries and Wine

www.napavintners.com6. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley

Ranking B1. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley2. Napa Valley – The authority for lodging...

www.napavalley.com3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com5. NapaValley.org

www.napavalley.org6. The Napa Valley Marathon

www.napavalleymarathon.org

Presented Ranking1.Napa Valley – The authority for lodging...

www.napavalley.com2. Napa Country, California – Wikipedia

en.wikipedia.org/wiki/Napa_Valley3. Napa: The Story of an American Eden...

books.google.co.uk/books?isbn=...4.Napa Valley Wineries – Plan your wine...

www.napavalley.com/wineries5. Napa Valley Hotels – Bed and Breakfast...

www.napalinks.com 6.Napa Balley College

www.napavalley.edu/homex.asp7 NapaValley.org

www.napavalley.org

B wins!

[Radlinski et al. 2008]

Team Draft Interleaving(Comparison Oracle for Search)

…A B C Total wins Total losses

A wins vs… 0 1 0 1 0B wins vs… 0 0 0 0 1C wins vs… 0 0 0 0 0

Interleave A vs B

Interleave A vs C

A B C Total wins Total lossesA wins vs… 0 1 0 1 1B wins vs… 0 0 0 0 1C wins vs… 1 0 0 1 0

Interleave B vs C

Interleave A vs B

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

Outline

• Learning Formulation– Dueling Bandits Problem [Yue et al. 2009]

• Modeling transitivity violation– E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ??– Not done in previous work

• Algorithm: Beat-the-Mean

• Empirical Validation

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

[Yue et al. 2009]

Dueling Bandits Problem

• Given K bandits b1, …, bK

• Each iteration: compare (duel) two bandits– E.g., interleaving two retrieval functions

• Cost function (regret):

• (bt, bt’) are the two bandits chosen• b* is the overall best one• (% users who prefer best bandit over chosen ones)

tttT bbPbbPR

1)'*()*(

[Yue et al. 2009]

Example Pairwise PreferencesA B C D E F