UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic...

19
UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits Fang Liu 1 , Sinong Wang 1 , Swapna Buccapatnam 2 and Ness Shroff 1 1 The Ohio State University 2 AT&T Labs Research

Transcript of UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic...

Page 1: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

Fang Liu1, Sinong Wang1, Swapna Buccapatnam2 and Ness Shroff1

1The Ohio State University2AT&T Labs Research

Page 2: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Background and MotivationsqMulti-Armed Bandits FrameworkqStochastic Bandits At a GlanceqComplexity vs Optimality DilemmaqOur results Overview

• UCBoost AlgorithmqGeneric UCB AlgorithmqUCBoost(D) AlgorithmqUCBoost( ) Algorithm

• Numerical ResultsqExperiment SettingqRegret ResultsqComputation Results

• Conclusion

Outline

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

Page 3: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Repeated game between an agent and an environment

Multi-Armed Bandits Framework

reward

agentenvironment

action

Reinforcement learning

x

Online learning

MAB

xStochastic

Page 4: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Model§ At each (discrete) time t, the agent plays action At from a set of K actions§ The agent receives reward , drawn from unknown distribution At

Stochastic Bandits At a Glance

YAt,t<latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AAAB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmmgApiTMdzUwhyooFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJiis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruquffn1fpNkUYJHaIjdII8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw==</latexit><latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AAAB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmmgApiTMdzUwhyooFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJiis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruquffn1fpNkUYJHaIjdII8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw==</latexit><latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AAAB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmmgApiTMdzUwhyooFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJiis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruquffn1fpNkUYJHaIjdII8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw==</latexit>

• Regret lower bounds§ Problem-dependent: where is expected reward

• Performance measure§ Regret(loss)

§ Minimize regret = maximize total reward

• Popular algorithms§ Upper Confidence Bounds (UCB), Thompson Sampling, epsilon-greedy

R(T ) = E"max

i2[K]

TX

t=1

Yi,t �TX

t=1

YAt,t

#

<latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">AAACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDbb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYrr2pXhq1Cs9bG9AAz1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQQrV3HTsHPiQJOBStKXqZZSugNaTLXSElipv28677AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURRffoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit><latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">AAACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDbb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYrr2pXhq1Cs9bG9AAz1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQQrV3HTsHPiQJOBStKXqZZSugNaTLXSElipv28677AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURRffoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit><latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">AAACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDbb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYrr2pXhq1Cs9bG9AAz1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQQrV3HTsHPiQJOBStKXqZZSugNaTLXSElipv28677AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURRffoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit>

X

i

µ⇤ � µi

KL(µa, µ⇤)

log T

!

<latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">AAACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbADD++7xxmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZsshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL//O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO//F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/II/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit><latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">AAACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbADD++7xxmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZsshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL//O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO//F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/II/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit><latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">AAACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbADD++7xxmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZsshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL//O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO//F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/II/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit>

µi<latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit><latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit><latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit>

Page 5: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Computational Complexity matters§ Real-time applications: robotic control, portfolio optimization§ Large-scale applications: recommendation systems, meta-algorithm for learning

• Optimal algorithms involve heavy computation L§ kl-UCB, DMED: optimization problems§ Thompson Sampling: posterior updating

• Simple algorithms are far from being optimal L§ UCB1: gap of Pinsker’s inequality is unbounded§ Epsilon-greedy: same gap as UCB1, requires one more prior knowledge

Complexity vs Optimality Dilemma

Can we design an algorithm that can trade-off complexity and optimality?

Page 6: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• UCBoost algorithms§ Ensemble a set of “weak” but closed-form

UCB-type algorithms§ Propose two solutions: a finite set and an

infinite set for any epsilon§ First to offer trade-off between complexity

and optimality with guarantees

Our Results Overview

Table 1: Regret guarantee and computational complexity per arm per round of various algorithms

kl-UCB UCBoost(✏) UCBoost(D) UCB1

Regret/log(T ) O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)

◆O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)�✏

◆O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)�1/e

◆O

✓Pa

µ⇤�µa

2(µ⇤�µa)2

Complexity unbounded O(log(1/✏)) O(1) O(1)

low-complexity optimal UCB algorithm, which has remainedopen till now. In this work, we make the following contri-butions to this open problem. (Table 1 summarizes the mainresults.)

• We propose a generic UCB algorithm. By plugging asemi-distance function, one can obtain a specific UCBalgorithm with regret guarantee (Theorem 1). As a by-product, we propose two new UCB algorithms that arealternatives to UCB1 (Corollary 1 and 2).

• We propose a boosting algorithm, UCBoost, which canobtain a strong (i.e., with regret guarantee close to thelower bound) UCB algorithm from a set of weak (i.e.,with regret guarantee far away from the lower bound)generic UCB algorithms (Theorem 2). By boosting afinite number of weak generic UCB algorithms, we finda UCBoost algorithm that enjoys the same complexityas UCB1 as well as a regret guarantee that is 1/e-closeto the kl-UCB algorithm (Corollary 3)1. That is to say,such a UCBoost algorithm is low-complexity and near-optimal under the Bernoulli case.

• We propose an approximation-based UCBoost algo-rithm, UCBoost(✏), that enjoys ✏-optimal regret guaran-tee under the Bernoulli case and O(log(1/✏)) computa-tional complexity for each arm at each round for any✏ > 0 (Theorem 3). This algorithm provides a non-trivial trade-off between complexity and optimality.

Related Work. There are other asymptotically optimalalgorithms, such as Thompson Sampling [Agrawal andGoyal, 2012], Bayes-UCB [Kaufmann et al., 2012] andDMED [Honda and Takemura, 2010]. However, the com-putations involved in these algorithms become non-trivialin non-Bernoulli cases. First, Bayesian methods, includingThompson Sampling, Information Directed Sampling [Russoand Van Roy, 2014; Liu et al., 2017] and Bayes-UCB, requireupdating and sampling from the posterior distribution, whichis computationally difficult for models other than exponen-tial families [Korda et al., 2013]. Second, the computationalcomplexity of DMED policy is larger than UCB policies be-cause the computation involved in DMED is formulated asa univariate convex optimization problem. In contrast, ouralgorithms are computationally efficient in general boundedsupport models and don’t need the knowledge of prior infor-mation on the distributions of the arms.

Our work is also related to DMED-M proposed by Hondaand Takemura [2012]. DMED-M uses the first d empiricalmoments to construct a lower bound of the objective func-tion involved in DMED. As d goes to infinity, the lowerbound converges to the objective function and DMED-M

1Note that e is the natural number

converges to DMED while the computational complexity in-creases. However, DMED-M has no explicit form whend > 4 and there is no guarantee on the regret gap to the op-timality for any finite d. Unlike DEMD-M, our UCBoost al-gorithms can provide guarantees on the complexity and regretperformance for arbitrary ✏, which offers a controlled tradeoffbetween complexity and optimality.

Agarwal et al. [2017] proposed a boosting technique to ob-tain a strong bandit algorithm from the existing algorithms,that is adaptive to the environment. However, our boostingtechnique is specifically designed for stochastic setting andhence allows us to obtain near-optimal algorithms that havebetter regret gurantees than those obtained using the boostingtechnique by Agarwal et al. [2017].

2 PreliminariesWe consider a stochastic bandit problem with finitely manyarms indexed by a 2 K , {1, . . . ,K}, where K is a positiveinteger. Each arm a is associated with an unknown probabil-ity distribution va over the bounded support2 ⇥ = [0, 1]. Ateach time step t = 1, 2, . . . , the agent chooses an action At

according to past observations (possibly using some indepen-dent randomization) and receives a reward XAt,NAt (t)

inde-pendently drawn from the distribution vAt , where Na(t) ,Pt

s=1

{As = a} denotes the number of times that arm awas chosen up to time t. Note that the agent can only observethe reward XAt,NAt (t)

at time t. Let ¯Xa(t) be the empiricalmean of arm a based on the observations up to time t.

For each arm a, we denote by µa the expectation of itsassociated probability distribution va. Let a⇤ be any optimalarm, that is a⇤ 2 argmaxa2K µa. We write µ⇤ as a shorthandnotation for the largest expectation µa⇤ and denote the gap ofthe expected reward of arm a to µ⇤ as �a = µ⇤ � µa. Theperformance of a policy ⇡ is evaluated through the standardnotion of expected regret, defined at time horizon T as

R⇡(T ) =

X

a2K�aE[Na(T )]. (1)

The goal of the agent is to minimize the expected regret.Now, we introduce the concept of semi-distance functions,

which measure the distance between two expectations of ran-dom variables over ⇥, and show several related properties.Definition 1. (Candidate semi-distance) A function d : ⇥ ⇥⇥ ! R is said to be a candidate semi-distance function if

1. d(p, p) 0, 8p 2 ⇥;

2. d(p, q) d(p, q0), 8p q q0 2 ⇥;

3. d(p, q) � d(p0, q), 8p p0 q 2 ⇥.

2If the supports are bounded in another interval, rescale to [0,1].

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic BanditsFang Liu1, Sinong Wang1, Swapna Buccapatnam2 and Ness Shroff1

1The Ohio State University2AT&T Labs Research

What is Stochastic Bandit?

reward

action

• Repeated game between agent and environment with random rewards

Complexity vs Optimality

• UCBoost connect the dots smoothly

Gap to optimality

Com

plex

ity

kl-UCB

UCB1

UCBoost( )

UCBoost(D)

Boosting!

UCBoost

✏<latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit>

• UCB kernel is a distance function dP (d) : max

q2⇥q

s.t. d(p, q) �<latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit>

• UCBoost ensemble a set D of distance functions (i.e. UCBs) by taking the minimum.

• For each d in D, P(d) closed-form

Why Taking the Minimum?Philosophy of voting:• If the ordering is known, follow the

leader. No majority vote.• UCBoost takes the minimum, thus

the tightest UCB.

UCB1

decision

0.9

UCB2

0.8

UCB3

0.6

UCBoost

0.6

0.8 0.75 0.7 0.7

0.2 0.2 0.3 0.2

1 1 2 2

Geometric view of UCBoost:• The kernel of UCBoost is• Take the minimum = solve

max

d2Dd

<latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit>

P

✓max

d2Dd

<latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit>

1

d(p,

q)

pValue of q

d1<latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit>

d2<latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit>

dkl<latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit>

max(d1, d2)<latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit>

Numerical ResultsTable 1: Regret guarantee and computational complexity per arm per round of various algorithms

kl-UCB UCBoost(✏) UCBoost(D) UCB1

Regret/log(T ) O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)

◆O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)�✏

◆O

✓Pa

µ⇤�µa

dkl(µa,µ⇤)�1/e

◆O

✓Pa

µ⇤�µa

2(µ⇤�µa)2

Complexity unbounded O(log(1/✏)) O(1) O(1)

low-complexity optimal UCB algorithm, which has remainedopen till now. In this work, we make the following contri-butions to this open problem. (Table 1 summarizes the mainresults.)

• We propose a generic UCB algorithm. By plugging asemi-distance function, one can obtain a specific UCBalgorithm with regret guarantee (Theorem 1). As a by-product, we propose two new UCB algorithms that arealternatives to UCB1 (Corollary 1 and 2).

• We propose a boosting algorithm, UCBoost, which canobtain a strong (i.e., with regret guarantee close to thelower bound) UCB algorithm from a set of weak (i.e.,with regret guarantee far away from the lower bound)generic UCB algorithms (Theorem 2). By boosting afinite number of weak generic UCB algorithms, we finda UCBoost algorithm that enjoys the same complexityas UCB1 as well as a regret guarantee that is 1/e-closeto the kl-UCB algorithm (Corollary 3)1. That is to say,such a UCBoost algorithm is low-complexity and near-optimal under the Bernoulli case.

• We propose an approximation-based UCBoost algo-rithm, UCBoost(✏), that enjoys ✏-optimal regret guaran-tee under the Bernoulli case and O(log(1/✏)) computa-tional complexity for each arm at each round for any✏ > 0 (Theorem 3). This algorithm provides a non-trivial trade-off between complexity and optimality.

Related Work. There are other asymptotically optimalalgorithms, such as Thompson Sampling [Agrawal andGoyal, 2012], Bayes-UCB [Kaufmann et al., 2012] andDMED [Honda and Takemura, 2010]. However, the com-putations involved in these algorithms become non-trivialin non-Bernoulli cases. First, Bayesian methods, includingThompson Sampling, Information Directed Sampling [Russoand Van Roy, 2014; Liu et al., 2017] and Bayes-UCB, requireupdating and sampling from the posterior distribution, whichis computationally difficult for models other than exponen-tial families [Korda et al., 2013]. Second, the computationalcomplexity of DMED policy is larger than UCB policies be-cause the computation involved in DMED is formulated asa univariate convex optimization problem. In contrast, ouralgorithms are computationally efficient in general boundedsupport models and don’t need the knowledge of prior infor-mation on the distributions of the arms.

Our work is also related to DMED-M proposed by Hondaand Takemura [2012]. DMED-M uses the first d empiricalmoments to construct a lower bound of the objective func-tion involved in DMED. As d goes to infinity, the lowerbound converges to the objective function and DMED-M

1Note that e is the natural number

converges to DMED while the computational complexity in-creases. However, DMED-M has no explicit form whend > 4 and there is no guarantee on the regret gap to the op-timality for any finite d. Unlike DEMD-M, our UCBoost al-gorithms can provide guarantees on the complexity and regretperformance for arbitrary ✏, which offers a controlled tradeoffbetween complexity and optimality.

Agarwal et al. [2017] proposed a boosting technique to ob-tain a strong bandit algorithm from the existing algorithms,that is adaptive to the environment. However, our boostingtechnique is specifically designed for stochastic setting andhence allows us to obtain near-optimal algorithms that havebetter regret gurantees than those obtained using the boostingtechnique by Agarwal et al. [2017].

2 PreliminariesWe consider a stochastic bandit problem with finitely manyarms indexed by a 2 K , {1, . . . ,K}, where K is a positiveinteger. Each arm a is associated with an unknown probabil-ity distribution va over the bounded support2 ⇥ = [0, 1]. Ateach time step t = 1, 2, . . . , the agent chooses an action At

according to past observations (possibly using some indepen-dent randomization) and receives a reward XAt,NAt (t)

inde-pendently drawn from the distribution vAt , where Na(t) ,Pt

s=1

{As = a} denotes the number of times that arm awas chosen up to time t. Note that the agent can only observethe reward XAt,NAt (t)

at time t. Let ¯Xa(t) be the empiricalmean of arm a based on the observations up to time t.

For each arm a, we denote by µa the expectation of itsassociated probability distribution va. Let a⇤ be any optimalarm, that is a⇤ 2 argmaxa2K µa. We write µ⇤ as a shorthandnotation for the largest expectation µa⇤ and denote the gap ofthe expected reward of arm a to µ⇤ as �a = µ⇤ � µa. Theperformance of a policy ⇡ is evaluated through the standardnotion of expected regret, defined at time horizon T as

R⇡(T ) =

X

a2K�aE[Na(T )]. (1)

The goal of the agent is to minimize the expected regret.Now, we introduce the concept of semi-distance functions,

which measure the distance between two expectations of ran-dom variables over ⇥, and show several related properties.Definition 1. (Candidate semi-distance) A function d : ⇥ ⇥⇥ ! R is said to be a candidate semi-distance function if

1. d(p, p) 0, 8p 2 ⇥;

2. d(p, q) d(p, q0), 8p q q0 2 ⇥;

3. d(p, q) � d(p0, q), 8p p0 q 2 ⇥.

2If the supports are bounded in another interval, rescale to [0,1].

Time 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

20

40

60

80

100

120

140

160UCB(d

h)

UCB1UCB(d

bq)

UCBoost({dbq

,dh,d

lb})

kl-UCB

Time0 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

10

20

30

40

50

60

70

80

UCBoost({dbq

,dh,d

lb})

UCBoost(0.08)UCBoost(0.05)UCBoost(0.01)kl-UCB

• Theoretical bounds

• Bernoulli case

(a) Bernoulli scenario 1 (b) Bernoulli scenario 2 (c) Beta scenario

Figure 1: Regret of the various algorithms as a function of time in three scenarios.

Table 2: Average computational time for each arm per round of various algorithms.

Scenario kl-UCB UCBoost(✏) UCBoost(✏) UCBoost(✏) UCBoost({dbq, dh, dlb}) UCB1✏ = 0.01(0.001) ✏ = 0.05(0.005) ✏ = 0.08

Bernoulli 1 933µs 7.67µs 6.67µs 5.78µs 1.67µs 0.31µsBernoulli 2 986µs 8.76µs 7.96µs 6.27µs 1.60µs 0.30µs

Beta 907µs 8.33µs 6.89µs 5.89µs 2.01µs 0.33µs

because the Hellinger distance between µa and µ⇤ is muchlarger than the l

2

distance in this scenario. So UCB(dh) en-joys better regret performance than UCB1 in this scenario.

Second, UCBoost({dbq, dh, dlb}) performs as ex-pected and is between UCB1 and kl-UCB. Althoughthe gap between UCB1 and kl-UCB becomes largerwhen compared to Bernoulli scenario 1, the gap betweenUCBoost({dbq, dh, dlb}) and kl-UCB remains. This ver-ifies our result in Corollary 3 that the gap between theconstants in the regret guarantees is bounded by 1/e. Thisresult also demonstrates the power of boosting in thatUCBoost({dbq, dh, dlb}) performs no worse than UCB(dh)and UCB(dbq) in all cases.

Third, UCBoost(✏) algorithm fills the gap betweenUCBoost({dbq, dh, dlb}) and kl-UCB, which is consistentwith the results in Bernoulli scenario 1. The regret ofUCBoost(✏) matches with that of kl-UCB when ✏ = 0.001.Compared to the results in Bernoulli scenario 1, we needmore accurate approximation for UCBoost when the expecta-tions are lower. However, this accuracy is moderate comparedto the requirements in numerical methods for kl-UCB.Beta Scenario. Our results in the previous sections hold forany distributions with bounded support. In this scenario, weconsider K = 9 arms with Beta distributions. More precisely,each arm 1 i 9 is associated with Beta(↵i,�i) distribu-tion such that ↵i = i and �i = 2. Note that the expectationof Beta(↵i,�i) is ↵i/(↵i + �i). The regret results shown in

Figure 1c are consistent with that of Bernoulli scenario 1.Computational time. We obtain the average running time foreach arm per round by measuring the total computational timeof 10, 000 independent runs of each algorithms in each sce-nario. Note that kl-UCB is implemented by the py/maBanditspackage developed by Cappe et al. [2012], which sets accu-racy to 10

�5 for the Newton method. The average computa-tional time results are shown in Table 2. The average runningtime of UCBoost(✏) that matches the regret of kl-UCB is nomore than 1% of the time of kl-UCB.

5 ConclusionIn this work, we introduce the generic UCB algorithm andprovide the regret guarantee for any UCB algorithm gener-ated by a kl-dominated strong semi-distance function. Then,we propose a boosting framework, UCBoost, to boost any setof generic UCB algorithms. We find a specific finite set D,such that UCBoost(D) enjoys O(1) complexity for each armper round as well as regret guarantee that is 1/e-close to thekl-UCB algorithm. Finally, we propose an approximation-based UCBoost algorithm, UCBoost(✏), that enjoys regretguarantee ✏-close to that of kl-UCB as well as O(log(1/✏))complexity for each arm per round. This algorithm bridgesthe regret guarantee to the computational complexity, thus of-fering an efficient trade-off between regret performance andcomplexity for practitioners. By experiments, we show thatUCBoost(✏) can achieve the same regret performance as stan-

• Computation time

Page 7: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Background and MotivationsqMulti-Armed Bandits FrameworkqStochastic Bandits At a GlanceqComplexity vs Optimality DilemmaqOur results Overview

• UCBoost AlgorithmqGeneric UCB AlgorithmqUCBoost(D) AlgorithmqUCBoost( ) Algorithm

• Numerical ResultsqExperiment SettingqRegret ResultsqComputation Results

• Conclusion

Outline

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

Page 8: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Semi-distance function§ Between expectations of random variables over bounded support§ Non-negative, triangle inequality, not necessary to be symmetric § Strong semi-distance function satisfies

Generic UCB Algorithm

⇥<latexit sha1_base64="P/DLCgecp7lGFnWoXzzjIoFt95g=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rNG2hDWWz3bRrN5uwOxFK6X/w4kHFqz/Im//GbZuDtj4YeLw3w8y8MJXCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNEmmGfdZIhPdDqnhUijuo0DJ26nmNA4lb4Wju5nfeuLaiEQ1cJzyIKYDJSLBKFqp2W0MOdJeueJW3TnIKvFyUoEc9V75q9tPWBZzhUxSYzqem2IwoRoFk3xa6maGp5SN6IB3LFU05iaYzK+dkjOr9EmUaFsKyVz9PTGhsTHjOLSdMcWhWfZm4n9eJ8PoOpgIlWbIFVssijJJMCGz10lfaM5Qji2hTAt7K2FDqilDG1DJhuAtv7xK/IvqTdV9uKzUbvM0inACp3AOHlxBDe6hDj4weIRneIU3J3FenHfnY9FacPKZY/gD5/MH3x2O2w==</latexit><latexit sha1_base64="P/DLCgecp7lGFnWoXzzjIoFt95g=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rNG2hDWWz3bRrN5uwOxFK6X/w4kHFqz/Im//GbZuDtj4YeLw3w8y8MJXCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNEmmGfdZIhPdDqnhUijuo0DJ26nmNA4lb4Wju5nfeuLaiEQ1cJzyIKYDJSLBKFqp2W0MOdJeueJW3TnIKvFyUoEc9V75q9tPWBZzhUxSYzqem2IwoRoFk3xa6maGp5SN6IB3LFU05iaYzK+dkjOr9EmUaFsKyVz9PTGhsTHjOLSdMcWhWfZm4n9eJ8PoOpgIlWbIFVssijJJMCGz10lfaM5Qji2hTAt7K2FDqilDG1DJhuAtv7xK/IvqTdV9uKzUbvM0inACp3AOHlxBDe6hDj4weIRneIU3J3FenHfnY9FacPKZY/gD5/MH3x2O2w==</latexit><latexit sha1_base64="P/DLCgecp7lGFnWoXzzjIoFt95g=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rNG2hDWWz3bRrN5uwOxFK6X/w4kHFqz/Im//GbZuDtj4YeLw3w8y8MJXCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNEmmGfdZIhPdDqnhUijuo0DJ26nmNA4lb4Wju5nfeuLaiEQ1cJzyIKYDJSLBKFqp2W0MOdJeueJW3TnIKvFyUoEc9V75q9tPWBZzhUxSYzqem2IwoRoFk3xa6maGp5SN6IB3LFU05iaYzK+dkjOr9EmUaFsKyVz9PTGhsTHjOLSdMcWhWfZm4n9eJ8PoOpgIlWbIFVssijJJMCGz10lfaM5Qji2hTAt7K2FDqilDG1DJhuAtv7xK/IvqTdV9uKzUbvM0inACp3AOHlxBDe6hDj4weIRneIU3J3FenHfnY9FacPKZY/gD5/MH3x2O2w==</latexit>

d : ⇥⇥⇥ ! R<latexit sha1_base64="rcPp2Cat9rDtdpmvAuP+3UHQ870=">AAACDXicbVDLSsNAFJ3UV62vqEs3g6XgqiQi+FgV3bis0thCE8pkMm2HTh7M3Agl9Avc+CtuXKi4de/Ov3HSBtHWAwNnzr2Xe8/xE8EVWNaXUVpaXlldK69XNja3tnfM3b07FaeSMofGIpYdnygmeMQc4CBYJ5GMhL5gbX90ldfb90wqHkctGCfMC8kg4n1OCWipZ9aCC7c1ZECwCzxkCv/8YuyGBIa+n91OembVqltT4EViF6SKCjR75qcbxDQNWQRUEKW6tpWAlxEJnAo2qbipYgmhIzJgXU0jold72dTOBNe0EuB+LPWLAE/V3xMZCZUah77uzC9U87Vc/K/WTaF/5mU8SlJgEZ0t6qcCa695NjjgklEQY00IlVzfiumQSEJBJ1jRIdjzlheJc1w/r1s3J9XGZZFGGR2gQ3SEbHSKGugaNZGDKHpAT+gFvRqPxrPxZrzPWktGMbOP/sD4+AbC4puD</latexit><latexit sha1_base64="rcPp2Cat9rDtdpmvAuP+3UHQ870=">AAACDXicbVDLSsNAFJ3UV62vqEs3g6XgqiQi+FgV3bis0thCE8pkMm2HTh7M3Agl9Avc+CtuXKi4de/Ov3HSBtHWAwNnzr2Xe8/xE8EVWNaXUVpaXlldK69XNja3tnfM3b07FaeSMofGIpYdnygmeMQc4CBYJ5GMhL5gbX90ldfb90wqHkctGCfMC8kg4n1OCWipZ9aCC7c1ZECwCzxkCv/8YuyGBIa+n91OembVqltT4EViF6SKCjR75qcbxDQNWQRUEKW6tpWAlxEJnAo2qbipYgmhIzJgXU0jold72dTOBNe0EuB+LPWLAE/V3xMZCZUah77uzC9U87Vc/K/WTaF/5mU8SlJgEZ0t6qcCa695NjjgklEQY00IlVzfiumQSEJBJ1jRIdjzlheJc1w/r1s3J9XGZZFGGR2gQ3SEbHSKGugaNZGDKHpAT+gFvRqPxrPxZrzPWktGMbOP/sD4+AbC4puD</latexit><latexit sha1_base64="rcPp2Cat9rDtdpmvAuP+3UHQ870=">AAACDXicbVDLSsNAFJ3UV62vqEs3g6XgqiQi+FgV3bis0thCE8pkMm2HTh7M3Agl9Avc+CtuXKi4de/Ov3HSBtHWAwNnzr2Xe8/xE8EVWNaXUVpaXlldK69XNja3tnfM3b07FaeSMofGIpYdnygmeMQc4CBYJ5GMhL5gbX90ldfb90wqHkctGCfMC8kg4n1OCWipZ9aCC7c1ZECwCzxkCv/8YuyGBIa+n91OembVqltT4EViF6SKCjR75qcbxDQNWQRUEKW6tpWAlxEJnAo2qbipYgmhIzJgXU0jold72dTOBNe0EuB+LPWLAE/V3xMZCZUah77uzC9U87Vc/K/WTaF/5mU8SlJgEZ0t6qcCa695NjjgklEQY00IlVzfiumQSEJBJ1jRIdjzlheJc1w/r1s3J9XGZZFGGR2gQ3SEbHSKGugaNZGDKHpAT+gFvRqPxrPxZrzPWktGMbOP/sD4+AbC4puD</latexit>

d(p, q) = 0 i↵ p = q<latexit sha1_base64="MMzM0YA5IF5wGPxRu7zJV8hb5/Q=">AAACA3icbVA9SwNBEN2LXzF+nVqmWQxCBAkXEdQiELSxjOCZQHKEvb29ZMneR3bnxHCksPGv2Fio2Pon7Pw3bpIrNPHBwOO9GWbmubHgCizr28gtLa+sruXXCxubW9s75u7enYoSSZlNIxHJlksUEzxkNnAQrBVLRgJXsKY7uJr4zXsmFY/CWxjFzAlIL+Q+pwS01DWLXjk+Hh7hGrZwB9gDpJj7Ph7juDbsmiWrYk2BF0k1IyWUodE1vzpeRJOAhUAFUapdtWJwUiKBU8HGhU6iWEzogPRYW9OQBEw56fSJMT7Uiof9SOoKAU/V3xMpCZQaBa7uDAj01bw3Ef/z2gn4507KwzgBFtLZIj8RGCI8SQR7XDIKYqQJoZLrWzHtE0ko6NwKOoTq/MuLxD6pXFSsm9NS/TJLI4+K6ACVURWdoTq6Rg1kI4oe0TN6RW/Gk/FivBsfs9ackc3soz8wPn8A4gSV5w==</latexit><latexit sha1_base64="MMzM0YA5IF5wGPxRu7zJV8hb5/Q=">AAACA3icbVA9SwNBEN2LXzF+nVqmWQxCBAkXEdQiELSxjOCZQHKEvb29ZMneR3bnxHCksPGv2Fio2Pon7Pw3bpIrNPHBwOO9GWbmubHgCizr28gtLa+sruXXCxubW9s75u7enYoSSZlNIxHJlksUEzxkNnAQrBVLRgJXsKY7uJr4zXsmFY/CWxjFzAlIL+Q+pwS01DWLXjk+Hh7hGrZwB9gDpJj7Ph7juDbsmiWrYk2BF0k1IyWUodE1vzpeRJOAhUAFUapdtWJwUiKBU8HGhU6iWEzogPRYW9OQBEw56fSJMT7Uiof9SOoKAU/V3xMpCZQaBa7uDAj01bw3Ef/z2gn4507KwzgBFtLZIj8RGCI8SQR7XDIKYqQJoZLrWzHtE0ko6NwKOoTq/MuLxD6pXFSsm9NS/TJLI4+K6ACVURWdoTq6Rg1kI4oe0TN6RW/Gk/FivBsfs9ackc3soz8wPn8A4gSV5w==</latexit><latexit sha1_base64="MMzM0YA5IF5wGPxRu7zJV8hb5/Q=">AAACA3icbVA9SwNBEN2LXzF+nVqmWQxCBAkXEdQiELSxjOCZQHKEvb29ZMneR3bnxHCksPGv2Fio2Pon7Pw3bpIrNPHBwOO9GWbmubHgCizr28gtLa+sruXXCxubW9s75u7enYoSSZlNIxHJlksUEzxkNnAQrBVLRgJXsKY7uJr4zXsmFY/CWxjFzAlIL+Q+pwS01DWLXjk+Hh7hGrZwB9gDpJj7Ph7juDbsmiWrYk2BF0k1IyWUodE1vzpeRJOAhUAFUapdtWJwUiKBU8HGhU6iWEzogPRYW9OQBEw56fSJMT7Uiof9SOoKAU/V3xMpCZQaBa7uDAj01bw3Ef/z2gn4507KwzgBFtLZIj8RGCI8SQR7XDIKYqQJoZLrWzHtE0ko6NwKOoTq/MuLxD6pXFSsm9NS/TJLI4+K6ACVURWdoTq6Rg1kI4oe0TN6RW/Gk/FivBsfs9ackc3soz8wPn8A4gSV5w==</latexit>

dkl(p, q) = p logp

q+ (1� p) log

1� p

1� q<latexit sha1_base64="WAayD624EaihKDdYz7D/NSINAn0=">AAACHXicbVBNS8MwGE7n15xfU49eikPYUEcrinoQhl48TrBusJaSpukWlrZZkgqj9Jd48a948aDiwYv4b8y2grr5QMKT53lf3ryPxygR0jC+tMLc/MLiUnG5tLK6tr5R3ty6E3HCEbZQTGPe9qDAlETYkkRS3GYcw9CjuOX1r0Z+6x5zQeLoVg4ZdkLYjUhAEJRKcssnvpv2aVZlB4PaBbNp3LUDDlHKsnSQ7VfNQ1b7EdUrU9cgc8sVo26Moc8SMycVkKPplj9sP0ZJiCOJKBSiYxpMOinkkiCKs5KdCMwg6sMu7igawRALJx2vl+l7SvH1IObqRFIfq787UhgKMQw9VRlC2RPT3kj8z+skMjhzUhKxROIITQYFCdVlrI+y0n3CMZJ0qAhEnKi/6qgHVRJSJVpSIZjTK88S66h+XjdujiuNyzyNItgBu6AKTHAKGuAaNIEFEHgAT+AFvGqP2rP2pr1PSgta3rMN/kD7/AYH66IB</latexit><latexit sha1_base64="WAayD624EaihKDdYz7D/NSINAn0=">AAACHXicbVBNS8MwGE7n15xfU49eikPYUEcrinoQhl48TrBusJaSpukWlrZZkgqj9Jd48a948aDiwYv4b8y2grr5QMKT53lf3ryPxygR0jC+tMLc/MLiUnG5tLK6tr5R3ty6E3HCEbZQTGPe9qDAlETYkkRS3GYcw9CjuOX1r0Z+6x5zQeLoVg4ZdkLYjUhAEJRKcssnvpv2aVZlB4PaBbNp3LUDDlHKsnSQ7VfNQ1b7EdUrU9cgc8sVo26Moc8SMycVkKPplj9sP0ZJiCOJKBSiYxpMOinkkiCKs5KdCMwg6sMu7igawRALJx2vl+l7SvH1IObqRFIfq787UhgKMQw9VRlC2RPT3kj8z+skMjhzUhKxROIITQYFCdVlrI+y0n3CMZJ0qAhEnKi/6qgHVRJSJVpSIZjTK88S66h+XjdujiuNyzyNItgBu6AKTHAKGuAaNIEFEHgAT+AFvGqP2rP2pr1PSgta3rMN/kD7/AYH66IB</latexit><latexit sha1_base64="WAayD624EaihKDdYz7D/NSINAn0=">AAACHXicbVBNS8MwGE7n15xfU49eikPYUEcrinoQhl48TrBusJaSpukWlrZZkgqj9Jd48a948aDiwYv4b8y2grr5QMKT53lf3ryPxygR0jC+tMLc/MLiUnG5tLK6tr5R3ty6E3HCEbZQTGPe9qDAlETYkkRS3GYcw9CjuOX1r0Z+6x5zQeLoVg4ZdkLYjUhAEJRKcssnvpv2aVZlB4PaBbNp3LUDDlHKsnSQ7VfNQ1b7EdUrU9cgc8sVo26Moc8SMycVkKPplj9sP0ZJiCOJKBSiYxpMOinkkiCKs5KdCMwg6sMu7igawRALJx2vl+l7SvH1IObqRFIfq787UhgKMQw9VRlC2RPT3kj8z+skMjhzUhKxROIITQYFCdVlrI+y0n3CMZJ0qAhEnKi/6qgHVRJSJVpSIZjTK88S66h+XjdujiuNyzyNItgBu6AKTHAKGuAaNIEFEHgAT+AFvGqP2rP2pr1PSgta3rMN/kD7/AYH66IB</latexit>

• kl-dominated: upper-bounded by

At time t, play arm argmax

a2Kmax{q 2 ⇥ : Na(t)d( ¯Ya(t), q) log(t)}

<latexit sha1_base64="5zNIJJ2atB3E+SPjZiTL6ohc8pg=">AAACN3icbVBNSyNBEO3R1dXoalaPXgaDEGEJExFcPYleFgQ/wKxKOgw1nUrS2NMz6a5ZDMP8LC/7M7yJFw+67HX/wfbEHHbVBw2v3quiul6UKmkpCO69qekPM7Mf5+YrC4uflparn1e+2yQzAlsiUYm5jMCikhpbJEnhZWoQ4kjhRXR9WPoXP9BYmehzGqXYiaGvZU8KICeF1RMOps9juAlz4FI7RgMBKj8qilLl+bBUzwdIsHccQp02u3UegcmvinH1ZbjJFQ65Svqu4kVYrQWNYAz/LWlOSI1NcBpW73g3EVmMmoQCa9vNIKVODoakUFhUeGYxBXENfWw7qiFG28nHhxf+hlO6fi8x7mnyx+q/EznE1o7iyHWWh9nXXim+57Uz6n3t5FKnGaEWL4t6mfIp8csU/a40KEiNHAFhpPurLwZgQJDLuuJCaL4++S1pbTV2G8HZdm3/YJLGHFtj66zOmmyH7bNv7JS1mGC37IE9sWfvp/fo/fJ+v7ROeZOZVfYfvD9/AQAprT8=</latexit><latexit sha1_base64="5zNIJJ2atB3E+SPjZiTL6ohc8pg=">AAACN3icbVBNSyNBEO3R1dXoalaPXgaDEGEJExFcPYleFgQ/wKxKOgw1nUrS2NMz6a5ZDMP8LC/7M7yJFw+67HX/wfbEHHbVBw2v3quiul6UKmkpCO69qekPM7Mf5+YrC4uflparn1e+2yQzAlsiUYm5jMCikhpbJEnhZWoQ4kjhRXR9WPoXP9BYmehzGqXYiaGvZU8KICeF1RMOps9juAlz4FI7RgMBKj8qilLl+bBUzwdIsHccQp02u3UegcmvinH1ZbjJFQ65Svqu4kVYrQWNYAz/LWlOSI1NcBpW73g3EVmMmoQCa9vNIKVODoakUFhUeGYxBXENfWw7qiFG28nHhxf+hlO6fi8x7mnyx+q/EznE1o7iyHWWh9nXXim+57Uz6n3t5FKnGaEWL4t6mfIp8csU/a40KEiNHAFhpPurLwZgQJDLuuJCaL4++S1pbTV2G8HZdm3/YJLGHFtj66zOmmyH7bNv7JS1mGC37IE9sWfvp/fo/fJ+v7ROeZOZVfYfvD9/AQAprT8=</latexit><latexit sha1_base64="5zNIJJ2atB3E+SPjZiTL6ohc8pg=">AAACN3icbVBNSyNBEO3R1dXoalaPXgaDEGEJExFcPYleFgQ/wKxKOgw1nUrS2NMz6a5ZDMP8LC/7M7yJFw+67HX/wfbEHHbVBw2v3quiul6UKmkpCO69qekPM7Mf5+YrC4uflparn1e+2yQzAlsiUYm5jMCikhpbJEnhZWoQ4kjhRXR9WPoXP9BYmehzGqXYiaGvZU8KICeF1RMOps9juAlz4FI7RgMBKj8qilLl+bBUzwdIsHccQp02u3UegcmvinH1ZbjJFQ65Svqu4kVYrQWNYAz/LWlOSI1NcBpW73g3EVmMmoQCa9vNIKVODoakUFhUeGYxBXENfWw7qiFG28nHhxf+hlO6fi8x7mnyx+q/EznE1o7iyHWWh9nXXim+57Uz6n3t5FKnGaEWL4t6mfIp8csU/a40KEiNHAFhpPurLwZgQJDLuuJCaL4++S1pbTV2G8HZdm3/YJLGHFtj66zOmmyH7bNv7JS1mGC37IE9sWfvp/fo/fJ+v7ROeZOZVfYfvD9/AQAprT8=</latexit>

• Generic UCB algorithm is

Theorem 1. If d is a strong semi-distance function and is also kl-dominated, then the regret of UCB(d) algorithm is

lim sup

T!1

E[R(T )]

log T

X

a

µ⇤ � µa

d(µa, µ⇤)

<latexit sha1_base64="Q9kIQALldPTKSfDGkYwHXxIEh6Q=">AAACSHicbVDPaxQxGM2sra1ra1c9egkuwlbsMlsK6q0oQo9VdtvCZjrNZDPb0PyYJl+EJcy/58WbN/8HLz204s3sdIX+8EHC4733kXyvqKRwkKY/k9aDpeWHK6uP2o/X1p9sdJ4+O3DGW8ZHzEhjjwrquBSaj0CA5EeV5VQVkh8WZx/n/uFXbp0wegizimeKTrUoBaMQpbxzQqRQzld5GBIwROgSZjUpLWWBKAqnRRE+1eMvveFmVgcizRQPayL5OSbOq5z+S/rj11vxzmkdJr2GvGnEzTrvdNN+2gDfJ4MF6aIF9vPODzIxzCuugUnq3HiQVpAFakEwyes28Y5XlJ3RKR9HqqniLgtNEzV+FZUJLo2NRwNu1JsTgSrnZqqIyfl27q43F//njT2U77IgdOWBa3b9UOklBoPnteKJsJyBnEVCmRXxr5id0lgOxPLbsYTB3ZXvk9F2/30//bzT3f2waGMVvUAvUQ8N0Fu0i/bQPhohhr6hX+gSXSXfk4vkd/LnOtpKFjPP0S20Wn8Bj3y0Ag==</latexit><latexit sha1_base64="Q9kIQALldPTKSfDGkYwHXxIEh6Q=">AAACSHicbVDPaxQxGM2sra1ra1c9egkuwlbsMlsK6q0oQo9VdtvCZjrNZDPb0PyYJl+EJcy/58WbN/8HLz204s3sdIX+8EHC4733kXyvqKRwkKY/k9aDpeWHK6uP2o/X1p9sdJ4+O3DGW8ZHzEhjjwrquBSaj0CA5EeV5VQVkh8WZx/n/uFXbp0wegizimeKTrUoBaMQpbxzQqRQzld5GBIwROgSZjUpLWWBKAqnRRE+1eMvveFmVgcizRQPayL5OSbOq5z+S/rj11vxzmkdJr2GvGnEzTrvdNN+2gDfJ4MF6aIF9vPODzIxzCuugUnq3HiQVpAFakEwyes28Y5XlJ3RKR9HqqniLgtNEzV+FZUJLo2NRwNu1JsTgSrnZqqIyfl27q43F//njT2U77IgdOWBa3b9UOklBoPnteKJsJyBnEVCmRXxr5id0lgOxPLbsYTB3ZXvk9F2/30//bzT3f2waGMVvUAvUQ8N0Fu0i/bQPhohhr6hX+gSXSXfk4vkd/LnOtpKFjPP0S20Wn8Bj3y0Ag==</latexit><latexit sha1_base64="Q9kIQALldPTKSfDGkYwHXxIEh6Q=">AAACSHicbVDPaxQxGM2sra1ra1c9egkuwlbsMlsK6q0oQo9VdtvCZjrNZDPb0PyYJl+EJcy/58WbN/8HLz204s3sdIX+8EHC4733kXyvqKRwkKY/k9aDpeWHK6uP2o/X1p9sdJ4+O3DGW8ZHzEhjjwrquBSaj0CA5EeV5VQVkh8WZx/n/uFXbp0wegizimeKTrUoBaMQpbxzQqRQzld5GBIwROgSZjUpLWWBKAqnRRE+1eMvveFmVgcizRQPayL5OSbOq5z+S/rj11vxzmkdJr2GvGnEzTrvdNN+2gDfJ4MF6aIF9vPODzIxzCuugUnq3HiQVpAFakEwyes28Y5XlJ3RKR9HqqniLgtNEzV+FZUJLo2NRwNu1JsTgSrnZqqIyfl27q43F//njT2U77IgdOWBa3b9UOklBoPnteKJsJyBnEVCmRXxr5id0lgOxPLbsYTB3ZXvk9F2/30//bzT3f2waGMVvUAvUQ8N0Fu0i/bQPhohhr6hX+gSXSXfk4vkd/LnOtpKFjPP0S20Wn8Bj3y0Ag==</latexit>

Page 9: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• UCB kernel is a semi-distance function d, with problem§ kl-UCB: kl-divergence dkl, need iterative method to solve§ UCB1: , closed-form solution

Generic UCB Algorithm

P (d) : max

q2⇥q

s.t. d(p, q) �<latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit>

dsq(p, q) = 2(p� q)2<latexit sha1_base64="9qriFzpq8+xe+rEJ4/NhG1HRzmE=">AAAB/HicbVDLSsNAFJ3UV62v+Ni5GSxCC1qSIqgLoejGZQVjC20Mk8m0HTpJpjMToYbir7hxoeLWD3Hn3zhts9DWAxcO59zLvff4nFGpLOvbyC0sLi2v5FcLa+sbm1vm9s6djBOBiYNjFoumjyRhNCKOooqRJhcEhT4jDb9/NfYbD0RIGke3asiJG6JuRDsUI6Ulz9wLvFQORiV+NChfVEv8eFC+r3pm0apYE8B5YmekCDLUPfOrHcQ4CUmkMENStmyLKzdFQlHMyKjQTiThCPdRl7Q0jVBIpJtOrh/BQ60EsBMLXZGCE/X3RIpCKYehrztDpHpy1huL/3mtRHXO3JRGPFEkwtNFnYRBFcNxFDCggmDFhpogLKi+FeIeEggrHVhBh2DPvjxPnGrlvGLdnBRrl1kaebAPDkAJ2OAU1MA1qAMHYPAInsEreDOejBfj3fiYtuaMbGYX/IHx+QO0mJOh</latexit><latexit sha1_base64="9qriFzpq8+xe+rEJ4/NhG1HRzmE=">AAAB/HicbVDLSsNAFJ3UV62v+Ni5GSxCC1qSIqgLoejGZQVjC20Mk8m0HTpJpjMToYbir7hxoeLWD3Hn3zhts9DWAxcO59zLvff4nFGpLOvbyC0sLi2v5FcLa+sbm1vm9s6djBOBiYNjFoumjyRhNCKOooqRJhcEhT4jDb9/NfYbD0RIGke3asiJG6JuRDsUI6Ulz9wLvFQORiV+NChfVEv8eFC+r3pm0apYE8B5YmekCDLUPfOrHcQ4CUmkMENStmyLKzdFQlHMyKjQTiThCPdRl7Q0jVBIpJtOrh/BQ60EsBMLXZGCE/X3RIpCKYehrztDpHpy1huL/3mtRHXO3JRGPFEkwtNFnYRBFcNxFDCggmDFhpogLKi+FeIeEggrHVhBh2DPvjxPnGrlvGLdnBRrl1kaebAPDkAJ2OAU1MA1qAMHYPAInsEreDOejBfj3fiYtuaMbGYX/IHx+QO0mJOh</latexit><latexit sha1_base64="9qriFzpq8+xe+rEJ4/NhG1HRzmE=">AAAB/HicbVDLSsNAFJ3UV62v+Ni5GSxCC1qSIqgLoejGZQVjC20Mk8m0HTpJpjMToYbir7hxoeLWD3Hn3zhts9DWAxcO59zLvff4nFGpLOvbyC0sLi2v5FcLa+sbm1vm9s6djBOBiYNjFoumjyRhNCKOooqRJhcEhT4jDb9/NfYbD0RIGke3asiJG6JuRDsUI6Ulz9wLvFQORiV+NChfVEv8eFC+r3pm0apYE8B5YmekCDLUPfOrHcQ4CUmkMENStmyLKzdFQlHMyKjQTiThCPdRl7Q0jVBIpJtOrh/BQ60EsBMLXZGCE/X3RIpCKYehrztDpHpy1huL/3mtRHXO3JRGPFEkwtNFnYRBFcNxFDCggmDFhpogLKi+FeIeEggrHVhBh2DPvjxPnGrlvGLdnBRrl1kaebAPDkAJ2OAU1MA1qAMHYPAInsEreDOejBfj3fiYtuaMbGYX/IHx+QO0mJOh</latexit>

• New semi-distance functions with closed-form solutions§ Hellinger distance:§ Biquadratic distance:§ Theorem 1 provides regret bounds for these new UCB algorithms§ Closed-form solution allows O(1) complexity

dh(p, q) = (pp�p

q)2 +⇣p

1� p�p

1� q⌘2

<latexit sha1_base64="bn++NfW9YkRq7Cis7C1AEvHQMvQ=">AAACNnicbVDLSsNAFJ3UV62vqks3wSK0aEtSBHUhFN24kgrGFppaJtNJO3Ty6MyNUEL+yo2/4U43LlTc+glOH0htPTBwOOdc7tzjhJxJMIwXLbWwuLS8kl7NrK1vbG5lt3fuZBAJQi0S8EDUHSwpZz61gAGn9VBQ7Dmc1pze5dCvPVAhWeDfwiCkTQ93fOYygkFJrex1u9XNh0f9wrnNqQt5W/YFxGFSHJN+YgvW6ULhvnw4HTCLvxGzOBVqZXNGyRhBnyfmhOTQBNVW9tluByTyqA+EYykbphFCM8YCGOE0ydiRpCEmPdyhDUV97FHZjEd3J/qBUtq6Gwj1fNBH6vREjD0pB56jkh6Grpz1huJ/XiMC97QZMz+MgPpkvMiNuA6BPixRbzNBCfCBIpgIpv6qky4WmICqOqNKMGdPnidWuXRWMm6Oc5WLSRtptIf2UR6Z6ARV0BWqIgsR9Ihe0Tv60J60N+1T+xpHU9pkZhf9gfb9A4x9rI8=</latexit><latexit sha1_base64="bn++NfW9YkRq7Cis7C1AEvHQMvQ=">AAACNnicbVDLSsNAFJ3UV62vqks3wSK0aEtSBHUhFN24kgrGFppaJtNJO3Ty6MyNUEL+yo2/4U43LlTc+glOH0htPTBwOOdc7tzjhJxJMIwXLbWwuLS8kl7NrK1vbG5lt3fuZBAJQi0S8EDUHSwpZz61gAGn9VBQ7Dmc1pze5dCvPVAhWeDfwiCkTQ93fOYygkFJrex1u9XNh0f9wrnNqQt5W/YFxGFSHJN+YgvW6ULhvnw4HTCLvxGzOBVqZXNGyRhBnyfmhOTQBNVW9tluByTyqA+EYykbphFCM8YCGOE0ydiRpCEmPdyhDUV97FHZjEd3J/qBUtq6Gwj1fNBH6vREjD0pB56jkh6Grpz1huJ/XiMC97QZMz+MgPpkvMiNuA6BPixRbzNBCfCBIpgIpv6qky4WmICqOqNKMGdPnidWuXRWMm6Oc5WLSRtptIf2UR6Z6ARV0BWqIgsR9Ihe0Tv60J60N+1T+xpHU9pkZhf9gfb9A4x9rI8=</latexit><latexit sha1_base64="bn++NfW9YkRq7Cis7C1AEvHQMvQ=">AAACNnicbVDLSsNAFJ3UV62vqks3wSK0aEtSBHUhFN24kgrGFppaJtNJO3Ty6MyNUEL+yo2/4U43LlTc+glOH0htPTBwOOdc7tzjhJxJMIwXLbWwuLS8kl7NrK1vbG5lt3fuZBAJQi0S8EDUHSwpZz61gAGn9VBQ7Dmc1pze5dCvPVAhWeDfwiCkTQ93fOYygkFJrex1u9XNh0f9wrnNqQt5W/YFxGFSHJN+YgvW6ULhvnw4HTCLvxGzOBVqZXNGyRhBnyfmhOTQBNVW9tluByTyqA+EYykbphFCM8YCGOE0ydiRpCEmPdyhDUV97FHZjEd3J/qBUtq6Gwj1fNBH6vREjD0pB56jkh6Grpz1huJ/XiMC97QZMz+MgPpkvMiNuA6BPixRbzNBCfCBIpgIpv6qky4WmICqOqNKMGdPnidWuXRWMm6Oc5WLSRtptIf2UR6Z6ARV0BWqIgsR9Ihe0Tv60J60N+1T+xpHU9pkZhf9gfb9A4x9rI8=</latexit>

dbq(p, q) = 2 (p� q)2 +4

9(p� q)4

<latexit sha1_base64="/EQAhm++NpsxL5DTkuW3hMWPovU=">AAACJXicbVDLSsNAFJ34rPVVdekmWIQWtSSloF0Uim5cVrC20NQwmUzaoZNHZ26EEvI1bvwVNy6qCK78FaePhbYeuHA4517uvceJOJNgGF/ayura+sZmZiu7vbO7t587OHyQYSwIbZKQh6LtYEk5C2gTGHDajgTFvsNpyxncTPzWExWShcE9jCLa9XEvYB4jGJRk52qunTjDtBCdD4u1ssWpB4XoYmgJ1utD8bF8ZnkCk6SSJtV0ya3YubxRMqbQl4k5J3k0R8POjS03JLFPAyAcS9kxjQi6CRbACKdp1ooljTAZ4B7tKBpgn8puMn0z1U+V4upeKFQFoE/V3xMJ9qUc+Y7q9DH05aI3Ef/zOjF4V92EBVEMNCCzRV7MdQj1SWa6ywQlwEeKYCKYulUnfaxyAZVsVoVgLr68TJrlUrVk3FXy9et5Ghl0jE5QAZnoEtXRLWqgJiLoGb2iMXrXXrQ37UP7nLWuaPOZI/QH2vcPOEGknQ==</latexit><latexit sha1_base64="/EQAhm++NpsxL5DTkuW3hMWPovU=">AAACJXicbVDLSsNAFJ34rPVVdekmWIQWtSSloF0Uim5cVrC20NQwmUzaoZNHZ26EEvI1bvwVNy6qCK78FaePhbYeuHA4517uvceJOJNgGF/ayura+sZmZiu7vbO7t587OHyQYSwIbZKQh6LtYEk5C2gTGHDajgTFvsNpyxncTPzWExWShcE9jCLa9XEvYB4jGJRk52qunTjDtBCdD4u1ssWpB4XoYmgJ1utD8bF8ZnkCk6SSJtV0ya3YubxRMqbQl4k5J3k0R8POjS03JLFPAyAcS9kxjQi6CRbACKdp1ooljTAZ4B7tKBpgn8puMn0z1U+V4upeKFQFoE/V3xMJ9qUc+Y7q9DH05aI3Ef/zOjF4V92EBVEMNCCzRV7MdQj1SWa6ywQlwEeKYCKYulUnfaxyAZVsVoVgLr68TJrlUrVk3FXy9et5Ghl0jE5QAZnoEtXRLWqgJiLoGb2iMXrXXrQ37UP7nLWuaPOZI/QH2vcPOEGknQ==</latexit><latexit sha1_base64="/EQAhm++NpsxL5DTkuW3hMWPovU=">AAACJXicbVDLSsNAFJ34rPVVdekmWIQWtSSloF0Uim5cVrC20NQwmUzaoZNHZ26EEvI1bvwVNy6qCK78FaePhbYeuHA4517uvceJOJNgGF/ayura+sZmZiu7vbO7t587OHyQYSwIbZKQh6LtYEk5C2gTGHDajgTFvsNpyxncTPzWExWShcE9jCLa9XEvYB4jGJRk52qunTjDtBCdD4u1ssWpB4XoYmgJ1utD8bF8ZnkCk6SSJtV0ya3YubxRMqbQl4k5J3k0R8POjS03JLFPAyAcS9kxjQi6CRbACKdp1ooljTAZ4B7tKBpgn8puMn0z1U+V4upeKFQFoE/V3xMJ9qUc+Y7q9DH05aI3Ef/zOjF4V92EBVEMNCCzRV7MdQj1SWa6ywQlwEeKYCKYulUnfaxyAZVsVoVgLr68TJrlUrVk3FXy9et5Ghl0jE5QAZnoEtXRLWqgJiLoGb2iMXrXXrQ37UP7nLWuaPOZI/QH2vcPOEGknQ==</latexit>

Can we ensemble these closed-form UCB algorithms to a “stronger” one?

• A natural question is

Page 10: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Consider a set D of kl-dominated semi-distance functions. If is a strong semi-distance function, then D is said to be feasible§ Sufficient condition: exists one strong semi-distance in D§ Easy to construct and verify a feasible set J

UCBoost(D) Algorithm

At time t, play arm

• UCBoost(D) algorithm is

Theorem 2. If D is a feasible set of kl-dominated semi-distance functions, then the regret of UCBoost(D) algorithm is

max

d2Dd

<latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit>

argmax

a2Kmin

d2Dmax{q 2 ⇥ : Na(t)d( ¯Ya(t), q) log(t)}

<latexit sha1_base64="KnCRYbSbO5bjThr5etAzG/1Q604=">AAACRHicbVBNaxsxENWmX6n7Ebc99iJqCg4Usy6FNDmFNIdAoaQQNymWWWa1Y1tEq11LsyVG7J/rJffe8g9yyaEtvYZoHR/apA8Eb968YTQvLbVyFMdn0cqdu/fuP1h92Hr0+MnTtfaz519cUVmJA1nowh6l4FArgwNSpPGotAh5qvEwPf7Q9A+/oXWqMAc0L3GUw8SosZJAQUraQoCdiBxOEg9CmcBoKkH7j3UtcmUSnwWV79aNRfhZYzmYIsHWpwS6tJ51RQrWf60X1ZvZutA4E7qYhErUSbsT9+IF+G3SX5IOW2I/af8QWSGrHA1JDc4N+3FJIw+WlNRYt0TlsAR5DBMcBmogRzfyixRq/jooGR8XNjxDfKH+PeEhd26ep8HZXOlu9hrxf71hReP3I69MWREaeb1oXGlOBW8i5ZmyKEnPAwFpVfgrl1OwICkE3woh9G+efJsM3vY2e/Hnd53tnWUaq+wle8W6rM822DbbY/tswCT7zs7ZT/YrOo0uot/Rn2vrSrScecH+QXR5BRMusrM=</latexit><latexit sha1_base64="KnCRYbSbO5bjThr5etAzG/1Q604=">AAACRHicbVBNaxsxENWmX6n7Ebc99iJqCg4Usy6FNDmFNIdAoaQQNymWWWa1Y1tEq11LsyVG7J/rJffe8g9yyaEtvYZoHR/apA8Eb968YTQvLbVyFMdn0cqdu/fuP1h92Hr0+MnTtfaz519cUVmJA1nowh6l4FArgwNSpPGotAh5qvEwPf7Q9A+/oXWqMAc0L3GUw8SosZJAQUraQoCdiBxOEg9CmcBoKkH7j3UtcmUSnwWV79aNRfhZYzmYIsHWpwS6tJ51RQrWf60X1ZvZutA4E7qYhErUSbsT9+IF+G3SX5IOW2I/af8QWSGrHA1JDc4N+3FJIw+WlNRYt0TlsAR5DBMcBmogRzfyixRq/jooGR8XNjxDfKH+PeEhd26ep8HZXOlu9hrxf71hReP3I69MWREaeb1oXGlOBW8i5ZmyKEnPAwFpVfgrl1OwICkE3woh9G+efJsM3vY2e/Hnd53tnWUaq+wle8W6rM822DbbY/tswCT7zs7ZT/YrOo0uot/Rn2vrSrScecH+QXR5BRMusrM=</latexit><latexit sha1_base64="KnCRYbSbO5bjThr5etAzG/1Q604=">AAACRHicbVBNaxsxENWmX6n7Ebc99iJqCg4Usy6FNDmFNIdAoaQQNymWWWa1Y1tEq11LsyVG7J/rJffe8g9yyaEtvYZoHR/apA8Eb968YTQvLbVyFMdn0cqdu/fuP1h92Hr0+MnTtfaz519cUVmJA1nowh6l4FArgwNSpPGotAh5qvEwPf7Q9A+/oXWqMAc0L3GUw8SosZJAQUraQoCdiBxOEg9CmcBoKkH7j3UtcmUSnwWV79aNRfhZYzmYIsHWpwS6tJ51RQrWf60X1ZvZutA4E7qYhErUSbsT9+IF+G3SX5IOW2I/af8QWSGrHA1JDc4N+3FJIw+WlNRYt0TlsAR5DBMcBmogRzfyixRq/jooGR8XNjxDfKH+PeEhd26ep8HZXOlu9hrxf71hReP3I69MWREaeb1oXGlOBW8i5ZmyKEnPAwFpVfgrl1OwICkE3woh9G+efJsM3vY2e/Hnd53tnWUaq+wle8W6rM822DbbY/tswCT7zs7ZT/YrOo0uot/Rn2vrSrScecH+QXR5BRMusrM=</latexit>

lim sup

T!1

E[R(T )]

log T

X

a

µ⇤ � µa

maxd2D d(µa, µ⇤)

<latexit sha1_base64="USR3az7HTLJ9efnSjwA4SrfgGKI=">AAACVXicbVFbaxQxGM1Ma63rpWv76EvoImxFl1kRWt9KVfCxLbu2sBmHTCazDc1lzKV0CfmT9kV/ii9iZrqFXvwg4XC+c5J8J2XDmbFZ9jtJV1YfrT1ef9J7+uz5i43+y81vRjlN6JQorvRpiQ3lTNKpZZbT00ZTLEpOT8rzT23/5IJqw5Sc2EVDc4HnktWMYBupoi8QZ8K4pvATZBVisraLgGqNiUcC27Oy9F/C7Hg42cmDR1zN4SQgTn9AZJwo8I3SfX/zLu4FDq3tsvBVPAp+DtWwY992ip1Q9AfZKOsKPgTjJRiAZR0W/Z+oUsQJKi3h2JjZOGts7rG2jHAaesgZ2mByjud0FqHEgprcd7EE+DoyFayVjkta2LG3HR4LYxaijMp2VHO/15L/682crfdyz2TjLJXk+qLacWgVbDOGFdOUWL6IABPN4lshOcMxKRt/ohdDGN8f+SGYvh99HGVHHwb7B8s01sErsA2GYAx2wT74Cg7BFBBwBf4kabKS/Er+pqvp2rU0TZaeLXCn0o1/ZVq1fA==</latexit><latexit sha1_base64="USR3az7HTLJ9efnSjwA4SrfgGKI=">AAACVXicbVFbaxQxGM1Ma63rpWv76EvoImxFl1kRWt9KVfCxLbu2sBmHTCazDc1lzKV0CfmT9kV/ii9iZrqFXvwg4XC+c5J8J2XDmbFZ9jtJV1YfrT1ef9J7+uz5i43+y81vRjlN6JQorvRpiQ3lTNKpZZbT00ZTLEpOT8rzT23/5IJqw5Sc2EVDc4HnktWMYBupoi8QZ8K4pvATZBVisraLgGqNiUcC27Oy9F/C7Hg42cmDR1zN4SQgTn9AZJwo8I3SfX/zLu4FDq3tsvBVPAp+DtWwY992ip1Q9AfZKOsKPgTjJRiAZR0W/Z+oUsQJKi3h2JjZOGts7rG2jHAaesgZ2mByjud0FqHEgprcd7EE+DoyFayVjkta2LG3HR4LYxaijMp2VHO/15L/682crfdyz2TjLJXk+qLacWgVbDOGFdOUWL6IABPN4lshOcMxKRt/ohdDGN8f+SGYvh99HGVHHwb7B8s01sErsA2GYAx2wT74Cg7BFBBwBf4kabKS/Er+pqvp2rU0TZaeLXCn0o1/ZVq1fA==</latexit><latexit sha1_base64="USR3az7HTLJ9efnSjwA4SrfgGKI=">AAACVXicbVFbaxQxGM1Ma63rpWv76EvoImxFl1kRWt9KVfCxLbu2sBmHTCazDc1lzKV0CfmT9kV/ii9iZrqFXvwg4XC+c5J8J2XDmbFZ9jtJV1YfrT1ef9J7+uz5i43+y81vRjlN6JQorvRpiQ3lTNKpZZbT00ZTLEpOT8rzT23/5IJqw5Sc2EVDc4HnktWMYBupoi8QZ8K4pvATZBVisraLgGqNiUcC27Oy9F/C7Hg42cmDR1zN4SQgTn9AZJwo8I3SfX/zLu4FDq3tsvBVPAp+DtWwY992ip1Q9AfZKOsKPgTjJRiAZR0W/Z+oUsQJKi3h2JjZOGts7rG2jHAaesgZ2mByjud0FqHEgprcd7EE+DoyFayVjkta2LG3HR4LYxaijMp2VHO/15L/682crfdyz2TjLJXk+qLacWgVbDOGFdOUWL6IABPN4lshOcMxKRt/ohdDGN8f+SGYvh99HGVHHwb7B8s01sErsA2GYAx2wT74Cg7BFBBwBf4kabKS/Er+pqvp2rU0TZaeLXCn0o1/ZVq1fA==</latexit>

• If all d in D have closed-form solutions, complexity is O(|D|)

Page 11: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Why taking the minimum?• Philosophy of voting

§ Majority vote? § No! (If the ordering is known, follow the leader)§ UCBoost takes the minimum, thus the tightest

upper confidence bound

UCBoost(D) Algorithm

reward

action

Gap to optimality

Com

plex

itykl-UCB

UCB1

UCBoost( )

UCBoost(D)

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit> ✏

<latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit>

P (d) : max

q2⇥q

s.t. d(p, q) �<latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit>

UCB1

decision

0.9

UCB2

0.8

UCB3

0.6

UCBoost

0.6

0.8 0.75 0.7 0.7

0.2 0.2 0.3 0.2

1 1 2 2

max

d2Dd

<latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit>

1

d(p,

q)

pValue of q

P

✓max

d2Dd

<latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit>

d1<latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit>

d2<latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit>

dkl<latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit>

max(d1, d2)<latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit>

reward

action

Gap to optimality

Com

plex

ity

kl-UCB

UCB1

UCBoost( )

UCBoost(D)

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit> ✏

<latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit><latexit sha1_base64="VtERuh8kUklrAsJFtQ19b1sT8NE=">AAAB7nicdVDLSgNBEOz1GeMr6tHLYBA8hY0I6i3oxWME1wSSJcxOepMhszPrzKwQlvyEFw8qXv0eb/6Nk4cQXwUNRVU33V1RKrixvv/hLSwuLa+sFtaK6xubW9ulnd1bozLNMGBKKN2MqEHBJQaWW4HNVCNNIoGNaHA59hv3qA1X8sYOUwwT2pM85oxaJzXbmBoulOyUytWKPwHxf5Evqwwz1Dul93ZXsSxBaZmgxrSqfmrDnGrLmcBRsZ0ZTCkb0B62HJU0QRPmk3tH5NApXRIr7UpaMlHnJ3KaGDNMIteZUNs3P72x+JfXymx8FuZcpplFyaaL4kwQq8j4edLlGpkVQ0co09zdSlifasqsi6g4H8L/JDiunFf865Ny7WKWRgH24QCOoAqnUIMrqEMADAQ8wBM8e3feo/fivU5bF7zZzB58g/f2Cblwj/Q=</latexit>

P (d) : max

q2⇥q

s.t. d(p, q) �<latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit><latexit sha1_base64="PPAp5AJWMK3wZiWQaYLjrEJAGV4=">AAACInicbVBNTxsxEPVSPsNX2h65WESgIKHVBlUq5YTKpccgEUCKo2jWOyEWXu/Gnq0arcJv6aV/pRcOLXBC4sfghBz4etJIz+/NyDMvzrVyFEX3wcyH2bn5hcWlyvLK6tp69eOnU5cVVmJLZjqz5zE41MpgixRpPM8tQhprPIsvj8b+2U+0TmXmhIY5dlK4MKqnJJCXutWDZj3ZOeAihV/dciCUESd9JBjxbX51NRCi4kIKJ4+knu8OdoTGARcJaoJutRaF0QT8LWlMSY1N0exWb0WSySJFQ1KDc+1GlFOnBEtKahxVROEwB3kJF9j21ECKrlNObhzxLa8kvJdZX4b4RH0+UULq3DCNfWcK1HevvbH4ntcuqLffKZXJC0Ijnz7qFZpTxseB8URZlKSHnoC0yu/KZR8sSPKxVnwIjdcnvyWtvfBbGB1/qR1+n6axyDbYJquzBvvKDtkP1mQtJtlv9pf9Y/+DP8F1cBPcPbXOBNOZz+wFgodH5VyiMA==</latexit>

UCB1

decision

0.9

UCB2

0.8

UCB3

0.6

UCBoost

0.6

0.8 0.75 0.7 0.7

0.2 0.2 0.3 0.2

1 1 2 2

max

d2Dd

<latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit>

1

d(p,

q)

pValue of q

P

✓max

d2Dd

<latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit>

d1<latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit><latexit sha1_base64="Inbc64N22iqPYmmAXvoPt893o4Y=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGltoQ9lstu3SzSbsToQS+hO8eFDx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmlldW19o7xZ2dre2d2r7h88miTTjPsskYluh9RwKRT3UaDk7VRzGoeSt8LRzdRvPXFtRKIecJzyIKYDJfqCUbTSfdTzetWaW3dnIMvEK0gNCjR71a9ulLAs5gqZpMZ0PDfFIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVSLST7QthWSm/p7IaWzMOA5tZ0xxaBa9qfif18mwfxnkQqUZcsXmi/qZJJiQ6d8kEpozlGNLKNPC3krYkGrK0KZTsSF4iy8vE/+sflV3785rjesijTIcwTGcggcX0IBbaIIPDAbwDK/w5kjnxXl3PuatJaeYOYQ/cD5/AFq/jV8=</latexit>

d2<latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit><latexit sha1_base64="st/vHNf51EH9VP/YhbBsju9J4rQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4KkkR1FvRi8eKxhbaUDabSbt0swm7G6GU/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLM8G1cd1vZ2V1bX1js7RV3t7Z3duvHBw+6jRXDH2WilS1Q6pRcIm+4UZgO1NIk1BgKxzeTP3WEyrNU/lgRhkGCe1LHnNGjZXuo169V6m6NXcGsky8glShQLNX+epGKcsTlIYJqnXHczMTjKkynAmclLu5xoyyIe1jx1JJE9TBeHbqhJxaJSJxqmxJQ2bq74kxTbQeJaHtTKgZ6EVvKv7ndXITXwZjLrPcoGTzRXEuiEnJ9G8ScYXMiJEllClubyVsQBVlxqZTtiF4iy8vE79eu6q5d+fVxnWRRgmO4QTOwIMLaMAtNMEHBn14hld4c4Tz4rw7H/PWFaeYOYI/cD5/AFxCjWA=</latexit>

dkl<latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit><latexit sha1_base64="lNDyzncRcldzJK9FUtoylgQyKkA=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqN6KXjxWMG2hDWWz2bRrN7thdyOU0P/gxYOKV3+QN/+N2zYHbX0w8Hhvhpl5YcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMEeoTyaXqhlhTzgT1DTOcdlNFcRJy2gnHtzO/80SVZlI8mElKgwQPBYsZwcZK7WiQj/l0UK25dXcOtEq8gtSgQGtQ/epHkmQJFYZwrHXPc1MT5FgZRjidVvqZpikmYzykPUsFTqgO8vm1U3RmlQjFUtkSBs3V3xM5TrSeJKHtTLAZ6WVvJv7n9TITXwU5E2lmqCCLRXHGkZFo9jqKmKLE8IklmChmb0VkhBUmxgZUsSF4yy+vEv+ifl137y9rzZsijTKcwCmcgwcNaMIdtMAHAo/wDK/w5kjnxXl3PhatJaeYOYY/cD5/AEB9jxs=</latexit>

max(d1, d2)<latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit><latexit sha1_base64="9l0AI7h7cuWFzPQvO6lUpwm/cjQ=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQQUpSBPVW9OKxgrGFNoTNZtsu3d3E3U2xhP4OLx5UvPpnvPlv3LY5aOuDgcd7M8zMCxNGlXacb6uwsrq2vlHcLG1t7+zulfcPHlScSkw8HLNYtkOkCKOCeJpqRtqJJIiHjLTC4c3Ub42IVDQW93qcEJ+jvqA9ipE2kt/l6KkaBe5ZFNRPg3LFqTkz2MvEzUkFcjSD8lc3inHKidCYIaU6rpNoP0NSU8zIpNRNFUkQHqI+6RgqECfKz2ZHT+wTo0R2L5amhLZn6u+JDHGlxjw0nRzpgVr0puJ/XifVvUs/oyJJNRF4vqiXMlvH9jQBO6KSYM3GhiAsqbnVxgMkEdYmp5IJwV18eZl49dpVzbk7rzSu8zSKcATHUAUXLqABt9AEDzA8wjO8wps1sl6sd+tj3lqw8plD+APr8weNhpDX</latexit>

• Geometric view of UCBoost§ Kernel of UCBoost is§ Taking the minimum = solving§ The closer to KL divergence, the better the regret

max

d2Dd

<latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit><latexit sha1_base64="4yGfIv5Pgb49/XNOEMDIUWsygxI=">AAAB9HicbVBNS8NAEJ34WetX1aOXxSJ4KokI6q2oB48VjC20sWw2m3bp7ibsbtQS+j+8eFDx6o/x5r9x2+agrQ8GHu/NMDMvTDnTxnW/nYXFpeWV1dJaeX1jc2u7srN7p5NMEeqThCeqFWJNOZPUN8xw2koVxSLktBkOLsd+84EqzRJ5a4YpDQTuSRYzgo2V7jsCP3XzqMMkuhpF3UrVrbkToHniFaQKBRrdylcnSkgmqDSEY63bnpuaIMfKMMLpqNzJNE0xGeAebVsqsaA6yCdXj9ChVSIUJ8qWNGii/p7IsdB6KELbKbDp61lvLP7ntTMTnwU5k2lmqCTTRXHGkUnQOAIUMUWJ4UNLMFHM3opIHytMjA2qbEPwZl+eJ/5x7bzm3pxU6xdFGiXYhwM4Ag9OoQ7X0AAfCCh4hld4cx6dF+fd+Zi2LjjFzB78gfP5A7d3kjE=</latexit>

P

✓max

d2Dd

<latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit><latexit sha1_base64="48eq9fyIL9Aul/JVazbe7LxayQI=">AAACBHicbVBNS8NAEN3Ur1q/qh71sFiEeimpCOqtqAePFYwtNCFsNpt26WYTdidiCb148a948aDi1R/hzX/j9uOgrQ8GHu/NMDMvSAXXYNvfVmFhcWl5pbhaWlvf2Nwqb+/c6SRTlDk0EYlqB0QzwSVzgINg7VQxEgeCtYL+5chv3TOleSJvYZAyLyZdySNOCRjJL+83XcEiqLoxefDz0OUSXw1DV/FuD478csWu2WPgeVKfkgqaoumXv9wwoVnMJFBBtO7U7RS8nCjgVLBhyc00Swntky7rGCpJzLSXj78Y4kOjhDhKlCkJeKz+nshJrPUgDkxnTKCnZ72R+J/XySA683Iu0wyYpJNFUSYwJHgUCQ65YhTEwBBCFTe3YtojilAwwZVMCPXZl+eJc1w7r9k3J5XGxTSNItpDB6iK6ugUNdA1aiIHUfSIntErerOerBfr3fqYtBas6cwu+gPr8wdkVJgQ</latexit>

Page 12: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• A new candidate semi-distance function§ Lower bound of dkl:§ Closed-form solution of P(dlb)§ Tight to dkl when q goes to 1§ Allows bounded gap to optimality

UCBoost(D) Algorithm

Corollary 1. If D={dbq,dh,dlb}, then the regret of UCBoost(D) algorithm is

where e is the natural number. The complexity is O(1) per arm per round.

dlb(p, q) = p log(p) + (1� p) log1� p

1� q<latexit sha1_base64="JS0CWbPO8kf/YHU2W7px0tKxduc=">AAACFnicbVBNS8MwGE7n15xfVY9egkPoUEcrgnoQhl48TrBusI2SpukWlrZZkgqj9F948a948aDiVbz5b8w+Drr5QMLzPs/7kryPzxmVyra/jcLC4tLySnG1tLa+sbllbu/cyyQVmLg4YYlo+kgSRmPiKqoYaXJBUOQz0vD71yO/8UCEpEl8p4acdCLUjWlIMVJa8sxq4GXMzy1+NKhc8jZLutDilUPLOeaVUdUOBcKZrnJ9DXLPLNtVeww4T5wpKYMp6p751Q4SnEYkVpghKVuOzVUnQ0JRzEheaqeScIT7qEtamsYoIrKTjffK4YFWAhgmQp9YwbH6eyJDkZTDyNedEVI9OeuNxP+8VqrC805GY54qEuPJQ2HKoErgKCQYUEGwYkNNEBZU/xXiHtJJKB1lSYfgzK48T9yT6kXVvj0t166maRTBHtgHFnDAGaiBG1AHLsDgETyDV/BmPBkvxrvxMWktGNOZXfAHxucPaCGdyg==</latexit><latexit sha1_base64="JS0CWbPO8kf/YHU2W7px0tKxduc=">AAACFnicbVBNS8MwGE7n15xfVY9egkPoUEcrgnoQhl48TrBusI2SpukWlrZZkgqj9F948a948aDiVbz5b8w+Drr5QMLzPs/7kryPzxmVyra/jcLC4tLySnG1tLa+sbllbu/cyyQVmLg4YYlo+kgSRmPiKqoYaXJBUOQz0vD71yO/8UCEpEl8p4acdCLUjWlIMVJa8sxq4GXMzy1+NKhc8jZLutDilUPLOeaVUdUOBcKZrnJ9DXLPLNtVeww4T5wpKYMp6p751Q4SnEYkVpghKVuOzVUnQ0JRzEheaqeScIT7qEtamsYoIrKTjffK4YFWAhgmQp9YwbH6eyJDkZTDyNedEVI9OeuNxP+8VqrC805GY54qEuPJQ2HKoErgKCQYUEGwYkNNEBZU/xXiHtJJKB1lSYfgzK48T9yT6kXVvj0t166maRTBHtgHFnDAGaiBG1AHLsDgETyDV/BmPBkvxrvxMWktGNOZXfAHxucPaCGdyg==</latexit><latexit sha1_base64="JS0CWbPO8kf/YHU2W7px0tKxduc=">AAACFnicbVBNS8MwGE7n15xfVY9egkPoUEcrgnoQhl48TrBusI2SpukWlrZZkgqj9F948a948aDiVbz5b8w+Drr5QMLzPs/7kryPzxmVyra/jcLC4tLySnG1tLa+sbllbu/cyyQVmLg4YYlo+kgSRmPiKqoYaXJBUOQz0vD71yO/8UCEpEl8p4acdCLUjWlIMVJa8sxq4GXMzy1+NKhc8jZLutDilUPLOeaVUdUOBcKZrnJ9DXLPLNtVeww4T5wpKYMp6p751Q4SnEYkVpghKVuOzVUnQ0JRzEheaqeScIT7qEtamsYoIrKTjffK4YFWAhgmQp9YwbH6eyJDkZTDyNedEVI9OeuNxP+8VqrC805GY54qEuPJQ2HKoErgKCQYUEGwYkNNEBZU/xXiHtJJKB1lSYfgzK48T9yT6kXVvj0t166maRTBHtgHFnDAGaiBG1AHLsDgETyDV/BmPBkvxrvxMWktGNOZXfAHxucPaCGdyg==</latexit>

lim sup

T!1

E[R(T )]

log T

X

a

µ⇤ � µa

dkl(µa, µ⇤)� 1/e

<latexit sha1_base64="Lg8uIRzrB+yFEYLLM9Qmlw4fUa8=">AAACUXicbVFbaxQxGP12vNWtl60++hJchK3Y7YwI1reiCD5W2bWFzThksplt2FzGXIQl5DcWii/+EF98ULPTFbT1g4STc85H8p3UreDW5fm3Xnbt+o2bt7Zu97fv3L13f7Dz4KPV3lA2pVpoc1ITywRXbOq4E+ykNYzIWrDjevlmrR9/YcZyrSZu1bJSkoXiDafEJaoacCy4tL6twgQ7jblq3CrixhAasCTutK7D2zj7MJrsljFgoRdoErFgnxG2Xlbkj9N/erqX9orEMK/CUsRRd3rWKbt7xT6L1WCYj/Ou0FVQbMAQNnVUDc7xXFMvmXJUEGtnRd66MhDjOBUs9rG3rCV0SRZslqAiktkydJFE9CQxc9Rok5ZyqGP/7ghEWruSdXKux7SXtTX5P23mXXNQBq5a75iiFxc1XiCn0TpfNOeGUSdWCRBqeHoroqckpeTSL/RTCMXlka+C6fPxq3H+/sXw8PUmjS14BI9hBAW8hEN4B0cwBQpn8B1+wq/e196PDLLswpr1Nj0P4Z/Ktn8DuDm1fg==</latexit><latexit sha1_base64="Lg8uIRzrB+yFEYLLM9Qmlw4fUa8=">AAACUXicbVFbaxQxGP12vNWtl60++hJchK3Y7YwI1reiCD5W2bWFzThksplt2FzGXIQl5DcWii/+EF98ULPTFbT1g4STc85H8p3UreDW5fm3Xnbt+o2bt7Zu97fv3L13f7Dz4KPV3lA2pVpoc1ITywRXbOq4E+ykNYzIWrDjevlmrR9/YcZyrSZu1bJSkoXiDafEJaoacCy4tL6twgQ7jblq3CrixhAasCTutK7D2zj7MJrsljFgoRdoErFgnxG2Xlbkj9N/erqX9orEMK/CUsRRd3rWKbt7xT6L1WCYj/Ou0FVQbMAQNnVUDc7xXFMvmXJUEGtnRd66MhDjOBUs9rG3rCV0SRZslqAiktkydJFE9CQxc9Rok5ZyqGP/7ghEWruSdXKux7SXtTX5P23mXXNQBq5a75iiFxc1XiCn0TpfNOeGUSdWCRBqeHoroqckpeTSL/RTCMXlka+C6fPxq3H+/sXw8PUmjS14BI9hBAW8hEN4B0cwBQpn8B1+wq/e196PDLLswpr1Nj0P4Z/Ktn8DuDm1fg==</latexit><latexit sha1_base64="Lg8uIRzrB+yFEYLLM9Qmlw4fUa8=">AAACUXicbVFbaxQxGP12vNWtl60++hJchK3Y7YwI1reiCD5W2bWFzThksplt2FzGXIQl5DcWii/+EF98ULPTFbT1g4STc85H8p3UreDW5fm3Xnbt+o2bt7Zu97fv3L13f7Dz4KPV3lA2pVpoc1ITywRXbOq4E+ykNYzIWrDjevlmrR9/YcZyrSZu1bJSkoXiDafEJaoacCy4tL6twgQ7jblq3CrixhAasCTutK7D2zj7MJrsljFgoRdoErFgnxG2Xlbkj9N/erqX9orEMK/CUsRRd3rWKbt7xT6L1WCYj/Ou0FVQbMAQNnVUDc7xXFMvmXJUEGtnRd66MhDjOBUs9rG3rCV0SRZslqAiktkydJFE9CQxc9Rok5ZyqGP/7ghEWruSdXKux7SXtTX5P23mXXNQBq5a75iiFxc1XiCn0TpfNOeGUSdWCRBqeHoroqckpeTSL/RTCMXlka+C6fPxq3H+/sXw8PUmjS14BI9hBAW8hEN4B0cwBQpn8B1+wq/e196PDLLswpr1Nj0P4Z/Ktn8DuDm1fg==</latexit>

Page 13: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Recall geometric view of UCBoost:§ The closer to KL divergence, the better the regret§ Design a sequence of semi-distance functions to approximate dkl

UCBoost( ) Algorithm

Theorem 3. The regret of UCBoost( ) algorithm is

The complexity is per arm per round.

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

lim sup

T!1

E[R(T )]

log T

X

a

µ⇤ � µa

dkl(µa, µ⇤)� ✏

<latexit sha1_base64="fVUQo8C2usCmSULfOLzBcoJIwoA=">AAACVnicbVFdaxQxFM1MW1vXj4766EtwEbZil1kR1LeiCD5W2bWFzThksplt2HyMyY2whPxJ8aV/xRfNTlfQ1gsJJ+ecS3JPmk4KB2V5meU7u3u39g9uD+7cvXf/sHjw8LMz3jI+Y0Yae95Qx6XQfAYCJD/vLKeqkfysWb3b6GffuHXC6CmsO14putSiFYxCoupCEymU810dpgQMEbqFdSStpSwQReGiacL7OP80mh5VMRBplngaieRfMXFe1fSP0395dpz2msawqMNKxlF/et4rR8eEd05Io2NdDMtx2Re+CSZbMETbOq2L72RhmFdcA5PUufmk7KAK1IJgkscB8Y53lK3oks8T1FRxV4U+l4ifJmaBW2PT0oB79u+OQJVza9Uk52ZWd13bkP/T5h7a11UQuvPANbu6qPUSg8GbkPFCWM5ArhOgzIr0VswuaIoK0lcMUgiT6yPfBLMX4zfj8uPL4cnbbRoH6DF6gkZogl6hE/QBnaIZYugH+pntZLvZZfYr38v3r6x5tu15hP6pvPgN0cC2Qw==</latexit><latexit sha1_base64="fVUQo8C2usCmSULfOLzBcoJIwoA=">AAACVnicbVFdaxQxFM1MW1vXj4766EtwEbZil1kR1LeiCD5W2bWFzThksplt2HyMyY2whPxJ8aV/xRfNTlfQ1gsJJ+ecS3JPmk4KB2V5meU7u3u39g9uD+7cvXf/sHjw8LMz3jI+Y0Yae95Qx6XQfAYCJD/vLKeqkfysWb3b6GffuHXC6CmsO14putSiFYxCoupCEymU810dpgQMEbqFdSStpSwQReGiacL7OP80mh5VMRBplngaieRfMXFe1fSP0395dpz2msawqMNKxlF/et4rR8eEd05Io2NdDMtx2Re+CSZbMETbOq2L72RhmFdcA5PUufmk7KAK1IJgkscB8Y53lK3oks8T1FRxV4U+l4ifJmaBW2PT0oB79u+OQJVza9Uk52ZWd13bkP/T5h7a11UQuvPANbu6qPUSg8GbkPFCWM5ArhOgzIr0VswuaIoK0lcMUgiT6yPfBLMX4zfj8uPL4cnbbRoH6DF6gkZogl6hE/QBnaIZYugH+pntZLvZZfYr38v3r6x5tu15hP6pvPgN0cC2Qw==</latexit><latexit sha1_base64="fVUQo8C2usCmSULfOLzBcoJIwoA=">AAACVnicbVFdaxQxFM1MW1vXj4766EtwEbZil1kR1LeiCD5W2bWFzThksplt2HyMyY2whPxJ8aV/xRfNTlfQ1gsJJ+ecS3JPmk4KB2V5meU7u3u39g9uD+7cvXf/sHjw8LMz3jI+Y0Yae95Qx6XQfAYCJD/vLKeqkfysWb3b6GffuHXC6CmsO14putSiFYxCoupCEymU810dpgQMEbqFdSStpSwQReGiacL7OP80mh5VMRBplngaieRfMXFe1fSP0395dpz2msawqMNKxlF/et4rR8eEd05Io2NdDMtx2Re+CSZbMETbOq2L72RhmFdcA5PUufmk7KAK1IJgkscB8Y53lK3oks8T1FRxV4U+l4ifJmaBW2PT0oB79u+OQJVza9Uk52ZWd13bkP/T5h7a11UQuvPANbu6qPUSg8GbkPFCWM5ArhOgzIr0VswuaIoK0lcMUgiT6yPfBLMX4zfj8uPL4cnbbRoH6DF6gkZogl6hE/QBnaIZYugH+pntZLvZZfYr38v3r6x5tu15hP6pvPgN0cC2Qw==</latexit>

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

O(log(1/✏))<latexit sha1_base64="sm3cnt6JbbgeAl69/Bo+AmPzO44=">AAAB+3icbVBNS8NAEN3Ur1q/oj16WSxCe6mJCOqt6MWbFYwtNKFstpt26WY37G6EEOpf8eJBxat/xJv/xm2bg7Y+GHi8N8PMvDBhVGnH+bZKK6tr6xvlzcrW9s7unr1/8KBEKjHxsGBCdkOkCKOceJpqRrqJJCgOGemE4+up33kkUlHB73WWkCBGQ04jipE2Ut+u3tZ9JoZ198QniaJM8Eajb9ecpjMDXCZuQWqgQLtvf/kDgdOYcI0ZUqrnOokOciQ1xYxMKn6qSILwGA1Jz1COYqKCfHb8BB4bZQAjIU1xDWfq74kcxUplcWg6Y6RHatGbiv95vVRHF0FOeZJqwvF8UZQyqAWcJgEHVBKsWWYIwpKaWyEeIYmwNnlVTAju4svLxDttXjadu7Na66pIowwOwRGoAxecgxa4AW3gAQwy8AxewZv1ZL1Y79bHvLVkFTNV8AfW5w9iv5OB</latexit><latexit sha1_base64="sm3cnt6JbbgeAl69/Bo+AmPzO44=">AAAB+3icbVBNS8NAEN3Ur1q/oj16WSxCe6mJCOqt6MWbFYwtNKFstpt26WY37G6EEOpf8eJBxat/xJv/xm2bg7Y+GHi8N8PMvDBhVGnH+bZKK6tr6xvlzcrW9s7unr1/8KBEKjHxsGBCdkOkCKOceJpqRrqJJCgOGemE4+up33kkUlHB73WWkCBGQ04jipE2Ut+u3tZ9JoZ198QniaJM8Eajb9ecpjMDXCZuQWqgQLtvf/kDgdOYcI0ZUqrnOokOciQ1xYxMKn6qSILwGA1Jz1COYqKCfHb8BB4bZQAjIU1xDWfq74kcxUplcWg6Y6RHatGbiv95vVRHF0FOeZJqwvF8UZQyqAWcJgEHVBKsWWYIwpKaWyEeIYmwNnlVTAju4svLxDttXjadu7Na66pIowwOwRGoAxecgxa4AW3gAQwy8AxewZv1ZL1Y79bHvLVkFTNV8AfW5w9iv5OB</latexit><latexit sha1_base64="sm3cnt6JbbgeAl69/Bo+AmPzO44=">AAAB+3icbVBNS8NAEN3Ur1q/oj16WSxCe6mJCOqt6MWbFYwtNKFstpt26WY37G6EEOpf8eJBxat/xJv/xm2bg7Y+GHi8N8PMvDBhVGnH+bZKK6tr6xvlzcrW9s7unr1/8KBEKjHxsGBCdkOkCKOceJpqRrqJJCgOGemE4+up33kkUlHB73WWkCBGQ04jipE2Ut+u3tZ9JoZ198QniaJM8Eajb9ecpjMDXCZuQWqgQLtvf/kDgdOYcI0ZUqrnOokOciQ1xYxMKn6qSILwGA1Jz1COYqKCfHb8BB4bZQAjIU1xDWfq74kcxUplcWg6Y6RHatGbiv95vVRHF0FOeZJqwvF8UZQyqAWcJgEHVBKsWWYIwpKaWyEeIYmwNnlVTAju4svLxDttXjadu7Na66pIowwOwRGoAxecgxa4AW3gAQwy8AxewZv1ZL1Y79bHvLVkFTNV8AfW5w9iv5OB</latexit>

• For any , step-function approximation§ A sequence of points: for any§ For each k, step function:§ For each p, construct dynamic set§ Bisection search over step functions in D(p)

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

qk = 1� 1/(1 + ✏)k<latexit sha1_base64="qrcVjvbexW7RILjOOW9ZD8CdVBg=">AAAB/nicbVBNS8NAEN3Ur1q/ooIXL4tFqIg1EUE9CEUvHisYW2hj2Gw37ZLNJu5uhBJ78K948aDi1d/hzX/jts1BWx8MPN6bYWaenzAqlWV9G4WZ2bn5heJiaWl5ZXXNXN+4lXEqMHFwzGLR9JEkjHLiKKoYaSaCoMhnpOGHl0O/8UCEpDG/Uf2EuBHqchpQjJSWPHPr3gvP7QP7sGLvt0kiKYv53l3omWWrao0Ap4mdkzLIUffMr3YnxmlEuMIMSdmyrUS5GRKKYkYGpXYqSYJwiLqkpSlHEZFuNrp/AHe10oFBLHRxBUfq74kMRVL2I193Rkj15KQ3FP/zWqkKTt2M8iRVhOPxoiBlUMVwGAbsUEGwYn1NEBZU3wpxDwmElY6spEOwJ1+eJs5R9axqXR+Xaxd5GkWwDXZABdjgBNTAFagDB2DwCJ7BK3gznowX4934GLcWjHxmE/yB8fkD/Q2UXA==</latexit><latexit sha1_base64="qrcVjvbexW7RILjOOW9ZD8CdVBg=">AAAB/nicbVBNS8NAEN3Ur1q/ooIXL4tFqIg1EUE9CEUvHisYW2hj2Gw37ZLNJu5uhBJ78K948aDi1d/hzX/jts1BWx8MPN6bYWaenzAqlWV9G4WZ2bn5heJiaWl5ZXXNXN+4lXEqMHFwzGLR9JEkjHLiKKoYaSaCoMhnpOGHl0O/8UCEpDG/Uf2EuBHqchpQjJSWPHPr3gvP7QP7sGLvt0kiKYv53l3omWWrao0Ap4mdkzLIUffMr3YnxmlEuMIMSdmyrUS5GRKKYkYGpXYqSYJwiLqkpSlHEZFuNrp/AHe10oFBLHRxBUfq74kMRVL2I193Rkj15KQ3FP/zWqkKTt2M8iRVhOPxoiBlUMVwGAbsUEGwYn1NEBZU3wpxDwmElY6spEOwJ1+eJs5R9axqXR+Xaxd5GkWwDXZABdjgBNTAFagDB2DwCJ7BK3gznowX4934GLcWjHxmE/yB8fkD/Q2UXA==</latexit><latexit sha1_base64="qrcVjvbexW7RILjOOW9ZD8CdVBg=">AAAB/nicbVBNS8NAEN3Ur1q/ooIXL4tFqIg1EUE9CEUvHisYW2hj2Gw37ZLNJu5uhBJ78K948aDi1d/hzX/jts1BWx8MPN6bYWaenzAqlWV9G4WZ2bn5heJiaWl5ZXXNXN+4lXEqMHFwzGLR9JEkjHLiKKoYaSaCoMhnpOGHl0O/8UCEpDG/Uf2EuBHqchpQjJSWPHPr3gvP7QP7sGLvt0kiKYv53l3omWWrao0Ap4mdkzLIUffMr3YnxmlEuMIMSdmyrUS5GRKKYkYGpXYqSYJwiLqkpSlHEZFuNrp/AHe10oFBLHRxBUfq74kMRVL2I193Rkj15KQ3FP/zWqkKTt2M8iRVhOPxoiBlUMVwGAbsUEGwYn1NEBZU3wpxDwmElY6spEOwJ1+eJs5R9axqXR+Xaxd5GkWwDXZABdjgBNTAFagDB2DwCJ7BK3gznowX4934GLcWjHxmE/yB8fkD/Q2UXA==</latexit>

k � 0<latexit sha1_base64="zjB8ZM+xX8a89ew5FaeabITpWVs=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lFUG9FLx4rGFtoQ9lsJ+2SzSbuboQS+iO8eFDx6v/x5r9x2+agrQ8GHu/NMDMvSAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGHosEYnqBFSj4BI9w43ATqqQxoHAdhDdTP32EyrNE3lvxin6MR1KHnJGjZXaUW+Ij8TtV2tu3Z2BLJNGQWpQoNWvfvUGCctilIYJqnW34abGz6kynAmcVHqZxpSyiA6xa6mkMWo/n507ISdWGZAwUbakITP190ROY63HcWA7Y2pGetGbiv953cyEl37OZZoZlGy+KMwEMQmZ/k4GXCEzYmwJZYrbWwkbUUWZsQlVbAiNxZeXiXdWv6q7d+e15nWRRhmO4BhOoQEX0IRbaIEHDCJ4hld4c1LnxXl3PuatJaeYOYQ/cD5/AAmojuc=</latexit><latexit sha1_base64="zjB8ZM+xX8a89ew5FaeabITpWVs=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lFUG9FLx4rGFtoQ9lsJ+2SzSbuboQS+iO8eFDx6v/x5r9x2+agrQ8GHu/NMDMvSAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGHosEYnqBFSj4BI9w43ATqqQxoHAdhDdTP32EyrNE3lvxin6MR1KHnJGjZXaUW+Ij8TtV2tu3Z2BLJNGQWpQoNWvfvUGCctilIYJqnW34abGz6kynAmcVHqZxpSyiA6xa6mkMWo/n507ISdWGZAwUbakITP190ROY63HcWA7Y2pGetGbiv953cyEl37OZZoZlGy+KMwEMQmZ/k4GXCEzYmwJZYrbWwkbUUWZsQlVbAiNxZeXiXdWv6q7d+e15nWRRhmO4BhOoQEX0IRbaIEHDCJ4hld4c1LnxXl3PuatJaeYOYQ/cD5/AAmojuc=</latexit><latexit sha1_base64="zjB8ZM+xX8a89ew5FaeabITpWVs=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lFUG9FLx4rGFtoQ9lsJ+2SzSbuboQS+iO8eFDx6v/x5r9x2+agrQ8GHu/NMDMvSAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGHosEYnqBFSj4BI9w43ATqqQxoHAdhDdTP32EyrNE3lvxin6MR1KHnJGjZXaUW+Ij8TtV2tu3Z2BLJNGQWpQoNWvfvUGCctilIYJqnW34abGz6kynAmcVHqZxpSyiA6xa6mkMWo/n507ISdWGZAwUbakITP190ROY63HcWA7Y2pGetGbiv953cyEl37OZZoZlGy+KMwEMQmZ/k4GXCEzYmwJZYrbWwkbUUWZsQlVbAiNxZeXiXdWv6q7d+e15nWRRhmO4BhOoQEX0IRbaIEHDCJ4hld4c1LnxXl3PuatJaeYOYQ/cD5/AAmojuc=</latexit>

dks(p, q) = dkl(p, qk)1{q > qk}<latexit sha1_base64="4xCaW3sSJM6rdWw8EdJ4JYQrQ3c=">AAACDHicbZDLSsNAFIYnXmu9RV26GaxCC1ISEdSFUnTjsoKxhbaGyWTSDplcOjMRSsgLuPFV3LhQcesDuPNtnLRZaOsPA9/85xxmzu/EjAppGN/a3PzC4tJyaaW8ura+salvbd+JKOGYWDhiEW87SBBGQ2JJKhlpx5ygwGGk5fhXeb31QLigUXgrRzHpBagfUo9iJJVl6/uuLe79anw4rMFz6Nqpz7L8Zvs1s5sOLxR0M1uvGHVjLDgLZgEVUKhp619dN8JJQEKJGRKiYxqx7KWIS4oZycrdRJAYYR/1SUdhiAIieul4mwweKMeFXsTVCSUcu78nUhQIMQoc1RkgORDTtdz8r9ZJpHfaS2kYJ5KEePKQlzAoI5hHA13KCZZspABhTtVfIR4gjrBUAZZVCOb0yrNgHdXP6sbNcaVxWaRRArtgD1SBCU5AA1yDJrAABo/gGbyCN+1Je9HetY9J65xWzOyAP9I+fwCvH5pF</latexit><latexit sha1_base64="4xCaW3sSJM6rdWw8EdJ4JYQrQ3c=">AAACDHicbZDLSsNAFIYnXmu9RV26GaxCC1ISEdSFUnTjsoKxhbaGyWTSDplcOjMRSsgLuPFV3LhQcesDuPNtnLRZaOsPA9/85xxmzu/EjAppGN/a3PzC4tJyaaW8ura+salvbd+JKOGYWDhiEW87SBBGQ2JJKhlpx5ygwGGk5fhXeb31QLigUXgrRzHpBagfUo9iJJVl6/uuLe79anw4rMFz6Nqpz7L8Zvs1s5sOLxR0M1uvGHVjLDgLZgEVUKhp619dN8JJQEKJGRKiYxqx7KWIS4oZycrdRJAYYR/1SUdhiAIieul4mwweKMeFXsTVCSUcu78nUhQIMQoc1RkgORDTtdz8r9ZJpHfaS2kYJ5KEePKQlzAoI5hHA13KCZZspABhTtVfIR4gjrBUAZZVCOb0yrNgHdXP6sbNcaVxWaRRArtgD1SBCU5AA1yDJrAABo/gGbyCN+1Je9HetY9J65xWzOyAP9I+fwCvH5pF</latexit><latexit sha1_base64="4xCaW3sSJM6rdWw8EdJ4JYQrQ3c=">AAACDHicbZDLSsNAFIYnXmu9RV26GaxCC1ISEdSFUnTjsoKxhbaGyWTSDplcOjMRSsgLuPFV3LhQcesDuPNtnLRZaOsPA9/85xxmzu/EjAppGN/a3PzC4tJyaaW8ura+salvbd+JKOGYWDhiEW87SBBGQ2JJKhlpx5ygwGGk5fhXeb31QLigUXgrRzHpBagfUo9iJJVl6/uuLe79anw4rMFz6Nqpz7L8Zvs1s5sOLxR0M1uvGHVjLDgLZgEVUKhp619dN8JJQEKJGRKiYxqx7KWIS4oZycrdRJAYYR/1SUdhiAIieul4mwweKMeFXsTVCSUcu78nUhQIMQoc1RkgORDTtdz8r9ZJpHfaS2kYJ5KEePKQlzAoI5hHA13KCZZspABhTtVfIR4gjrBUAZZVCOb0yrNgHdXP6sbNcaVxWaRRArtgD1SBCU5AA1yDJrAABo/gGbyCN+1Je9HetY9J65xWzOyAP9I+fwCvH5pF</latexit>

D(p) = {dsq, dlb, dks : p qk exp(�✏/p)}<latexit sha1_base64="rgzx84ZrelvMkm8fb0ZmuvMEsbM=">AAACJHicbVDLSgMxFM34tr6qLt0Ei9CC1hkRfKAg6sKlgtVCpw6Z9LYNzWTSJCOWoT/jxl9x40LFhRu/xUztwteFcA7n3MvNPaHkTBvXfXdGRsfGJyanpnMzs3PzC/nFpSsdJ4pChcY8VtWQaOBMQMUww6EqFZAo5HAddk4y//oWlGaxuDQ9CfWItARrMkqMlYL8wWlRlg79tBGkuttft8DDDPRNZ1/6HLq4G3QG6MOdLG74IDXjsdiUJb8f5Atu2R0U/ku8ISmgYZ0H+Re/EdMkAmEoJ1rXPFeaekqUYZRDP+cnGiShHdKCmqWCRKDr6eDKPl6zSgM3Y2WfMHigfp9ISaR1LwptZ0RMW//2MvE/r5aY5m49ZUImBgT9WtRMODYxziLDDaaAGt6zhFDF7F8xbRNFqLHB5mwI3u+T/5LKVnmv7F5sF46Oh2lMoRW0iorIQzvoCJ2hc1RBFN2jR/SMXpwH58l5dd6+Wkec4cwy+lHOxydVs6TE</latexit><latexit sha1_base64="rgzx84ZrelvMkm8fb0ZmuvMEsbM=">AAACJHicbVDLSgMxFM34tr6qLt0Ei9CC1hkRfKAg6sKlgtVCpw6Z9LYNzWTSJCOWoT/jxl9x40LFhRu/xUztwteFcA7n3MvNPaHkTBvXfXdGRsfGJyanpnMzs3PzC/nFpSsdJ4pChcY8VtWQaOBMQMUww6EqFZAo5HAddk4y//oWlGaxuDQ9CfWItARrMkqMlYL8wWlRlg79tBGkuttft8DDDPRNZ1/6HLq4G3QG6MOdLG74IDXjsdiUJb8f5Atu2R0U/ku8ISmgYZ0H+Re/EdMkAmEoJ1rXPFeaekqUYZRDP+cnGiShHdKCmqWCRKDr6eDKPl6zSgM3Y2WfMHigfp9ISaR1LwptZ0RMW//2MvE/r5aY5m49ZUImBgT9WtRMODYxziLDDaaAGt6zhFDF7F8xbRNFqLHB5mwI3u+T/5LKVnmv7F5sF46Oh2lMoRW0iorIQzvoCJ2hc1RBFN2jR/SMXpwH58l5dd6+Wkec4cwy+lHOxydVs6TE</latexit><latexit sha1_base64="rgzx84ZrelvMkm8fb0ZmuvMEsbM=">AAACJHicbVDLSgMxFM34tr6qLt0Ei9CC1hkRfKAg6sKlgtVCpw6Z9LYNzWTSJCOWoT/jxl9x40LFhRu/xUztwteFcA7n3MvNPaHkTBvXfXdGRsfGJyanpnMzs3PzC/nFpSsdJ4pChcY8VtWQaOBMQMUww6EqFZAo5HAddk4y//oWlGaxuDQ9CfWItARrMkqMlYL8wWlRlg79tBGkuttft8DDDPRNZ1/6HLq4G3QG6MOdLG74IDXjsdiUJb8f5Atu2R0U/ku8ISmgYZ0H+Re/EdMkAmEoJ1rXPFeaekqUYZRDP+cnGiShHdKCmqWCRKDr6eDKPl6zSgM3Y2WfMHigfp9ISaR1LwptZ0RMW//2MvE/r5aY5m49ZUImBgT9WtRMODYxziLDDaaAGt6zhFDF7F8xbRNFqLHB5mwI3u+T/5LKVnmv7F5sF46Oh2lMoRW0iorIQzvoCJ2hc1RBFN2jR/SMXpwH58l5dd6+Wkec4cwy+lHOxydVs6TE</latexit>

Page 14: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Background and MotivationsqMulti-Armed Bandits FrameworkqStochastic Bandits At a GlanceqComplexity vs Optimality DilemmaqOur results Overview

• UCBoost AlgorithmqGeneric UCB AlgorithmqUCBoost(D) AlgorithmqUCBoost( ) Algorithm

• Numerical ResultsqExperiment SettingqRegret ResultsqComputation Results

• Conclusion

Outline

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

Page 15: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Average results over 10k independent runs of the algorithms• Bernoulli Scenario 1

§ 9 arms with = i/10§ Basic scenario with Bernoulli rewards

• Bernoulli Scenario 2§ 10 arms with§ Model the cases in online recommendations

• Beta Scenario§ 9 arms with Beta distributions, Beta(ai,2), where ai = i§ Another typical distribution with bounded support

Experiment Setting

µi<latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit><latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit><latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzzs4=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmlldW19o7xZ2dre2d2r7h882CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keirri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyyxPKRvTIe86qqniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyyNxP/87oZxpdhLnSaIddssSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMMR3AMp9CAC2jCLbQgAAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit>

µ1 = µ2 = µ3 = 0.01, µ4 = µ5 = µ6 = 0.02, µ7 = µ8 = µ9 = 0.05, µ10 = 0.1<latexit sha1_base64="tUQE/+iPBpBWwVVcmJw0RGywpIY=">AAACPnicbZBLTwIxFIU7PnF8jbp000hMXBAygyCwICG6cYmJPBIgpFMKNHQeaTsmZMI/c+NfcOfWjQs1bl3aKbNQ8CY9Oflub9p73JBRIW37xVhb39jc2s7smLt7+weH1tFxSwQRx6SJAxbwjosEYdQnTUklI52QE+S5jLTd6U3Sbz8QLmjg38tZSPoeGvt0RDGSCg2sVs+LBk4t0YLWy5qdt50cTHxRk5LWq4QXcmbiy5pUtFYTXlrw2LHnsAbtvDOwsgrrgqvGSU0WpNUYWM+9YYAjj/gSMyRE17FD2Y8RlxQzMjd7kSAhwlM0Jl1lfeQR0Y/1/nN4rsgQjgKuji+hpr8nYuQJMfNcddNDciKWewn8r9eN5KjSj6kfRpL4ePHQKGJQBjAJEw4pJ1iymTIIc6r+CvEEcYSlitxUITjLK6+aZiFfzdt3xWz9Ok0jA07BGbgADiiDOrgFDdAEGDyCV/AOPown4834NL4WV9eMdOYE/Cnj+wd5zald</latexit><latexit sha1_base64="tUQE/+iPBpBWwVVcmJw0RGywpIY=">AAACPnicbZBLTwIxFIU7PnF8jbp000hMXBAygyCwICG6cYmJPBIgpFMKNHQeaTsmZMI/c+NfcOfWjQs1bl3aKbNQ8CY9Oflub9p73JBRIW37xVhb39jc2s7smLt7+weH1tFxSwQRx6SJAxbwjosEYdQnTUklI52QE+S5jLTd6U3Sbz8QLmjg38tZSPoeGvt0RDGSCg2sVs+LBk4t0YLWy5qdt50cTHxRk5LWq4QXcmbiy5pUtFYTXlrw2LHnsAbtvDOwsgrrgqvGSU0WpNUYWM+9YYAjj/gSMyRE17FD2Y8RlxQzMjd7kSAhwlM0Jl1lfeQR0Y/1/nN4rsgQjgKuji+hpr8nYuQJMfNcddNDciKWewn8r9eN5KjSj6kfRpL4ePHQKGJQBjAJEw4pJ1iymTIIc6r+CvEEcYSlitxUITjLK6+aZiFfzdt3xWz9Ok0jA07BGbgADiiDOrgFDdAEGDyCV/AOPown4834NL4WV9eMdOYE/Cnj+wd5zald</latexit><latexit sha1_base64="tUQE/+iPBpBWwVVcmJw0RGywpIY=">AAACPnicbZBLTwIxFIU7PnF8jbp000hMXBAygyCwICG6cYmJPBIgpFMKNHQeaTsmZMI/c+NfcOfWjQs1bl3aKbNQ8CY9Oflub9p73JBRIW37xVhb39jc2s7smLt7+weH1tFxSwQRx6SJAxbwjosEYdQnTUklI52QE+S5jLTd6U3Sbz8QLmjg38tZSPoeGvt0RDGSCg2sVs+LBk4t0YLWy5qdt50cTHxRk5LWq4QXcmbiy5pUtFYTXlrw2LHnsAbtvDOwsgrrgqvGSU0WpNUYWM+9YYAjj/gSMyRE17FD2Y8RlxQzMjd7kSAhwlM0Jl1lfeQR0Y/1/nN4rsgQjgKuji+hpr8nYuQJMfNcddNDciKWewn8r9eN5KjSj6kfRpL4ePHQKGJQBjAJEw4pJ1iymTIIc6r+CvEEcYSlitxUITjLK6+aZiFfzdt3xWz9Ok0jA07BGbgADiiDOrgFDdAEGDyCV/AOPown4834NL4WV9eMdOYE/Cnj+wd5zald</latexit>

Page 16: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

Regret Results

Time 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

20

40

60

80

100

120

140

160UCB(d

h)

UCB1UCB(d

bq)

UCBoost({dbq

,dh,d

lb})

kl-UCB

Time0 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

50

100

150

200

250

300UCB(d

h)

UCB1UCB(d

bq)

UCBoost({dbq

,dh,d

lb})

kl-UCB

Time 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

20

40

60

80

100

120

140

160

180

200

220UCB(d

h)

UCB1UCB(d

bq)

UCBoost({dbq

,dh,d

lb})

kl-UCB

Time0 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

10

20

30

40

50

60

70

80

UCBoost({dbq

,dh,d

lb})

UCBoost(0.08)UCBoost(0.05)UCBoost(0.01)kl-UCB

(a) Bernoulli scenario 1

Time0 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

20

40

60

80

100

120

140

160

UCBoost({dbq

,dh,d

lb})

UCBoost(0.08)UCBoost(0.005)UCBoost(0.001)kl-UCB

(b) Bernoulli scenario 2

Time0 2000 4000 6000 8000 10000

Ave

rag

e r

eg

ret

0

20

40

60

80

100

120

140

UCBoost({dbq

,dh,d

lb})

UCBoost(0.08)UCBoost(0.05)UCBoost(0.01)kl-UCB

(c) Beta scenario

Figure 1: Regret of the various algorithms as a function of time in three scenarios.

Table 2: Average computational time for each arm per round of various algorithms.

Scenario kl-UCB UCBoost(✏) UCBoost(✏) UCBoost(✏) UCBoost({dbq

, dh

, dlb

}) UCB1✏ = 0.01(0.001) ✏ = 0.05(0.005) ✏ = 0.08

Bernoulli 1 933µs 7.67µs 6.67µs 5.78µs 1.67µs 0.31µsBernoulli 2 986µs 8.76µs 7.96µs 6.27µs 1.60µs 0.30µs

Beta 907µs 8.33µs 6.89µs 5.89µs 2.01µs 0.33µs

constants in the regret guarantees is bounded by 1/e. Thisresult also demonstrates the power of boosting in thatUCBoost({d

bq

, dh

, dlb

}) performs no worse than UCB(dh

)and UCB(d

bq

) in all cases.Third, UCBoost(✏) algorithm fills the gap between

UCBoost({dbq

, dh

, dlb

}) and kl-UCB, which is consistentwith the results in Bernoulli scenario 1. The regret ofUCBoost(✏) matches with that of kl-UCB when ✏ = 0.001.Compared to the results in Bernoulli scenario 1, we needmore accurate approximation for UCBoost when the expec-tations are lower. However, this accuracy is still moderatecompared to the requirements in numerical methods for kl-UCB.

6.3 Beta Scenario

Our results in the previous sections hold for any distributionswith bounded support. In this scenario, we consider K = 9

arms with Beta distributions. More precisely, each arm 1 i 9 is associated with Beta(↵

i

,�i

) distribution such that↵i

= i and �i

= 2. Note that the expectation of Beta(↵i

,�i

)is ↵

i

/(↵i

+ �i

). The regret results of various algorithms areshown in Figure 1c. The results are consistent with that ofBernoulli scenario 1.

6.4 Computational timeWe obtain the average running time for each arm per round bymeasuring the total computational time of 10, 000 indepen-dent runs of each algorithms in each scenario. Note that kl-UCB is implemented by the py/maBandits package developedby Cappe et al. [2012], which sets accuracy to 10

�5 for theNewton method. The average computational time results areshown in Table 2. The average running time of UCBoost(✏)that matches the regret of kl-UCB is no more than 1% of thetime of kl-UCB.

7 ConclusionIn this work, we introduce the generic UCB algorithm andprovide the regret guarantee for any UCB algorithm gener-ated by a kl-dominated strong semi-distance function. Asa by-product, we find two closed-form UCB algorithms,UCB(d

h

) and UCB(dbq

), that are alternatives to the tradi-tional UCB1 algorithm. Then, we propose a boosting frame-work, UCBoost, to boost any set of generic UCB algorithms.We find a specific finite set D, such that UCBoost(D) en-joys O(1) complexity for each arm per round as well asregret guarantee that is 1/e-close to the kl-UCB algorithm.Finally, we propose an approximation-based UCBoost algo-rithm, UCBoost(✏), that enjoys regret guarantee ✏-close to that

Page 17: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Computational Costs per arm per round

§ UCBoost(D) always outperforms UCB1 with same scale of computational cost§ 1% computation cost of kl-UCB to achieve competitive regret§ 100x faster response time or 100x capacity of arms

Computation Results(a) Bernoulli scenario 1 (b) Bernoulli scenario 2 (c) Beta scenario

Figure 1: Regret of the various algorithms as a function of time in three scenarios.

Table 2: Average computational time for each arm per round of various algorithms.

Scenario kl-UCB UCBoost(✏) UCBoost(✏) UCBoost(✏) UCBoost({dbq, dh, dlb}) UCB1✏ = 0.01(0.001) ✏ = 0.05(0.005) ✏ = 0.08

Bernoulli 1 933µs 7.67µs 6.67µs 5.78µs 1.67µs 0.31µsBernoulli 2 986µs 8.76µs 7.96µs 6.27µs 1.60µs 0.30µs

Beta 907µs 8.33µs 6.89µs 5.89µs 2.01µs 0.33µs

because the Hellinger distance between µa and µ⇤ is muchlarger than the l

2

distance in this scenario. So UCB(dh) en-joys better regret performance than UCB1 in this scenario.

Second, UCBoost({dbq, dh, dlb}) performs as ex-pected and is between UCB1 and kl-UCB. Althoughthe gap between UCB1 and kl-UCB becomes largerwhen compared to Bernoulli scenario 1, the gap betweenUCBoost({dbq, dh, dlb}) and kl-UCB remains. This ver-ifies our result in Corollary 3 that the gap between theconstants in the regret guarantees is bounded by 1/e. Thisresult also demonstrates the power of boosting in thatUCBoost({dbq, dh, dlb}) performs no worse than UCB(dh)and UCB(dbq) in all cases.

Third, UCBoost(✏) algorithm fills the gap betweenUCBoost({dbq, dh, dlb}) and kl-UCB, which is consistentwith the results in Bernoulli scenario 1. The regret ofUCBoost(✏) matches with that of kl-UCB when ✏ = 0.001.Compared to the results in Bernoulli scenario 1, we needmore accurate approximation for UCBoost when the expecta-tions are lower. However, this accuracy is moderate comparedto the requirements in numerical methods for kl-UCB.Beta Scenario. Our results in the previous sections hold forany distributions with bounded support. In this scenario, weconsider K = 9 arms with Beta distributions. More precisely,each arm 1 i 9 is associated with Beta(↵i,�i) distribu-tion such that ↵i = i and �i = 2. Note that the expectationof Beta(↵i,�i) is ↵i/(↵i + �i). The regret results shown in

Figure 1c are consistent with that of Bernoulli scenario 1.Computational time. We obtain the average running time foreach arm per round by measuring the total computational timeof 10, 000 independent runs of each algorithms in each sce-nario. Note that kl-UCB is implemented by the py/maBanditspackage developed by Cappe et al. [2012], which sets accu-racy to 10

�5 for the Newton method. The average computa-tional time results are shown in Table 2. The average runningtime of UCBoost(✏) that matches the regret of kl-UCB is nomore than 1% of the time of kl-UCB.

5 ConclusionIn this work, we introduce the generic UCB algorithm andprovide the regret guarantee for any UCB algorithm gener-ated by a kl-dominated strong semi-distance function. Then,we propose a boosting framework, UCBoost, to boost any setof generic UCB algorithms. We find a specific finite set D,such that UCBoost(D) enjoys O(1) complexity for each armper round as well as regret guarantee that is 1/e-close to thekl-UCB algorithm. Finally, we propose an approximation-based UCBoost algorithm, UCBoost(✏), that enjoys regretguarantee ✏-close to that of kl-UCB as well as O(log(1/✏))complexity for each arm per round. This algorithm bridgesthe regret guarantee to the computational complexity, thus of-fering an efficient trade-off between regret performance andcomplexity for practitioners. By experiments, we show thatUCBoost(✏) can achieve the same regret performance as stan-

Page 18: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

• Generic UCB algorithm§ New alternatives to UCB1

• Two recipes for complexity vs optimality dilemma§ UCBoost(D) algorithm: bounded gap, O(1) complexity§ UCBoost( ) algorithm: -gap, O(log(1/ ) complexity

• A boosting framework§ Design of UCB algorithm reduces to finding new semi-distance functions§ Try your own semi-distance functions

Conclusion

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

✏<latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit><latexit sha1_base64="ro97phRfhQVDYY93qu+r3a4UEZ4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLx4rGFtoQ9lsJ+3SzSbuboQS+ie8eFDx6u/x5r9x2+agrQ8GHu/NMDMvTAXXxnW/ndLK6tr6RnmzsrW9s7tX3T940EmmGPosEYlqh1Sj4BJ9w43AdqqQxqHAVji6mfqtJ1SaJ/LejFMMYjqQPOKMGiu1u5hqLhLZq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqnXHc1MT5FQZzgROKt1MY0rZiA6wY6mkMeogn907ISdW6ZMoUbakITP190ROY63HcWg7Y2qGetGbiv95ncxEl0HOZZoZlGy+KMoEMQmZPk/6XCEzYmwJZYrbWwkbUkWZsRFVbAje4svLxD+rX9Xdu/Na47pIowxHcAyn4MEFNOAWmuADAwHP8ApvzqPz4rw7H/PWklPMHMIfOJ8/t/GP8w==</latexit>

Page 19: UCBoost: A Boosting Approach to Tame Complexity and ... · Complexity and Optimality for Stochastic Bandits Fang Liu1, SinongWang1, SwapnaBuccapatnam2andNess Shroff1 1The Ohio State

Thanks!

End