“Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. ·...
Transcript of “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. ·...
![Page 1: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/1.jpg)
“Multi-players Bandit Algorithmsfor Internet of Things Networks”
By Lilian Besson
PhD defense at CentraleSupélec (Rennes)
Wednesday 20th of November, 2019
Supervisors: Prof. Christophe Moy at SCEE team, IETR & CentraleSupélec Dr. Émilie Kaufmann at SequeL team, CNRS & Inria, in Lille
![Page 2: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/2.jpg)
Introduction:
Spectrum issuesin wireless networks
Ref: Chapter 1 of my thesis.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 2/ 52
![Page 3: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/3.jpg)
All spectrum is allocated to different applications
But all zones are not always used everywhere
What if we could dynamically use the (most) empty channels?
Free
United States of North America, Department of Commerce, © 16
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 52
.Wireless networks
![Page 4: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/4.jpg)
Wireless networks. . .
We focus on Internet of Things networks (IoT) in unlicensed bands.→ networks with decentralized access. . .→ many wireless devices access a wireless network
served from one access pointthe base station is not affecting devices to radio resources. . .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52
.Target of this study
![Page 5: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/5.jpg)
Wireless networks. . .
We focus on Internet of Things networks (IoT) in unlicensed bands.→ networks with decentralized access. . .→ many wireless devices access a wireless network
served from one access pointthe base station is not affecting devices to radio resources. . .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52
.Target of this study
![Page 6: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/6.jpg)
Main constraints
decentralized: devices initiate transmission
can be in unlicensed radio bands
massive number of devices
long range
ultra-low power devices
low duty cycle
low data rate
Images from http://IBM.com/blogs/internet-of-things/what-is-the-iot and http://www.globalsign.com/en/blog/
connected-cows-and-crop-control-to-drones-the-internet-of-things-is-rapidly-improving-agriculture/
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 52
.The “Internet of Things”
![Page 7: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/7.jpg)
Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52
.Main questions
![Page 8: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/8.jpg)
Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .
Goal: increase the battery life of IoT devices
Fight the spectrum scarcity issue by using the spectrum moreefficiently than a static or uniformly random allocation
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52
.Main questions
![Page 9: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/9.jpg)
Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .
Goal: increase the battery life of IoT devices
Fight the spectrum scarcity issue by using the spectrum moreefficiently than a static or uniformly random allocation
Main solutions !
Yes we can!
By letting the radio devicesbecome “intelligent”
With MAB algorithms !
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52
.Main questions
![Page 10: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/10.jpg)
Outline of thispresentation
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 52
![Page 11: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/11.jpg)
Chapter 1Introduction
Chapter 2The Stochastic
Multi-Armed Bandit models
Chapter 3SMPyBandits: simulation
library for MAB
Chapter 4Online selection
of the best algorithm
Chapter 5Two MAB modelsfor IoT networks
Chapter 6Multi-players
Multi-Armed Bandits
Chapter 7Piece-Wise StationaryMulti-Armed Bandits
Chapter 8General Conclusion
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 52
.Contributions of my thesis highlighted today
![Page 12: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/12.jpg)
Introduction. Spectrum issues in wireless networks
Part I. Selfish MAB learning in a new model of IoT network
Part II. Two tractable problems extending the classical bandit multi-player bandits in stationary settings single-player bandits in piece-wise stationary settings
Conclusion and perspectives
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 52
.Outline of this presentation
![Page 13: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/13.jpg)
Part I.
Selfish MAB Learning
in IoT Networks
Ref: Chapter 5 of my thesis, and [Bonnefoi, Besson et al, 17].
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 52
![Page 14: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/14.jpg)
We control a lot of IoT devices
We want to insert them in an already crowded wireless network
Within a protocol slotted in time and frequency
Each device / has a low duty cycle ex: few messages per day
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52
.We want
![Page 15: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/15.jpg)
We control a lot of IoT devices
We want to insert them in an already crowded wireless network
Within a protocol slotted in time and frequency
Each device / has a low duty cycle ex: few messages per day
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52
.We want
![Page 16: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/16.jpg)
Discrete time t ∈ N∗ and K radio channels (e.g., 10) (known)
Chosen protocol: uplink messages ր followed by acknowledgements ց[Bonnefoi, Besson et al, 17], Sec.5.2
D dynamic devices trying to access the network independently
S = S1 + · · · + SK static devices occupying the network:S1, . . . , SK in each channel 1, . . . , K (unknown)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 52
.A new model for IoT networks
![Page 17: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/17.jpg)
1st case: Successful transmission if no collision on uplink messages ր !
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52
.Protocol: decentralized access with Ack. mode
![Page 18: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/18.jpg)
2nd case: Failed transmission if collision on uplink messages ր. . .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52
.Protocol: decentralized access with Ack. mode
![Page 19: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/19.jpg)
Emission model for IoT devices with low duty cycle
Each device / has the same low emission probability:each step, each device sends a packet with probability p
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52
.Our model of IoT devices [Bonnefoi, Besson et al, 17]
![Page 20: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/20.jpg)
Emission model for IoT devices with low duty cycle
Each device / has the same low emission probability:each step, each device sends a packet with probability p
Background stationary ambiant traffic
Each static device uses only one channel (Sk devices in channel k)
Their repartition is fixed in time
=⇒ This surrounding traffic is disturbing the dynamic devices
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52
.Our model of IoT devices [Bonnefoi, Besson et al, 17]
![Page 21: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/21.jpg)
Emission model for IoT devices with low duty cycle
Each device / has the same low emission probability:each step, each device sends a packet with probability p
Background stationary ambiant traffic
Each static device uses only one channel (Sk devices in channel k)
Their repartition is fixed in time
=⇒ This surrounding traffic is disturbing the dynamic devices
Dynamic radio reconfiguration
Dynamic device decide the channel to use to send their packets
They all have memory and computational capacity to implementsmall decision algorithms
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52
.Our model of IoT devices [Bonnefoi, Besson et al, 17]
![Page 22: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/22.jpg)
Goal minimize packet loss ratio (max = number of received Ack)
in a finite-space discrete-time Decision Making Problem
Baseline (naive solution)
Purely random (uniform) spectrum access for the D dynamic devices .
A possible solution
Embed a decentralized Multi-Armed Bandit algorithm, runningindependently on each dynamic device .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 52
.Problem
![Page 23: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/23.jpg)
If an oracle can affect Dk dynamic devices to channel k , thesuccessful transmission probability of the entire network is
P(success|sent) =K∑
k=1
(1 − p)Dk −1
︸ ︷︷ ︸Dk −1 others
× (1 − p)Sk
︸ ︷︷ ︸No static device
× Dk/D︸ ︷︷ ︸Sent in channel k
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52
.1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]
![Page 24: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/24.jpg)
If an oracle can affect Dk dynamic devices to channel k , thesuccessful transmission probability of the entire network is
P(success|sent) =K∑
k=1
(1 − p)Dk −1
︸ ︷︷ ︸Dk −1 others
× (1 − p)Sk
︸ ︷︷ ︸No static device
× Dk/D︸ ︷︷ ︸Sent in channel k
The oracle has to solve this optimization problem:
arg maxD1,...,DK
K∑k=1
Dk(1 − p)Sk +Dk −1
such thatK∑
k=1
Dk = D and Dk ≥ 0, ∀1 ≤ k ≤ K .
Contribution: a (numerical) solver for this quasi-convex optimizationproblem, with Lagrange multipliers.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52
.1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]
![Page 25: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/25.jpg)
=⇒ This oracle strategy has very good performance, as it maximizes thetransmission rate of all the D dynamic devices
But unrealistic
But not achievable in practice!
because there is no centralized supervision!
and (S1, . . . , SK ) are unknown!
We propose a realistic decentralized approach, with bandits!
MachineLearning
ReinforcementLearning
Multi-ArmedBandits
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 17/ 52
.1) Oracle centralized strategy
![Page 26: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/26.jpg)
It’s an old name for a casino machine !
© Dargaud 1981, Lucky Luke tome 18,.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 18/ 52
.Hum, what is a (one-armed) bandit?
![Page 27: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/27.jpg)
A player tries to collect rewards when playing a K -armed bandit game.
At each round t ∈ 1, . . . , T player chooses an arm A(t) ∈ 1, . . . , K the arm generates an i.i.d. reward rA(t)(t) ∼ νA(t)
Ex: from a Bernoulli distribution νk = B(µk)
player observes the reward rA(t)(t)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52
.Stochastic Multi-Armed Bandit formulation
![Page 28: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/28.jpg)
A player tries to collect rewards when playing a K -armed bandit game.
At each round t ∈ 1, . . . , T player chooses an arm A(t) ∈ 1, . . . , K the arm generates an i.i.d. reward rA(t)(t) ∼ νA(t)
Ex: from a Bernoulli distribution νk = B(µk)
player observes the reward rA(t)(t)
Goal (Reinforcement Learning)
Maximize the sum reward or its expectation
maxA
T∑
t=1
rA(t) or maxA E
[T∑
t=1
rA(t)
].
[Bubeck, 12], [Lattimore & Szepesvári, 19], [Slivkins, 19]
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52
.Stochastic Multi-Armed Bandit formulation
![Page 29: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/29.jpg)
A dynamic device tries to collect rewards when transmitting:
it transmits following a random Bernoulli process(ie. probability p of transmitting at each round t)
it chooses a channel A(τ) ∈ 1, . . . , K (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52
.2) Pseudo MAB formulation of our IoT problem
![Page 30: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/30.jpg)
A dynamic device tries to collect rewards when transmitting:
it transmits following a random Bernoulli process(ie. probability p of transmitting at each round t)
it chooses a channel A(τ) ∈ 1, . . . , K (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!)
Goal: Maximize transmission rate ≡ maximize cumulated rewards
It is not a stochastic Multi-Armed Bandit problem
It looks like a MAB but the environment is not stochastic or stationary
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52
.2) Pseudo MAB formulation of our IoT problem
![Page 31: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/31.jpg)
A dynamic device keeps τ number of sent packets1 For the first K activations (τ = 1, . . . , K ), try each channel once.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52
.2) Upper Confidence Bound algorithm [Auer et al, 02]
![Page 32: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/32.jpg)
A dynamic device keeps τ number of sent packets1 For the first K activations (τ = 1, . . . , K ), try each channel once.2 Then for the next steps t:
With probability p, the device is active (τ := τ + 1)
Compute the index UCBk(τ) :=
Mean µk (τ)︷ ︸︸ ︷Xk(τ)Nk(τ)
+
Confidence Bonus︷ ︸︸ ︷√log(τ)2Nk(τ)
,
Choose channel A(τ) = arg maxk
UCBk(τ),
Observe reward rA(τ)(τ) from arm A(τ) Update Nk(τ + 1) nb selections of channel k Update Xk(τ) nb of successful transmissions
Wait for next message. . . (mean waiting time ≃ 1/p)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52
.2) Upper Confidence Bound algorithm [Auer et al, 02]
![Page 33: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/33.jpg)
1 For any dynamic device , for any round t: With probability p, the device is active (τ := τ + 1) Play UCB algorithm. . . [Auer et al, 02] Wait for next message. . . (mean waiting time ≃ 1/p)
Problem 1: multiple dynamic devices
The collisions between dynamic devices are not stochastic!
Problem 2: random activation times τ? Devices transmits only with probability p at each time t
(following its Bernoulli activation pattern)
The times τ are not the global time indexes t (synchronized clock) !
=⇒ These two problems make the model hard to analyze !
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52
.2) Upper Confidence Bound algorithm [Auer et al, 02]
![Page 34: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/34.jpg)
K = 10 channels ,
S + D = 10000 devices in total,
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52
.Experimental setting: simulation parameters
![Page 35: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/35.jpg)
K = 10 channels ,
S + D = 10000 devices in total,
p = 10−3 probability of emission,
Horizon T = 105 total time slots (avg. ≃ 100 messages / device),
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52
.Experimental setting: simulation parameters
![Page 36: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/36.jpg)
K = 10 channels ,
S + D = 10000 devices in total,
p = 10−3 probability of emission,
Horizon T = 105 total time slots (avg. ≃ 100 messages / device),
We change the proportion of dynamic devices D / (S + D ),
For one example of repartition of (S1, . . . , SK ) static devices .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52
.Experimental setting: simulation parameters
![Page 37: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/37.jpg)
Number of slots ×1052 4 6 8 10
Successfu
l tr
ansm
issio
n r
ate
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.9
0.91
UCBThompson-samplingOptimalGood sub-optimalRandom
10% of dynamic devices . Gives 7% of gain.[Bonnefoi, Besson et al, 17], Sec.5.2
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 23/ 52
.One result for 10% of dynamic devices
![Page 38: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/38.jpg)
Proportion of dynamic devices (%)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Gain
com
pare
d to r
andom
channel sele
ction
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Optimal strategyUCB
1, α=0.5
Thomson-sampling
The MAB selfish learning is almost optimal, for any proportion ofdynamic devices , after a short learning time.
In this example, it gives up-to 16% gain over the naive approach!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 24/ 52
.Growing proportion of dynamic devices D/(S + D)
![Page 39: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/39.jpg)
We developed a realistic demonstration using USRP boards and GNURadio, as a proof-of-concept in a “toy” IoT network.
[Bonnefoi et al, ICT 18], [Besson et al, WCNC 19], Ch.5.3 and video published on YouTu.be/HospLNQhcMk
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 25/ 52
.We implemented this with real hardware (1/3)
![Page 40: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/40.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 26/ 52
.Using USRP board to simulate IoT devices (2/3)
![Page 41: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/41.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 27/ 52
.GNU Radio for the UI of the demo (3/3)
![Page 42: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/42.jpg)
It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .
Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K
Dor D ≤ K
pgives best performance
Hyp 2: we assumed a stationary background traffic . . .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52
.From practice to theory
![Page 43: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/43.jpg)
It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .
Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K
Dor D ≤ K
pgives best performance
Hyp 2: we assumed a stationary background traffic . . .
Goal: obtain theoretical result for our proposed model of IoT networks,and guarantees about the observed behavior of Selfish MAB learning.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52
.From practice to theory
![Page 44: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/44.jpg)
It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .
Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K
Dor D ≤ K
pgives best performance
Hyp 2: we assumed a stationary background traffic . . .
Goal: obtain theoretical result for our proposed model of IoT networks,and guarantees about the observed behavior of Selfish MAB learning.
We can study theoretically two more specific models
Model 1: multi-player bandits: devices are always activatedie. p = 1 in their random activation process =⇒ D = M ≤ K
p= K
Model 2: non-stationary bandits (for one device )
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52
.From practice to theory
![Page 45: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/45.jpg)
Part II.
Theoretical analysis oftwo relaxed models
Ref: Chapters 6 and 7 of my thesisand [Besson & Kaufmann, 18] and [Besson et al, 19].
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 29/ 52
![Page 46: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/46.jpg)
Theoretical analysis oftwo relaxed models
Multi-player banditsRef: Chapter 6 of my thesis, and [Besson & Kaufmann, 18].
Piece-wise stationary bandits
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 30/ 52
![Page 47: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/47.jpg)
M ≥ 2 players playing the same K -armed bandit (2 ≤ M ≤ K )they are all activated at each time step, ie. p = 1
At round t ∈ 1, . . . , T:
player m selects arm Amt ; then this arm generates sAm
t ,t ∈ 0, 1 and the reward is computed as
rm,t =
sAm
t ,t if no other player chose the same arm0 else (= COLLISION)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52
.Multi-players bandits: setup
![Page 48: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/48.jpg)
M ≥ 2 players playing the same K -armed bandit (2 ≤ M ≤ K )they are all activated at each time step, ie. p = 1
At round t ∈ 1, . . . , T:
player m selects arm Amt ; then this arm generates sAm
t ,t ∈ 0, 1 and the reward is computed as
rm,t =
sAm
t ,t if no other player chose the same arm0 else (= COLLISION)
Goal
maximize centralized (sum) rewardsM∑
m=1
T∑t=1
rm,t
. . . without (explicit) communication between players
trade-off: exploration / exploitation / and collisions !
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52
.Multi-players bandits: setup
![Page 49: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/49.jpg)
Different observation models: players observe sAmt ,t and/or rm,t
# 1: “Listen before talk” [Liu & Zhao, 10], [Jouini et al. 10], [Anandkumar et al. 11]
Good model for Opportunistic Spectrum Access (OSA)
First do sensing, attempt of transmission if no Primary User (PU),possible collisions with other Secondary Users (SU).
Feedback model: observe first sAm
t ,t , if sAm
t ,t = 1, transmit and then observe the joint rm,t , else don’t transmit and don’t observe a reward.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 32/ 52
.Multi-Players bandits for Cognitive Radios
![Page 50: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/50.jpg)
# 2: “Talk and maybe collide” [Besson & Kaufmann, 18]
Good model for Internet of Things (IoT)
Do not do any sensing, just transmit, and wait for anacknowledgment before any next message.
Feedback model: observe only the joint information rm,t , no collision if rm,t 6= 0, but cannot distinguish between collision or zero reward if rm,t = 0.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52
.M-P bandits for Cognitive Radios: proposed models
![Page 51: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/51.jpg)
# 2: “Talk and maybe collide” [Besson & Kaufmann, 18]
Good model for Internet of Things (IoT)
Do not do any sensing, just transmit, and wait for anacknowledgment before any next message.
Feedback model: observe only the joint information rm,t , no collision if rm,t 6= 0, but cannot distinguish between collision or zero reward if rm,t = 0.
# 3: “Observe collision then talk?” [Besson & Kaufmann, 18], [Boursier et al, 19]
A third “hybrid” model studied by several recent papers, following ourwork
Feedback model: first check if collision, then if not collision, receive joint reward rm,t .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52
.M-P bandits for Cognitive Radios: proposed models
![Page 52: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/52.jpg)
Hypothesis: arms sorted by decreasing mean: µ1 ≥ µ2 ≥ · · · ≥ µK
Rµ(A, T ) :=
(M∑
k=1
µk
)T
︸ ︷︷ ︸oracle total reward
−EAµ
[T∑
t=1
M∑
m=1
rm,t
]
Regret decomposition [Besson & Kaufmann, 18]
Rµ(A, T ) =K∑
k=M+1
(µM − µk)E[Nk(T )]
+M∑
k=1
(µk − µM) (T − E[Nk(T )]) +K∑
k=1
µkE[Ck(T )].
Nk(T ) total number of selections of arm k
Ck(T ) total number of collisions experienced on arm k
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52
.Regret for multi-player bandits (M players on K arms)
![Page 53: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/53.jpg)
Regret decomposition [Besson & Kaufmann, 18]
Rµ(A, T ) ≤ cstK∑
k=M+1
E [Nk(T )] + cst’M∑
k=1
E [Ck(T )] .
A good algorithm has to control both
the number of selections of sub-optimal arms→ with a good classical bandit policy: like kl-UCB
the number of collisions on optimal arms→ with a good orthogonalization procedure
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52
.Regret for multi-player bandits (M players on K arms)
![Page 54: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/54.jpg)
At round t, player m uses his past sensing information to:
compute an Upper Confidence Bound for each mean µk , UCBmk (t)
use the UCBs to estimate the M best arms
Mm(t) := arms with M largest UCBmk (t)
Two simple ideas: inspired by Musical Chair [Rosenski et al. 16]
always pick an arm estimated as “good” Am(t) ∈ Mm(t − 1)
try not to switch arm too often
σm(t) := player m is “fixed” at the end of round t
Other UCB-based algorithms: TDFS [Lui and Zhao, 10],Rho-Rand [Anandkumar et al., 11], Selfish [Bonnefoi, Besson et al., 17]
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 35/ 52
.The MC-Top-M algorithm (for the OSA case)
![Page 55: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/55.jpg)
(0) Start t = 0
Not fixed, σm(t) (2) Cm(t), Am(t) ∈ Mm(t)
(3) Am(t) /∈ Mm(t)
Fixed, σm(t)
(1) Cm(t), Am(t) ∈ Mm(t)
(4) Am(t) ∈ Mm(t)
(5) Am(t) /∈ Mm(t)
Sketch of the proof to bound number of collisions
any sequence of transitions (2) has constant length
O(log T ) number of transitions (3) and (5), by kl-UCB
=⇒ player m is fixed, for almost all rounds (O(T − log T ) times)
nb of collisions ≤ M× nb of collisions of non fixed players
=⇒ nb of collisions = O(log T ) & O(log(T )) sub-optimal selections (4)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 36/ 52
.The MC-Top-M algorithm (for the OSA case)
![Page 56: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/56.jpg)
MC-Top-M with kl-based confidence intervals [Cappé et al. 13]UCBm
k (t) = max q : Nmk (t)kl (µm
k (t), q) ≤ ln(t) ,
where kl(x , y) = KL (B(x), B(y)) = x ln(
xy
)+ (1 − x) ln
(1−x1−y
).
Control of the sub-optimal selections (state-of-the-art)
For all sub-optimal arms k ∈ M + 1, . . . , K,
E[Nmk (T )] ≤ ln(T )
kl(µk , µM)+ Cµ
√ln(T ).
Control of the collisions (new result)
E
[K∑
k=1
Ck(T )
]≤ M2
∑
a,b:µa<µb
2M + 1kl(µa, µb)
ln(T ) + O(ln T ).
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52
.Theoretical results for MC-Top-M
![Page 57: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/57.jpg)
MC-Top-M with kl-based confidence intervals [Cappé et al. 13]UCBm
k (t) = max q : Nmk (t)kl (µm
k (t), q) ≤ ln(t) ,
where kl(x , y) = KL (B(x), B(y)) = x ln(
xy
)+ (1 − x) ln
(1−x1−y
).
Control of the sub-optimal selections (state-of-the-art)
For all sub-optimal arms k ∈ M + 1, . . . , K,
E[Nmk (T )] ≤ ln(T )
kl(µk , µM)+ Cµ
√ln(T ).
logarithmic regret =⇒ Rµ(A, T ) = O((MCM,µ + M2C6) log(T ))
Control of the collisions (new result)
E
[K∑
k=1
Ck(T )
]≤ M2
∑
a,b:µa<µb
2M + 1kl(µa, µb)
ln(T ) + O(ln T ).
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52
.Theoretical results for MC-Top-M
![Page 58: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/58.jpg)
0 2000 4000 6000 8000 10000Time steps t=1. . . T, horizon T=10000
0
200
400
600
800
1000
1200
1400
1600
Cum
ulat
ed n
umbe
r of c
ollis
ions
on
all a
rms
Multi-players M=9 : Cumulated number of collisions, averaged 200 times9 arms: [B(0.1) ∗ , B(0.2) ∗ , B(0.3) ∗ , B(0.4) ∗ , B(0.5) ∗ , B(0.6) ∗ , B(0.7) ∗ , B(0.8) ∗ , B(0.9) ∗ ]
CentralizedMultiplePlay(kl-UCB)Selfish-kl-UCBRhoRand-kl-UCBMCTopM-kl-UCB
For M = K , our strategy MC-Top-M (D) achieves constant nb of collisions!=⇒ Our new orthogonalization procedure is very efficient!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 38/ 52
.Results on a multi-player MAB problem (1/2)
![Page 59: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/59.jpg)
0 2000 4000 6000 8000 10000Time steps t=1. . . T, horizon T=10000,
0
50
100
150
200
250
300Cu
mul
ativ
e ce
ntra
lized
regr
et t
6 ∑ k=
1µ∗ k−
t ∑ s=
1
9 ∑ k=
1µk(s)
200[A
j(t)=k,C
j(t)]
Multi-players M=6 : Cumulated centralized regret, averaged 200 times9 arms: [B(0.1), B(0.2), B(0.3), B(0.4) ∗ , B(0.5) ∗ , B(0.6) ∗ , B(0.7) ∗ , B(0.8) ∗ , B(0.9) ∗ ]
CentralizedMultiplePlay(kl-UCB)Selfish-kl-UCBRhoRand-kl-UCBMCTopM-kl-UCB
For M = 6 devices, our strategy MC-Top-M (D) largely outperforms ρrand andother previous state-of-the-art policies (not included).
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 39/ 52
.Results on a multi-player MAB problem (2/2)
![Page 60: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/60.jpg)
Algorithm Ref. Regret boundPerformance
⋆ is worst
Speed
is worstParameters
Centralized multi-play kl-UCB
[1] CM,µ log(T ) ⋆⋆⋆⋆⋆ just M butin another model!
ρrand UCB [2] M3C2 log(T ) ⋆⋆ just M
MEGA [3] C3T 3/4 ⋆ 4 params,
impossible to tune
Musical Chair [4](2M
M
)C4 log(T ) ⋆⋆ 1 parameter T0
hard to tune
Selfish UCB [5] T in some case ⋆ / ⋆⋆⋆⋆ none!
MCTopM klUCB [6] (MCM,µ + M2C6) log(T ) ⋆⋆⋆⋆ just M
Sic-MMAB [7] (CM,µ+MK ) log(T ) ⋆⋆⋆⋆ none! butin another model!
DPE [8] CM,µ log(T ) ??none! butin another model!
Optimal regret bound is multiple-play bound R(A, T ) ≤ CM,µ log(T ) + o(log(T )), with
CM,µ =∑
k:µk<µ∗
M
M∑j=1
µ∗
M
kl(µk ,µ∗
j) , and Ci ≫ CM,µ are much larger constants.
Papers: [1] Anantharam et al, 87 [2] Anandkumar et al, 11 [3] Avner et al, 15 [4] Rosenski et al, 15[5] Bonnefoi et al 17 [6] Besson & Kaufmann, 18 [7] Boursier et al, 19 [8] Proutière et al, 19
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 40/ 52
.State-of-the-art multi-player algorithms
![Page 61: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/61.jpg)
Theoretical analysis oftwo relaxed models
Multi-player bandits
Piece-wise stationary banditsRef: Chapter 7 of my thesis, and [Besson et al, 19].
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 41/ 52
![Page 62: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/62.jpg)
Stationary MAB problems
Arm k samples rewards from the same distribution for any round
∀t, rk(t) iid∼ νk = B(µk).
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52
.Piece-wise stationary bandits
![Page 63: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/63.jpg)
Stationary MAB problems
Arm k samples rewards from the same distribution for any round
∀t, rk(t) iid∼ νk = B(µk).
Non stationary MAB problems?
(possibly) different distributions for any round !
∀t, rk(t) iid∼ νk(t) = B(µk(t)).
=⇒ harder problem! And impossible with no extra hypothesis
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52
.Piece-wise stationary bandits
![Page 64: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/64.jpg)
Stationary MAB problems
Arm k samples rewards from the same distribution for any round
∀t, rk(t) iid∼ νk = B(µk).
Non stationary MAB problems?
(possibly) different distributions for any round !
∀t, rk(t) iid∼ νk(t) = B(µk(t)).
=⇒ harder problem! And impossible with no extra hypothesis
Piece-wise stationary problems!
The literature usually focuses on the easier case, when there are at mostΥT = o(
√T ) intervals, on which the means are all stationary.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52
.Piece-wise stationary bandits
![Page 65: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/65.jpg)
We plots the means µ1(t), µ2(t), µ3(t) of K = 3 arms .There are ΥT = 4 break-points and 5 sequences in 1, . . . , T = 5000
0 1000 2000 3000 4000 5000Time steps t=1. . . T, horizon T=5000
0.2
0.4
0.6
0.8
Succ
essiv
e m
eans
of t
he K
=3 a
rms
History of means for Non-Stationary MAB, Bernoulli with 4 break-pointsArm #0Arm #1Arm #2
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 43/ 52
.Example of a piece-wise stationary MAB problem
![Page 66: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/66.jpg)
The “oracle” plays the (unknown) best arm k∗(t) = argmax µk(t)(which changes between the ΥT ≥ 1 stationary sequences)
R(A, T ) = E
[T∑
t=1
rk∗(t)(t)
]−
T∑
t=1
E [r(t)]
=
(T∑
t=1
maxk
µk(t)
)
︸ ︷︷ ︸oracle total reward
−T∑
t=1
E [r(t)] .
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52
.Regret for piece-wise stationary bandits
![Page 67: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/67.jpg)
The “oracle” plays the (unknown) best arm k∗(t) = argmax µk(t)(which changes between the ΥT ≥ 1 stationary sequences)
R(A, T ) = E
[T∑
t=1
rk∗(t)(t)
]−
T∑
t=1
E [r(t)]
=
(T∑
t=1
maxk
µk(t)
)
︸ ︷︷ ︸oracle total reward
−T∑
t=1
E [r(t)] .
Typical regimes for piece-wise stationary bandits
The (minimax) worst-case lower-bound is R(A, T ) ≥ Ω(√
KTΥT )
State-of-the-art algorithms A obtain R(A, T ) ≤ O(K√
TΥT log(T ))
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52
.Regret for piece-wise stationary bandits
![Page 68: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/68.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 69: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/69.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 70: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/70.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 71: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/71.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 72: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/72.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)
Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 73: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/73.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)
Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 74: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/74.jpg)
Three components of our algorithm [Besson et al, 19]
Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB
[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]
A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected
A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)
Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )
Regret bound (if T and ΥT are both known)
Our algorithm obtains R(A, T ) ≤ O(
K∆2
change
√TΥT log(T )
)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52
.Our new algorithm: kl-UCB index + BGLR detector
![Page 75: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/75.jpg)
→ kl-UCB + BGLR (⋆) achieves the best performance (among non-oracle)!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 46/ 52
.Results on a piece-wise stationary MAB problem
![Page 76: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/76.jpg)
Algorithm Ref. Regret boundPerformance⋆ is worst
Speedis worst
Parameters
Naive UCB [1] T in worst case ⋆ none!
Oracle-Restart UCB [1] CΥT log(T ) ⋆⋆⋆⋆⋆ the break-points
(unrealistic oracle!)
Discounted UCB [2] C2
√TΥT log(T ) ⋆ T and ΥT
Sliding-Window UCB [2] C′2
√TΥT log(T ) ⋆ T and ΥT
Exp3.S [3] C√
TΥT log(T ) ⋆ ΥT
Discounted TS [4] not yet proven ⋆⋆ how to tune γ ?
CUSUM-UCB [5] C5
√TΥT log( T
ΥT) ⋆⋆⋆ T , ΥT and δmin
M-UCB [6] C6
√TΥT log(T ) ⋆⋆ T , ΥT and δmin
BGLR + kl-UCB [7] C√
TΥT log(T ) ⋆⋆⋆⋆ T and ΥT
AdSwitch [8] C8
√TΥT log(T ) ⋆⋆ just T
Ada-ILTCB+ [9] C9
√TΥT log(T ) ?? just T
Optimal minimax regret bound is R(A, T ) = O(√
KTΥT ), and C = CΥT ,µ = O( K∆2
change
).
Ci ≫ CΥT ,µ are much larger constants, and δmin < ∆change lower-bounds the problem difficulty.
Papers: [1] Auer et al. 02 [2] Garivier et al. 09 [3] Auer et al. 02 [5] Raj et al. 17[5] Liu et al. 18 [6] Cao et al. 19 [7] Besson et al. 19 [8] Auer et al. 19 [9] Chen et al. 19
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 47/ 52
.State-of-the-art piece-wise stationary algorithms
![Page 77: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/77.jpg)
Summary
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 48/ 52
![Page 78: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/78.jpg)
Part I:
Part II:
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52
.Contributions (1/3)
![Page 79: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/79.jpg)
Part I:
A simple model of IoT network, where autonomous IoT devicescan embed decentralized learning (“selfish MAB learning”),
numerical simulations proving the quality of our solution,
a realistic implementation on radio hardware.
Part II:
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52
.Contributions (1/3)
![Page 80: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/80.jpg)
Part I:
A simple model of IoT network, where autonomous IoT devicescan embed decentralized learning (“selfish MAB learning”),
numerical simulations proving the quality of our solution,
a realistic implementation on radio hardware.
Part II: New algorithms and regret bounds, in two simplified models:
for multi-player bandits, with M ≤ K players, for piece-wise stationary bandits, with ΥT = o(T ) break-points,
our proposed algorithms achieve state-of-the-art performance on both numerical, and theoretical results.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52
.Contributions (1/3)
![Page 81: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/81.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 82: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/82.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 83: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/83.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
Study the “Graal” goal:
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 84: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/84.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
Study the “Graal” goal: propose a more realistic model for IoT networks
(exogenous activation, non stationary traffic, etc)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 85: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/85.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
Study the “Graal” goal: propose a more realistic model for IoT networks
(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 86: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/86.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
Study the “Graal” goal: propose a more realistic model for IoT networks
(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 87: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/87.jpg)
Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!
More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy
Study the “Graal” goal: propose a more realistic model for IoT networks
(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees!
Extend my Python library SMPyBandits to cover many other banditmodels (cascading, delay feedback, combinatorial, contextual etc)→ it is already online, free and open-source on GitHub.com/SMPyBandits
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52
.Perspectives (2/3)
![Page 88: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/88.jpg)
8 International conferences with proceedings:
“MAB Learning in IoT Networks”, Bonnefoi, Besson et al, CROWNCOM, 2017 “Aggregation of MAB for OSA”, Besson, Kaufmman, Moy, IEEE WCNC, 2018 “Multi-Player Bandits Revisited”, Besson & Kaufmann, ALT, 2018 “MALIN with GRC . . . ”, Bonnefoi, Besson, Moy, demo at ICT, 2018 “GNU Radio Implementation of MALIN . . . ”, Besson et al, IEEE WCNC, 2019 “UCB . . . LPWAN w/ Retransmissions”, Bonnefoi, Besson et al, IEEE WCNC, 2019 “Decentralized Spectrum Learning . . . ”, Moy & Besson, ISIoT, 2019 “Analyse non asymptotique . . . ”, Besson & Kaufmann, GRETSI, 2019
1 Preprints:
“Doubling-Trick . . . ”, Besson & Kaufmann, arXiv:1803.06971, 2018
3 Submitted works:
“Decentralized Spectrum Learning . . . ”, Moy, Besson et al, for Annals ofTelecommunications, July 2019
“GLRT meets klUCB . . . ”, Besson & Kaufmann & Maillard, for AISTATS, Oct.2019 “SMPyBandits . . . ”, Besson, for JMLR MLOSS, October 2019
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 51/ 52
.List of publications (3/3)
![Page 89: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/89.jpg)
Thanks for your attention!
Questions & Discussion
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 52/ 52
.Conclusion
![Page 90: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/90.jpg)
an extension of our model of IoT network to account forretransmissions (Section 5.4),
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27
.Didn’t have time to talk about. . . (2/4)
![Page 91: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/91.jpg)
an extension of our model of IoT network to account forretransmissions (Section 5.4),
my Python library SMPyBandits (Chapter 3),
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27
.Didn’t have time to talk about. . . (2/4)
![Page 92: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/92.jpg)
an extension of our model of IoT network to account forretransmissions (Section 5.4),
my Python library SMPyBandits (Chapter 3),
our proposed algorithm for aggregating bandit algorithms(Chapter 4),
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27
.Didn’t have time to talk about. . . (2/4)
![Page 93: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/93.jpg)
an extension of our model of IoT network to account forretransmissions (Section 5.4),
my Python library SMPyBandits (Chapter 3),
our proposed algorithm for aggregating bandit algorithms(Chapter 4),
details about our algorithms, their precise theoretical results andproofs (Chapters 6 & 7),
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27
.Didn’t have time to talk about. . . (2/4)
![Page 94: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/94.jpg)
an extension of our model of IoT network to account forretransmissions (Section 5.4),
my Python library SMPyBandits (Chapter 3),
our proposed algorithm for aggregating bandit algorithms(Chapter 4),
details about our algorithms, their precise theoretical results andproofs (Chapters 6 & 7),
our work on the “doubling trick” (to make an algorithm Aanytime and keep its regret bounds).
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27
.Didn’t have time to talk about. . . (2/4)
![Page 95: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/95.jpg)
References and
publications
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 2/ 27
![Page 96: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/96.jpg)
Check out the
“The Bandit Book”by Tor Lattimore and Csaba Szepesvári
Cambridge University Press, 2019.
→ tor-lattimore.com/downloads/book/book.pdf
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 27
.Where to know more: about bandits (1/3)
![Page 97: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/97.jpg)
Reach me (or Christophe or Émilie) out by email, if you have questions
Lilian.Besson @ CentraleSupelec.fr
→ perso.crans.org/besson/
Christophe.Moy @ Univ-Rennes1.fr
→ moychristophe.wordpress.com
Emilie.Kaufmann @ Univ-Lille.fr
→ chercheurs.lille.inria.fr/ekaufman
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 27
.Where to know more: about our work? (2/3)
![Page 98: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/98.jpg)
Experiment with bandits by yourself!
Interactive demo on this web-page→ perso.crans.org/besson/phd/MAB_interactive_demo/
Use my Python library for simulations of MAB problems SMPyBandits→ SMPyBandits.GitHub.io & GitHub.com/SMPyBandits
Install with $ pip install SMPyBandits
Free and open-source (MIT license)
Easy to set up your own bandit experiments, add new algorithms etc.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 27
.Where to know more: in practice (3/3)
![Page 99: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/99.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 27
.→ SMPyBandits.GitHub.io
![Page 100: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/100.jpg)
My PhD thesis (Lilian Besson)“Multi-players Bandit Algorithms for Internet of Things Networks”→ Online at perso.crans.org/besson/phd/
→ Open-source at GitHub.com/Naereen/phd-thesis/
My Python library for simulations of MAB problems, SMPyBandits→ SMPyBandits.GitHub.io
“The Bandit Book”, by Tor Lattimore and Csaba Szepesvári→ tor-lattimore.com/downloads/book/book.pdf
“Introduction to Multi-Armed Bandits”, by Alex Slivkins→ arXiv.org/abs/1904.07272
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 27
.Main references
![Page 101: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/101.jpg)
List of publications
Cf.: CV.archives-ouvertes.fr/lilian-besson
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 27
![Page 102: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/102.jpg)
Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation,by Christophe Moy & Lilian Besson.1st International ISIoT workshop, at Conference on Distributed Computing inSensor Systems, Santorini, Greece, May 2019.See Chapter 5.
Upper-Confidence Bound for Channel Selection in LPWA Networks withRetransmissions,by Rémi Bonnefoi, Lilian Besson, Julio Manco-Vasquez & Christophe Moy.1st International MOTIoN workshop, at WCNC, Marrakech, Morocco, April 2019.See Section 5.4.
GNU Radio Implementation of MALIN: “Multi-Armed bandits Learning forInternet-of-things Networks”,by Lilian Besson, Rémi Bonnefoi & Christophe Moy.Wireless Communication and Networks Conference, Marrakech, April 2019.See Section 5.3.
For more details, see: CV.Archives-Ouvertes.fr/lilian-besson.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 27
.International conferences with proceedings (1/2)
![Page 103: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/103.jpg)
Multi-Player Bandits Revisited,by Lilian Besson & Émilie Kaufmann.Algorithmic Learning Theory, Lanzarote, Spain, April 2018.See Chapter 6.
Aggregation of Multi-Armed Bandits learning algorithms for OpportunisticSpectrum Access,by Lilian Besson, Émilie Kaufmann & Christophe Moy.Wireless Communication and Networks Conference, Barcelona, Spain, April 2018.See Chapter 4.
Multi-Armed Bandit Learning in IoT Networks and non-stationary settings,by Rémi Bonnefoi, L.Besson, C.Moy, É.Kaufmann & Jacques Palicot.Conference on Cognitive Radio Oriented Wireless Networks, Lisboa, Portugal,September 2017. Best Paper Award.See Section 5.2.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 27
.International conferences with proceedings (2/2)
![Page 104: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/104.jpg)
MALIN: “Multi-Arm bandit Learning for Iot Networks” with GRC: A TestBedImplementation and Demonstration that Learning Helps,by Lilian Besson, Rémi Bonnefoi, Christophe Moy.Demonstration presented in International Conference on Communication,Saint-Malo, France, June 2018.See YouTu.be/HospLNQhcMk for a 6-minutes presentation video.See Section 5.3.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 27
.Demonstrations in international conferences
![Page 105: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/105.jpg)
Analyse non asymptotique d’un test séquentiel de détection de ruptures etapplication aux bandits non stationnaires (in French),by Lilian Besson & Émilie Kaufmann,GRETSI, August 2019.See Chapter 7.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 27
.French language conferences with proceedings
![Page 106: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/106.jpg)
Decentralized Spectrum Learning for Radio Collision Mitigation in Ultra-Dense IoTNetworks: LoRaWAN Case Study and Measurements,by Christophe Moy, Lilian Besson, G. Delbarre & L. Toutain, July 2019.Submitted for a special volume of the Annals of Telecommunications journal, on“Machine Learning for Intelligent Wireless Communications and Networking”.See Chapter 5.
The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm forPiece-Wise Non-Stationary Bandits,by Lilian Besson & Émilie Kaufmann & Odalric-Ambrym Maillard, October 2019.Submitted for AISTATS 2020. Preprint at HAL.Inria.fr/hal-02006471.See Chapter 7.
SMPyBandits: an Open-Source Research Framework for Single and Multi-PlayersMulti-Arms Bandits (MAB) Algorithms in Python,by Lilian BessonActive development since October 2016, HAL.Inria.fr/hal-01840022.It currently consists in about 45000 lines of code, hosted onGitHub.com/SMPyBandits, and a complete documentation accessible onSMPyBandits.rtfd.io or SMPyBandits.GitHub.io.Submitted for JMLR MLOSS, in October 2019.See Chapter 3.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 27
.Submitted works. . .
![Page 107: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/107.jpg)
What Doubling-Trick Can and Can’t Do for Multi-Armed Bandits,by Lilian Besson & Émilie Kaufmann, September 2018.Preprint at HAL.Inria.fr/hal-01736357.
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 27
.In progress works waiting for a new submission. . .
![Page 108: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/108.jpg)
I included here some extra slides. . .
pseudo code of Rand-Top-M + kl-UCB
pseudo code of MC-Top-M + kl-UCB
exact regret bound of MC-Top-M + kl-UCB
pseudo code of GLRT + kl-UCB
exact regret bound of GLRT + kl-UCB
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 27
.Backup slides
![Page 109: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/109.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 27
.Our algorithm Rand-Top-M
![Page 110: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/110.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 17/ 27
.Our algorithm MC-Top-M
![Page 111: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/111.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 18/ 27
.Lemma: bad selections for MC-Top-M with kl-UCB
![Page 112: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/112.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 27
.Lemma: collisions for MC-Top-M with kl-UCB
![Page 113: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/113.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 27
.Theoreom: regret for MC-Top-M with kl-UCB
![Page 114: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/114.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 27
.Our algorigthm GLRT and kl-UCB
![Page 115: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/115.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 27
.Theorem: regret bound for GLRT + kl-UCB (global)
![Page 116: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/116.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 23/ 27
.Corollary: regret bounds for GLRT + kl-UCB (global)
![Page 117: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/117.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 24/ 27
.Theorem: regret bound for GLRT + kl-UCB (local)
![Page 118: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/118.jpg)
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 25/ 27
.Corollary: regret bounds for GLRT + kl-UCB (local)
![Page 119: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/119.jpg)
End of backup slides
Thanks for your attention!
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 26/ 27
.End of backup slides
![Page 120: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD](https://reader034.fdocuments.us/reader034/viewer/2022051902/5ff29adc7f184e57955565c5/html5/thumbnails/120.jpg)
© Jeph Jacques, 2015, QuestionableContent.net/view.php?comic=3074
PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 27/ 27
.What about the climatic crisis?