Nearly optimal scheduling under time varying departure probabilities · 2011-02-10 · Nearly...
Transcript of Nearly optimal scheduling under time varying departure probabilities · 2011-02-10 · Nearly...
Nearly optimal scheduling under time varying departureprobabilities
Martin Erauskin
UPV / EHU
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 1 / 20
Outline of talk
Introduction
Problem description
A nearly optimal solution
Numerical results
Conclusions and future work
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 2 / 20
Multi armed bandit problem
A sequential decision problem where at each time slot the agent mustchoose one of K available options.
Depending on the chosen action, the agent receives a payoff at theend of the time slot.
Goal: maximize the present value of the future payoffs, choosing theright sequence of actions.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 3 / 20
Motivation: wireless application
In each time slot, thebase station selects acustomer to serve.
Channel conditions varydue to fading andinterference effects.
Each state representsdifferent channelconditions.
In each channel condition,the probability ofcompleting the job in onetime slot is different.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 4 / 20
Problem description
Time is slotted.
K customers waiting for service, k ∈ K = {1, 2, ...,K}.Nk = {1, 2, ...,Nk} set of possible states for customer k .
∀n ∈ Nk , µk,n: departure probability for customer k, if served, whenit is at state n.
∀n ∈ Nk , qk,n: probability for customer k of being at state n.
ck : holding cost of customer k per slot waiting for service.
0 ≤ µk,1 ≤ µk,2 ≤ · · · ≤ µk,Nk≤ 1.
Independence in the state evolution history, independence betweendifferent customer’s current states.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 5 / 20
A particular case: cµ rule
Let |Nk | = 1, ∀k ∈ K. Then:
Theorem
The policy that gives service to the customer k∗, where
k∗ = arg maxk∈K
ckµk
minimizes the expected total cost incurred by the system.
Remark: This policy also minimizes the one-period expected costincurred by the system.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 6 / 20
MDP formulation
A = {0, 1}: Action space. Action 0 means ’not serving’, action 1means ’serving’.
The expected one-period reward earned by customer k at state n,depending if it is served or not, will be given by
R1k,n = −ck(1− µk,n) R0
k,n = −ckXk(·): state process of customer k .
ak(·): action process of customer k.
We define the next β-average operator for 0 < β < 1:
Bπ0[Q
a(·)X (·), β
]:= lim
T→∞
T−1∑t=0
βt Eπ0[Q
a(t)X (t)
]T−1∑t=0
βt
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 7 / 20
MDP formulation
Let Π be the set of admisible policies.
The optimization problem can be described as follows:
maxπ∈Π
Bπ0
[∑k∈K
Rak (·)k,Xk (·)
](P)
subject to∑k∈K
ak(t) = 1, for all t ∈ T
The original problem can not be solved neither analytically nornumerically.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 8 / 20
Relaxations (P. Whittle (1988))
We relax the constraint: serve 1 customer on average.∑k∈K
ak(t) = 1 =⇒ Eπ
[∑k∈K
ak(t)
]= 1,∀t ∈ T
We relax again this constraint to the β-average constraint:
Eπ
[∑k∈K
ak(t)
]= 1, ∀t ∈ T =⇒ Bπ0
[∑k∈K
ak(·)
]= 1
We obtain the next relaxed problem:
maxπ∈Π
Bπ0
[∑k∈K
Rak (·)k,Xk (·)
](RP)
subject to Bπ0
[∑k∈K
ak(·)
]= 1
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 9 / 20
Solution: Potential Improvement rule
Relaxed problem can be approached using Lagrangian methods.
maxπ∈Π
Bπ0
[(∑k∈K
Rak (·)k,Xk (·) − ν
∑k∈K
ak(·)
)]− ν
We decompose this problem in K subproblems:
maxπ̃k∈Πk
Bπ̃k0
[(Rak (·)k,Xk (·) − νak(·)
)](SRP)
We solve K subproblems, and we obtain the joint optimal policy forthe relaxed problem combining them.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 10 / 20
Solution: Potential Improvement rule
Theorem
Let
νk,n =ckµk,n
(1− β) + β∑m>n
qk,m(µk,m − µk,n)for n 6= Nk , νk,Nk
=∞
Then:
If ν ≤ νk,n, it is optimal to serve customer k under state n ∈ Nk ;
If ν ≥ νk,n, it is optimal not to serve customer k under state n ∈ Nk ;
Sketch of the proof.
By solving the dynamic programming equation.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 11 / 20
Solution: Potential Improvement rule
We construct a feasible policy for the original problem, using theoptimal solution of the relaxed problem:
Potential Improvement rule: gives service at time t to job k∗ (t)such that:
k∗(t) := arg maxk∈K
νk,Xk (t)
Not necessarilly optimal for the original problem.
For β = 1, we have the time-average index:
νk,n =ckµk,n∑
m>nqk,m(µk,m − µk,n)
for n 6= Nk , νk,Nk=∞
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 12 / 20
Scheduling disciplines
cµ index:
νcµk,n := ckµk,n for n ∈ Nk ;
Score Based index (T.Bonald, 2004):
νSBk,n := ck
n∑m=1
qk,m, for n ∈ Nk .
Relatively Best index (Qualcomm 3G standard, 2000):
νRBk,n :=
ckµk,nNk∑m=1
qk,mµk,m
, for n ∈ Nk .
Potential Improvement index:
νPIk,n =
ckµk,n∑m>n
qk,m(µk,m − µk,n)for n 6= Nk , νPI
k,Nk=∞
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 13 / 20
Problem with arrivals of new customers
We consider k ∈ K different classes of customers.
λk : probability, in each time slot, of having a new customer of class k.
Definition
A system is called stable if the number of customers does not explode.
Consider
%k =λkµk,Nk
% =∑k∈K
%k
Theorem (S. Aalto, P. Lassila (2010))
If any customer in its best state is preferred over any other customerwhich is not in its best state, then the policy is stable for every % < 1.
Remark: PI rule is stable for every % < 1.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 14 / 20
Numerical simulations: scenario 1
Two classes, k ∈ K = {1, 2}.λ2 fixed.Departure probabilities fixed for both classes of customers.c1 = c2 = 1.We move λ1 such that % varies from 0.5 to 1.
Figure: Mean number of customers in the system versus %, Scenario 1.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 15 / 20
Numerical simulations: scenario 1
Two classes, k ∈ K = {1, 2}.λ2 fixed.Departure probabilities fixed for both classes of customers.c1 = c2 = 1.We move λ1 such that % varies from 0.5 to 1.
Figure: Mean number of customers in the system versus %, Scenario 1.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 15 / 20
Numerical simulations: scenario 1
Figure: Sample path of the number of customers in the system in Scenario 1,% = 0.95.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 16 / 20
Numerical simulations: scenario 1
Mean number of class-2 customers versus mean number of class-1jobs.
Indifference curves link points with the same value of %.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 17 / 20
Numerical simulations: scenario 1
Mean number of class-2 customers versus mean number of class-1jobs.
Indifference curves link points with the same value of %.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 17 / 20
Numerical simulations: scenario 2
λ1 and λ2 fixed.
Departure probabilities fixed for class-2 customers.
We vary proportionally departure probabilities for class-1 customers,moving % between 0.50 and 1.
Figure: Mean number of customers in the system versus %, Scenario 2.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 18 / 20
Numerical simulations: scenario 2
λ1 and λ2 fixed.
Departure probabilities fixed for class-2 customers.
We vary proportionally departure probabilities for class-1 customers,moving % between 0.50 and 1.
Figure: Mean number of customers in the system versus %, Scenario 2.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 18 / 20
Stochastic dominance
Definition
Two random variables X and Y are stochastically ordered (denoted asX ≤st Y ) if and only if P(X ≤ z) ≥ P(Y ≤ z), ∀z .
Simulations suggest stochastic dominance of Potential Improvementrule over the other rules.
X: Number of jobs in the system.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 19 / 20
Conclusions and future work
Main conclusions:
Depending on the value of the parameters, RB or cµ might outperformthe others.PI consistently outperforms all the other policies (or is equivalent tothe best one).Simulations strongly suggest stochastic dominance of PI over the otherrules.The stability region is the maximum for PI rule, while it is not for cµand RB rules.
Future work:
Include correlations between the states of different jobs in the model.Overload analysis: The slope of the sample paths when system isunstable.Stability region for different policies.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 20 / 20
Conclusions and future work
Main conclusions:
Depending on the value of the parameters, RB or cµ might outperformthe others.PI consistently outperforms all the other policies (or is equivalent tothe best one).Simulations strongly suggest stochastic dominance of PI over the otherrules.The stability region is the maximum for PI rule, while it is not for cµand RB rules.
Future work:
Include correlations between the states of different jobs in the model.Overload analysis: The slope of the sample paths when system isunstable.Stability region for different policies.
Martin Erauskin (UPV / EHU) Nearly optimal scheduling 20 / 20