Nearly optimal scheduling under time varying departure probabilities · 2011-02-10 · Nearly...

Nearly optimal scheduling under time varying departureprobabilities

Martin Erauskin

UPV / EHU

Martin Erauskin (UPV / EHU) Nearly optimal scheduling 1 / 20

Outline of talk

Introduction

Problem description

A nearly optimal solution

Numerical results

Conclusions and future work


Multi armed bandit problem

A sequential decision problem where at each time slot the agent mustchoose one of K available options.

Depending on the chosen action, the agent receives a payoff at theend of the time slot.

Goal: maximize the present value of the future payoffs, choosing theright sequence of actions.


Motivation: wireless application

In each time slot, thebase station selects acustomer to serve.

Channel conditions varydue to fading andinterference effects.

Each state representsdifferent channelconditions.

In each channel condition,the probability ofcompleting the job in onetime slot is different.


Problem description

Time is slotted.

K customers waiting for service, k ∈ K = {1, 2, ...,K}.Nk = {1, 2, ...,Nk} set of possible states for customer k .

∀n ∈ Nk , µk,n: departure probability for customer k, if served, whenit is at state n.

∀n ∈ Nk , qk,n: probability for customer k of being at state n.

ck : holding cost of customer k per slot waiting for service.

0 ≤ µk,1 ≤ µk,2 ≤ · · · ≤ µk,Nk≤ 1.

Independence in the state evolution history, independence betweendifferent customer’s current states.


A particular case: cµ rule

Let |Nk | = 1, ∀k ∈ K. Then:

Theorem

The policy that gives service to the customer k∗, where

k∗ = arg maxk∈K

ckµk

minimizes the expected total cost incurred by the system.

Remark: This policy also minimizes the one-period expected costincurred by the system.


MDP formulation

A = {0, 1}: Action space. Action 0 means ’not serving’, action 1means ’serving’.

The expected one-period reward earned by customer k at state n,depending if it is served or not, will be given by

R1k,n = −ck(1− µk,n) R0

k,n = −ckXk(·): state process of customer k .

ak(·): action process of customer k.

We define the next β-average operator for 0 < β < 1:

Bπ0[Q

a(·)X (·), β

]:= lim

T→∞

T−1∑t=0

βt Eπ0[Q

a(t)X (t)

]T−1∑t=0

βt


MDP formulation

Let Π be the set of admisible policies.

The optimization problem can be described as follows:

maxπ∈Π

Bπ0

[∑k∈K

Rak (·)k,Xk (·)

](P)

subject to∑k∈K

ak(t) = 1, for all t ∈ T

The original problem can not be solved neither analytically nornumerically.


Relaxations (P. Whittle (1988))

We relax the constraint: serve 1 customer on average.∑k∈K

ak(t) = 1 =⇒ Eπ

[∑k∈K

ak(t)

]= 1,∀t ∈ T

We relax again this constraint to the β-average constraint:

Eπ

[∑k∈K

ak(t)

]= 1, ∀t ∈ T =⇒ Bπ0

[∑k∈K

ak(·)

]= 1

We obtain the next relaxed problem:

maxπ∈Π

Bπ0

[∑k∈K

Rak (·)k,Xk (·)

](RP)

subject to Bπ0

[∑k∈K

ak(·)

]= 1


Solution: Potential Improvement rule

Relaxed problem can be approached using Lagrangian methods.

maxπ∈Π

Bπ0

[(∑k∈K

Rak (·)k,Xk (·) − ν

∑k∈K

ak(·)

)]− ν

We decompose this problem in K subproblems:

maxπ̃k∈Πk

Bπ̃k0

[(Rak (·)k,Xk (·) − νak(·)

)](SRP)

We solve K subproblems, and we obtain the joint optimal policy forthe relaxed problem combining them.



Theorem

Let

νk,n =ckµk,n

(1− β) + β∑m>n

qk,m(µk,m − µk,n)for n 6= Nk , νk,Nk

=∞

Then:

If ν ≤ νk,n, it is optimal to serve customer k under state n ∈ Nk ;

If ν ≥ νk,n, it is optimal not to serve customer k under state n ∈ Nk ;

Sketch of the proof.

By solving the dynamic programming equation.



We construct a feasible policy for the original problem, using theoptimal solution of the relaxed problem:

Potential Improvement rule: gives service at time t to job k∗ (t)such that:

k∗(t) := arg maxk∈K

νk,Xk (t)

Not necessarilly optimal for the original problem.

For β = 1, we have the time-average index:

νk,n =ckµk,n∑

m>nqk,m(µk,m − µk,n)

for n 6= Nk , νk,Nk=∞


Scheduling disciplines

cµ index:

νcµk,n := ckµk,n for n ∈ Nk ;

Score Based index (T.Bonald, 2004):

νSBk,n := ck

n∑m=1

qk,m, for n ∈ Nk .

Relatively Best index (Qualcomm 3G standard, 2000):

νRBk,n :=

ckµk,nNk∑m=1

qk,mµk,m

, for n ∈ Nk .

Potential Improvement index:

νPIk,n =

ckµk,n∑m>n

qk,m(µk,m − µk,n)for n 6= Nk , νPI

k,Nk=∞


Problem with arrivals of new customers

We consider k ∈ K different classes of customers.

λk : probability, in each time slot, of having a new customer of class k.

Definition

A system is called stable if the number of customers does not explode.

Consider

%k =λkµk,Nk

% =∑k∈K

%k

Theorem (S. Aalto, P. Lassila (2010))

If any customer in its best state is preferred over any other customerwhich is not in its best state, then the policy is stable for every % < 1.

Remark: PI rule is stable for every % < 1.


Numerical simulations: scenario 1

Two classes, k ∈ K = {1, 2}.λ2 fixed.Departure probabilities fixed for both classes of customers.c1 = c2 = 1.We move λ1 such that % varies from 0.5 to 1.

Figure: Mean number of customers in the system versus %, Scenario 1.



Figure: Sample path of the number of customers in the system in Scenario 1,% = 0.95.



Mean number of class-2 customers versus mean number of class-1jobs.

Indifference curves link points with the same value of %.



λ1 and λ2 fixed.

Departure probabilities fixed for class-2 customers.

We vary proportionally departure probabilities for class-1 customers,moving % between 0.50 and 1.

Figure: Mean number of customers in the system versus %, Scenario 2.


Stochastic dominance

Definition

Two random variables X and Y are stochastically ordered (denoted asX ≤st Y ) if and only if P(X ≤ z) ≥ P(Y ≤ z), ∀z .

Simulations suggest stochastic dominance of Potential Improvementrule over the other rules.

X: Number of jobs in the system.


Conclusions and future work

Main conclusions:

Depending on the value of the parameters, RB or cµ might outperformthe others.PI consistently outperforms all the other policies (or is equivalent tothe best one).Simulations strongly suggest stochastic dominance of PI over the otherrules.The stability region is the maximum for PI rule, while it is not for cµand RB rules.

Future work:

Include correlations between the states of different jobs in the model.Overload analysis: The slope of the sample paths when system isunstable.Stability region for different policies.


Nearly optimal scheduling under time varying departure probabilities · 2011-02-10 · Nearly...

Documents

Transcript of Nearly optimal scheduling under time varying departure probabilities · 2011-02-10 · Nearly...