Optimized scheduling of sequential resource allocation systems (poster)

1
5.2 Static random switches Static random switches are defined only by the set of the enabled untimed transitions and not by the state itself, i.e., Ξ i = Ξ j if the vanishing states v i and v j activate the same set of untimed transitions The corresponding policy space contains all the “ static- priority ” policies Mathematically, the proposed restriction corresponds to a state space aggregation Hence, we can refine the obtained solution through (partial) disaggregation 4. The methodological framework (demo with an example resource allocation system) 1. Background and motivation Resource allocation in flexibly automated operations Optimized Scheduling of Sequential Resource Allocation Systems Ran Li ([email protected] ) Spyros Reveliotis ([email protected] ) WS 1 WS 2 I/O Port Process route: WS 1 -> WS 2 -> WS 1 0 1 2 3 4 5 6 7 8 9 μ 2 / ( μ 1 + μ 2 ) 13 12 14 17 23 μ 2 / ( μ 2 + μ 3 ) 18 20 19 22 11 55 56 58 59 60 62 65 26 28 29 30 51 53 μ 3 / ( μ 2 + μ 3 ) μ 2 / ( μ 1 + μ 2 ) μ 1 / ( μ 1 + μ 2 ) 49 50 47 48 44 46 45 25 32 33 34 36 37 38 40 41 57 63 27 35 64 10 16 15 μ 2 / ( μ 2 + μ 3 ) μ 3 / ( μ 2 + μ 3 ) μ 1 / ( μ 1 + μ 2 ) 24 μ 1 / ( μ 1 + μ 2 ) μ 2 / ( μ 1 + μ 2 ) 31 39 42 μ 2 / ( μ 1 + μ 2 ) μ 1 / ( μ 1 + μ 2 ) 43 μ 3 / ( μ 2 + μ 3 ) μ 2 / ( μ 2 + μ 3 ) 61 μ 2 / ( μ 1 + μ 2 ) μ 1 / ( μ 1 + μ 2 ) 52 μ 2 / ( μ 2 + μ 3 ) 54 μ 3 / ( μ 2 + μ 3 ) μ 2 / ( μ 2 + μ 3 ) 21 μ 3 / ( μ 2 + μ 3 ) maximize ζ η(ζ) = π(ζ) T • r subject to Ξ i T • 1 = 1.0 for all v i ε ξ ij for all v i and all j in {1,…,k(i)} where Ξ i = < ζ ij : j=1,…,k(i) > the random switch for vanishing state v i ζ = the vector collecting all ζ ij ε = a minimal degree of randomization in each Ξ i π(ζ) = the steady-state distribution for tangible states, defined by the pricing of each element of ζ r = the vector collecting the reward rates at the tangible states 4.1 The example system A flexibly automated production cell Objective Maximize long-run time average throughput Configuration 2 workstations (WS): each with 1 server, 2 buffer slots The jobs in processing occupy their buffer slots 1 process type with 3 stages Stage j takes exponentially distributed time length with rate μ j 4.2 Generalized stochastic Petri-net (GSPN) Route t 0 –p 0 –t 1 … p 6 –t 7 : the process route Untimed transitions: their firing is immediate, and models the allocation of resources Timed transitions: their firing has an exponentially distributed delay time, has lower priority than the firing of untimed transitions, and models the processing of job instances Places: Model the different process stages Places p 7 -p 10 : Model resource availability Place p 11 and its arcs (the red subnet): Models the applied DAP. Model as a discrete event system State space for the timed dynamics The underlying optimization problem 4.3 State transition diagram for the underlying semi-Markov process (SMP) with reward Tangible state: only timed transitions are enabled, and their branching probabilities are determined by exponential race Tangible state with rewards: the timed transition that models the output (i.e., transition t 7 ) is enabled Vanishing state: at least one untimed transition is enabled Vanishing state with a random switch : at least two untimed transitions are enabled, and a decision of “which fires first” is needed Flexibly automated production cell Automated guided vehicles (AGV) 2D traffic system of free-ranging mobile agents Multi-thread software Stage I-1 Stage I-2 Process Type I Stage II-1 Stage II-2a Stage II-2b Stage II-3 Process Type II Choose one alternative Resources and Requirement on them All these applications can be abstracted as sequential resource allocation systems (RAS) Sequential resource allocation systems A sequential resource allocation system consists of several process types, and reusable but finite resources of different types. A job instance of a process type can be executed by going through a number of stages. Each stage requires a certain amount of certain resource types and a random processing time. The job instances of different process types, or the same process type but different stages, may compete for the required resource. 2. Problem definition Objective Maximize some time-related performance measure, while maintaining behavioral correctness (e.g., avoid deadlocks). What can be regulated? Allocation of resources to the competing job instances 3. Method overview The logical control problem has been well studied in the community of discrete event systems. The performance control problem is in the domain of stochastic optimization. This research defines a discrete event model as the framework for solving performance control problem while integrating the existing logical control results, and develops the supporting methodology. RAS Domain Logical Control System State Model Performance Control Configuration Data Feasible Actions Admissible Actions Event Commanded Action Deadlock A pattern of “circular waiting”: all jobs in a given set cannot advance to their next stage since they are waiting for resources currently allocated to some other job in the set. Optimal deadlock avoidance policy (DAP) Forbid the actions that will unavoidably lead to deadlock states. Stage 2 job instance WS 2 WS 1 Stage 1 job instance No job instances can advance further, because all buffers are full Optimal DAP: not load new jobs if total number of job instances in stages 1 and 2 is three Deadlock and deadlock avoidance in the example system Implementation t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 p 0 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 p 11 Untimed Transitions Timed Transitions rate = μ 1 rate = μ 2 rate = μ 3 5. Coping with the underlying complexity t 2 and t 6 are enabled at state 25, but firing one transition does not disable the other 5.1 Random switch refinement Some random switches are not necessary since they do not reflect “real conflicts” in resource allocation Example: We can replace {t 2 , t 6 } by the singleton {t 2 }, but not {t 6 } : firing t 6 first “lost” the possibility to reach the tangible state 39 For each vanishing state, the replacement can be performed if it does not impact the potential to reach any tangible states. Such a refinement maintains the performance potential of the policy space X 4.4 Mathematic programming formulation Note that the vanishing states can be “collapsed” to tangible states since they have zero sojourn times and zero rewards. Then the SMP becomes a continuous time Markov chain (CTMC) The steady-state distribution π(ζ) can either be (i) computed through the “balance equation”, or (ii) estimated through steady-state simulation The whole state space The green and yellow nodes correspond to the two static random switches that remain in the state space of the example RAS of Section 4, after refinement of the initial random switches. 4.5. Computational challenges Explosion of v i => Explosion of ζ ij Explosion of π(ζ) In the example system: 3 stages 2 single servers 2 buffers of capacity 2 19 tangible states 47 vanishing states 20 random switches 27 decision variables state space Increasing system size => 5.3 Stochastic approximation: coping with the explosion of π(ζ) A typical iteration of stochastic approximation is: ζ k+1 = ζ k + γ k Y k ζ k is the vector of decision variables at iteration k, γ k is the positive step size, and Y k is the improvement direction. A typical choice of Y k for the average-reward problem of irreducible Markov chains is the estimated gradient. In this work, we adapt the Likelihood Ratio gradient estimator with a sample size of 2N regenerative cycles at each iteration, then: where p is transition probability u is revisiting time to the reference state Λ is sum of likelihood ratio of p, i.e. k u j j j j j k k m m p m m p 1 1 1 ) ( ) , ( ) , ( 1 1 1 1 1 2 1 2 1 0 1 1 2 2 2 2 2 2 2 1 2 1 2 2 1 2 2 2 2 1 2 2 2 1 2 1 2 2 ) ( ) ( ] ) ( )[ ( ] ) ( )[ ( 2 ˆ i i i i i i i i i i i i u u k k u u k k u u k k u u k k u u k k k i i N i u u k k k i i N m r m r m r u u m r u u u N Y 6. Conclusion An integrated framework for real- time management of sequential resource allocation systems based on the (formal) representational power of GSPNs; a parsimonious representation of the underlying conflicts; a pertinent specification of the set of target scheduling policies; results from sensitivity analysis of Markov reward processes. The table shows the effectiveness of the complexity control of 20 RAS configurations (Config. 1 is the example system of Section 4) R.S. = random switch(es) D.V. = decision variable(s) Config. Origin Apply refinement Apply static R.S. Num. of R.S. Num. of D.V. Num. of R.S. Num. of D.V. Num. of R.S. Num. of D.V. 1 20 27 5 5 2 2 2 4 4 1 1 1 1 3 40 56 11 11 2 2 4 128 177 35 35 2 2 5 1,007 1,374 269 269 2 2 6 71 84 9 9 1 1 7 346 463 49 49 2 2 8 742 966 112 112 2 2 9 4,304 5,498 677 677 2 2 10 13,302 20,948 2,083 2,290 13 15 11 7,573 11,368 1,513 1,513 4 4 12 2,781 4,018 678 678 4 4 13 2,468 3,759 609 609 5 5 14 519 693 106 106 5 5 15 4,256 5,887 759 759 6 6 16 1,851 2,534 243 243 6 6 17 163,695 270,738 30,805 35,420 15 17 18 74,655 109,948 12,313 12,313 4 4 19 322,052 525,166 80,142 85,117 19 22 20 788,731 1,270,562 139,496 154,069 14 17 0 1 2 3 5 6 7 8 9 μ2 / (μ1 + μ2) 12 17 23 μ2 / (μ2 + μ3) 18 19 22 11 55 62 65 26 28 29 51 53 μ3 / (μ2 + μ3) μ2 / (μ1 + μ2) μ1 / (μ1 + μ2) 49 50 47 44 46 45 25 32 33 34 36 38 40 41 63 35 64 10 16 15 μ2 / (μ2 + μ3) μ3 / (μ2 + μ3) μ1 / (μ1 + μ2) 24 μ1 / (μ1 + μ2) μ2 / (μ1 + μ2) 31 39 42 μ2 / (μ1 + μ2) μ1 / (μ1 + μ2) 43 μ3 / (μ2 + μ3) μ2 / (μ2 + μ3) 61 μ2 / (μ1 + μ2) μ1 / (μ1 + μ2) 52 μ2 / (μ2 + μ3) 54 μ3 / (μ2 + μ3) μ2 / (μ2 + μ3) 21 μ3 / (μ2 + μ3) t 3 t 3 t 5 t 5 t 6 26 28 29 30 t 6 t 2 25 36 37 38 27 t 2 35 t 3 t 6 t 5 t 0 31 t 3 t 6 t 6 t 3 t 5 39 t 0 t 3 t 3 t 5 t 5 t 6 t 6 t 2 25 27 t 2 t 3 t 6 t 5 t 0 t 3 t 6 t 6 t 3 t 5 t 0 26 28 29 30 36 37 38 35 31 39 t 3 t 3 t 5 t 5 t 6 t 6 t 2 25 27 t 2 t 3 t 6 t 5 t 0 t 3 t 6 t 6 t 3 t 5 t 0 26 28 29 30 36 37 38 35 31 39 t 3 26 t 6 t 2 25 36 38 31 t 5 39 t 0

Transcript of Optimized scheduling of sequential resource allocation systems (poster)

Page 1: Optimized scheduling of sequential resource allocation systems (poster)

5.2 Static random switchesStatic random switches are defined only by the set of the enabled untimed transitions and not by the state itself, i.e.,

Ξi = Ξj if the vanishing states vi and vj activate the same set of untimed transitions

• The corresponding policy space contains all the “static-priority” policies

• Mathematically, the proposed restriction corresponds to a state space aggregation

• Hence, we can refine the obtained solution through (partial) disaggregation

4. The methodological framework (demo with an example resource allocation system)1. Background and motivationResource allocation in flexibly automated operations

Optimized Scheduling of Sequential Resource Allocation SystemsRan Li ([email protected]) Spyros Reveliotis ([email protected])

WS1 WS2

I/O Port

Process route:WS1 -> WS2 -> WS1

0

1

2

3 4

56

7

8

9

μ2 / (μ1 + μ2)

13

12

14

17

23

μ2 / (μ2 + μ3)

18

20

19

22

11

5556

58

5960

62

65

26

28

29 30

51

53

μ3 / (μ2 + μ3)

μ2 / (μ1 + μ2)

μ1 / (μ1 + μ2)

4950 47

48

44

46

45

25

323334

36

37

38

4041

57

63

27

35

64

10

16

15

μ2 / (μ2 + μ3)

μ3 / (μ2 + μ3)

μ1 / (μ1 + μ2)

24

μ1 / (μ1 + μ2)

μ2 / (μ1 + μ2)

31

39

42

μ2 / (μ1 + μ2)μ1 / (μ1 + μ2)

43

μ3 / (μ2 + μ3)

μ2 / (μ2 + μ3)

61μ2 / (μ1 + μ2)

μ1 / (μ1 + μ2)

52

μ2 / (μ2 + μ3)

54

μ3 / (μ2 + μ3)μ2 / (μ2 + μ3)

21

μ3 / (μ2 + μ3)

maximize ζ η(ζ) = π(ζ) T • rsubject to

ΞiT • 1 = 1.0 for all vi

ε ≤ ξij for all vi and all j in {1,…,k(i)}where

Ξi = < ζij: j=1,…,k(i) > the random switch for vanishing state vi

ζ = the vector collecting all ζij

ε = a minimal degree of randomization in each Ξi

π(ζ) = the steady-state distribution for tangible states, defined by the pricing of each element of ζr = the vector collecting the reward rates at the tangible states

4.1 The example systemA flexibly automated production cell

ObjectiveMaximize long-run time average throughput

Configuration2 workstations (WS): each with 1 server, 2 buffer slotsThe jobs in processing occupy their buffer slots1 process type with 3 stagesStage j takes exponentially distributed time length with rate µj

4.2 Generalized stochastic Petri-net (GSPN)

Route t0 – p0 – t1 … p6 – t7: the process route• Untimed transitions: their firing is immediate,

and models the allocation of resources• Timed transitions: their firing has an

exponentially distributed delay time, has lower priority than the firing of untimed transitions, and models the processing of job instances

• Places: Model the different process stages

Places p7 - p10: Model resource availability

Place p11 and its arcs (the red subnet): Models the applied DAP.

Model as a discrete event system

State space for the timed dynamics

The underlying optimization problem

4.3 State transition diagram for the underlying semi-Markov process (SMP) with reward

Tangible state: only timed transitions are enabled, and their branching probabilities are determined by exponential race

Tangible state with rewards: the timed transition that models the output (i.e., transition t7) is enabled

Vanishing state: at least one untimed transition is enabled

Vanishing state with a random switch: at least two untimed transitions are enabled, and a decision of “which fires first” is needed

Flexibly automated production cell Automated guided vehicles (AGV) 2D traffic system of free-ranging mobile agents

Multi-thread software

Stage I-1

Stage I-2

Process Type IStage II-1

Stage II-2a Stage II-2b

Stage II-3

Process Type II

Choose one alternative

Resources andRequirement on them

All these applications can be abstracted as sequential resource allocation systems (RAS)

Sequential resource allocation systems

• A sequential resource allocation system consists of several process types, and reusable but finite resources of different types.

• A job instance of a process type can be executed by going through a number of stages.

• Each stage requires a certain amount of certain resource types and a random processing time.

• The job instances of different process types, or the same process type but different stages, may compete for the required resource.

2. Problem definition

Objective

• Maximize some time-related performance measure, while• maintaining behavioral correctness (e.g., avoid deadlocks).

What can be regulated?

• Allocation of resources to the competing job instances

3. Method overview The logical control problem has been well studied in the community of discrete event systems.The performance control problem is in the domain of stochastic optimization.This research defines a discrete event model as the framework for solving performance control problem while integrating the existing logical control results, and develops the supporting methodology.

RAS Domain

Logic

al C

ontr

ol

Syste

m S

tate

Model

Perf

orm

ance C

ontr

ol

Configuration Data

FeasibleActions

AdmissibleActions

Event Commanded

Action

DeadlockA pattern of “circular waiting”: all jobs in a given set cannot advance to their next stage since they are waiting for resources currently allocated to some other job in the set.

Optimal deadlock avoidance policy (DAP)Forbid the actions that will unavoidably lead to deadlock states.

Stage 2 job instance

WS2WS1

Stage 1 job instance

No job instances can advance further, because all buffers are full

Optimal DAP: not load new jobs if total number of job instances in stages 1 and 2 is three

Deadlock and deadlock avoidance in the example system Implementation

t0

t1

t2

t3

t4

t5

t6

t7

p0

p1

p2

p3

p4

p5

p6

p7

p8

p9 p10

p11

Untimed Transitions

Timed Transitions

rate = µ1

rate = µ2

rate = µ3

5. Coping with the underlying complexity

t2 and t6 are enabled at state 25, but firing one transition does not disable the other

5.1 Random switch refinement Some random switches are not necessary since they do not reflect “real conflicts” in resource allocation

Example:

We can replace {t2, t6} by the singleton {t2}, but not {t6} : firing t6 first “lost” the possibility to reach the tangible state 39

For each vanishing state, the replacement can be performed if it does not impact the potential to reach any tangible states. Such a refinement maintains the performance potential of the policy space

…√

X

4.4 Mathematic programming formulation

Note that the vanishing states can be “collapsed” to tangible states since they have zero sojourn times and zero rewards. Then the SMP becomes a continuous time Markov chain (CTMC)The steady-state distribution π(ζ) can either be(i) computed through the “balance equation”, or(ii) estimated through steady-state simulation

The whole state space

The green and yellow nodes correspond to the two static random switches that remain in the state space of the example RAS of Section 4, after refinement of the initial random switches.

4.5. Computational challenges

Explosion of vi => Explosion of ζij

Explosion of π(ζ)

In the example system:

3 stages2 single servers

2 buffers of capacity 2

19 tangible states47 vanishing states

20 random switches27 decision variables

state space

Increasing system size => 5.3 Stochastic approximation: coping with theexplosion of π(ζ) A typical iteration of stochastic approximation is:

ζk+1 = ζk + γk Yk

ζk is the vector of decision variables at iteration k, γk is the positive step size, and Yk is the improvement direction.A typical choice of Yk for the average-reward problem of irreducible Markov chains is the estimated gradient. In this work, we adapt the Likelihood Ratio gradient estimator with a sample size of 2N regenerative cycles at each iteration, then:

where p is transition probabilityu is revisiting time to the reference stateΛ is sum of likelihood ratio of p, i.e.

k

ujjj

jj

kk mmp

mmp

11

1

)( ),(

),(

1111

1

212

1

0

1

12222

2

22

12

12

2

12

2

22

12

22

12

12

2

)()(

])()[(])()[(2ˆ

i

i

i

i

i

i

i

i

i

i

i

i

u

uk k

u

uk k

u

uk k

u

uk k

u

uk kkii

N

i

u

uk kkii

N

mrmr

mruumruuu

NY

6. Conclusion

An integrated framework for real-time management of sequential resource allocation systems based on• the (formal) representational

power of GSPNs;• a parsimonious representation

of the underlying conflicts;• a pertinent specification of the

set of target scheduling policies;

• results from sensitivity analysis of Markov reward processes.

The table shows the effectiveness of the complexity control of 20 RAS configurations

(Config. 1 is the example system of Section 4)

R.S. = random switch(es)D.V. = decision variable(s)

Config.Origin Apply refinement Apply static R.S.

Num. of R.S.

Num. of D.V.

Num. of R.S.

Num. of D.V.

Num. of R.S.

Num. of D.V.

1 20 27 5 5 2 22 4 4 1 1 1 1

3 40 56 11 11 2 24 128 177 35 35 2 25 1,007 1,374 269 269 2 2

6 71 84 9 9 1 1

7 346 463 49 49 2 28 742 966 112 112 2 29 4,304 5,498 677 677 2 2

10 13,302 20,948 2,083 2,290 13 15

11 7,573 11,368 1,513 1,513 4 412 2,781 4,018 678 678 4 413 2,468 3,759 609 609 5 5

14 519 693 106 106 5 5

15 4,256 5,887 759 759 6 616 1,851 2,534 243 243 6 6

17 163,695 270,738 30,805 35,420 15 1718 74,655 109,948 12,313 12,313 4 419 322,052 525,166 80,142 85,117 19 2220 788,731 1,270,562 139,496 154,069 14 17

0

1

2

3

56

7

8

9

μ2 / (μ1 + μ2)

12

17

23

μ2 / (μ2 + μ3)

18

19

22

11

55

62

65

26

28

29

51

53

μ3 / (μ2 + μ3)

μ2 / (μ1 + μ2)

μ1 / (μ1 + μ2)

4950 47

44

46

45

25

323334

36

38

4041

63

35

64

10

16

15

μ2 / (μ2 + μ3)

μ3 / (μ2 + μ3)

μ1 / (μ1 + μ2)

24

μ1 / (μ1 + μ2)

μ2 / (μ1 + μ2)

31

39

42

μ2 / (μ1 + μ2)μ1 / (μ1 + μ2)

43

μ3 / (μ2 + μ3)

μ2 / (μ2 + μ3)

61μ2 / (μ1 + μ2)

μ1 / (μ1 + μ2)

52

μ2 / (μ2 + μ3)

54

μ3 / (μ2 + μ3)μ2 / (μ2 + μ3)

21

μ3 / (μ2 + μ3)

t3

t3

t5

t5

t6

26

28

29 30

t6

t225 36

37

38

27t2

35t3

t6

t5

t0

31

t3

t6

t6

t3

t5

39

t0

t3

t3

t5

t5

t6

t6

t225

27t2

t3

t6

t5

t0

t3

t6

t6

t3

t5

t0

26

28

29 30

36

37

38

35

31

39

t3

t3

t5

t5

t6

t6

t225

27t2

t3

t6

t5

t0

t3

t6

t6

t3

t5

t0

26

28

29 30

36

37

38

35

31

39

t3

26

t6

t225 36

38

31

t5

39

t0