Optimized scheduling of sequential resource allocation systems (poster)
Transcript of Optimized scheduling of sequential resource allocation systems (poster)
5.2 Static random switchesStatic random switches are defined only by the set of the enabled untimed transitions and not by the state itself, i.e.,
Ξi = Ξj if the vanishing states vi and vj activate the same set of untimed transitions
• The corresponding policy space contains all the “static-priority” policies
• Mathematically, the proposed restriction corresponds to a state space aggregation
• Hence, we can refine the obtained solution through (partial) disaggregation
4. The methodological framework (demo with an example resource allocation system)1. Background and motivationResource allocation in flexibly automated operations
Optimized Scheduling of Sequential Resource Allocation SystemsRan Li ([email protected]) Spyros Reveliotis ([email protected])
WS1 WS2
I/O Port
Process route:WS1 -> WS2 -> WS1
0
1
2
3 4
56
7
8
9
μ2 / (μ1 + μ2)
13
12
14
17
23
μ2 / (μ2 + μ3)
18
20
19
22
11
5556
58
5960
62
65
26
28
29 30
51
53
μ3 / (μ2 + μ3)
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
4950 47
48
44
46
45
25
323334
36
37
38
4041
57
63
27
35
64
10
16
15
μ2 / (μ2 + μ3)
μ3 / (μ2 + μ3)
μ1 / (μ1 + μ2)
24
μ1 / (μ1 + μ2)
μ2 / (μ1 + μ2)
31
39
42
μ2 / (μ1 + μ2)μ1 / (μ1 + μ2)
43
μ3 / (μ2 + μ3)
μ2 / (μ2 + μ3)
61μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
52
μ2 / (μ2 + μ3)
54
μ3 / (μ2 + μ3)μ2 / (μ2 + μ3)
21
μ3 / (μ2 + μ3)
maximize ζ η(ζ) = π(ζ) T • rsubject to
ΞiT • 1 = 1.0 for all vi
ε ≤ ξij for all vi and all j in {1,…,k(i)}where
Ξi = < ζij: j=1,…,k(i) > the random switch for vanishing state vi
ζ = the vector collecting all ζij
ε = a minimal degree of randomization in each Ξi
π(ζ) = the steady-state distribution for tangible states, defined by the pricing of each element of ζr = the vector collecting the reward rates at the tangible states
4.1 The example systemA flexibly automated production cell
ObjectiveMaximize long-run time average throughput
Configuration2 workstations (WS): each with 1 server, 2 buffer slotsThe jobs in processing occupy their buffer slots1 process type with 3 stagesStage j takes exponentially distributed time length with rate µj
4.2 Generalized stochastic Petri-net (GSPN)
Route t0 – p0 – t1 … p6 – t7: the process route• Untimed transitions: their firing is immediate,
and models the allocation of resources• Timed transitions: their firing has an
exponentially distributed delay time, has lower priority than the firing of untimed transitions, and models the processing of job instances
• Places: Model the different process stages
Places p7 - p10: Model resource availability
Place p11 and its arcs (the red subnet): Models the applied DAP.
Model as a discrete event system
State space for the timed dynamics
The underlying optimization problem
4.3 State transition diagram for the underlying semi-Markov process (SMP) with reward
Tangible state: only timed transitions are enabled, and their branching probabilities are determined by exponential race
Tangible state with rewards: the timed transition that models the output (i.e., transition t7) is enabled
Vanishing state: at least one untimed transition is enabled
Vanishing state with a random switch: at least two untimed transitions are enabled, and a decision of “which fires first” is needed
Flexibly automated production cell Automated guided vehicles (AGV) 2D traffic system of free-ranging mobile agents
Multi-thread software
Stage I-1
Stage I-2
Process Type IStage II-1
Stage II-2a Stage II-2b
Stage II-3
Process Type II
Choose one alternative
Resources andRequirement on them
All these applications can be abstracted as sequential resource allocation systems (RAS)
Sequential resource allocation systems
• A sequential resource allocation system consists of several process types, and reusable but finite resources of different types.
• A job instance of a process type can be executed by going through a number of stages.
• Each stage requires a certain amount of certain resource types and a random processing time.
• The job instances of different process types, or the same process type but different stages, may compete for the required resource.
2. Problem definition
Objective
• Maximize some time-related performance measure, while• maintaining behavioral correctness (e.g., avoid deadlocks).
What can be regulated?
• Allocation of resources to the competing job instances
3. Method overview The logical control problem has been well studied in the community of discrete event systems.The performance control problem is in the domain of stochastic optimization.This research defines a discrete event model as the framework for solving performance control problem while integrating the existing logical control results, and develops the supporting methodology.
RAS Domain
Logic
al C
ontr
ol
Syste
m S
tate
Model
Perf
orm
ance C
ontr
ol
Configuration Data
FeasibleActions
AdmissibleActions
Event Commanded
Action
DeadlockA pattern of “circular waiting”: all jobs in a given set cannot advance to their next stage since they are waiting for resources currently allocated to some other job in the set.
Optimal deadlock avoidance policy (DAP)Forbid the actions that will unavoidably lead to deadlock states.
Stage 2 job instance
WS2WS1
Stage 1 job instance
No job instances can advance further, because all buffers are full
Optimal DAP: not load new jobs if total number of job instances in stages 1 and 2 is three
Deadlock and deadlock avoidance in the example system Implementation
t0
t1
t2
t3
t4
t5
t6
t7
p0
p1
p2
p3
p4
p5
p6
p7
p8
p9 p10
p11
Untimed Transitions
Timed Transitions
rate = µ1
rate = µ2
rate = µ3
5. Coping with the underlying complexity
t2 and t6 are enabled at state 25, but firing one transition does not disable the other
5.1 Random switch refinement Some random switches are not necessary since they do not reflect “real conflicts” in resource allocation
Example:
We can replace {t2, t6} by the singleton {t2}, but not {t6} : firing t6 first “lost” the possibility to reach the tangible state 39
For each vanishing state, the replacement can be performed if it does not impact the potential to reach any tangible states. Such a refinement maintains the performance potential of the policy space
…√
X
4.4 Mathematic programming formulation
Note that the vanishing states can be “collapsed” to tangible states since they have zero sojourn times and zero rewards. Then the SMP becomes a continuous time Markov chain (CTMC)The steady-state distribution π(ζ) can either be(i) computed through the “balance equation”, or(ii) estimated through steady-state simulation
The whole state space
The green and yellow nodes correspond to the two static random switches that remain in the state space of the example RAS of Section 4, after refinement of the initial random switches.
4.5. Computational challenges
Explosion of vi => Explosion of ζij
Explosion of π(ζ)
In the example system:
3 stages2 single servers
2 buffers of capacity 2
19 tangible states47 vanishing states
20 random switches27 decision variables
state space
Increasing system size => 5.3 Stochastic approximation: coping with theexplosion of π(ζ) A typical iteration of stochastic approximation is:
ζk+1 = ζk + γk Yk
ζk is the vector of decision variables at iteration k, γk is the positive step size, and Yk is the improvement direction.A typical choice of Yk for the average-reward problem of irreducible Markov chains is the estimated gradient. In this work, we adapt the Likelihood Ratio gradient estimator with a sample size of 2N regenerative cycles at each iteration, then:
where p is transition probabilityu is revisiting time to the reference stateΛ is sum of likelihood ratio of p, i.e.
k
ujjj
jj
kk mmp
mmp
11
1
)( ),(
),(
1111
1
212
1
0
1
12222
2
22
12
12
2
12
2
22
12
22
12
12
2
)()(
])()[(])()[(2ˆ
i
i
i
i
i
i
i
i
i
i
i
i
u
uk k
u
uk k
u
uk k
u
uk k
u
uk kkii
N
i
u
uk kkii
N
mrmr
mruumruuu
NY
6. Conclusion
An integrated framework for real-time management of sequential resource allocation systems based on• the (formal) representational
power of GSPNs;• a parsimonious representation
of the underlying conflicts;• a pertinent specification of the
set of target scheduling policies;
• results from sensitivity analysis of Markov reward processes.
The table shows the effectiveness of the complexity control of 20 RAS configurations
(Config. 1 is the example system of Section 4)
R.S. = random switch(es)D.V. = decision variable(s)
Config.Origin Apply refinement Apply static R.S.
Num. of R.S.
Num. of D.V.
Num. of R.S.
Num. of D.V.
Num. of R.S.
Num. of D.V.
1 20 27 5 5 2 22 4 4 1 1 1 1
3 40 56 11 11 2 24 128 177 35 35 2 25 1,007 1,374 269 269 2 2
6 71 84 9 9 1 1
7 346 463 49 49 2 28 742 966 112 112 2 29 4,304 5,498 677 677 2 2
10 13,302 20,948 2,083 2,290 13 15
11 7,573 11,368 1,513 1,513 4 412 2,781 4,018 678 678 4 413 2,468 3,759 609 609 5 5
14 519 693 106 106 5 5
15 4,256 5,887 759 759 6 616 1,851 2,534 243 243 6 6
17 163,695 270,738 30,805 35,420 15 1718 74,655 109,948 12,313 12,313 4 419 322,052 525,166 80,142 85,117 19 2220 788,731 1,270,562 139,496 154,069 14 17
0
1
2
3
56
7
8
9
μ2 / (μ1 + μ2)
12
17
23
μ2 / (μ2 + μ3)
18
19
22
11
55
62
65
26
28
29
51
53
μ3 / (μ2 + μ3)
μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
4950 47
44
46
45
25
323334
36
38
4041
63
35
64
10
16
15
μ2 / (μ2 + μ3)
μ3 / (μ2 + μ3)
μ1 / (μ1 + μ2)
24
μ1 / (μ1 + μ2)
μ2 / (μ1 + μ2)
31
39
42
μ2 / (μ1 + μ2)μ1 / (μ1 + μ2)
43
μ3 / (μ2 + μ3)
μ2 / (μ2 + μ3)
61μ2 / (μ1 + μ2)
μ1 / (μ1 + μ2)
52
μ2 / (μ2 + μ3)
54
μ3 / (μ2 + μ3)μ2 / (μ2 + μ3)
21
μ3 / (μ2 + μ3)
t3
t3
t5
t5
t6
26
28
29 30
t6
t225 36
37
38
27t2
35t3
t6
t5
t0
31
t3
t6
t6
t3
t5
39
t0
t3
t3
t5
t5
t6
t6
t225
27t2
t3
t6
t5
t0
t3
t6
t6
t3
t5
t0
26
28
29 30
36
37
38
35
31
39
t3
t3
t5
t5
t6
t6
t225
27t2
t3
t6
t5
t0
t3
t6
t6
t3
t5
t0
26
28
29 30
36
37
38
35
31
39
t3
26
t6
t225 36
38
31
t5
39
t0