Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan...
-
Upload
marianna-copeland -
Category
Documents
-
view
215 -
download
2
Transcript of Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan...
![Page 1: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/1.jpg)
Solving Generalized Semi-Markov Decision Processes usingContinuous Phase-Type Distributions
Håkan L. S. Younes Reid G. SimmonsCarnegie Mellon University
![Page 2: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/2.jpg)
2
Introduction Asynchronous processes are abundant
in the real world Telephone system, computer network, etc.
Discrete-time and semi-Markov models are inappropriate for systems with asynchronous events
Generalized semi-Markov (decision) processes, GSM(D)Ps, are great for this! Approximate solution using phase-type
distributions and your favorite MDP solver
![Page 3: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/3.jpg)
3
Asynchronous Processes: Example
m1 m2
m1 upm2 up
t = 0
![Page 4: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/4.jpg)
4
Asynchronous Processes: Example
m1 m2
m1 upm2 up
m1 upm2 down
m2 crashes
t = 0 t = 2.5
![Page 5: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/5.jpg)
5
Asynchronous Processes: Example
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 crashesm2 crashes
t = 0 t = 2.5 t = 3.1
![Page 6: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/6.jpg)
6
Asynchronous Processes: Example
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 downm2 up
m1 crashesm1 crashes m2 rebooted
t = 0 t = 2.5 t = 3.1 t = 4.9
![Page 7: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/7.jpg)
7
A Model of StochasticDiscrete Event Systems
Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S
GSMDP Actions A E are controllable events
![Page 8: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/8.jpg)
8
Events
With each event e is associated: A condition e identifying the set of
states in which e is enabled A distribution Ge governing the time e
must remain enabled before it triggers A distribution pe(s′|s) determining the
probability that the next state is s′ if e triggers in state s
![Page 9: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/9.jpg)
9
Events: Example
Network with two machines Crash time: Exp(1) Reboot time: U(0,1)
Gc1 = Exp(1)Gc2 = Exp(1)
Asynchronous events beyond semi-Markov
Gc1 = Exp(1)Gr2 = U(0,1)
Gr2 = U(0,0.5)
t = 0 t = 0.6 t = 1.1
crash m2m1 upm2 up
m1 upm2 down
m1 downm2 down
crash m1
![Page 10: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/10.jpg)
10
Policies
Actions as controllable events We can choose to disable an action
even if its enabling condition is satisfied
A policy determines the set of actions to keep enabled at any given time during execution
![Page 11: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/11.jpg)
11
Rewards and Optimality Lump sum reward k(s,e,s′) associated
with transition from s to s′ caused by e Continuous reward rate r(s,A)
associated with A being enabled in s Infinite-horizon discounted reward
Unit reward earned at time t counts as e –t
Optimal choice may depend on entire execution history
![Page 12: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/12.jpg)
12
GSMDP Solution Method
Continuous-time MDPGSMDP Discrete-time MDPDiscrete-time MDP
Phase-type distributions(approximation)
Uniformization[Jensen 1953]
GSMDP Continuous-time MDP
MDP policyGSMDP policySimulate
phase transitions
![Page 13: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/13.jpg)
13
Continuous Phase-Type Distributions [Neuts 1981]
Time to absorption in a continuous-time Markov chain with n transient states
0
Exponential
10p1
(1 – p)1
2
Two-phase Coxian
n – 1 10 …p
(1 – p)
n-phase generalized Erlang
![Page 14: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/14.jpg)
14
Approximating GSMDP with Continuous-time MDP
Approximate each distribution Ge with a continuous phase-type distribution Phases become part of state
description Phases represent discretization into
random-length intervals of the time events have been enabled
![Page 15: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/15.jpg)
15
Policy Execution
The policy we obtain is a mapping from modified state space to actions
To execute a policy we need to simulate phase transitions
Times when action choice may change: Triggering of actual event or action Simulated phase transition
![Page 16: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/16.jpg)
16
Method of Moments
Approximate general distribution G with phase-type distribution PH by matching the first k moments Mean (first moment): 1
Variance: 2 = 2 – 1
2 The ith moment: i = E[X
i]
Coefficient of variation: cv = /1
![Page 17: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/17.jpg)
17
Matching One Moment
Exponential distribution: = 1/1
![Page 18: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/18.jpg)
18
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
![Page 19: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/19.jpg)
19
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
![Page 20: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/20.jpg)
20
cv 2
Matching Two Moments
0 1
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
21
2
21
2
1
cv
22
1
cvp
Two-Phase Coxian Distribution
1
1
Exponential Distribution
![Page 21: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/21.jpg)
21
Matching Three Moments
Combination of Erlang and two-phase Coxian [Osogami & Harchol-Balter, TOOLS’03]
n – 2n – 30 …
(1 – p)1
2n – 1
p1
![Page 22: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/22.jpg)
22
The Foreman’s Dilemma
When to enable “Service” action in “Working” state?
Workingc = 1
Failedc = 0
Servicedc = 0.5
ServiceExp(10)
ReturnExp(1)
FailG
ReplaceExp(1/100)
![Page 23: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/23.jpg)
23
The Foreman’s Dilemma: Optimal Solution
Find t0 that maximizes v0
0201
0
210
2
1
1
1
1001
1
11
)(1)(11
)(1)(
vvvv
dtveetFtfveetFtfv ttXY
ttYX
t
XX
ttX
dxxftF
tte
tttf
0
0)(10
0
)()(
10
0)(
0
Y is the time to failure in “Working” state
![Page 24: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/24.jpg)
24
The Foreman’s Dilemma: SMDP Solution
Same formulas, but restricted choice: Action is immediately enabled (t0 = 0) Action is never enabled (t0 = ∞)
![Page 25: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/25.jpg)
25
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: U(5,x)
x
Perc
en
t of
op
tim
al
100
90
80
70
60
505 10 15 20 25 30 35 40 45 50
![Page 26: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/26.jpg)
26
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: W(1.6x,4.5)
x
Perc
en
t of
op
tim
al
100
90
80
70
60
505 10 15 20 25 30 35 400
![Page 27: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/27.jpg)
27
System Administration
Network of n machines Reward rate c(s) = k in states where
k machines are up One crash event and one reboot
action per machine At most one action enabled at any
time (single agent)
![Page 28: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/28.jpg)
28
System Administration: Performance
3 moments2 moments1 moment
Rew
ard
50
45
40
35
30
25
20
15
1 2 3 4 5 6 7 8 9 10 11 12 13n
Reboot-time distribution: U(0,1)
![Page 29: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/29.jpg)
29
System Administration: Performance
1 moment 2 moments 3 moments
size
states time (s) states time (s) states time (s)
4 16 0.36 32 3.57 112 10.30
5 32 0.82 80 7.72 272 22.33
6 64 1.89 192 16.24 640 40.98
7 128 3.65 448 28.04 1472 69.06
8 256 6.98 1024 48.11 3328 114.63
9 512 16.04 2304 80.27 7424 176.93
10 1024 33.58 5120 136.4 16384 291.70
11 2048 66.00 24576 264.17 35840 481.10
12 4096 111.96 53248 646.97 77824 1051.33
13 8192 210.03 114688 2588.95 167936 3238.16
2n (n+1)2n (1.5n+1)2n
![Page 30: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/30.jpg)
30
Summary
Generalized semi-Markov (decision) processes allow asynchronous events
Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs
and SMDPs using existing MDP techniques Phase does matter!
![Page 31: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/31.jpg)
31
Future Work
Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step
Other optimization criteria Finite horizon, etc.
Computational complexity of optimal GSMDP planning
![Page 32: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/32.jpg)
32
Tempastic-DTP
A tool for GSMDP planning:http://www.cs.cmu.edu/~lorens/tempastic-dtp.html
![Page 33: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/33.jpg)
33
Matching Moments: Example 1
Weibull distribution: W(1,1/2) 1 = 2, cv2 = 5
one momenttwo moments
W(1,1/2)
1 2 3 4 5 6 7 8
0.5
1.0
t
F(t)10
1/10
9/10
1/10
![Page 34: Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47aa8/html5/thumbnails/34.jpg)
34
Matching Moments: Example 2
Uniform distribution: U(0,1) 1 = 1/2, cv2 = 1/3
0.5
1.0F(t)
U(0,1)
t
one momenttwo moments
1 2
106 6
26