On Time with Minimal Expected Cost ! Time with Minimal Expected Cost ! Alexandre David, Peter G...

Post on 22-Mar-2018

218 views 3 download

Transcript of On Time with Minimal Expected Cost ! Time with Minimal Expected Cost ! Alexandre David, Peter G...

On Time with Minimal Expected Cost !

Alexandre David, Peter G Jensen, Kim G Larsen, Axel Legay, Didier Lime, Mathias G Sørensen, Jakob H Taankvist

– Aalborg University

DENMARK

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

Motivation

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [2]

Optimal WC strategy:

E=WC=45 (bicycle)

Optimal Expected Strategy:

E=33, WC=71 (car)

Optimal Expected Strategy ensuring WC<=60:

try train upto 3 delays then bike

E=37.5, WC=59

Bruyere, V., Filiot, E., Randour, M., Raskin, J.F.: Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. STACS14

2-Player Game (Antagonistic opponent)

Markov Decision Process (probabilistic opponent)

Duration Probabilistic Automata

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [3]

P2

P1

Pr[<=1000](<> P1.Done && P2.Done)

Kempf, J.F., Bozga, M., Maler, O.: As soon as probable: Optimal scheduling under stochastic uncertainty. In: TACAS. pp. 385{400 (2013)

Uni[2,6]

Race

Uni[2,4]

Strategy that that will minimize

expected completion time??

Motivation

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [4]

× Priced Timed Game

Timed Game Timed Automata Priced Timed Automata Stochastic (P)TA × Priced Timed MDP ∼ Decision Stochastic Priced Timed Automata

TIGA

UPPAAL

CORA

SMC

TIGA/SMC

Minimize expected cost subject to guaranteed time-bound

Uni[0,100]

Nondeterministic choice

Overview

Priced Timed Games

Time bounded reachability strategies

Priced Timed Markov Decision Processes

Minimal expected cost reachability strategy

Optimal Strategy Synthesis Using Reinforcement Learning

Representation of Stochastic Strategies

Experimental Results

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [5]

PTA and PTG

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [6]

PTA Semantics

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [7]

Strategies & Outcome

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [8]

Cost Bounded Reachability Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [9]

Motivation

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [10]

Objective: 𝐴⟨⟩(END ∧ time ≤ 210)

Deterministic, memoryless strategy:

100 200

x w w

𝝀 𝝀

time

100 200

time

x w w

𝝀

90 70

𝝀

a

𝝀

a b

Most permissive, memoryless strategy

Priced Timed MDPs

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [11]

Stochastic Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [12]

Induced Probability Measure

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [13]

Minimum Expected Cost

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [14]

Motivation

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [15]

Minimal Expected Cost Strategy (0,b) 2*80=160 Expected Cost for TIGA Strategy (100,w) 4*95=380

Minimal Expected Cost while

guaranteeing END is reached

within time 210:

Strat.: t>90 (100,w)

t>70 (0,b) ow (0,a) = 204

Reinforcement Learning

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [16]

Time Bounded Reachability (G,T)

TIGA

SMC

SMC

Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [17]

Nondeterministic Strategies (UPPAAL TIGA)

Stochastic Strategies (non-lazy *)

* Non-lazy strategies suffices for DPAs

Classes allowing for

efficient

representation and

learning

Learning

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [18]

Simulation of 𝑼𝒏𝒊 𝝈𝒑 for 𝐴⟨⟩(END ∧ time ≤ 210) time(𝝅)@Choice

C(𝝅) wait

a

b

Strategies – Representation & Manipulation

Covariance Matrices

Logistic Regression

Splitting

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [19]

Using Learning Determinization

Experiments

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [20]

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [21]

More plots of runs according to strategies learne.

Covariance

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [22]

More plots of runs according to strategies learne.

Covariance

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [23]

More plots of runs according to strategies learne.

Covariance

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [24]

More plots of runs according to strategies learne.

Covariance

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [25]

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [26]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [27]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [28]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [29]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [30]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [31]

More plots of runs according to strategies learne.

Regression

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [32]

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [33]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [34]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [35]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [36]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [37]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [38]

More plots of runs according to strategies learne.

Splitting

Learned Strategies

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [39]

More plots of runs according to strategies learne.

Splitting

Experiments /DPA

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [40]

Kempf, J.F., Bozga, M., Maler, O.: As soon as probable: Optimal scheduling under stochastic uncertainty. In: TACAS. pp. 385{400 (2013)

http://www-verimag.imag.fr/PROJECTS/TEMPO/DATA/201304_dpa/

Experiments /DPA Random

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [41]

Experiments /DPA Random

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [42]

Conclusion & Future Work

Efficient synthesis of strategies for PTMDP ensuring time-bounds and minimizing expected cost.

If not time-bound needed we can omit the UPPAAL TIGA synthesis

Extension to Hybrid MDPs utilizing UPPAAL SMCs support for SHAs.

Make TIGA/SMC available to you! Datastructures supporting general stochastic

strategies – not just non-lazy ones. More clever filtrations of runs.

CASSTING, Brussels, May 21-23, 2014 Kim Larsen [43]