Analysis of Markov Reward Models with Partial Reward Loss … · M Telek, Markov Anniversary...

M Telek, Markov Anniversary Meeting, June 2006. Analysis of Markov Reward Models with Partial Reward Loss - p. 1/15

Analysis of Markov Reward Models withPartial Reward Loss Based on a Time

Reverse Approach

Gábor Horváth, Miklós TelekTechnical University of Budapest, 1521 Budapest, Hungary

{hgabor,telek}@webspn.hit.bme.hu

Outline

● Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Outline

■ Markov Reward models with reward loss

■ The difficulty of time forward approach

■ The time reverse analysis approach

■ Properties of the obtained solution

■ Numerical examples

■ Conclusions

Outline

Introduction

● Markov Reward models

without reward loss● Markov Reward models with

total reward loss● Markov Reward models with

partial reward loss

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Markov Reward models without reward loss

Markov reward models (MRM)

■ a finite state CTMC,■ non negative reward rates (ri),■ performance measures:

◆ reward accumulated up to time t,◆ time to accumulate reward w.

Outline

Introduction

partial reward loss

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Markov Reward models with total reward loss

We consider

■ first order MRM (deterministic dependence on Z(t)),■ without impulse reward,■ but with potential reward loss at state transition.

Outline

Introduction

partial reward loss

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Markov Reward models with partial reward loss

In case of partial reward loss:

■ αi remaining portion of reward when leaving state i,■ the lost reward is proportional to:

◆ total accumulated reward⇒ partial total loss,

◆ reward accumulated in the last state⇒ partial incremental loss.

Outline

Introduction

partial reward loss

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Markov Reward models with partial reward loss

In case of partial reward loss:

■ αi remaining portion of reward when leaving state i,■ the lost reward is proportional to:

◆ total accumulated reward⇒ partial total loss,

◆ reward accumulated in the last state⇒ partial incremental loss.

tT1 T2 T3

B(T−1 )αi

B(T−2 )αk

B(T−3 )αjαj[B(T−3 )− B(T2)]

tT1 T2 T3

Outline

Introduction

Model behaviour

● Time forward approach

● Time reverse approach

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Time forward approach

Possible interpretation:■ Reduced (riαi) reward accumulation up to the last state

transition,■ and total (ri) reward accumulation in the last statewithout reward loss.

αj[B(T−3 )− B(T2)]

tT1 T2 T3

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Time forward approach

Possible interpretation:■ Reduced (riαi) reward accumulation up to the last state

transition,■ and total (ri) reward accumulation in the last statewithout reward loss.

αj[B(T−3 )− B(T2)]

tT1 T2 T3

Unfortunately, the last state transition before time T is not astopping time.

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Time reverse approach

Behaviour of the time reverse process:■ Inhomogeneous CTMC

with initial probability←−γ (0) = γ(T )

and generator←−Q(τ) = {←−qij(τ)},

←−qij(τ) =

γj(T − τ)

γi(T − τ)qji if i 6= j,

−∑

k∈S,k 6=i

γk(T − τ)

γi(T − τ)qki if i = j.

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Behaviour of the time reverse process:■ Inhomogeneous CTMC

with initial probability←−γ (0) = γ(T )

and generator←−Q(τ) = {←−qij(τ)},

←−qij(τ) =

γj(T − τ)

γi(T − τ)qji if i 6= j,

−∑

k∈S,k 6=i

γk(T − τ)

γi(T − τ)qki if i = j.

■ Total (ri) reward accumulation in the first state,■ and reduced (riαi) reward accumulation in all consecutive

states■ without reward loss.

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Potential model description:

duplicate the state space to describe■ the total reward accumulation in the first state (ri),■ and the reduced reward accumulation in all further states

(riαi).

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

Potential model description:

duplicate the state space to describe■ the total reward accumulation in the first state (ri),■ and the reduced reward accumulation in all further states

(riαi).

π∗(0) = [γ(T ), 0],←−Q∗(τ) =

←−−QD(τ)

←−Q(τ)−

←−−QD(τ)

0←−Q(τ)

, R∗ =

Outline

Introduction

Model behaviour

Model description

Analysis approach

● Inhomogeneous differential

equation

● Homogeneous differential

equation

● Block structure of the

differential equation

● Moments of accumulated

reward

Numerical method

Numerical example

Conclusions

Inhomogeneous differential equation

Introducing

←−Y i(τ, w) = Pr(

←−B (τ) ≤ w,

←−Z (τ) = i)

we can apply the analysis approach available forinhomogeneous MRMs.

It is based on the solution of the inhomogeneous partialdifferential equation

←−Y (τ, w) +

←−Y (τ, w)R =

←−Y (τ, w)

←−Q(τ) ,

where←−Y (τ, w) = {

←−Y i(τ, w)}.

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Inhomogeneous differential equation

Introducing

←−Y i(τ, w) = Pr(

←−B (τ) ≤ w,

←−Z (τ) = i)

we can apply the analysis approach available forinhomogeneous MRMs.

It is based on the solution of the inhomogeneous partialdifferential equation

←−Y (τ, w) +

←−Y (τ, w)R =

←−Y (τ, w)

←−Q(τ) ,

where←−Y (τ, w) = {

←−Y i(τ, w)}.

But a drawback of this approach is that it requires thecomputation of

←−Q(τ).

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Homogeneous differential equation

To overcome this drawback we introduce the conditionaldistribution of reward accumulated by the reverse process

←−V i(τ, w) = Pr(

←−B (τ) ≤ w |

←−Z (τ) = i)

and the row vector←−V (τ, w) = {

←−V i(τ, w)}.

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Homogeneous differential equation

To overcome this drawback we introduce the conditionaldistribution of reward accumulated by the reverse process

←−V i(τ, w) = Pr(

←−B (τ) ≤ w |

←−Z (τ) = i)

and the row vector←−V (τ, w) = {

←−V i(τ, w)}.

Using this performance measure we have to solve

←−V (τ, w) +

←−V (τ, w)R =

←−V (τ, w)QT ,

where QT is the transpose of Q.

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Block structure of the differential equation

Utilizing the special block structure of the Q′(τ) and the R′

matrices (of size 2#S) we can obtain two homogeneous partialdifferential equations of size #S:

←−X1(τ, w) +

←−X1(τ, w)R =

←−X1(τ, w)QD ,

←−X2(τ, w)+

←−X2(τ, w)Rα =

←−X1(τ, w)(Q−QD)T +

←−X2(τ, w)QT ,

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Moments of accumulated reward

The analysis approach available for inhomogeneous MRMsallows to describe the moments of IMRMs with aninhomogeneous ordinary differential equation.

Similar to the reward distribution case, this approach is alsoapplicable for our model, but it requires the the computation of←−Q(τ).

Outline

Introduction

Model behaviour

Model description

Analysis approach

equation

reward

Numerical method

Numerical example

Conclusions

Moments of accumulated reward

The analysis approach available for inhomogeneous MRMsallows to describe the moments of IMRMs with aninhomogeneous ordinary differential equation.

Similar to the reward distribution case, this approach is alsoapplicable for our model, but it requires the the computation of←−Q(τ).

Using similar state dependent moment measures we obtainhomogeneous ordinary differential equations

←−−M1(n)(τ) = n

←−−M1(n−1)(τ)R +

←−−M1(n)(τ)QD ,

←−−M2(n)(τ) = n

←−−M2(n−1)(τ)Rα +

←−−M1(n)(τ)(Q−QD)T +

←−−M2(n)(τ)Q

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

● Randomization based

numerical method

Numerical example

Conclusions

Randomization based numerical method

The ordinary differential equation with constant coefficientsallows to compose a randomization based numerical method.

←−−M1(n)(τ) = τn e RnED(τ) ,

and←−−M2(n)(τ) = n!dn

∞∑

e−λτ (λτ)k

k!D(n)(k),

D(n)(k) =

e (I−AkD) n = 0

0 k ≤ n, n ≥ 1

D(n−1)(k−1)Sα + D(n)(k−1)A+(

k−1n

e SnAk−1−nD (A−AD) k > n, n ≥ 1

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

● Numerical Example

Conclusions

Numerical Example

rN=Nr rN−2=(N−2)rrN−1=(N−1)r r0 =0

α N=0.5 α N−1=0.5 α N−2=0.5

ρσ σ σ

λλλ (N−1)N

0N−2N N−1

Structure of the Markov chain

0.0001

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1st moment2nd moment3rd moment4th moment5th moment

Moments of the accumulatedreward

With parameters N = 500000, λ = 0.000004, σ = 1.5, ρ = 0.1,r = 0.000002, α = 0.5,

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

The analysis of partial loss MRM is usually rather complex.

We propose an analysis method with the following features:

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

■ non stopping time⇒ time reverse approach

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

■ inhomogeneous differential equation⇒ proper performancemeasure,

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

■ partial differential equation⇒ ordinary differential equations,

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

■ numerical stability, error control⇒ randomization basedanalysis.

Outline

Introduction

Model behaviour

Model description

Analysis approach

Numerical method

Numerical example

Conclusions

● Conclusions

Conclusions

■ numerical stability, error control⇒ randomization basedanalysis.

Thanks for your attention.

Analysis of Markov Reward Models with Partial Reward Loss … · M Telek, Markov Anniversary...

Documents

Transcript of Analysis of Markov Reward Models with Partial Reward Loss … · M Telek, Markov Anniversary...

Bounded Risk-Sensitive Markov Game and Its Inverse Reward ...Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem Ran Tian?Liting Sun Masayoshi Tomizuka {rantian,

Simulation-based optimization of markov reward …web.mit.edu/jnt/www/Papers/J083-01-mar-MDP.pdfMARBACH AND TSITSIKLIS: SIMULATION-BASED OPTIMIZATION OF MARKOV REWARD PROCESSES 193

Selective Prediction with Hidden Markov Models · Selective Prediction with Hidden Markov Models Research Thesis In Partial Fulﬁllment of The ... A convenient and quite versatile

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

Analysis of inhomogeneous Markov reward models · 386 M. Telek et al. / Linear Algebra and its Applications 386 (2004) 383–405 Fig. 1. Rate reward accumulation in inhomogeneous

Monotonicity in Markov Reward and Decision Chains: Theory ...koole/publications/2006fnt/art.pdf · Editorial Scope Foundations and Trends R in Stochastic Systems will publish survey

Partial Policy Iteration for -Robust Markov Decision Processes

Performability analysis using semi-Markov reward processesweb.cs.iastate.edu/~ciardo/pubs/1990IEEETC-Rewards.pdf · until absorption in a semi-Markov reward process. Failures and

pure.tue.nl · Aggregation Methods for Markov Reward Chains with Fast and Silent Transitions J. Markovski? and N. Tr•cka?? Technische Universiteit Eindhoven, Formal Methods Group

Reinforcement Learning with Factored States and Actions · REINFORCEMENT LEARNING WITH FACTORED STATES AND ACTIONS reward action state Figure 1: A Markov decision process. Circles

GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Reinforcement Learning - Redwood Center for Theoretical …€¦ · Reinforcement Learning ? ? Outline Markov Decision Processes (MDPs) How to maximize reward (Q-Learning) Connection

MARKOV REWARD PROCESSES: A FINAL REPORTMARKOV REWARD PROCESSES: A FINAL REPORT R. M. Smith Department of Computer Science Yale University New Haven, CT. 01620 October 23, 1991 "--5

Comparison of time-inhomogeneous Markov processesarchiv.stochastik.uni-freiburg.de/...20151029.pdf · consider partial orderings for stochastic processes induced by expectations of

The Mixing of Markov Chains on Linear Extensions in Practice · Markov chain algorithms, focusing on the classical problem of counting the linear extensions of a given partial order.

1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.

Application of Markov Decision Processes to the Control of ... · Section 2.3. Markov Decision Processes Page 4 2.The nite set of actions A. 3.The reward function: R(s;a) depends

The Policy Iteration Algorithm For Average Reward Markov ...The Policy Iteration Algorithm for Average Reward Markov Decision Processes with General State Space Sean P. Meyn, Senior

On Non-Homogeneous Markov Reward Model to Availability and ...

Markov Logic: Theory, Algorithms and Applicationsparags/papers/thesis.pdfMarkov Logic: Theory, Algorithms and Applications Parag A dissertation submitted in partial fulﬁllment of