Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic...

30
Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality Stochastic Dynamic Programming V. Lecl` ere (CERMICS, ENPC) July 5, 2016 V. Lecl` ere Dynamic Programming July 5, 2016 1 / 20

Transcript of Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic...

Page 1: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Stochastic Dynamic Programming

V. Leclere (CERMICS, ENPC)

July 5, 2016

V. Leclere Dynamic Programming July 5, 2016 1 / 20

Page 2: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Contents

1 Deterministic Dynamic Programming

2 Stochastic Dynamic Programming

3 Curses of Dimensionality

V. Leclere Dynamic Programming July 5, 2016 2 / 20

Page 3: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Contents

1 Deterministic Dynamic Programming

2 Stochastic Dynamic Programming

3 Curses of Dimensionality

V. Leclere Dynamic Programming July 5, 2016 2 / 20

Page 4: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Controlled Dynamic System

A controlled dynamic system is defined by its dynamic

xt+1 = ft(xt , ut)

and initial state x0.The variables

xt is the state of the system,

ut is the control applied to the system at time t.

Example :

xt is the position and speed of a satellite, ut the accelerationdue to the engine (at time t).

xt is the stock of products available, ut the consumption attime t

...

V. Leclere Dynamic Programming July 5, 2016 3 / 20

Page 5: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Optimization Problem

We want to solve the following optimization problem

minu0,...,uT−1

T−1∑t=0

Lt

(xt , ut

)+ K

(xT)

(1a)

s.t. xt+1 = ft(xt , ut), x0 given (1b)

ut ∈ Ut(xt) (1c)

Where

Lt(x , u) is the cost incurred between t and t + 1 for a startingstate x with control u;K (x) is the final cost incurred for the final state x ;ft is the dynamic of the dynamical system;Ut(x) is the set of admissible controls at time t with startingstate x .

Note : this is a Shortest Path Problem on an acircuitic directedgraph.

V. Leclere Dynamic Programming July 5, 2016 4 / 20

Page 6: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Problem decomposition

The problem can be written

minu0

{L0(x0, u0) + min

u1,...,uT−1

T−1∑t=1

Lt

(xt , ut

)+ K

(xT)}

s.t. xt+1 = ft(xt , ut)

x1 = f0(x0, u0)

ut ∈ Ut(xt)

Or, more simply,

minu0

L0(x0, u0) + V1

(f0(x0, u0)

)where V1(x) is the value of the problem starting at time t = 1with state x1 = x .

V. Leclere Dynamic Programming July 5, 2016 5 / 20

Page 7: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Bellman value function

More generically, we denote Vt0(x) the optimal value of theproblem starting at time t with state x :

Vt0(x) = minut0 ,...,uT−1

T−1∑t=t0

Lt

(xt , ut

)+ K

(xT)

(2a)

s.t. xt+1 = ft(xt , ut), xt0 = x (2b)

ut ∈ Ut(xt) (2c)

V. Leclere Dynamic Programming July 5, 2016 6 / 20

Page 8: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Bellman Equation

Theorem

We have the Bellman equation (we assume existence ofminimizers)

VT (x) = K (x) ∀x ∈ XT

Vt(x) = minut∈Ut(x)

Lt(x , ut) + Vt+1 ◦ ft(x , ut)︸ ︷︷ ︸xt+1

∀x ∈ Xt .

And the optimal policy is given by

π]t(x) ∈ arg minut∈Ut(x)

{Lt(x , ut) + Vt+1 ◦ ft(x , ut)︸ ︷︷ ︸

xt+1

}∀x ∈ Xt .

V. Leclere Dynamic Programming July 5, 2016 7 / 20

Page 9: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Policy

Definition

An admissible policy for problem (1) is a sequence of function πtmapping the set Xt of possible state at time t into the set Ut ofpossible controls and such that

∀t ∈ J0,T − 1K, ∀x ∈ Xt , πt(x) ∈ Ut(x).

V. Leclere Dynamic Programming July 5, 2016 8 / 20

Page 10: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Open-Loop vs Closed Loop solution

Problem (1) can be solved with a Pontryagin approach, which

will yields a sequence of optimal controls (u]0, . . . , u

]T−1). This

is a so called open-loop solution as it is decided once (at timet = 0) and never questionned. This type of solution is easy tostore and use but not robust to errors or imprecisions.

Dynamic Programming approach yields an optimal policy{π]t}t∈J0,T−1K. This is a so-called closed-loop solution as the

control ut is choosen at time t according to the actual state t.It is more complex to use and compute, but more robust toerrors or imprecisions.

In a deterministic and exact setting an open-loop solution isequivalent to a closed loop solution.

V. Leclere Dynamic Programming July 5, 2016 9 / 20

Page 11: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Contents

1 Deterministic Dynamic Programming

2 Stochastic Dynamic Programming

3 Curses of Dimensionality

V. Leclere Dynamic Programming July 5, 2016 9 / 20

Page 12: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Stochastic Controlled Dynamic System

A stochastic controlled dynamic system is defined by its dynamic

xt+1 = ft(xt ,ut , ξt+1)

and initial statex0 = x0

The variables

xt is the state of the system,

ut is the control applied to the system at time t,

ξt is an exogeneous noise.

V. Leclere Dynamic Programming July 5, 2016 10 / 20

Page 13: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Examples

Stock of water in a dam:

xt is the amount of water in the dam at time t,ut is the amount of water turbined at time t,ξt is the inflow of water at time t.

Boat in the ocean:

xt is the position of the boat at time t,ut is the direction and speed chosen at time t,ξt is the wind and current at time t.

Subway network:

xt is the position and speed of each train at time t,ut is the acceleration chosen at time t,ξt is the delay due to passengers and incident on the networkat time t.

V. Leclere Dynamic Programming July 5, 2016 11 / 20

Page 14: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Optimization Problem

We want to solve the following optimization problem

min E[ T−1∑

t=0

Lt

(xt ,ut , ξt+1

)+ K

(xT)]

(3a)

s.t. xt+1 = ft(xt ,ut , ξt+1), x0 = x0 (3b)

ut ∈ Ut(xt) (3c)

σ(ut) ⊂ Ft := σ(ξ0, · · · , ξt

)(3d)

Where

constraint (3b) is the dynamic of the system ;

constraint (3c) refer to the constraint on the controls;

constraint (3d) is the information constraint : ut is choosenknowing the realisation of the noises ξ0, . . . , ξt but withoutknowing the realisation of the noises ξt+1, . . . , ξT−1.

V. Leclere Dynamic Programming July 5, 2016 12 / 20

Page 15: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Dynamic Programming Principle

Theorem

Assume that the noises ξt are independent and exogeneous. Then,there exists (under technical assumption satisfied in the discretecase) an optimal solution, called a strategy, of the formut = πt

(xt).

We have

πt(x) ∈ arg minu∈Ut(x)

E[

Lt(x , u, ξt+1)︸ ︷︷ ︸current cost

+ Vt+1 ◦ ft(x , u, ξt+1

)︸ ︷︷ ︸future costs

],

where (Dynamic Programming Equation)VT (x) = K (x)

Vt(x) = minu∈Ut(x)

E[Lt(x , u, ξt+1) + Vt+1 ◦ ft

(x , u, ξt+1

)︸ ︷︷ ︸”Xt+1”

]V. Leclere Dynamic Programming July 5, 2016 13 / 20

Page 16: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Dynamic Programming Principle

Theorem

Assume that the noises ξt are independent and exogeneous. Then,there exists (under technical assumption satisfied in the discretecase) an optimal solution, called a strategy, of the formut = πt

(xt).

We have

πt(x) ∈ arg minu∈Ut(x)

E[

Lt(x , u, ξt+1)︸ ︷︷ ︸current cost

+ Vt+1 ◦ ft(x , u, ξt+1

)︸ ︷︷ ︸future costs

],

where (Dynamic Programming Equation)VT (x) = K (x)

Vt(x) = minu∈Ut(x)

E[Lt(x , u, ξt+1) + Vt+1 ◦ ft

(x , u, ξt+1

)︸ ︷︷ ︸”Xt+1”

]V. Leclere Dynamic Programming July 5, 2016 13 / 20

Page 17: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Interpretation of Bellman Value

The Bellman’s value function Vt0(x) can be interpreted as thevalue of the problem starting at time t0 from the state x . Moreprecisely we have

Vt0(x) = min E[ T−1∑t=t0

Lt

(xt ,ut , ξt+1

)+ K

(xT)]

s.t. xt+1 = ft(xt ,ut , ξt+1), xt0 = x

ut ∈ Ut(xt)

σ(ut) ⊂ σ(ξ0, · · · , ξt

)

V. Leclere Dynamic Programming July 5, 2016 14 / 20

Page 18: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure I

In Problem (3), constraint (3d) is the information constraint.There are different possible information structure.

If constraint (3d) reads σ(ut) ⊂ F0, the problem is open-loop,as the controls are choosen without knowledge of therealisation of any noise.

If constraint (3d) reads σ(ut) ⊂ Ft , the problem is said to bein decision-hazard structure as decision ut is chosen withoutknowing ξt+1.

If constraint (3d) reads σ(ut) ⊂ Ft+1, the problem is said tobe in hazard-decision structure as decision ut is chosen withknowledge of ξt+1.

If constraint (3d) reads σ(ut) ⊂ FT−1, the problem is said tobe anticipative as decision ut is chosen with knowledge of allthe noises.

V. Leclere Dynamic Programming July 5, 2016 15 / 20

Page 19: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure I

In Problem (3), constraint (3d) is the information constraint.There are different possible information structure.

If constraint (3d) reads σ(ut) ⊂ F0, the problem is open-loop,as the controls are choosen without knowledge of therealisation of any noise.

If constraint (3d) reads σ(ut) ⊂ Ft , the problem is said to bein decision-hazard structure as decision ut is chosen withoutknowing ξt+1.

If constraint (3d) reads σ(ut) ⊂ Ft+1, the problem is said tobe in hazard-decision structure as decision ut is chosen withknowledge of ξt+1.

If constraint (3d) reads σ(ut) ⊂ FT−1, the problem is said tobe anticipative as decision ut is chosen with knowledge of allthe noises.

V. Leclere Dynamic Programming July 5, 2016 15 / 20

Page 20: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure I

In Problem (3), constraint (3d) is the information constraint.There are different possible information structure.

If constraint (3d) reads σ(ut) ⊂ F0, the problem is open-loop,as the controls are choosen without knowledge of therealisation of any noise.

If constraint (3d) reads σ(ut) ⊂ Ft , the problem is said to bein decision-hazard structure as decision ut is chosen withoutknowing ξt+1.

If constraint (3d) reads σ(ut) ⊂ Ft+1, the problem is said tobe in hazard-decision structure as decision ut is chosen withknowledge of ξt+1.

If constraint (3d) reads σ(ut) ⊂ FT−1, the problem is said tobe anticipative as decision ut is chosen with knowledge of allthe noises.

V. Leclere Dynamic Programming July 5, 2016 15 / 20

Page 21: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure I

In Problem (3), constraint (3d) is the information constraint.There are different possible information structure.

If constraint (3d) reads σ(ut) ⊂ F0, the problem is open-loop,as the controls are choosen without knowledge of therealisation of any noise.

If constraint (3d) reads σ(ut) ⊂ Ft , the problem is said to bein decision-hazard structure as decision ut is chosen withoutknowing ξt+1.

If constraint (3d) reads σ(ut) ⊂ Ft+1, the problem is said tobe in hazard-decision structure as decision ut is chosen withknowledge of ξt+1.

If constraint (3d) reads σ(ut) ⊂ FT−1, the problem is said tobe anticipative as decision ut is chosen with knowledge of allthe noises.

V. Leclere Dynamic Programming July 5, 2016 15 / 20

Page 22: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure I

In Problem (3), constraint (3d) is the information constraint.There are different possible information structure.

If constraint (3d) reads σ(ut) ⊂ F0, the problem is open-loop,as the controls are choosen without knowledge of therealisation of any noise.

If constraint (3d) reads σ(ut) ⊂ Ft , the problem is said to bein decision-hazard structure as decision ut is chosen withoutknowing ξt+1.

If constraint (3d) reads σ(ut) ⊂ Ft+1, the problem is said tobe in hazard-decision structure as decision ut is chosen withknowledge of ξt+1.

If constraint (3d) reads σ(ut) ⊂ FT−1, the problem is said tobe anticipative as decision ut is chosen with knowledge of allthe noises.

V. Leclere Dynamic Programming July 5, 2016 15 / 20

Page 23: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure II

Be careful when modeling your information structure:

Open-loop information structure might happen in practice(you have to decide on a planning and stick to it). If theproblem does not require an open-loop solution then it mightbe largely suboptimal (imagine driving a car eyes closed...). Inany case it yields an upper-bound of the problem.

In some cases decision-hazard and hazard-decision are bothapproximation of the reality. Hazard-decision yield a lowervalue then decision-hazard.

Anticipative structure is never an accurate modelization of thereality. However it can yield a lower-bound of youroptimization problem relying on deterministic optimizationand Monte-Carlo.

V. Leclere Dynamic Programming July 5, 2016 16 / 20

Page 24: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure II

Be careful when modeling your information structure:

Open-loop information structure might happen in practice(you have to decide on a planning and stick to it). If theproblem does not require an open-loop solution then it mightbe largely suboptimal (imagine driving a car eyes closed...). Inany case it yields an upper-bound of the problem.

In some cases decision-hazard and hazard-decision are bothapproximation of the reality. Hazard-decision yield a lowervalue then decision-hazard.

Anticipative structure is never an accurate modelization of thereality. However it can yield a lower-bound of youroptimization problem relying on deterministic optimizationand Monte-Carlo.

V. Leclere Dynamic Programming July 5, 2016 16 / 20

Page 25: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Information structure II

Be careful when modeling your information structure:

Open-loop information structure might happen in practice(you have to decide on a planning and stick to it). If theproblem does not require an open-loop solution then it mightbe largely suboptimal (imagine driving a car eyes closed...). Inany case it yields an upper-bound of the problem.

In some cases decision-hazard and hazard-decision are bothapproximation of the reality. Hazard-decision yield a lowervalue then decision-hazard.

Anticipative structure is never an accurate modelization of thereality. However it can yield a lower-bound of youroptimization problem relying on deterministic optimizationand Monte-Carlo.

V. Leclere Dynamic Programming July 5, 2016 16 / 20

Page 26: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Non-independence of noise in DP

The Dynamic Programming equation requires only thetime-independence of noises.This can be relaxed if we consider an extended state.Consider a dynamic system driven by an equation

yt+1 = ft(xt ,ut , εt+1)

where the random noise εt is an AR1 process :

εt = αtεt−1 + βt + ξt ,

{ξt}t∈Z being independent.Then yt is called the physical state of the system and DP canbe used with the information state xt = (yt , εt−1).Generically speaking, if the noise ξt is exogeneous (notaffected by decisions ut), then we can always apply DynamicProgramming with the state

(xt , ξ1, . . . , ξt).V. Leclere Dynamic Programming July 5, 2016 17 / 20

Page 27: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Contents

1 Deterministic Dynamic Programming

2 Stochastic Dynamic Programming

3 Curses of Dimensionality

V. Leclere Dynamic Programming July 5, 2016 17 / 20

Page 28: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Dynamic Programming Algorithm

Data: Problem parametersResult: optimal control and value;VT ≡ K ;for t : T − 1→ 0 do

for x ∈ Xt doVt(x) =∞;for u ∈ Ut(x) do

vu = E[Lt(x , u, ξt+1) + Vt+1 ◦ ft

(x , u, ξt+1

)];

if vu < Vt(x) thenVt(x) = vu ;πt(x) = u ;

Algorithm 1: Dynamic Programming Algorithm (discrete case)

Number of flops: O(T × |Xt | × |Ut | × |Ξt |).

V. Leclere Dynamic Programming July 5, 2016 18 / 20

Page 29: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

3 curses of dimensionality

1 State. If we consider 3 independent states each taking 10values, then |Xt | = 103 = 1000. In practice DP is notapplicable for states of dimension more than 5.

2 Decision. The decision are often vector decisions, that is anumber of independent decision, hence leading to huge|Ut(x)|.

3 Expectation. In practice random information came from largedata set. Without a proper statistical treatment computing anexpectation is costly. Monte-Carlo approach are costly too,and unprecise.

V. Leclere Dynamic Programming July 5, 2016 19 / 20

Page 30: Stochastic Dynamic Programminggdrro.lip6.fr/sites/default/files/Expose-Leclere...Stochastic Dynamic Programming V. Lecl ere (CERMICS, ENPC) July 5, 2016 V. Lecl ere Dynamic Programming

Deterministic Dynamic Programming Stochastic Dynamic Programming Curses of Dimensionality

Numerical considerations

The DP equation holds in (almost) any case.The algorithm shown before compute a look-up table ofcontrol for every possible state offline. It is impossible to do ifthe state is (partly) continuous.Alternatively, we can focus on computing offline anapproximation of the value function Vt and derive the optimalcontrol online by solving a one-step problem, solved only atthe current state :

πt(x) ∈ arg minu∈Ut(x)

E[Lt(x , u, ξt+1) + Vt+1 ◦ ft

(x , u, ξt+1

)]The field of Approximate DP gives methods for computingthose approximate value function (decomposed on a base offunctions).The simpler one consisting in discretizing the state, and theninterpolating the value function.

V. Leclere Dynamic Programming July 5, 2016 20 / 20