Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009...

27
Dynamic Programming and Optimal Control Fall 2009 Problem Set: Infinite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems marked with BERTSEKAS are taken from the book Dynamic Programming and Optimal Control by Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages, hardcover. The solutions were derived by the teaching assistants. Please report any error that you may find to [email protected] or [email protected]. The solutions in this document are handwritten. A typed version will be available in future.

Transcript of Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009...

Page 1: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Dynamic Programming andOptimal Control

Fall 2009

Problem Set:Infinite Horizon Problems, Value Iteration, PolicyIteration

Notes:

• Problems marked with BERTSEKAS are taken from the book Dynamic Programming andOptimal Control by Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages, hardcover.

• The solutions were derived by the teaching assistants. Please report any error that youmay find to [email protected] or [email protected].

• The solutions in this document are handwritten. A typed version will be available in future.

Page 2: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Problem Set

Problem 1 (BERTSEKAS, p. 445, exercise 7.1)

Problem 2 (BERTSEKAS, p. 446, exercise 7.3)

Problem 3 (BERTSEKAS, p. 448, exercise 7.12)

Problem 4 (BERTSEKAS, p. 60, exercise 1.23)

2

Page 3: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

---

---

£Xe1==_~=====~:.=::::::J!.....:..-20­

±-iJ

~_.

~=_-o

---.--+-­

- -­vt.( W k-) G­ 0, 4+ If. fu- 'i oLt ~3 "-1-1~

1-;---------+- - - - '---­

I-:---'--'-'--I.""""-'--.LI.-- -~~ pOLQt1 --of-~ A ­

111~----as:~w ... ~ &>"

'<. '<._A-'<. YO

o..lli -to (ie'" L.(~~L

~ ~ l(4.0-U~_~~r_

fOr

-~~ ---'­

~ ... --<-+

l.l-­ "

'--v­ >--t{ -... .. 4( ~ - -,.­ -+ k..-. ' i-I.---­

or--5StYQ{\'_.

. -<----~

"-I--r-..!L.~

p~e.s Q t -&e..- 'fJ~ k (.. Lt 4.

----==-=-­

() G.Jc., -At(...~-~;'(

-A-I-=-----U\-,C -,

Page 4: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Ii'

I +

j _

I \

I"

J,

..

1--'

f ';t

t•

t t

J r [I

~1'

I . 1

;#:

'1

T

.. II

~1 I

I' I

I

I I~~

•.

I _)

U ..,...

(.J

R.v

"'

1.

-, -

j~

-..

~J;

I

I 1,

1 '

,

't+

r t

. 'T

I

I;; i"

;

I j

l

_ I

1 ,

,s:::

1 I

I

j I

1 •

I "

I '

f · , ·

r :r

-.

I.

• r

I' I

I !".

.,

~ '.

. I.

I1

'III

I I

.l rI

l' .;

j~ ~ r'

~. r ;

;;

A"

1l'

-I

Ie j

. ..

~ ~t

.'

~ \.

••

tIt

fI

"lI

~ -

I ",

.•

J

:! ..

. ::

."

:;'11

.", ,I:'

I '

I ~

.

-1

~.

1:

t

I-

r .-

l~ P

,,-cy

j I;

(J

..!Jj

' _~

I

f'

I ~

I

S?)

-1

'1

1

14-1

;

~

D. .

.. ~

fU

1c

'=:';

V

-..

.../

>:'

,i

'-il

r ·I

Irh.

I.

t'

I I

. .

1\;1

V

I I ~

I ~~

..:J

II ~

J \J

tj

~j

~ l)1

1' l

Rf' f~

Ij

I.

. ,~I

'J ~ ~

l~aJ

IL

~-I

j 1'1

i..'

I rI

~8 e:

~ l ~

._II

I

II

l t,

...l

,p

f?'j

t-1=1

N

-j

I 0;'

I

I -

r.I

'I

-,--

.--.

rl C

9f'

74

''f

D~

I

_-rC

:f I ~

, If

fit

~/l'~'

P

I~ I

I

t ~

W P

eN

'f

Ct ~

r 1;.

K

,<

-'I

-1f

-.

SI

t1<:

::: I

pt'~

..:~

-~

'f

f I

It"

7F.~t

if

<

-q,

C

rI

"I

W

'{''

OJ

r-

C:J

(.,J

-I:

-'Tt

~'

4-

' 'f.

l' I I

Hi

.r+

r""

--

OJ

~

I ~

'1

-+-

I R

~

~~

,~

fJ

-

-'f.J

11

. -~

.r ~

I ~

,oj

I 1

t

'-"0

-r --/,)

..

~

I Y

a:

• r·

"t

I -.

~ y

~ I

f7

~ '+

1 ~

j

Ie1

01 t

~ ~~

r ••

I I'll

I, Ii

II'

II

-oe

"

"

1 I ..

. :!

:.:

1.'

)"::

'1

1 T

J

ll!j

'llr

ll~-

Q~I"

j 1

'I t

).).

1 I.

••

1 •

I.

..,I

..I rI

.1

I/tt

' ~

.,

.. t

·t

+L

+

~

a

, --

T)

I'

1 r

I'1

t..

-~ ~

I

) •

· ~;

; i ~

: ~

~ i ti

I -

; II r

j 1:

iI ~

~ Ii:

'~ ~: I

~

....

I~~.

It

;~

I ~ r;

t.

~ ! ~

'!'

~

I' If

~ '

j

~~ I

•t

w

I~I

0+1

r'

~ "f"

r•

:.

j J

.. I 1

+.;

-'"

' ~

f?

II'

~I

I; 4

r.

l •

III

l..I

:jt -'I"~

.'1.. t"

~~

I·;:

II

, '"

,. ·

I,

1 +

.,

I Pt

I ,."

I,.

., t::

!!>1

j

I j.~.

f

'!'~

I 1

'"

M"

c:; r

' I

1

I r

iii r .

ii ~ r

T

; Ii;

lJ I·

I t

[I ~ I

I:i

f !

[ :;

: j' !

'!;

II

Page 5: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

, f

't

~ 1

~ J

t +

I

I

Page 6: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

--+­

. -~ - .....

I I

+-­ -t----­ -

_... ---­

... -

--+--­ ..... -­

• I

.---­

- ~ I 1­ r

._-"---,---+ - - - - r ~

I I

I

~ ..--<---'----r--------+

---.. I

----'----­ - ---=

~ ~_ iT: __------"----1 C-­ -------,-- .­

i--'---+-----.------;-------,,---'-~__+_.=..:---=-~:t -I­ -rr=~ ~ - 1-----­ - -~

~~I--'C-,PI-I----'-'~~..=---,. ~ L~'"f4A.--<l: It ;L~I -i . I ~ ~ r ...-----;-----,---'-----,~--'----.,.-------:....I-----' - . ~ ~ ~ ~Q-~(l~~ 1)­ - ~~

~,.-.--~ I I =;=[~ ) ~ __ u; Vq-OL ~---r-

Page 7: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 1 of 6

% Solution to Exercise 7.1 b) % in Bertsekas - "Dynamic Programming and Optimal Control" Vol. 1 p. 445%% --% ETH Zurich% Institute for Dynamic Systems and Control% Angela Schöllig% [email protected]%% --% Revision history% [12.11.09, AS] first version%% % clear workspace and command windowclear;clc; %% PARAMETERS% Landing probability p(2) = 0.95; % Slow serve % Winning probabilityq(1) = 0.6; % Fast serveq(2) = 0.4; % Slow serve % Define value iteration error bounderr = 1e-100; % Specifiy if you want to test one value for p(1) - landing probability for% fast serve: use test = 1, or a row of probabilities: use test = 2test =1; if test ==1 % Define p(1) p(1) = 0.2; % Number of runs mend = 1;else % Define increment for p(1) incr = 0.01; % Number of runs mend = ((0.95-incr)/incr+2);end %% VALUE ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2

Page 8: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 2 of 6

p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Set the initial costs to 0, although any value will do (except for the % termination cost, which must be equal to 0). J = (p(1)+0.1).*ones(4, 4, 2); % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve FVal = ones(4, 4, 2); %% POLICY ITERATION % Initialize cost-to-go costToGo = zeros(4, 4, 2); iter = 0; while (1) iter = iter+1; if (mod(iter,1e3)==0) disp(['Value Iteration Number ',num2str(iter)]); end % Update the value for i = 1:3 [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); [costToGo(4,i,2),FVal(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1)); [costToGo(i,4,1),FVal(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); [costToGo(i,4,2),FVal(i,4,2)] = max(q.*p.*J(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FVal(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); [costToGo(i,j,2),FVal(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); end end [costToGo(4,4,1),FVal(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); [costToGo(4,4,2),FVal(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); % max(max(max(J-costToGo)))/max(max(max(costToGo))) if (max(max(max(J-costToGo)))/max(max(max(costToGo))) < err) % update cost J = costToGo; break;

Page 9: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 3 of 6

else J = costToGo; end end % Probability of player 1 wining the game JValsave(m)=J(1, 1, 1);end; %% POLICY ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2 p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve - in vector form FPol = 2.*ones(32,1); %% VALUE ITERATION iter = 0; while (1) iter = iter+1; disp(['Policy Iteration Number ',num2str(iter)]); if (mod(iter,1e3)==0) disp(['Policy Iteration Number ',num2str(iter)]); end % Update the value % Initialize matrices APol = zeros(32,32); bPol = zeros(32,1); for i = 1:3 % [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); f = FPol(12+i); APol(12+i,12+i)=-1; bPol(12+i)= -q(f)*p(f); APol(12+i,13+i)= (1-q(f))*p(f); APol(12+i,28+i)= (1-p(f)); % [costToGo(4,i,2),FPol(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1));

Page 10: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 4 of 6

f = FPol(28+i); APol(28+i,28+i)=-1; bPol(28+i)= -q(f)*p(f); APol(28+i,13+i)= 1-p(f)*q(f); % [costToGo(i,4,1),FPol(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); f = FPol(4*i); APol(4*i,4*i)=-1; APol(4*i,4*i+4)= q(f)*p(f); APol(4*i,20+4*(i-1))= 1-p(f); % [costToGo(i,4,2),FPol(i,4,2)] = max(q.*p.*J(i+1,4,1)); f = FPol(4*i+16); APol(4*i+16,4*i+16)=-1; APol(4*i+16,4*i+4)= q(f)*p(f); for j = 1:3 % Maximize over input 1 and 2 % [costToGo(i,j,1),FPol(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); f = FPol(4*(i-1)+j); APol(4*(i-1)+j,4*(i-1)+j)=-1; APol(4*(i-1)+j,4*i+j)= q(f)*p(f); APol(4*(i-1)+j,4*(i-1)+j+1)= (1-q(f))*p(f); APol(4*(i-1)+j,4*(i-1)+j+16)= (1-p(f)); % [costToGo(i,j,2),FPol(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); f = FPol(4*(i-1)+j+16); APol(4*(i-1)+j+16,4*(i-1)+j+16)=-1; APol(4*(i-1)+j+16,4*i+j)= q(f)*p(f); APol(4*(i-1)+j+16,4*(i-1)+j+1)= 1-q(f)*p(f); end end % [costToGo(4,4,1),FPol(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); f = FPol(16); APol(16,16)=-1; APol(16,15)= q(f)*p(f); APol(16,12)= (1-q(f))*p(f); APol(16,32)= (1-p(f)); % [costToGo(4,4,2),FPol(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); f = FPol(32); APol(32,32)=-1; APol(32,15)= q(f)*p(f); APol(32,12)= 1-q(f)*p(f); % Calculate cost JPol = zeros(32,1); JPol = APol\bPol; % Can now update the policy. FPolNew = FPol; JPolNew = JPol; JPol_re = reshape(JPol,4,4,2);

Page 11: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 5 of 6

JPol_re(:,:,1)= JPol_re(:,:,1)'; JPol_re(:,:,2)= JPol_re(:,:,2)'; % Initialize cost-to-go costToGo = zeros(4, 4, 2); FPolNew_re = zeros(4, 4, 2); % Update the optimal policy for i = 1:3 [costToGo(4,i,1),FPolNew_re(4,i,1)] = max(q.*p + (1-q).*p.*JPol_re(4,i+1,1)+(1-p).*JPol_re(4,i,2)); [costToGo(4,i,2),FPolNew_re(4,i,2)] = max(q.*p + (1-q.*p).*JPol_re(4,i+1,1)); [costToGo(i,4,1),FPolNew_re(i,4,1)] = max(q.*p.*JPol_re(i+1,4,1) + (1-p).*JPol_re(i,4,2)); [costToGo(i,4,2),FPolNew_re(i,4,2)] = max(q.*p.*JPol_re(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FPolNew_re(i,j,1)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q).*p.*JPol_re(i,j+1,1)+(1-p).*JPol_re(i,j,2)); [costToGo(i,j,2),FPolNew_re(i,j,2)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q.*p).*JPol_re(i,j+1,1)); end end [costToGo(4,4,1),FPolNew_re(4,4,1)] = max(q.*p.*JPol_re(4,3,1) + (1-q).*p.*JPol_re(3,4,1)+(1-p).*JPol_re(4,4,2)); [costToGo(4,4,2),FPolNew_re(4,4,2)] = max(q.*p.*JPol_re(4,3,1) + (1-q.*p).*JPol_re(3,4,1)); JPolNew(1:16) = reshape(costToGo(:,:,1)', 16,1); JPolNew(17:32) = reshape(costToGo(:,:,2)', 16,1); FPolNew(1:16) = reshape(FPolNew_re(:,:,1)', 16,1); FPolNew(17:32) = reshape(FPolNew_re(:,:,2)', 16,1); % Quit if the policy is the same, or if the costs are essentially the % same. if ((norm(JPol - JPolNew)/norm(JPol) < 1e-10) ) break; else FPol = FPolNew; end end % Probability of player 1 wining the game JPolsave(m)=JPolNew(1,1); end;

Page 12: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 6 of 6

if test ==2 plot(0:incr:(0.95-incr), JValsave,'.'); hold on plot(0:incr:(0.95-incr), JPolsave,'r'); grid on title('Probability of the server winning a game') xlabel('p_F') ylabel('Probability of winning') legend('Value Iteration', 'Policy Iteration')else disp(['Probability of player 1 winning ',num2str(JValsave(1))]); subplot(1,2,1) bar3(FVal(:,:,1),0.25,'detached') title(['First Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)') subplot(1,2,2) bar3(FVal(:,:,2),0.25,'detached') title(['Second Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)')end

Page 13: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Probability of the server winning a game

pF

Pro

babi

lity

of w

inni

ng

Value IterationPolicy Iteration

Page 14: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems
Page 15: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

f<- LA.) e U-L-t}:= l FJ I R J r=;-: ~ciV<ShIK

H ~ -:-~ 'r Qa{LX.S~K

--+ -­

-tJ-'.-j f-­ .­ - ..... >----­

f- r I-­ .­I

t I-­ ....--­ -r­-I-­

I--L_ f+-I-­-I-­

I I II

I I I t--t-­-I~I -!-- l--I-­

~, lJi .... 1-1­ I 1 -t +--f-­...

[ L~ r!J 1 ~ ::fH :. 1 I~f:( t ~ ; ~

;­ II. {{.ottf t r • T i t:­ ~ ?'O~~7. I L · t r ~ ---. r= . ~l - 1----+

L -~ ; c ~ = L;.( z.L f A~ s(,.tLS c.-rcJ-' I Z ; c£.vc:.s 111 rI kU c...X1.(

.....

Page 16: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

S"~(we" ~~~~ ~ c..:>~~t ~ C)f<.-...u roG'w.s.

pZ= (A,n.) .

)!-'J.: (f=l,R).

j3 ~ ( I - [~.~ ~.Jr' [!s-lat

~4= (ft, R _' -

"'O.() O'~]1-1 [6] - 0<. Lo~ If O. (; - ~

. -

-1 -

cx-p ] /l-oL+~P

- A-2",-+-0-~ + ~ oLp -01(, oL~p

lA-o-.)'? + (-1-o.-)ol(9+P)

(A-ol) (A.- eX. 17 ol(?+p))

..J: ~ ( - ,<A)-1,: 11

-) I ot->-1 0-ol) r1~)

-7­

Page 17: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I+. 12.

-:L !'1] =

11.((.09

Ca.¥<. -'f: eX.. gv../+r'~k J SeNt ( : (;)( -">0

--1 Lt::::~ j<ft ~L~] .

3 c:]0<.->0 eX.. ") ()

::::~ fJ ~L~ 1 I r ~ f-~10...-' 0 0/...-)0

Cc<-\-<.... z: ol.- ~/"'~ I-<.r cw k 1-0. A !

~ 3·: ~:-j oq [~; ~ ~] [_:] ­

:J '} := 'to z-z,~ J /1- ~ ...-r-OL"'1

:=:"> ~ }" > ~ jJ , j=z,3,lf oJ.- -">--1 --- 0< >...,

.:::~ '~~~k~' 0'L'ur r ":: (17t I it ) l'j ~~a-e

-s­

Page 18: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

rO -= Cy0(A.),jA U(t))

=-C­1

/2)

• Sfup"2; POG.<CCJ ~ruo~'_J·-u ..A._d Sk-p

M-(L.t"I( I) .- 0JC\ V'-\.()..>( (~(fIU-) +r>Lt r,~ tlA.) .1j1l: Cj) ) i ~'1, L ,; <J t.L€. UU) r' I

r~ (-{) -OJ~ V\Ac..J( f 4- -r (}q (Og If-''(.{) -t-O·l.lyo(?')),

6 +0~ ((';IS J}'.(-t) +o.qy.n)) J =: CWt \..U~l( t A 6. 2, /1 t:;'. ~ ~ =- R­

jJ- ~ ('2) = (D~ ..u.C<-x [- S- + (11. ~ ( C). t· Jft>( I{) 40. '3 . ~ft> (2 ) ) I

-1 +0· ~ ((J. t ~rrJ(I() +(). (~jJ(){Z ) ~

- O-J~ ~o_x 1(.27-, t;'.63 = J~

~ (R, R)

-(J. ­

Page 19: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems
Page 20: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 1 of 4

% script_P73c.m

%

% Matlab Script with implementation of a solution to Problem 7.3c from the

% class textbook.

%

% Dynamic Programming and Optimal Control

% Fall 2009

% Problem Set, Chapter 7

% Problem 7.3 c

%

% --

% ETH Zurich

% Institute for Dynamic Systems and Control

% Sebastian Trimpe

% [email protected]

%

% --

% Revision history

% [11.11.09, ST] first version

%

%

% clear workspace and command window

clear;

%clc;

%% Program control

% Flag defining whether we run the value iteration with or without error

% bounds.

flag_errbnds = true;

% False: No error bounds; fixed number of iterations MAX_ITER.

% True: Use error bounds. Terminate when result within tolerance ERR_TOL.

MAX_ITER = 1000;

ERR_TOL = 1e-6;

%% Setup problem data

% Stage costs g. g is a 2x2 matrix where the first index correponds to the

% state i=1,2 and the second index correponds to the admissible control

% input u=1 (=advertising/do research), or 2 (=don't advertise/don't do

% research).

g = [4 6; -5 -3];

% Transition probabilities. P is a 2x2x2 matrix, where the first index

% corresponds to the origin state, the second index corresponds to the

% destination state and the third input corresponds to the applied control

% input where (1,2) maps to (advertise/do research, don't advertise/don't

% do research). For example, the probability of transition from node 1 to

% node 2 given that we do not advertice is P(1,2,2).

P = zeros(2,2,2);

% Advertise/do research (u=1):

Page 21: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 2 of 4

P(:,:,1) = [0.8 0.2; 0.7 0.3];

% Don't advertise/don't do research (u=2):

P(:,:,2) = [0.5 0.5; 0.4 0.6];

% Discount factor.

%alpha = 0.9;

alpha = 0.99;

%% Value Iteration

% Initialize variables that we update during value iteration.

% Cost (here it really is the reward):

costJ = [0,0];

costJnew = [0,0];

% Policy

policy = [0,0];

% Control variable for breaking the loop

doBreak = false;

% Loop over value iterations k.

for k=1:MAX_ITER

for i=1:2 % loop over two states

% One value iteration step for each state.

[costJnew(i),policy(i)] = max( g(i,:) + alpha*costJ*squeeze(P(i,:,:)) );

end;

% Save results for plotting later.

res_J(k,:) = costJnew;

% Construct string to be displayed later:

dispstr = ['k=',num2str(k,'%5d'), ...

' J(1)=',num2str(costJnew(1),'%6.4f'), ...

' J(2)=',num2str(costJnew(2),'%6.4f'), ...

' mu(1)=',num2str(policy(1),'%3d'), ...

' mu(2)=',num2str(policy(2),'%3d')];

% Use error bounds if flag has been set.

if flag_errbnds

% Compute error bounds.

lower = costJnew + alpha/(1-alpha)*min(costJnew-costJ); % lower bound on J*

upper = costJnew + alpha/(1-alpha)*max(costJnew-costJ); % upper bound on J*

% display string:

dispstr = [dispstr,' lower=[',num2str(lower,'%9.4f'), ...

'] upper=[',num2str(upper,'%9.4f'),']'];

% If desired tolerance reached: can terminate the algorithm.

if all((upper-lower)<=ERR_TOL)

% Difference between upper and lower bound less than specified

Page 22: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 3 of 4

% tolerance for all states: set cost to average of upper and

% lower bound and terminate.

% Note that using the bounds, we can get a better result than

% just using J_{k+1}.

costJnew = mean([lower; upper],1);

dispstr = [dispstr, ...

sprintf('\nResult within desired tolerance. Terminate.\n')];

doBreak = true;

end;

% Save results for plotting later.

res_lower(k,:) = lower;

res_upper(k,:) = upper;

end;

% Update the cost. One could also implement an immidiate update of the

% cost as Gauss-Seidel type update, which would yield faster

% convergence.

costJ = costJnew;

% Display:

disp(dispstr);

if doBreak

break;

end;

end;

% display the optained costs

disp('Result:');

disp([' J*(1) = ',num2str(costJ(1),'%9.6f')]);

disp([' J*(2) = ',num2str(costJ(2),'%9.6f')]);

disp([' mu*(1) = ',num2str(policy(1))]);

disp([' mu*(2) = ',num2str(policy(2))]);

%% Plots

% Plot costs and bounds over iterations.

kk = 1:size(res_J,1);

figure;

subplot(2,1,1);

if(flag_errbnds)

plot(kk,[res_J(:,1),res_lower(:,1), res_upper(:,1)], ...

[kk(1),kk(end)],[costJ(1), costJ(1)],'k');

else

plot(kk,[res_J(:,1)],[kk(1),kk(end)],[costJ(1), costJ(1)],'k');

end;

grid;

%legend('lower bound','upper bound','J_k','J*');

%xlabel('iteration');

%

Page 23: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 4 of 4

subplot(2,1,2);

if(flag_errbnds)

plot(kk,[res_J(:,2),res_lower(:,2), res_upper(:,2)], ...

[kk(1),kk(end)],[costJ(2),costJ(2)],'k');

else

plot(kk,[res_J(:,2)],[kk(1),kk(end)],[costJ(2), costJ(2)],'k');

end;

grid;

if(flag_errbnds)

legend('J_k','lower bound','upper bound','J*');

else

legend('J_k','J*');

end;

xlabel('iteration');

Page 24: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I

T

~

..

r

-+--..------. ---­

t-­

-­ --I- +--­ ~ __

-,

_____ -.0-. __ -+-­

-~ ,..--,--­

------r---+­ ,.. .... --r r­-+-.,.---.,.----,­ - t­

~-- --­ ~

-~)--

--­ - +--~

-_ --+--1--'-_-­

~ ~ ~~-=-=-=-~ =: ==~ - I±fl.£dn .l----,----j--,-~~....;__f_-- I ~--

-l-------'----'­ .1--1..,....-.,---'----'-'---!--_ _~ ~- ~ - _ ~

- ~ I 1'.-=-~_-a-­ -~ .::I! .. ---+­__• . I I I ' ~~

, - -CoJlDi~..L-i~'-,,-,,'!--.,....-"'&--t:-~~

..

~_--t ~I ~ T­~ - --­

I--:--­__~~ ~ k<:/Yl ~ t$ Q (c) r ~ ~, '"'l:l .r> rJ: ­ r-~\ ~- L '­~ ~ft(/~\'£F- -\.,~ I, kYC­ ~...,.."..,

""IIiI'~"-J ......, --at X 1if -F--m..~_-'-- ):\S6 ~~ I 1~_

h __ ~_ 0t<.LZ) _C~ ~ -=r6E)J± or propa ~~;)~I .-.­ a:s-~ - Cr5;(~ _ -­ ---r­ ~

~ £hd:- D\...~ o.ft:tr a..:l (Y'uX::£ CL <Steps { (.,Je..

i---..----r-- --c.eKf{(}{~~~ ~~-=.6 - ~~ a.L~

b (,.)(.t.~ ~po {we. (0to6~tl.j

Page 25: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I I 1

; 1I

• 1

I t­

., +

...

r I I •

I : :;)

j. I

j lil-

if "

I ~

l.! j"

. j

r I

1 ••.

I I

j I

~ t

] .. I

• I

I I:

I'

I

I I

I I

c t.

I

I I

'I

! I

r j

I

)/IJ

I tll

I I

r ,

III

I ;"

rllr:j

jl1

!11

I

I IJ

J I l~

I.

1 ;

1

~ I

I I

I [

: !I

i.

I 1

;

,;Iii

; I

·I c

I -I

! ! I

I '

i~ ·I

~ I , I +

I +

J

..

Page 26: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I t­

'Pn:7 ~ <..u....t ..{." '1 [ . . I

~. [ ~ ~~~~~ r-roru ~.c:J i2P'-s ~

p-oace, t~o....h(,~ Oyt pCLcr- 4- /s .J • •• (j-,'~ 1/J-1 eX) ~ lJJ (x) VX' e. S

• T {;" k~ ~~: }I' (X' ).f 11c.t I (x)

P(Q-O~ b ~~ch'~~ f ~

rw ~ ;t -,-l;-/ J><! ~-1 hi l'< ) ~'~

:1 K. L>< ') ~ ~ ~-i; (l) \7 >( E S_ {<N S~ k

- .

~-~ -~C,,-C>4-~.J-­u6(.a~4

t:le-t (\<'1 v... "7,,,- LX- IA. ~~ -/ _~~_~ +0... ~}- T

-} - .

. - ­

L ~ __-9" e eX..

Page 27: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

N '"

Q

..D

I

1;1 r

r~

1 --f

N

~tt

l ~

rJ

« -

'+

-...-.

x ~!.

~

~ "7

}:

rJ

­0

~

(( -~

~ ~

I '~

r ~-

t

t :

I

+

i

I­t

I

t

~

....... '

tX

-

V! -..

.A

r\1"

.

'Ls.

.J

· ·.

r:~:

j~1

E. >

(

·f ~

I i

-.

-: I

+

L

, ,

... t

t ~

+

I +

I

t-I

1

i +

t

-I ~

I· •

~ j­

r -\­

-t---

4--,

--~

l' -

-:--­

~

jl-'

}

"J.+

I

:l+t

_t~!

!l

• I

I­I

1­+--

1­t­

1 I

t-I

,

t -I­

--

-.­•

+.

r t

Itt

.

t 1

+

I i

_ ._

. __~

1 .:

t 1

r J

: ~_

1 I

~

I

+

_J L

,

t­+

t-l-

t

f ! ~

+

I ~

r

~ j ;

I ~ ~

[I t

:1 ·

, . ,

r

-.. . -­