A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...
-
Upload
jaheim-alvarez -
Category
Documents
-
view
214 -
download
2
Transcript of A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions...
![Page 1: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/1.jpg)
A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING
![Page 2: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/2.jpg)
PROBLEM SET-UPProblem is arrayed as a set of decisions made over time.System has a discrete stateEach decision results in some reward or cost, and results in the system being moved to another state.Usually has a finite number of transitions.Transitions can be probabilistic, as can the rewards.Solution is a decision strategy that maximizes summed reward (minimizes cost)
![Page 3: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/3.jpg)
NotationN = finite planning horizonSn (x) = cost of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n.x(dn) is the state resulting from deciding d at stage n.c(dn ) is the cost of taking decision dn
![Page 4: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/4.jpg)
EXAMPLEYou have moved to Singapore, and you need to operate a car for 3 yrs.
You plan to sell the car when you leave
Your QOL is not affected by your wheels
Cost/resale of cars and operating costs are below
0 1 2 3
sale price 1000 800 450 150
op cost 200 400 600
![Page 5: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/5.jpg)
MAPPING TO THE NOTATION
State: Age of you carStage: Years you have been in S-porePolicy: Car’s age you buy at the END of the year
![Page 6: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/6.jpg)
COST EXAMPLEyou have a 2yr old caryou operate for the year ($600)you sell your 3 yr old car (-$150)you buy a new (to you) 1 yr old used car ($800)TOTAL: $1250
![Page 7: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/7.jpg)
finish
0 1 2 3
start 0 400 200
1 950 750 400
2 1450 1250 900 600
![Page 8: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/8.jpg)
car age "cost" end of yr 3
0 -1000
1 -800
2 -450
3 -150
![Page 9: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/9.jpg)
CONTINUED COST EXAMPLEIt’s beginning yr 2, and you possess a 2 yr old carYou can....operate the car (600 + S3(3yr old car))
operate the car, sell it, buy new car (600 -150 + 1000 + S3(new))
operate the car, sell it, buy 1yr old car (600 -150 + 800 + S3(1 yr old car))
...
![Page 10: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/10.jpg)
1 2 3"cost" end of
yr 3
0 1200 -200 -600 -1000
1 1550 350 -50 -800
2 1700 850 450 -450
3 -150
1450
1250
900
![Page 11: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/11.jpg)
1 2 3"cost" end of
yr 3
0 1200 -200 -600 -1000
1 1550 350 -50 -800
2 1700 850 450 -450
3 -150
![Page 12: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/12.jpg)
BELLMAN’S EQUATION
))(()((min)( 1 dxSdcxS ndn
Sometimes its easy to get your name on something!
![Page 13: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/13.jpg)
EXEMPLARA specialized tool is available during the period 9am, ..., 3pmEach hour, a bid for the asset is made according to the table belowThe asset is busy for 3 hr. if the bid is accepted
9 10 11 12 1 2 3
100 150 160 50 175 40 10
![Page 14: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/14.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
![Page 15: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/15.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
10
![Page 16: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/16.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10
![Page 17: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/17.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10175
![Page 18: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/18.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10175175
![Page 19: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/19.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10175175175
![Page 20: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/20.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10175175175325
![Page 21: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/21.jpg)
0 0 0 0
100 150 160
9 11 12 1102
end
0 0
50 175 1040
0
40 10175175175325325
Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm
Note 2: Dynamic Programming = Shortest Path
![Page 22: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/22.jpg)
PROBABILISTIC TRANSITIONS
))(()((min)( 1 dxESdEcxES ndn
1. c(d) is a random variable2. x(d) is random3. the “trial” takes place after the decision
![Page 23: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/23.jpg)
EXEMPLAR (Probabilistic)An “asset” is available during the period 8pm, 9pm, ..., 3amEach hour, a bid for the asset is made according to the discrete probability density belowThe asset is busy for 3 hr. if the bid is accepted
bid ($1) 3 6 9probability 0.1 0.6 0.9
![Page 24: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/24.jpg)
MANY APPROACHES TO FORMULATIONN = 4amSn (x) = profit of optimally operating from n to N given state x at time n.dn*(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT)c(dn ) is the profit of taking decision dn
x(dn) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)
![Page 25: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/25.jpg)
RECURSION
s
)9(3.0)6(6.0)3(1.0
)2(3max)3(
121212
1211 SSS
hrSS
time
hours beforeasset is available again
See DP Example.xls
![Page 26: A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.](https://reader036.fdocuments.us/reader036/viewer/2022070308/551bd3d1550346b9588b5630/html5/thumbnails/26.jpg)
UNLOCKING THE JARGONx(d) can be governed by a Markov Chain
a different Pi,j matrix for each decision d
Result is a Markov Decision Process
)()()(min))((
)()()(min))((
,
1,
jESdPiEciSE
jESdPiEciSE
j jid
j njidn