Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with...

36
Markov Decision Processes § An MDP is defined by: § A set of states s S § A set of actions a A § A transition functionT(s, a, s’) § Probability that a from s leads to s’, i.e., P(s’| s, a) § Also called the model or the dynamics § A reward function R(s, a, s’) § Sometimes just R(s) or R(s’) § A start state § Maybe a terminal state § MDPs can be thought of as non-deterministic search problems

Transcript of Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with...

Page 1: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

MarkovDecisionProcesses

§ AnMDPisdefinedby:§ Asetofstatess∈ S§ Asetofactionsa∈ A§ AtransitionfunctionT(s,a,s’)

§ Probability thatafromsleadstos’,i.e.,P(s’| s,a)§ Alsocalledthemodelorthedynamics

§ ArewardfunctionR(s,a,s’)§ Sometimes justR(s)orR(s’)

§ Astartstate§ Maybeaterminalstate

§ MDPscanbethoughtofasnon-deterministicsearchproblems

Page 2: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

MDPSearchTrees

a

s

s’

s,a

(s,a,s’) isatransition

T(s,a,s’)=P(s’|s,a)s,a,s’

sisastate

(s,a)isaq-state

Page 3: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

ComparetoAdversarialSearch(Minimax)

§ Deterministic,zero-sumgames:§ Tic-tac-toe,chess,checkers§ Oneplayermaximizesresult§ Theotherminimizesresult

§ Minimax search:§ Astate-spacesearchtree§ Playersalternateturns§ Computeeachnode’sminimax value:

thebestachievableutilityagainstarational(optimal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimax values:computedrecursively

Page 4: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Worst-Casevs.AverageCase

10 10 9 100

max

min

Idea:Uncertainoutcomescontrolledbychance,notanadversary!

Page 5: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Expectimax Search

§ Whywouldn’tweknowwhattheresultofanactionwillbe?§ Explicitrandomness:rollingdice§ Unpredictableopponentsrespondrandomly§ Actionscanfail:whenmovingarobot,wheelsmightslip

§ Valuesshouldnowreflectaverage-case(expectimax)outcomes,notworst-case(minimax)outcomes

§ Expectimax search: computetheaveragescoreunderoptimalplay§ Maxnodesasinminimax search§ Chancenodesarelikeminnodesbuttheoutcomeisuncertain§ Calculatetheirexpectedutilities§ I.e.takeweightedaverage(expectation)ofchildren

§ MDPsandvalueiterationformalizethis.

10 4 5 7

max

chance

10 10 9 100

Page 6: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

OptimalQuantities

§ Thevalue (utility)ofastates:V*(s)=expectedutilitystartinginsandactingoptimally

§ Thevalue(utility)ofaq-state(s,a):Q*(s,a)=expectedutilitystartingouthavingtakenactionafromstatesand(thereafter)actingoptimally

§ Theoptimalpolicy:π*(s)=optimalactionfromstates

a

s

s’

s,a

(s,a,s’)isatransition

s,a,s’

sisastate

(s,a)isaq-state

Page 7: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

DeterministicSearch

a

s

s’

s,a

s,a,s’

a

s

s’

Page 8: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Policies

§ ForMDPs,solutionisanoptimalpolicyπ*:S→A§ Apolicyπ givesanactionforeachstate§ Anoptimalpolicyisonethatmaximizes

expectedutilityiffollowed

§ Indeterministic single-agentsearchproblems,wewantanoptimalplan,justasequenceofactions,fromstarttoagoal

Page 9: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Example:TravelinginRomania

§ Statespace:§ Cities

§ Successorfunction:§ Roads:Gotoadjacentcitywith

cost=distance

§ Startstate:§ Arad

§ Goaltest:§ Isstate==Bucharest?

§ Solution?

Page 10: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

SearchingwithaSearchTree

§ Search:§ Expandoutpotentialplans(treenodes)§ Maintainafringeofpartialplansunderconsideration§ Trytoexpandasfewtreenodesaspossible

Page 11: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

GeneralTreeSearch

§ Importantideas:§ Fringe§ Expansion§ Explorationstrategy

§ Mainquestion:whichfringenodestoexplore?

Page 12: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

SearchAlgorithmProperties

§ Complete:Guaranteedtofindasolutionifoneexists?§ Optimal:Guaranteedtofindtheleastcostpath?§ Time complexity?§ Space complexity?

§ Cartoonofsearchtree:§ bisthebranchingfactor§ misthemaximumdepth§ solutionsatvariousdepths

§ Numberofnodesinentiretree?§ 1+b+b2 +….bm =O(bm)

…b 1 node

b nodes

b2 nodes

bm nodes

m tiers

Page 13: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Depth-FirstSearch

S

a

b

d p

a

c

e

p

h

f

r

q

q c G

a

qe

p

h

f

r

q

q c G

a

S

G

d

b

p q

c

e

h

a

f

rqph

fd

ba

c

e

r

Strategy:expandadeepestnodefirst

Page 14: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Depth-FirstSearch(DFS)Properties

…b 1 node

b nodes

b2 nodes

bm nodes

m tiers

§ WhatnodesDFSexpand?§ Someleftprefixofthetree.§ Couldprocessthewholetree!§ Ifmisfinite,takestimeO(bm)

§ Howmuchspacedoesthefringetake?§ Onlyhassiblingsonpathtoroot,soO(bm)

§ Isitcomplete?§ mcouldbeinfinite,soonlyifweprevent

cycles

§ Isitoptimal?§ No,itfindsthe“leftmost”solution,

regardlessofdepthorcost

Page 15: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Breadth-FirstSearch

S

a

b

d p

a

c

e

p

h

f

r

q

q c G

a

qe

p

h

f

r

q

q c G

a

S

G

d

b

p q

ce

h

a

f

r

Search

Tiers

Strategy:expandashallowestnodefirst

Page 16: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Breadth-FirstSearch(BFS)Properties

§ WhatnodesdoesBFSexpand?§ Processesallnodesaboveshallowestsolution§ Letdepthofshallowestsolutionbes§ SearchtakestimeO(bs)

§ Howmuchspacedoesthefringetake?§ Hasroughlythelasttier,soO(bs)

§ Isitcomplete?§ smustbefiniteifasolutionexists,soyes!

§ Isitoptimal?§ Onlyifcostsareall1(moreoncostslater)

…b 1 node

b nodes

b2 nodes

bm nodes

s tiers

bs nodes

Page 17: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

IterativeDeepening

…b

§ Idea:getDFS’sspaceadvantagewithBFS’stime/shallow-solutionadvantages§ RunaDFSwithdepthlimit1.Ifnosolution…§ RunaDFSwithdepthlimit2.Ifnosolution…§ RunaDFSwithdepthlimit3.…..

§ Isn’tthatwastefullyredundant?§ Generallymostworkhappensinthelowestlevelsearched,sonotsobad!

Page 18: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

UniformCostSearch

S

a

b

d p

a

c

e

p

h

f

r

q

q c G

a

qe

p

h

f

r

q

q c G

a

Strategy: expand a cheapest node first

S

G

d

b

p q

c

e

h

a

f

r

3 9 1

16411

5

713

8

1011

17 11

0

6

39

1

1

2

8

8 2

15

1

2

Cost contours

2

Page 19: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

UniformCostSearch(UCS)Properties

§ WhatnodesdoesUCSexpand?§ Processesallnodeswithcostlessthancheapestsolution!§ IfthatsolutioncostsC* andarcscostatleastε , thenthe

“effectivedepth”isroughlyC*/ε§ TakestimeO(bC*/ε)(exponentialineffectivedepth)

§ Howmuchspacedoesthefringetake?§ Hasroughlythelasttier,soO(bC*/ε)

§ Isitcomplete?§ Assumingbestsolutionhasafinitecostandminimumarccost

ispositive,yes!

§ Isitoptimal?§ Yes!(ProofnextviaA*)

b

C*/ε “tiers”c ≤ 3

c ≤ 2c ≤ 1

Page 20: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

UniformCostIssues

§ Remember:UCSexploresincreasingcostcontours

§ Thegood:UCSiscompleteandoptimal!

§ Thebad:§ Exploresoptionsinevery“direction”§ Noinformationaboutgoallocation

§ We’llfixthatwithsearchheuristics!

Start Goal

c ≤ 3c ≤ 2

c ≤ 1

Page 21: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

SearchHeuristics§ Aheuristicis:

§ Afunctionthatestimates howcloseastateistoagoal§ Designedforaparticularsearchproblem§ Examples:Manhattandistance,Euclideandistancefor

pathfinding

10

5

11.2

Page 22: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

Example:HeuristicFunction

h(x)

Page 23: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

GreedySearch

§ Expandthenodethatseemsclosest…

§ Whatcangowrong?

Page 24: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

GreedySearch

§ Strategy:expandanodethatyouthinkisclosesttoagoalstate§ Heuristic:estimateofdistancetonearestgoalforeachstate

§ Worst-case:likeabadly-guidedDFS

…b

…b

Page 25: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

CombiningUCSandGreedy

§ Uniform-cost ordersbypathcost,orbackwardcostg(n)§ Greedy ordersbygoalproximity,orforwardcosth(n)

§ A*Search ordersbythesum:f(n)=g(n)+h(n)

S a d

b

Gh=5

h=6

h=2

1

8

11

2

h=6 h=0

c

h=7

3

e h=11

Example:Teg Grenager

S

a

b

c

ed

dG

G

g =0h=6

g =1h=5

g =2h=6

g =3h=7

g =4h=2

g =6h=0

g =9h=1

g =10h=2

g =12h=0

Page 26: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

IsA*Optimal?

§ Whatwentwrong?§ Actualbadgoalcost<estimatedgoodgoalcost§ Weneedestimatestobelessthanactualcosts!

A

GS

1 3h=6

h=0

5

h =7

Page 27: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

AdmissibleHeuristics

§ Aheuristich isadmissible (optimistic)if:

whereisthetruecosttoanearestgoal

§ Comingupwithadmissibleheuristicsismostofwhat’sinvolvedinusingA*inpractice.

Page 28: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

OptimalityofA*TreeSearch

Assume:§ Aisanoptimalgoalnode§ Bisasuboptimalgoalnode§ hisadmissible

Claim:

§ AwillexitthefringebeforeB

Page 29: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

OptimalityofA*TreeSearch:Blocking

Proof:§ ImagineBisonthefringe§ Someancestorn ofAisonthe

fringe,too(maybeA!)§ Claim:n willbeexpandedbeforeB

1. f(n)islessorequaltof(A)

Definitionoff-costAdmissibilityofh

h=0atagoal

Page 30: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

OptimalityofA*TreeSearch:Blocking

Proof:§ ImagineBisonthefringe§ Someancestorn ofAisonthe

fringe,too(maybeA!)§ Claim:n willbeexpandedbeforeB

1. f(n)islessorequaltof(A)2. f(A)islessthanf(B)

B issuboptimalh=0atagoal

Page 31: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

OptimalityofA*TreeSearch:Blocking

Proof:§ ImagineBisonthefringe§ Someancestorn ofAisonthe

fringe,too(maybeA!)§ Claim:n willbeexpandedbeforeB

1. f(n)islessorequaltof(A)2. f(A)islessthanf(B)3. n expandsbeforeB

§ AllancestorsofAexpandbeforeB§ AexpandsbeforeB§ A*searchisoptimal

Page 32: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

PropertiesofA*

Page 33: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

UCSvs A*Contours

§ Uniform-costexpandsequallyinall“directions”

§ A*expandsmainlytowardthegoal,butdoeshedgeitsbetstoensureoptimality

Start Goal

Start Goal

Page 34: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

A*Applications

§ Videogames§ Pathing /routingproblems§ Resourceplanningproblems§ Robotmotionplanning§ Languageanalysis§ Machinetranslation§ Speechrecognition§ …

Page 35: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

RoboticsExample

Page 36: Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages § Run a DFS with depth limit 1. If no

A*:Summary

§ A*usesbothbackwardcostsand(estimatesof)forwardcosts

§ A*isoptimalwithadmissible/consistentheuristics

§ Heuristicdesigniskey