Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with...

MarkovDecisionProcesses

§ AnMDPisdefinedby:§ Asetofstatess∈ S§ Asetofactionsa∈ A§ AtransitionfunctionT(s,a,s’)

§ Probability thatafromsleadstos’,i.e.,P(s’| s,a)§ Alsocalledthemodelorthedynamics

§ ArewardfunctionR(s,a,s’)§ Sometimes justR(s)orR(s’)

§ Astartstate§ Maybeaterminalstate

§ MDPscanbethoughtofasnon-deterministicsearchproblems

MDPSearchTrees

(s,a,s’) isatransition

T(s,a,s’)=P(s’|s,a)s,a,s’

sisastate

(s,a)isaq-state

ComparetoAdversarialSearch(Minimax)

§ Deterministic,zero-sumgames:§ Tic-tac-toe,chess,checkers§ Oneplayermaximizesresult§ Theotherminimizesresult

§ Minimax search:§ Astate-spacesearchtree§ Playersalternateturns§ Computeeachnode’sminimax value:

thebestachievableutilityagainstarational(optimal)adversary

8 2 5 6

min2 5

Terminalvalues:partofthegame

Minimax values:computedrecursively

Worst-Casevs.AverageCase

10 10 9 100

Idea:Uncertainoutcomescontrolledbychance,notanadversary!

Expectimax Search

§ Whywouldn’tweknowwhattheresultofanactionwillbe?§ Explicitrandomness:rollingdice§ Unpredictableopponentsrespondrandomly§ Actionscanfail:whenmovingarobot,wheelsmightslip

§ Valuesshouldnowreflectaverage-case(expectimax)outcomes,notworst-case(minimax)outcomes

§ Expectimax search: computetheaveragescoreunderoptimalplay§ Maxnodesasinminimax search§ Chancenodesarelikeminnodesbuttheoutcomeisuncertain§ Calculatetheirexpectedutilities§ I.e.takeweightedaverage(expectation)ofchildren

§ MDPsandvalueiterationformalizethis.

10 4 5 7

chance

10 10 9 100

OptimalQuantities

§ Thevalue (utility)ofastates:V*(s)=expectedutilitystartinginsandactingoptimally

§ Thevalue(utility)ofaq-state(s,a):Q*(s,a)=expectedutilitystartingouthavingtakenactionafromstatesand(thereafter)actingoptimally

§ Theoptimalpolicy:π*(s)=optimalactionfromstates

(s,a,s’)isatransition

s,a,s’

sisastate

(s,a)isaq-state

DeterministicSearch

s,a,s’

Policies

§ ForMDPs,solutionisanoptimalpolicyπ*:S→A§ Apolicyπ givesanactionforeachstate§ Anoptimalpolicyisonethatmaximizes

expectedutilityiffollowed

§ Indeterministic single-agentsearchproblems,wewantanoptimalplan,justasequenceofactions,fromstarttoagoal

Example:TravelinginRomania

§ Statespace:§ Cities

§ Successorfunction:§ Roads:Gotoadjacentcitywith

cost=distance

§ Startstate:§ Arad

§ Goaltest:§ Isstate==Bucharest?

§ Solution?

SearchingwithaSearchTree

§ Search:§ Expandoutpotentialplans(treenodes)§ Maintainafringeofpartialplansunderconsideration§ Trytoexpandasfewtreenodesaspossible

GeneralTreeSearch

§ Importantideas:§ Fringe§ Expansion§ Explorationstrategy

§ Mainquestion:whichfringenodestoexplore?

SearchAlgorithmProperties

§ Complete:Guaranteedtofindasolutionifoneexists?§ Optimal:Guaranteedtofindtheleastcostpath?§ Time complexity?§ Space complexity?

§ Cartoonofsearchtree:§ bisthebranchingfactor§ misthemaximumdepth§ solutionsatvariousdepths

§ Numberofnodesinentiretree?§ 1+b+b2 +….bm =O(bm)

…b 1 node

b nodes

b2 nodes

bm nodes

m tiers

Depth-FirstSearch

Strategy:expandadeepestnodefirst

Depth-FirstSearch(DFS)Properties

…b 1 node

b nodes

b2 nodes

bm nodes

m tiers

§ WhatnodesDFSexpand?§ Someleftprefixofthetree.§ Couldprocessthewholetree!§ Ifmisfinite,takestimeO(bm)

§ Howmuchspacedoesthefringetake?§ Onlyhassiblingsonpathtoroot,soO(bm)

§ Isitcomplete?§ mcouldbeinfinite,soonlyifweprevent

cycles

§ Isitoptimal?§ No,itfindsthe“leftmost”solution,

regardlessofdepthorcost

Breadth-FirstSearch

Search

Strategy:expandashallowestnodefirst

Breadth-FirstSearch(BFS)Properties

§ WhatnodesdoesBFSexpand?§ Processesallnodesaboveshallowestsolution§ Letdepthofshallowestsolutionbes§ SearchtakestimeO(bs)

§ Howmuchspacedoesthefringetake?§ Hasroughlythelasttier,soO(bs)

§ Isitcomplete?§ smustbefiniteifasolutionexists,soyes!

§ Isitoptimal?§ Onlyifcostsareall1(moreoncostslater)

…b 1 node

b nodes

b2 nodes

bm nodes

s tiers

bs nodes

IterativeDeepening

§ Idea:getDFS’sspaceadvantagewithBFS’stime/shallow-solutionadvantages§ RunaDFSwithdepthlimit1.Ifnosolution…§ RunaDFSwithdepthlimit2.Ifnosolution…§ RunaDFSwithdepthlimit3.…..

§ Isn’tthatwastefullyredundant?§ Generallymostworkhappensinthelowestlevelsearched,sonotsobad!

UniformCostSearch

Strategy: expand a cheapest node first

Cost contours

UniformCostSearch(UCS)Properties

§ WhatnodesdoesUCSexpand?§ Processesallnodeswithcostlessthancheapestsolution!§ IfthatsolutioncostsC* andarcscostatleastε , thenthe

“effectivedepth”isroughlyC*/ε§ TakestimeO(bC*/ε)(exponentialineffectivedepth)

§ Howmuchspacedoesthefringetake?§ Hasroughlythelasttier,soO(bC*/ε)

§ Isitcomplete?§ Assumingbestsolutionhasafinitecostandminimumarccost

ispositive,yes!

§ Isitoptimal?§ Yes!(ProofnextviaA*)

C*/ε “tiers”c ≤ 3

c ≤ 2c ≤ 1

UniformCostIssues

§ Remember:UCSexploresincreasingcostcontours

§ Thegood:UCSiscompleteandoptimal!

§ Thebad:§ Exploresoptionsinevery“direction”§ Noinformationaboutgoallocation

§ We’llfixthatwithsearchheuristics!

Start Goal

c ≤ 3c ≤ 2

c ≤ 1

SearchHeuristics§ Aheuristicis:

§ Afunctionthatestimates howcloseastateistoagoal§ Designedforaparticularsearchproblem§ Examples:Manhattandistance,Euclideandistancefor

pathfinding

Example:HeuristicFunction

GreedySearch

§ Expandthenodethatseemsclosest…

§ Whatcangowrong?

GreedySearch

§ Strategy:expandanodethatyouthinkisclosesttoagoalstate§ Heuristic:estimateofdistancetonearestgoalforeachstate

§ Worst-case:likeabadly-guidedDFS

CombiningUCSandGreedy

§ Uniform-cost ordersbypathcost,orbackwardcostg(n)§ Greedy ordersbygoalproximity,orforwardcosth(n)

§ A*Search ordersbythesum:f(n)=g(n)+h(n)

h=6 h=0

e h=11

Example:Teg Grenager

g =0h=6

g =1h=5

g =2h=6

g =3h=7

g =4h=2

g =6h=0

g =9h=1

g =10h=2

g =12h=0

IsA*Optimal?

§ Whatwentwrong?§ Actualbadgoalcost<estimatedgoodgoalcost§ Weneedestimatestobelessthanactualcosts!

1 3h=6

AdmissibleHeuristics

§ Aheuristich isadmissible (optimistic)if:

whereisthetruecosttoanearestgoal

§ Comingupwithadmissibleheuristicsismostofwhat’sinvolvedinusingA*inpractice.

OptimalityofA*TreeSearch

Assume:§ Aisanoptimalgoalnode§ Bisasuboptimalgoalnode§ hisadmissible

Claim:

§ AwillexitthefringebeforeB

OptimalityofA*TreeSearch:Blocking

Proof:§ ImagineBisonthefringe§ Someancestorn ofAisonthe

fringe,too(maybeA!)§ Claim:n willbeexpandedbeforeB

1. f(n)islessorequaltof(A)

Definitionoff-costAdmissibilityofh

h=0atagoal

1. f(n)islessorequaltof(A)2. f(A)islessthanf(B)

B issuboptimalh=0atagoal

1. f(n)islessorequaltof(A)2. f(A)islessthanf(B)3. n expandsbeforeB

§ AllancestorsofAexpandbeforeB§ AexpandsbeforeB§ A*searchisoptimal

PropertiesofA*

UCSvs A*Contours

§ Uniform-costexpandsequallyinall“directions”

§ A*expandsmainlytowardthegoal,butdoeshedgeitsbetstoensureoptimality

Start Goal

A*Applications

§ Videogames§ Pathing /routingproblems§ Resourceplanningproblems§ Robotmotionplanning§ Languageanalysis§ Machinetranslation§ Speechrecognition§ …

RoboticsExample

A*:Summary

§ A*usesbothbackwardcostsand(estimatesof)forwardcosts

§ A*isoptimalwithadmissible/consistentheuristics

§ Heuristicdesigniskey

Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with...

Documents

Transcript of Markov Decision Processesbboots3/ACRL-Spring2019/... · § Idea: get DFS’s space advantage with...

ACRL-OK Career Workshop

Build Sustainable Collaboration ACRL 2015

ACRL Trust in Science Talk

Fairway Red Hook Spring2019

WNYO ACRL Fall 2010 Newsletter

2015 ACRL Conference Scholarship Campaign Impact Report · 2015 ACRL Conference Scholarship Campaign Impact Report Thanks to the generosity of ACRL leaders, volunteers, members, and

ACRL in Washington, D.C.

Md acrl-program

QP User Group, ACRL 2011

ACRL 2015 Scholarship Breakfast

ACRL Information Literacy Immersion Recap

ACRL Closing Session

ACRL Washington Newsletter · Get Your Login On! A Technology Basics Series. by Nicholas Schiller. WA/ACRL Home Page . ACRL Washington Newsletter. Spring 2006, No. 58. President's

CS/ENGRD 2110 SPRING2019

Leachman ACRL NW- Beyond Academia

ACRL 2015 Wrap-Up

Acrl march2015 final

Virginia ACRL Presentation

ACRL SC 101: Engaging Faculty

ACRL ArtsGuide Anaheim 2012