Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr,...
-
Upload
alaina-thompson -
Category
Documents
-
view
226 -
download
0
Transcript of Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr,...
Efficient Solution Algorithms for Factored MDPs
by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman
Presented by Arkady Epshteyn
Problem with MDPs
• Exponential number of states• Example: Sysadmin Problem
• 4 computers: M1, M2 , M3 , M4
• Each machine is working or has failed.• State space: 24
• 8 actions: whether to reboot each machine or not• Reward: depends on the number of working
machines
Factored Representation
• Transition model: DBN• Reward model:
k
j
j xrxR1
)()(
Approximate Value Function
• Linear value function:
• Basis functions:
hi(Xi=true)=1
hi(Xi=false)=0
h0=1
k
j
jj xhwxV1
)()(
Markov Decision Processes
'
)( )'()|'()()(x
x xVxxPxRxV For fixed policy :
The optimal value function V*:
])'(*)|'()([max)(*'
x
aaa
xVxxPxRxV
Solving MDPMethod 1: Policy Iteration
• Value determination
• Policy Improvement
'
)()( )'()|'()()(x
txx
t xVxxPxRxV
•Polynomial in the number of states N•Exponential in the number of variables K
])'()|'()([maxarg)('
1
x
taa
a
t xVxxPxRx
Solving MDPMethod 2: Linear Programming
Intuition: compare with the fixed point of V(x):
axVxxPxRVtoSubject
xiVxMinimize
VVVariables
i
j
jijaai
i
x
ii
N
i
,,)|()(:
0)(:,)(:
,...,: 1
•Polynomial in the number of states N•Exponential in the number of variables
])'(*)|'()([max)(*'
x
aaa
xVxxPxRxV
Value Function Approximation
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
x
k
i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
1
axVxxPxRVtoSubject
xiVxMinimize
VVVariables
i
j
jijaai
i
x
ii
N
i
,,)|()(:
0)(:,)(:
,...,: 1
Objective function
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
•Objective function polynomial in the number of basis functions
i
i
Cx
i
i
ii
c
ii
i
i
x
i
x i
ii
xcwhere
chcw
xhxw
xhwx
)()(
,)()(
)()(
)()(
Each Constraint: Backprojection
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
i
i
x
ai
i
ii
x
a xhxxPwxhwxxP )'()|()'()|('
'
'
'
))(|(
)|(
)|'(
iii
ii
i
cpacEh
xcEh
xxEh
Representing Exponentially Many Constraints
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
axRxhxhxxPw
axxRxhxhxxPw
axxhwxxPxRxhw
a
i
ii
x
aix
a
i
ii
x
ai
i
ii
x
aa
i
ii
),()]()'()|([max0
,),()]()'()|([0
,,)'()|()()(
'
'
'
'
'
'
Restricted Domain
i j
jiix
a
i
iaii
x
a
i
ii
x
aix
xrxfw
xRxhxgw
axRxhxhxxPw
)()(max
)()]()([max
),()]()'()|([max0'
'
1. Backprojection - depends on few variables2. Basis function3. Reward function
1 2 3
Variable Elimination
)],(),([max),(
)],(),(),([max
)]],(),([max),(),([max
),(),(),(),(max
)()(max
4324214
321
321312221113,2,1
4324214
312221113,2,1
432421312221114,3,2,1
xxrxxrxxewhere
xxexxfwxxfw
xxrxxrxxfwxxfw
xxrxxrxxfwxxfw
xrxfw
x
xxx
xxxx
xxxx
i j
jiix
- similar to Bayesian Networks
Maximization as Linear Constraints
...
),(),(),(
),(),(),(
),(),(),(
),(),(),(
:sconstrainttoEquivalent
)],(),([max),(
432421321
432421321
432421321
432421321
4324214
321
xxrxxrxxe
xxrxxrxxe
xxrxxrxxe
xxrxxrxxe
xxrxxrxxex
• Exponential in the size of each function’s domain, not the number of states
Factored LP: Scaling
Rule-based Representation
Approximate Value Function
k
j hRule
ij
k
j
jj
k
j
jj
ji
xxxxRulew
xxxxhwxhwxV
1
4321
1
4321
1
),,,(
),,,()()(
x1
x30
5 0.6
h1:
6.0:,:
5:,:
0::
313
312
11
xxRule
xxRule
xRule
Notice: compact representation (2/4 variables, 3/16 rules)
Summing Over Rules
k
j hRule
ij
ji
xxxxRulewxV1
4321 ),,,()(
x1
x3u1
u2 u3
h1(x)
x2
x1u4
u5
h2(x)
+
u6
=
x2
x1
u1+u4
u2+u6 u3+u6
x1
x3 x3u5+u1
u2+u4 u3+u4
Multiplying over Rules
• Analogous construction
axRxhxhxxPw a
i
ii
x
aix
),()]()'()|([max0'
'
Rule-based MaximizationaxRxhxhxxPw a
i
ii
x
aix
),()]()'()|([max0'
'
x1
x2u1
u2 x3
u3 u4
Eliminate x2
x1
x3u1
max(u2,u3) max(u2,u4)
Rule-based Linear Program
• Backprojection, objective function – handled in a similar way
• All the operations (summation, multiplication, maximization) – keep rule representation intact
• is a linear function ji hRule
ij xxxxRulew ),,,( 4321
Conclusions
• Compact representation can be exploited to solve MDPs with exponentially many states efficiently.
• Still NP-complete in the worst case.• Factored solution may increase the size of LP
when the number of states is small (but it scales better).
• Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.