L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li,...
-
Upload
arabella-marsha-stevens -
Category
Documents
-
view
213 -
download
0
Transcript of L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li,...
LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCESNan Li, Subbarao Kambhampati, and Sungwook YoonSchool of Computing and InformaticsArizona State UniversityTempe, AZ 85281 [email protected], [email protected], [email protected] Thanks to William Cushing
A riddle for you:
What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?
TWO TALES OF HTN PLANNING
Abstraction Efficiency Top-down
o Preference handlingo Qualityo Bottom-up
Learning Most work o Our work
Hitchhike? No way!
Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2
Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8
Phike: Hitchhike(source, dest) 0
LEARNING USER PLAN PREFERENCES
LEARNING USER PREFERENCES AS PHTNS
Given a set O of plans executed by the user Find a generative model, Hl
Hl = argmaxH p (O |H)
Probabilistic Hierarchical Task Networks(pHTNs)
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
LEARNING pHTNs
HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction
pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
A TWO-STEP ALGORITHM
• Greedy Structure Hypothesizer: Hypothesizes the
schema structure
• Expectation-Maximization (EM) Phase: Refines schema
probabilities Removes redundant
schemas
Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
GREEDY STRUCTURE HYPOTHESIZER
Structure learning Bottom-up Prefer recursive to non-recursive
EM PHASE
E Step: Plan parse tree
computation Most probable parse
tree M Step:
Selection probabilities update
s: ai p, aj ak
EVALUATION
Ideal: User studies (too hard) Our approach:
Assume H* represents user preferences Generate observed plans using H* (H* O) Learn Hl from O (O Hl) Compare H* and Hl (H* T*, Hl Tl)
Syntactic similarity is not important, only distribution is
Use KL-Divergence between distributions T*, Tl
KL-Divergence measures distance between distributions
Domains Randomly Generated Logistics Planning, Gold Miner
H*
P1, P2, …Pn
Learner
Hl
RATE OF LEARNING AND CONCISENESS
Rate of Learning Conciseness
More training plans, better schemas.
• Small domains, 1 or 2 more non-primitive actions• Large domains, much more non-primitive actions• Refine structure learning?
Randomly Generated Domains
EFFECTIVENESS OF EM
• Compare greedy schemas with learned schemas• EM step is very effective in capturing user preferences
Randomly Generated Domains
“BENCHMARK” DOMAINS
H*: Move by plane or
truck Prefer plane Prefer fewer steps
KL Divergence: 0.04 Recovers
plane > truck less steps > more
steps
H*: Get the laser cannon Shoot rock until adjacent to
gold Get a bomb Use the bomb to remove last
wall KL Divergence: 0.52
Reproduces basic strategy
Logistics Planning Gold Miner
CONCLUSIONS & EXTENSIONS
Learn user plan preferences Learned HTNs
capture preferences rather than domain abstractions
Evaluate predictive power Compare distributions
rather than structure
Preference obfuscation Poor graduate
student who prefers to travel by plane usually travels by car
Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09