Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006 1 Eyal Amir U. of Illinois,...
-
Upload
whitney-robinson -
Category
Documents
-
view
225 -
download
0
Transcript of Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006 1 Eyal Amir U. of Illinois,...
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
1
Eyal AmirEyal Amir
U. of Illinois, Urbana-ChampaignU. of Illinois, Urbana-ChampaignJoint work with: Dafna Shahaf, Allen ChangJoint work with: Dafna Shahaf, Allen Chang
Connecting Connecting Learning and LogicLearning and Logic
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
2
Problem: Learn Actions’ EffectsProblem: Learn Actions’ Effects• GivenGiven: a sequence of observations over : a sequence of observations over
timetime– Example: Action a was executedExample: Action a was executed– Example: State feature f has value TExample: State feature f has value T
• WantWant: an estimate of actions’ effect : an estimate of actions’ effect modelmodel– Example: Example: aa is executable if the state is executable if the state
satisfies some propertysatisfies some property– Example: under condition _, Example: under condition _, aa has effect _ has effect _
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
3
Example: Light SwitchExample: Light SwitchTimeTime Action Action Observe (after action)Observe (after action)
Posn. Bulb SwitchPosn. Bulb Switch
00 E E ~up ~up
11 go-W go-W ~E ~E ~on ~on
22 sw-up sw-up ~E ~E ~on ~on FAIL FAIL
33 go-E go-E E E ~up ~up
44 sw-up sw-up E E up up
55 go-W go-W ~E on~E on
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
5
Example: Light SwitchExample: Light SwitchState 1State 1 State 2State 2
west east west eastwest east west east
~up ^ ~on ^ E ~up ^ ~on ^ E up ^ on ^ E up ^ on ^ E
• Flipping the switch changes world stateFlipping the switch changes world state
• We do not observe the state fullyWe do not observe the state fully
~up~up~on~on
upuponon
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
6
Motivation: Exploration AgentsMotivation: Exploration Agents• Exploring partially observable domainsExploring partially observable domains
– Interfaces to new softwareInterfaces to new software– Game-playing/companion agentsGame-playing/companion agents– Robots exploring buildings, cities, planetsRobots exploring buildings, cities, planets– Agents acting in the WWWAgents acting in the WWW
• Difficulties: Difficulties: – No knowledge of actions’ effects aprioriNo knowledge of actions’ effects apriori– Many featuresMany features– Partially observable domainPartially observable domain
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
7
Rest of This TalkRest of This Talk1.1. Actions in partially observed domainsActions in partially observed domains
2.2. Efficient learning algorithmsEfficient learning algorithms
3.3. Related Work & ConclusionsRelated Work & Conclusions
4.4. [[Theory behind AlgorithmsTheory behind Algorithms]]
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
8
• LearningLearning: Update knowledge of the : Update knowledge of the transition relation and state of the worldtransition relation and state of the world
Learning Transition ModelsLearning Transition Models
k4k4k3k3k2k2k1k1
a1
Knowledgestate
s1 s4s3s2World state
a2 a3 a4Action
TransitionTransitionRelationRelation
TransitionTransitionKnowledgeKnowledge
33 11 33 22
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
9
Action Model:Action Model:<State,Transition> Set<State,Transition> Set
Problem: n world features Problem: n world features 2^(2^n) transitions 2^(2^n) transitions
TT11++
TT33++
TT22++
TT33++
TT11++11
11
22
22
22 TT11++
TT22++
TT22++
TT33++
TT33++
TT22++
TT33++
33
11
22
33
11
1122
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
10
Rest of This TalkRest of This Talk1.1. Actions in partially observed domainsActions in partially observed domains
2.2. Efficient algorithmsEfficient algorithms1.1. Updating a Directed Acyclic Graph (DAG)Updating a Directed Acyclic Graph (DAG)
2.2. Factored update (flat formula repn.)Factored update (flat formula repn.)
3.3. Related Work & ConclusionsRelated Work & Conclusions
4.4. [[Theory behind AlgorithmsTheory behind Algorithms]]
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
11
Compact Encoding (Sometimes)Compact Encoding (Sometimes)• Transition Belief StateTransition Belief State = a logical = a logical
formula (transition relation and state)formula (transition relation and state)
• Observation = logical state formulaeObservation = logical state formulae
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
12
Compact Encoding (Sometimes)Compact Encoding (Sometimes)• Transition Belief StateTransition Belief State = a logical = a logical
formula (transition relation and state)formula (transition relation and state)
• Observation = logical state formulaeObservation = logical state formulae
• Actions = propositional symbols assert Actions = propositional symbols assert effect ruleseffect rules– ““sw-upsw-up causes causes on ^ upon ^ up if if EE””– ““go-W go-W keepskeeps up” up”
(= (= “go-W“go-W causes causes upup if if upup” …)” …)
– Prop symbol: Prop symbol: go-Wgo-W≈≈upup, sw-up, sw-upononEE, sw-up, sw-upupup
EE
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
tr1
tr2
initlocked
PressB causes¬locked if locked
PressB causeslocked if ¬locked
expl(0)
Updating the Status of “Locked”Time 0
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
tr1
tr2
initlocked
PressB causes¬locked if locked
PressB causeslocked if ¬locked
expl(0)
Updating the Status of “Locked”Time t
expl(t)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
tr1
tr2
initlocked
PressB causes¬locked if locked
PressB causeslocked if ¬locked
expl(0)
Updating the Status of “Locked”Time t+1
expl(t)
........ ¬
expl(t+1)
“locked” holdsbecause PressB
caused it
¬
........
“locked” holdsbecause PressBdid not change it
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
16
Algorithm: Algorithm: Update of a DAG1.1. Given: action a, observation o, Given: action a, observation o,
transition-belief formula transition-belief formula φφtt
2. for each fluent f,a. kb:= kb Λ logic formula “a is executable”b. expl'f := logical formula for the possible
explanations for f’s value after action ac. replace every fluent g in expl’f with a
pointer to explgd. update explf := expl'f
3.3. φφt+1t+1 is result of 2 together with o is result of 2 together with o
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
17
Fast Update: DAG Action ModelFast Update: DAG Action Model• DAG-update algorithm takes constant DAG-update algorithm takes constant
time (using hash table) time (using hash table) to update to update formulaformula
• Algorithm is Algorithm is exactexact
• Result DAG has size O(TnResult DAG has size O(Tnkk+|+|φφ00|)|)
– T steps, n features, k features in action T steps, n features, k features in action preconditionspreconditions
– Still only n features/variablesStill only n features/variables
• Use Use φφtt with a DAG-DPLL SAT-solver with a DAG-DPLL SAT-solver
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
18
Experiments: DAG Experiments: DAG UpdateUpdate
0
0.3
0.6
0.9
1.2
0 1000 2000 3000 4000 5000
Number of Steps
Tim
e f
or
Ste
p (
ms
ec
)
19 Fluents
55 Fluents
109 Fluents
DAG SLAF: Time for Step
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
19
Experiments: DAGExperiments: DAG Update Update
0
200
400
600
800
1000
0 1000 2000 3000 4000 5000
Number of Steps
Fo
rmu
la S
ize
(n
od
es
/10
^4
)
19 Fluents
55 Fluents
109 Fluents
DAG SLAF: Formula Size
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
20
Experiments: DAG Experiments: DAG QueriesQueriesInference on the DAG
0
250
500
750
1000
1250
1500
1750
2000
0 1000 2000 3000 4000
Number of Steps
Tim
e (s
ec/1
000)
Find a Model
False Rule
True Rule
False Fluent
True Fluent
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
21
Rest of This TalkRest of This Talk1.1. Actions in partially observed domainsActions in partially observed domains
2.2. Efficient algorithmsEfficient algorithms1.1. Updating a Directed Acyclic Graph (DAG)Updating a Directed Acyclic Graph (DAG)
2.2. Factored update (flat formula repn.)Factored update (flat formula repn.)
3.3. Related Work & ConclusionsRelated Work & Conclusions
4.4. [[Theory behind AlgorithmsTheory behind Algorithms]]
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
22
Distribution for Some ActionsDistribution for Some Actions Project[a](Project[a](Project[a](Project[a](Project[a]Project[a]
(()) Project[a](Project[a](Project[a](Project[a](Project[a]Project[a]
(())Project[a](Project[a](Project[a](Project[a](Project[a](TRUE)Project[a](TRUE)
• Compute update for literals in the formula Compute update for literals in the formula separately, and combine the resultsseparately, and combine the results•Known Success/FailureKnown Success/Failure• 1:1 Actions1:1 Actions
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
23
Actions that map states 1:1Actions that map states 1:1• Reason forReason for distribution over distribution over
Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())
Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())
1:11:1
Non-1:1Non-1:1
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
24
Algorithm: Factored LearningAlgorithm: Factored Learning• Given: action a, observation o, Given: action a, observation o,
transition-belief formula transition-belief formula φφtt
1.1. Precompute update for every literalPrecompute update for every literal
2.2. Decompose Decompose φφtt recursively, update recursively, update
every literal separately, and combine every literal separately, and combine the resultsthe results
3.3. Conjoin the result of 2. with o, Conjoin the result of 2. with o, producing producing φφt+1t+1
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
25
Fast Update of Action ModelFast Update of Action Model• Factored Learning algorithm takes time Factored Learning algorithm takes time
O(|O(|φφtt|) to update formula|) to update formula
• Algorithm is Algorithm is exact whenexact when– We know that actions are 1:1 mappings We know that actions are 1:1 mappings
between statesbetween states– Actions’ effects are always the sameActions’ effects are always the same
• Otherwise, Otherwise, approximate resultapproximate result: : includes includes exact action model, but also othersexact action model, but also others
• Resulting representation is flat (CNF)Resulting representation is flat (CNF)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
26
Compact Compact FlatFlat Representation: Representation: How?How?
• Keep some property of invariant, e.g.,Keep some property of invariant, e.g.,– K-CNF (CNF with k literals per clause)K-CNF (CNF with k literals per clause)– #clauses bounded#clauses bounded
• Factored LearningFactored Learning: compact repn. if : compact repn. if – We know if action succeeded, orWe know if action succeeded, or– Action failure leaves affected propositions Action failure leaves affected propositions
in a specified nondeterministic state, orin a specified nondeterministic state, or– Approximate: We discard large clauses Approximate: We discard large clauses
(allows more states)(allows more states)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
27
Compact Representation in CNFCompact Representation in CNF• Action affects and depends on Action affects and depends on ≤≤k k
features features | |φφt+1t+1| | ≤≤ ||φφtt|·n|·nk(k+1)k(k+1)
• Actions always have same effectActions always have same effect
||φφt+1t+1| | ≤≤ O(t·n) O(t·n)
– If also every feature observed every If also every feature observed every ≤≤k k stepssteps | |φφt+1t+1| | ≤≤ O(n O(nk+1k+1))
– If (instead) the number of actions If (instead) the number of actions ≤≤kk
||φφt+1t+1| | ≤≤ O(n·2 O(n·2kkloglogkk))
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
28
Experiments: Factored LearningExperiments: Factored LearningTime per Step
0123456789
10
number of steps
time
per
step
(m
illis
ec)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
29
SummarySummary• Learning of effects and preconditions of Learning of effects and preconditions of
actions in partially observable domainsactions in partially observable domains• Showed in this talk: Showed in this talk:
– Exact DAG update for any actionExact DAG update for any action– Exact CNF update, if actions 1:1 or w/o Exact CNF update, if actions 1:1 or w/o
conditional effectsconditional effects– Can update model efficiently without Can update model efficiently without
increase in #variables in belief stateincrease in #variables in belief state– Compact representation Compact representation
• Adventure games, virtual worlds, robotsAdventure games, virtual worlds, robots
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
30
InnovationInnovation in this Research in this Research• First scalable learning algorithm for partially First scalable learning algorithm for partially
observable dynamic domainsobservable dynamic domains• Algorithm (DAG) Algorithm (DAG)
– Always exact and optimalAlways exact and optimal– Takes constant update timeTakes constant update time
• Algorithm (Factored)Algorithm (Factored)– Exact for actions that always have same effectExact for actions that always have same effect– Takes polynomial update timeTakes polynomial update time
• Can solve problems with n>1000 domain Can solve problems with n>1000 domain features (>2features (>210001000 states) states)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
31
Current Approaches and WorkCurrent Approaches and Work• Reinforcement Learning & HMMsReinforcement Learning & HMMs
– [Chrisman’92], [McCallum’95], [Boyen & Koller [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]
– Maintain probability distribution over current stateMaintain probability distribution over current state– ProblemProblem: Exact solution intractable for domains of : Exact solution intractable for domains of
high (>100) dimensionalityhigh (>100) dimensionality– ProblemProblem: Approximate solutions have unbounded : Approximate solutions have unbounded
errors, or make strong mixing assumptionserrors, or make strong mixing assumptions
• Learning AI-Planning operatorsLearning AI-Planning operators– [Wang ’95], [Benson ’95], [Pasula etal. ’04],…[Wang ’95], [Benson ’95], [Pasula etal. ’04],…– ProblemProblem: Assume fully observable domain: Assume fully observable domain
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
32
Open ProblemsOpen Problems• Efficient Inference with learned formulaEfficient Inference with learned formula• Compact, efficient stochastic learningCompact, efficient stochastic learning• Average case of formula size?Average case of formula size?• Dynamic observation models, filtering in Dynamic observation models, filtering in
expanding worldsexpanding worlds• SoftwareSoftware: http://www.cs.uiuc.edu/~eyal: http://www.cs.uiuc.edu/~eyal
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
33
AcknowledgementsAcknowledgements• Dafna ShahafDafna Shahaf
• Megan NanceMegan Nance
• Brian HlubockyBrian Hlubocky
• Allen ChangAllen Chang
• … … and the rest of my incredible group of and the rest of my incredible group of studentsstudents
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
34
THE THE ENDEND
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
35
Talk OutlineTalk Outline1.1. Actions in partially observed domainsActions in partially observed domains
2.2. Representation and update of modelsRepresentation and update of models
3.3. Efficient algorithmsEfficient algorithms
4.4. Related Work & ConclusionsRelated Work & Conclusions
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
36
Compact Encoding (Sometimes)Compact Encoding (Sometimes)• Transition Belief StateTransition Belief State = a logical = a logical
formula (transition relation and state)formula (transition relation and state)
• Observation = logical state formulaeObservation = logical state formulae
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
37
Compact Encoding (Sometimes)Compact Encoding (Sometimes)• Transition Belief StateTransition Belief State = a logical = a logical
formula (transition relation and state)formula (transition relation and state)
• Observation = logical state formulaeObservation = logical state formulae
• Actions = propositional symbols assert Actions = propositional symbols assert effect ruleseffect rules– ““sw-upsw-up causes causes on ^ upon ^ up if if EE””– ““go-W go-W keepskeeps up” up”
(= (= “go-W“go-W causes causes upup if if upup” …)” …)
– Prop symbol: Prop symbol: go-Wgo-W≈≈upup, sw-up, sw-upononEE, sw-up, sw-upupup
EE
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
38
Example: Light SwitchExample: Light Switch• Initial belief state (time 0) = set of pairs:Initial belief state (time 0) = set of pairs:
{ <E,~on,~up>, <E,on,~up>}{ <E,~on,~up>, <E,on,~up>} all transition rels.all transition rels.
Space = O(2^(2^n))Space = O(2^(2^n))• New encoding: New encoding:
E E ~up ~up
Space = 2Space = 2• Question: how to update new representation?Question: how to update new representation?
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
39
Updating Action ModelUpdating Action Model• Transition belief state represented by Transition belief state represented by • Action-Definition(a)Action-Definition(a)t,t+1t,t+1
((a((at t (a(aff v (a v (affff fftt)) )) f ft+1t+1))
ff (a (at t fft+1 t+1 (a(aff v (a v (affff fftt))))))
(effect axioms + explanation closure)(effect axioms + explanation closure)
• UpdateUpdate: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1
of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
40
Example Update: Light SwitchExample Update: Light Switch• Transition belief state: Transition belief state: tt = E = Ett ~up ~uptt
• Project[sw-on](Project[sw-on](tt) = ) =
(E(Et+1t+1 sw-on sw-onEEEE sw-on sw-onEE ) )
(~up(~upt+1t+1 sw-on sw-on~up~up~up~up sw-on sw-on~up~up) )
……
• UpdateUpdate: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1
of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
41
Updating Action ModelUpdating Action Model• Transition belief state represented by Transition belief state represented by • t+1 t+1 = Update[= Update[oo](Project[](Project[aa](](tt))))
• ActionsActions: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1 of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1
• ObservationsObservations: Update[: Update[oo](]() = ) = oo
Theorem: formula filtering equivalent to Theorem: formula filtering equivalent to <transition,state>-set semantics<transition,state>-set semantics
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
42
Larger Picture:Larger Picture:An Exploration AgentAn Exploration Agent
World Model
Interface Module
Learning Module
Filtering Module
Decision Making Module
Knowledge Base
Commonsense extraction
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
43
Example: Light SwitchExample: Light Switch• Initial belief state (time 0) = set of pairs:Initial belief state (time 0) = set of pairs:
{ <E,~on,~up>, <E,on,~up>}{ <E,~on,~up>, <E,on,~up>} all transition rels.all transition rels.• Apply actionApply action a = a = go-Wgo-W
..
• Resulting belief state (after action)Resulting belief state (after action) { <E,~on,~up> } x { transitions map to same state } { <E,~on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <~E,~on,~up> } x { transitions set position to ~E } { <~E,~on,~up> } x { transitions set position to ~E } … …..
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
44
Example: Light SwitchExample: Light Switch• Resulting belief state (after action)Resulting belief state (after action) { <E,~on,~up> } x { transitions map to same state } { <E,~on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <~E,~on,~up> } x { transitions set position to ~E } { <~E,~on,~up> } x { transitions set position to ~E } … …..
• Observe: ~E, ~onObserve: ~E, ~on
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
45
Experiments w/DAG-UpdateExperiments w/DAG-Update
0
150
300
450
600
750
900
0 1000 2000 3000 4000 5000 6000
Number of Steps
To
tal T
ime
(m
se
c)
DAG SLAF: Time to Find a Model
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
46
Some Learned RulesSome Learned Rules
• Pickup(b1) causes Holding(b1)Pickup(b1) causes Holding(b1)
• Stack(b3,b5) causes On(b3,b5)Stack(b3,b5) causes On(b3,b5)
• Pickup() does not cause Arm-EmptyPickup() does not cause Arm-Empty
• Move(room1,room4) causes Move(room1,room4) causes At(book5,room4) if In-Briefcase(book5)At(book5,room4) if In-Briefcase(book5)
• Move(room1,room4) does not cause Move(room1,room4) does not cause At(book5,room4) if ¬In-Briefcase(book5)At(book5,room4) if ¬In-Briefcase(book5)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
47
Approximate LearningApproximate Learning• Always Always result of Factored-Learning result of Factored-Learning
( ( φφtt ) includes exact action model ) includes exact action model
• Same compactness results applySame compactness results apply
• Approximation decreases size: Discard Approximation decreases size: Discard clauses >k (allows more action models), clauses >k (allows more action models),
| |φφtt| = O(n^k)| = O(n^k)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
48
More in the PaperMore in the Paper• Algorithm that uses deduction for exact Algorithm that uses deduction for exact
updating the model representation updating the model representation alwaysalways
• Arbitrary preconditions and conditional Arbitrary preconditions and conditional effectseffects
• Formal justification of algorithms and Formal justification of algorithms and complexity resultscomplexity results
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
49
ExperimentsExperiments
Time per Step
0123456789
10
number of steps
time
per
step
(m
illis
ec)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
DAG-SLAF: The AlgorithmInput: a formula φ , an action-observation sequence <a
i,o
i> , i=1..t
Initialize:for each fluent f, expl
f := init
f
kb:= φ , where each f is replaced by initf
<example here?>
Process Sequence:for i=1..t do Update-Belief(a
i,o
i)
return kb Λ base Λ (f ↔ explf )
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
52
Current Game + TranslationCurrent Game + Translation• LambdaMOOLambdaMOO
– MUD code baseMUD code base– Uses database to store game world, Uses database to store game world, – Emphasis on player-world interaction Emphasis on player-world interaction – Powerful in-game programming languagePowerful in-game programming language
• Game sends agents logical description Game sends agents logical description of worldof world
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
53
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
54
Example: Light SwitchExample: Light SwitchTimeTime Action Action Observe (after action)Observe (after action)
Posn. Bulb SwitchPosn. Bulb Switch
00 E E ~up ~up
11 go-W go-W ~E ~E ~on ~on
22 sw-up sw-up ~E ~E ~on ~on FAIL FAIL
33 go-E go-E E E ~up ~up
44 sw-up sw-up E E up up
55 go-W go-W ~E on~E on
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
55
Current Approaches and WorkCurrent Approaches and Work• Reinforcement Learning & HMMsReinforcement Learning & HMMs
– [Chrisman’92], [McCallum’95], [Boyen & Koller [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]
– Maintain probability distribution over current stateMaintain probability distribution over current state– ProblemProblem: Exact solution intractable for domains of : Exact solution intractable for domains of
high (>100) dimensionalityhigh (>100) dimensionality– ProblemProblem: Approximate solutions have unbounded : Approximate solutions have unbounded
errors, or make strong mixing assumptionserrors, or make strong mixing assumptions
• Learning AI-Planning operatorsLearning AI-Planning operators– [Wang ’95], [Benson ’95], [Pasula etal. ’04],…[Wang ’95], [Benson ’95], [Pasula etal. ’04],…– ProblemProblem: Assume fully observable domain: Assume fully observable domain
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
56
Present ContributionPresent Contribution• Identify (useful) sufficient conditions for Identify (useful) sufficient conditions for
efficient computation of action modelsefficient computation of action models– Actions map states 1:1Actions map states 1:1– Deterministic actions with limited effectDeterministic actions with limited effect
• Polynomial-time algorithms for exact Polynomial-time algorithms for exact update of action modelupdate of action model
• (Useful) sufficient conditions for (Useful) sufficient conditions for compact representation of action modelcompact representation of action model
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
57
Present ContributionPresent Contribution• Identify (useful) sufficient conditions for Identify (useful) sufficient conditions for
efficient computation of efficient computation of action modelsaction models– Actions map states 1:1Actions map states 1:1– Deterministic actions with limited effectDeterministic actions with limited effect
• Polynomial-time Polynomial-time algorithmsalgorithms for exact for exact update of action modelupdate of action model
• (Useful) sufficient (Useful) sufficient conditionsconditions for for compact representation of action modelcompact representation of action model
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
58
Effect of Concept ExpansionEffect of Concept Expansion
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10
Concepts
Rel
atio
ns
Total Relations
Relations per retrievedconcept
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
59
Text Translation to LogicText Translation to Logic• Difficulties:Difficulties:
– Counterintuitive LambdaMOO actionsCounterintuitive LambdaMOO actions– Enumerate observations for action resultEnumerate observations for action result
1.1.Create small game world Create small game world
2.2.Predicates needed to describe worldPredicates needed to describe world
3.3.Define STRIPS-like actions required to Define STRIPS-like actions required to interact with game worldinteract with game world
4.4.MUD outputs logical descriptions of worldMUD outputs logical descriptions of world
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
60
Map Propositions to SemanticsMap Propositions to Semantics• E.g., assume that every action E.g., assume that every action aa is non- is non-
failing, deterministic, non-conditionalfailing, deterministic, non-conditional– For every domain description D, For every domain description D, –
TTDD = Rules = RulesD D (a (a≈f≈f v av aff
v av a-f-f))
ff– RulesRulesD D = { a= { aff
g | “a causes f if g” | “a causes f if g” D } D }
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
61
Mapping Theory to SemanticsMapping Theory to Semantics• Every set of state-transition pairs Every set of state-transition pairs
represented using a logical formularepresented using a logical formula
• Theorem: Every consistent propositional Theorem: Every consistent propositional theory maps to a set of theory maps to a set of <transition,state> pairs and vice versa<transition,state> pairs and vice versa
• Have formulas for deterministic actionsHave formulas for deterministic actions– Conditional effect, sometimes inexecutableConditional effect, sometimes inexecutable– Non-conditional, sometimes inexecutableNon-conditional, sometimes inexecutable– Non-conditional, always executableNon-conditional, always executable
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
62
Distribution PropertiesDistribution Properties Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())
• Filtering a DNF belief state by factoringFiltering a DNF belief state by factoring
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
63
Distribution PropertiesDistribution Properties Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())
Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())
Project[a](Project[a](Project[a](Project[a](Project[a](TRUE)Project[a](TRUE)
• Filtering a DNF belief state by factoringFiltering a DNF belief state by factoring
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
64
Knowledge About ActionsKnowledge About Actions
• Goal 1:Goal 1: conclude that one of sw-up, conclude that one of sw-up, go-W, go-E causes ongo-W, go-E causes on
• Goal 2:Goal 2: show that sw-up is possible only show that sw-up is possible only when E is true (with some assumptions)when E is true (with some assumptions)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
65
ChallengesChallenges(disregarding NLP for now):(disregarding NLP for now):
• Partially observable worldPartially observable world– E.g., Cannot see the light bulb in east roomE.g., Cannot see the light bulb in east room
• Incomplete knowledge about state of Incomplete knowledge about state of the world and effects of actionsthe world and effects of actions– E.g., do not know the effect/preconditions E.g., do not know the effect/preconditions
of flipping switchof flipping switch
• ....................
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
66
ChallengesChallenges(disregarding NLP for now):(disregarding NLP for now):
• Partially observable worldPartially observable world– E.g., Cannot see the light bulb in east roomE.g., Cannot see the light bulb in east room
• Incomplete knowledge about state of Incomplete knowledge about state of the world and effects of actionsthe world and effects of actions– E.g., do not know the effect/preconditions E.g., do not know the effect/preconditions
of flipping switchof flipping switch
• ....................
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
67
Relating Effect PropositionsRelating Effect Propositions
• Action-Definition(a)Action-Definition(a)t,t+1t,t+1
((a((at t (a(aff v (a v (affff fftt)) )) f ft+1t+1))
ff (a (at t fft+1 t+1 (a(aff v (a v (affff fftt))))))
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
69
Example: Moving ItemsExample: Moving Items• From here: Assume that actions non-From here: Assume that actions non-
conditional, always succeedconditional, always succeed• Initial belief state:Initial belief state:
set of { at(r,closet,room)}set of { at(r,closet,room)} all transition all transition rels.rels.
• Apply actionApply action a = a = move(r,closet,room)move(r,closet,room)tt
• Resulting belief stateResulting belief state
{ at(r,closet,room) } x { a{ at(r,closet,room) } x { aat(r,closet,room)at(r,closet,room)at(r,closet,room)at(r,closet,room) …} …}
{ -at(r,closet,room) } x { a{ -at(r,closet,room) } x { a-at(r,closet,room)-at(r,closet,room)at(r,closet,room)at(r,closet,room) ..} ..}
… …..
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
70
Example: Moving ItemsExample: Moving Items• Initial belief state:Initial belief state:
{ at(r,closet,room) } x { a{ at(r,closet,room) } x { aat(r,closet,room)at(r,closet,room)at(r,closet,room)at(r,closet,room) …} …}
{ -at(r,closet,room) } x { a{ -at(r,closet,room) } x { a-at(r,closet,room)-at(r,closet,room)at(r,closet,room)at(r,closet,room) ..} ..}
… …..
• Apply observationApply observation -at-at(r,closet,room)(r,closet,room)tt
• Resulting belief stateResulting belief state
{ -at(r,closet,room) } x { a{ -at(r,closet,room) } x { a-at(r,closet,room)-at(r,closet,room)at(r,closet,room)at(r,closet,room) ..} ..}
… …..
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
71
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representationrepresentation
s1 s4s3s2 s5
s1 s4s3s2 s5
s1 s4s3s2 s5
s1 s4s3s2 s5
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
72
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representationrepresentation
s4s3s2 s5
s4s3s2 s5
s4s3s2 s5
s4s3s2 s5
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
73
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representationrepresentation
s4s3 s5
s4s3 s5
s4s3 s5
s4s3 s5
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
74
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representationrepresentation
s4 s5
s4 s5
s4 s5
s4 s5O(2O(2nn) space) spaceO(2O(22n2n) time) time
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
75
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representation: representation: O(2O(2nn) space, O(2) space, O(22n2n) time) time
• Kalman Filter: Gaussian belief state and Kalman Filter: Gaussian belief state and linear transition modellinear transition model
s1 s4s3s2 s5
s1 s4s3s2 s5
s1 s4s3s2 s5
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
76
Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored
representation: representation: O(2O(2nn) space, O(2) space, O(22n2n) time) time
• Kalman Filter: Gaussian belief state and Kalman Filter: Gaussian belief state and linear transition modellinear transition model
s4 s5
s4 s5
s4 s5O(nO(n22) space) spaceO(nO(n33) time) time
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
77
Example: Moving ItemsExample: Moving Items• Initial belief state:Initial belief state:
set of all world statesset of all world states• Apply actionApply action
move(r,closet,room)move(r,closet,room)tt
• Resulting belief stateResulting belief state
all states that satisfy all states that satisfy in(r,closet)in(r,closet)t+1t+1• Reason:Reason:
– If initially If initially in(r,closet), then still in(r,closet), then still in(r,closet)in(r,closet)– If initially If initially in(r,closet), then now in(r,closet), then now in(r,closet)in(r,closet)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
78
Example: Filtering a LiteralExample: Filtering a Literal• Initial knowledge (time t):Initial knowledge (time t):
in(r,closet)in(r,closet)
• Apply Apply move(r,closet,room)move(r,closet,room)
Preconds:Preconds: in(r,closet) in(r,closet) locked(closet)locked(closet)
Effects:Effects: in(r,room) in(r,room) in(r,closet)in(r,closet)
• Resulting knowledge (time t+1):Resulting knowledge (time t+1):
in(r,room) in(r,room) in(r,closet) in(r,closet)
locked(closet) locked(closet) in(r,closet) in(r,closet)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
79
Example: Filtering a FormulaExample: Filtering a Formula• Initial knowledge (time t):Initial knowledge (time t):
in(broom,closet) in(broom,closet) locked(closet)locked(closet)
• Apply Apply move(r,closet,room)move(r,closet,room)
Preconds:Preconds: in(r,closet) in(r,closet) locked(closet)locked(closet)
Effects:Effects: in(r,room) in(r,room) in(r,closet)in(r,closet)
• Resulting knowledge (time t+1):Resulting knowledge (time t+1):
in(r,room) in(r,room) in(r,closet) in(r,closet) locked(closet)locked(closet)
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
80
Brief Outline of Future EffortBrief Outline of Future Effort• Filtering FOL representationsFiltering FOL representations
• Compact reductions from FOL to Compact reductions from FOL to Propositional LogicPropositional Logic
• More classes of filtering that maintains More classes of filtering that maintains compact representationcompact representation
• Learning world models in partially Learning world models in partially observable domainsobservable domains
• Stochastic filteringStochastic filtering
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
81
Open ProblemsOpen Problems• More families of actions/observationsMore families of actions/observations
– Stochastic conditions on observationsStochastic conditions on observations– Different data structures (BDDs? Horn?)Different data structures (BDDs? Horn?)
• Compact, efficient stochastic filteringCompact, efficient stochastic filtering
• Average case?Average case?
• Relational / first-order filteringRelational / first-order filtering
• Dynamic observation models, filtering in Dynamic observation models, filtering in expanding worldsexpanding worlds
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
82
STRIPS-Filter: Experimental STRIPS-Filter: Experimental ResultsResults
[A. & Russell ’03]
Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006
83
STRIPS-Filter: Experimental STRIPS-Filter: Experimental ResultsResults
[A. & Russell ’03]