Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006 1 Eyal Amir U. of Illinois,...

Connecting Learning and LogicEyal Amir, Cambridge Present., May 2006

1

Eyal AmirEyal Amir

U. of Illinois, Urbana-ChampaignU. of Illinois, Urbana-ChampaignJoint work with: Dafna Shahaf, Allen ChangJoint work with: Dafna Shahaf, Allen Chang

Connecting Connecting Learning and LogicLearning and Logic


2

Problem: Learn Actions’ EffectsProblem: Learn Actions’ Effects• GivenGiven: a sequence of observations over : a sequence of observations over

timetime– Example: Action a was executedExample: Action a was executed– Example: State feature f has value TExample: State feature f has value T

• WantWant: an estimate of actions’ effect : an estimate of actions’ effect modelmodel– Example: Example: aa is executable if the state is executable if the state

satisfies some propertysatisfies some property– Example: under condition _, Example: under condition _, aa has effect _ has effect _


3

Example: Light SwitchExample: Light SwitchTimeTime Action Action Observe (after action)Observe (after action)

Posn. Bulb SwitchPosn. Bulb Switch

00 E E ~up ~up

11 go-W go-W ~E ~E ~on ~on

22 sw-up sw-up ~E ~E ~on ~on FAIL FAIL

33 go-E go-E E E ~up ~up

44 sw-up sw-up E E up up

55 go-W go-W ~E on~E on


5

Example: Light SwitchExample: Light SwitchState 1State 1 State 2State 2

west east west eastwest east west east

~up ^ ~on ^ E ~up ^ ~on ^ E up ^ on ^ E up ^ on ^ E

• Flipping the switch changes world stateFlipping the switch changes world state

• We do not observe the state fullyWe do not observe the state fully

~up~up~on~on

upuponon


6

Motivation: Exploration AgentsMotivation: Exploration Agents• Exploring partially observable domainsExploring partially observable domains

– Interfaces to new softwareInterfaces to new software– Game-playing/companion agentsGame-playing/companion agents– Robots exploring buildings, cities, planetsRobots exploring buildings, cities, planets– Agents acting in the WWWAgents acting in the WWW

• Difficulties: Difficulties: – No knowledge of actions’ effects aprioriNo knowledge of actions’ effects apriori– Many featuresMany features– Partially observable domainPartially observable domain


7

Rest of This TalkRest of This Talk1.1. Actions in partially observed domainsActions in partially observed domains

2.2. Efficient learning algorithmsEfficient learning algorithms

3.3. Related Work & ConclusionsRelated Work & Conclusions

4.4. [[Theory behind AlgorithmsTheory behind Algorithms]]


8

• LearningLearning: Update knowledge of the : Update knowledge of the transition relation and state of the worldtransition relation and state of the world

Learning Transition ModelsLearning Transition Models

k4k4k3k3k2k2k1k1

a1

Knowledgestate

s1 s4s3s2World state

a2 a3 a4Action

TransitionTransitionRelationRelation

TransitionTransitionKnowledgeKnowledge

33 11 33 22


9

Action Model:Action Model:<State,Transition> Set<State,Transition> Set

Problem: n world features Problem: n world features 2^(2^n) transitions 2^(2^n) transitions

TT11++

TT33++

TT22++

TT33++

TT11++11

11

22

22

22 TT11++

TT22++

TT22++

TT33++

TT33++

TT22++

TT33++

33

11

22

33

11

1122


10


2.2. Efficient algorithmsEfficient algorithms1.1. Updating a Directed Acyclic Graph (DAG)Updating a Directed Acyclic Graph (DAG)

2.2. Factored update (flat formula repn.)Factored update (flat formula repn.)




11

Compact Encoding (Sometimes)Compact Encoding (Sometimes)• Transition Belief StateTransition Belief State = a logical = a logical

formula (transition relation and state)formula (transition relation and state)

• Observation = logical state formulaeObservation = logical state formulae


12




• Actions = propositional symbols assert Actions = propositional symbols assert effect ruleseffect rules– ““sw-upsw-up causes causes on ^ upon ^ up if if EE””– ““go-W go-W keepskeeps up” up”

(= (= “go-W“go-W causes causes upup if if upup” …)” …)

– Prop symbol: Prop symbol: go-Wgo-W≈≈upup, sw-up, sw-upononEE, sw-up, sw-upupup

EE


tr1

tr2

initlocked

PressB causes¬locked if locked

PressB causeslocked if ¬locked

expl(0)

Updating the Status of “Locked”Time 0


tr1

tr2

initlocked



expl(0)

Updating the Status of “Locked”Time t

expl(t)


tr1

tr2

initlocked



expl(0)

Updating the Status of “Locked”Time t+1

expl(t)

........ ¬

expl(t+1)

“locked” holdsbecause PressB

caused it

¬

........

“locked” holdsbecause PressBdid not change it


16

Algorithm: Algorithm: Update of a DAG1.1. Given: action a, observation o, Given: action a, observation o,

transition-belief formula transition-belief formula φφtt

2. for each fluent f,a. kb:= kb Λ logic formula “a is executable”b. expl'f := logical formula for the possible

explanations for f’s value after action ac. replace every fluent g in expl’f with a

pointer to explgd. update explf := expl'f

3.3. φφt+1t+1 is result of 2 together with o is result of 2 together with o


17

Fast Update: DAG Action ModelFast Update: DAG Action Model• DAG-update algorithm takes constant DAG-update algorithm takes constant

time (using hash table) time (using hash table) to update to update formulaformula

• Algorithm is Algorithm is exactexact

• Result DAG has size O(TnResult DAG has size O(Tnkk+|+|φφ00|)|)

– T steps, n features, k features in action T steps, n features, k features in action preconditionspreconditions

– Still only n features/variablesStill only n features/variables

• Use Use φφtt with a DAG-DPLL SAT-solver with a DAG-DPLL SAT-solver


18

Experiments: DAG Experiments: DAG UpdateUpdate

0

0.3

0.6

0.9

1.2

0 1000 2000 3000 4000 5000

Number of Steps

Tim

e f

or

Ste

p (

ms

ec

)

19 Fluents

55 Fluents

109 Fluents

DAG SLAF: Time for Step


19

Experiments: DAGExperiments: DAG Update Update

0

200

400

600

800

1000

0 1000 2000 3000 4000 5000

Number of Steps

Fo

rmu

la S

ize

(n

od

es

/10

^4

)

19 Fluents

55 Fluents

109 Fluents

DAG SLAF: Formula Size


20

Experiments: DAG Experiments: DAG QueriesQueriesInference on the DAG

0

250

500

750

1000

1250

1500

1750

2000

0 1000 2000 3000 4000

Number of Steps

Tim

e (s

ec/1

000)

Find a Model

False Rule

True Rule

False Fluent

True Fluent


21


2.2. Efficient algorithmsEfficient algorithms1.1. Updating a Directed Acyclic Graph (DAG)Updating a Directed Acyclic Graph (DAG)

2.2. Factored update (flat formula repn.)Factored update (flat formula repn.)




22

Distribution for Some ActionsDistribution for Some Actions Project[a](Project[a](Project[a](Project[a](Project[a]Project[a]

(()) Project[a](Project[a](Project[a](Project[a](Project[a]Project[a]

(())Project[a](Project[a](Project[a](Project[a](Project[a](TRUE)Project[a](TRUE)

• Compute update for literals in the formula Compute update for literals in the formula separately, and combine the resultsseparately, and combine the results•Known Success/FailureKnown Success/Failure• 1:1 Actions1:1 Actions


23

Actions that map states 1:1Actions that map states 1:1• Reason forReason for distribution over distribution over

Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())


1:11:1

Non-1:1Non-1:1


24

Algorithm: Factored LearningAlgorithm: Factored Learning• Given: action a, observation o, Given: action a, observation o,

transition-belief formula transition-belief formula φφtt

1.1. Precompute update for every literalPrecompute update for every literal

2.2. Decompose Decompose φφtt recursively, update recursively, update

every literal separately, and combine every literal separately, and combine the resultsthe results

3.3. Conjoin the result of 2. with o, Conjoin the result of 2. with o, producing producing φφt+1t+1


25

Fast Update of Action ModelFast Update of Action Model• Factored Learning algorithm takes time Factored Learning algorithm takes time

O(|O(|φφtt|) to update formula|) to update formula

• Algorithm is Algorithm is exact whenexact when– We know that actions are 1:1 mappings We know that actions are 1:1 mappings

between statesbetween states– Actions’ effects are always the sameActions’ effects are always the same

• Otherwise, Otherwise, approximate resultapproximate result: : includes includes exact action model, but also othersexact action model, but also others

• Resulting representation is flat (CNF)Resulting representation is flat (CNF)


26

Compact Compact FlatFlat Representation: Representation: How?How?

• Keep some property of invariant, e.g.,Keep some property of invariant, e.g.,– K-CNF (CNF with k literals per clause)K-CNF (CNF with k literals per clause)– #clauses bounded#clauses bounded

• Factored LearningFactored Learning: compact repn. if : compact repn. if – We know if action succeeded, orWe know if action succeeded, or– Action failure leaves affected propositions Action failure leaves affected propositions

in a specified nondeterministic state, orin a specified nondeterministic state, or– Approximate: We discard large clauses Approximate: We discard large clauses

(allows more states)(allows more states)


27

Compact Representation in CNFCompact Representation in CNF• Action affects and depends on Action affects and depends on ≤≤k k

features features | |φφt+1t+1| | ≤≤ ||φφtt|·n|·nk(k+1)k(k+1)

• Actions always have same effectActions always have same effect

||φφt+1t+1| | ≤≤ O(t·n) O(t·n)

– If also every feature observed every If also every feature observed every ≤≤k k stepssteps | |φφt+1t+1| | ≤≤ O(n O(nk+1k+1))

– If (instead) the number of actions If (instead) the number of actions ≤≤kk

||φφt+1t+1| | ≤≤ O(n·2 O(n·2kkloglogkk))


28

Experiments: Factored LearningExperiments: Factored LearningTime per Step

0123456789

10

number of steps

time

per

step

(m

illis

ec)


29

SummarySummary• Learning of effects and preconditions of Learning of effects and preconditions of

actions in partially observable domainsactions in partially observable domains• Showed in this talk: Showed in this talk:

– Exact DAG update for any actionExact DAG update for any action– Exact CNF update, if actions 1:1 or w/o Exact CNF update, if actions 1:1 or w/o

conditional effectsconditional effects– Can update model efficiently without Can update model efficiently without

increase in #variables in belief stateincrease in #variables in belief state– Compact representation Compact representation

• Adventure games, virtual worlds, robotsAdventure games, virtual worlds, robots


30

InnovationInnovation in this Research in this Research• First scalable learning algorithm for partially First scalable learning algorithm for partially

observable dynamic domainsobservable dynamic domains• Algorithm (DAG) Algorithm (DAG)

– Always exact and optimalAlways exact and optimal– Takes constant update timeTakes constant update time

• Algorithm (Factored)Algorithm (Factored)– Exact for actions that always have same effectExact for actions that always have same effect– Takes polynomial update timeTakes polynomial update time

• Can solve problems with n>1000 domain Can solve problems with n>1000 domain features (>2features (>210001000 states) states)


31

Current Approaches and WorkCurrent Approaches and Work• Reinforcement Learning & HMMsReinforcement Learning & HMMs

– [Chrisman’92], [McCallum’95], [Boyen & Koller [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]

– Maintain probability distribution over current stateMaintain probability distribution over current state– ProblemProblem: Exact solution intractable for domains of : Exact solution intractable for domains of

high (>100) dimensionalityhigh (>100) dimensionality– ProblemProblem: Approximate solutions have unbounded : Approximate solutions have unbounded

errors, or make strong mixing assumptionserrors, or make strong mixing assumptions

• Learning AI-Planning operatorsLearning AI-Planning operators– [Wang ’95], [Benson ’95], [Pasula etal. ’04],…[Wang ’95], [Benson ’95], [Pasula etal. ’04],…– ProblemProblem: Assume fully observable domain: Assume fully observable domain


32

Open ProblemsOpen Problems• Efficient Inference with learned formulaEfficient Inference with learned formula• Compact, efficient stochastic learningCompact, efficient stochastic learning• Average case of formula size?Average case of formula size?• Dynamic observation models, filtering in Dynamic observation models, filtering in

expanding worldsexpanding worlds• SoftwareSoftware: http://www.cs.uiuc.edu/~eyal: http://www.cs.uiuc.edu/~eyal


33

AcknowledgementsAcknowledgements• Dafna ShahafDafna Shahaf

• Megan NanceMegan Nance

• Brian HlubockyBrian Hlubocky

• Allen ChangAllen Chang

• … … and the rest of my incredible group of and the rest of my incredible group of studentsstudents


34

THE THE ENDEND


35

Talk OutlineTalk Outline1.1. Actions in partially observed domainsActions in partially observed domains

2.2. Representation and update of modelsRepresentation and update of models

3.3. Efficient algorithmsEfficient algorithms



36





37




• Actions = propositional symbols assert Actions = propositional symbols assert effect ruleseffect rules– ““sw-upsw-up causes causes on ^ upon ^ up if if EE””– ““go-W go-W keepskeeps up” up”

(= (= “go-W“go-W causes causes upup if if upup” …)” …)

– Prop symbol: Prop symbol: go-Wgo-W≈≈upup, sw-up, sw-upononEE, sw-up, sw-upupup

EE


38

Example: Light SwitchExample: Light Switch• Initial belief state (time 0) = set of pairs:Initial belief state (time 0) = set of pairs:

{ <E,~on,~up>, <E,on,~up>}{ <E,~on,~up>, <E,on,~up>} all transition rels.all transition rels.

Space = O(2^(2^n))Space = O(2^(2^n))• New encoding: New encoding:

E E ~up ~up

Space = 2Space = 2• Question: how to update new representation?Question: how to update new representation?


39

Updating Action ModelUpdating Action Model• Transition belief state represented by Transition belief state represented by • Action-Definition(a)Action-Definition(a)t,t+1t,t+1

((a((at t (a(aff v (a v (affff fftt)) )) f ft+1t+1))

ff (a (at t fft+1 t+1 (a(aff v (a v (affff fftt))))))

(effect axioms + explanation closure)(effect axioms + explanation closure)

• UpdateUpdate: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1

of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1


40

Example Update: Light SwitchExample Update: Light Switch• Transition belief state: Transition belief state: tt = E = Ett ~up ~uptt

• Project[sw-on](Project[sw-on](tt) = ) =

(E(Et+1t+1 sw-on sw-onEEEE sw-on sw-onEE ) )

(~up(~upt+1t+1 sw-on sw-on~up~up~up~up sw-on sw-on~up~up) )

……

• UpdateUpdate: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1

of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1


41

Updating Action ModelUpdating Action Model• Transition belief state represented by Transition belief state represented by • t+1 t+1 = Update[= Update[oo](Project[](Project[aa](](tt))))

• ActionsActions: Project[: Project[aa](](tt)) = logical results = logical resultst+1t+1 of of tt Action-Definition(a)Action-Definition(a)t,t+1t,t+1

• ObservationsObservations: Update[: Update[oo](]() = ) = oo

Theorem: formula filtering equivalent to Theorem: formula filtering equivalent to <transition,state>-set semantics<transition,state>-set semantics


42

Larger Picture:Larger Picture:An Exploration AgentAn Exploration Agent

World Model

Interface Module

Learning Module

Filtering Module

Decision Making Module

Knowledge Base

Commonsense extraction


43

Example: Light SwitchExample: Light Switch• Initial belief state (time 0) = set of pairs:Initial belief state (time 0) = set of pairs:

{ <E,~on,~up>, <E,on,~up>}{ <E,~on,~up>, <E,on,~up>} all transition rels.all transition rels.• Apply actionApply action a = a = go-Wgo-W

..

• Resulting belief state (after action)Resulting belief state (after action) { <E,~on,~up> } x { transitions map to same state } { <E,~on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <~E,~on,~up> } x { transitions set position to ~E } { <~E,~on,~up> } x { transitions set position to ~E } … …..


44

Example: Light SwitchExample: Light Switch• Resulting belief state (after action)Resulting belief state (after action) { <E,~on,~up> } x { transitions map to same state } { <E,~on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <E,on,~up> } x { transitions map to same state } { <~E,~on,~up> } x { transitions set position to ~E } { <~E,~on,~up> } x { transitions set position to ~E } … …..

• Observe: ~E, ~onObserve: ~E, ~on


45

Experiments w/DAG-UpdateExperiments w/DAG-Update

0

150

300

450

600

750

900

0 1000 2000 3000 4000 5000 6000

Number of Steps

To

tal T

ime

(m

se

c)

DAG SLAF: Time to Find a Model


46

Some Learned RulesSome Learned Rules

• Pickup(b1) causes Holding(b1)Pickup(b1) causes Holding(b1)

• Stack(b3,b5) causes On(b3,b5)Stack(b3,b5) causes On(b3,b5)

• Pickup() does not cause Arm-EmptyPickup() does not cause Arm-Empty

• Move(room1,room4) causes Move(room1,room4) causes At(book5,room4) if In-Briefcase(book5)At(book5,room4) if In-Briefcase(book5)

• Move(room1,room4) does not cause Move(room1,room4) does not cause At(book5,room4) if ¬In-Briefcase(book5)At(book5,room4) if ¬In-Briefcase(book5)


47

Approximate LearningApproximate Learning• Always Always result of Factored-Learning result of Factored-Learning

( ( φφtt ) includes exact action model ) includes exact action model

• Same compactness results applySame compactness results apply

• Approximation decreases size: Discard Approximation decreases size: Discard clauses >k (allows more action models), clauses >k (allows more action models),

| |φφtt| = O(n^k)| = O(n^k)


48

More in the PaperMore in the Paper• Algorithm that uses deduction for exact Algorithm that uses deduction for exact

updating the model representation updating the model representation alwaysalways

• Arbitrary preconditions and conditional Arbitrary preconditions and conditional effectseffects

• Formal justification of algorithms and Formal justification of algorithms and complexity resultscomplexity results


49

ExperimentsExperiments

Time per Step

0123456789

10

number of steps

time

per

step

(m

illis

ec)


DAG-SLAF: The AlgorithmInput: a formula φ , an action-observation sequence <a

i,o

i> , i=1..t

Initialize:for each fluent f, expl

f := init

f

kb:= φ , where each f is replaced by initf

<example here?>

Process Sequence:for i=1..t do Update-Belief(a

i,o

i)

return kb Λ base Λ (f ↔ explf )


52

Current Game + TranslationCurrent Game + Translation• LambdaMOOLambdaMOO

– MUD code baseMUD code base– Uses database to store game world, Uses database to store game world, – Emphasis on player-world interaction Emphasis on player-world interaction – Powerful in-game programming languagePowerful in-game programming language

• Game sends agents logical description Game sends agents logical description of worldof world


53


54

Example: Light SwitchExample: Light SwitchTimeTime Action Action Observe (after action)Observe (after action)

Posn. Bulb SwitchPosn. Bulb Switch

00 E E ~up ~up

11 go-W go-W ~E ~E ~on ~on

22 sw-up sw-up ~E ~E ~on ~on FAIL FAIL

33 go-E go-E E E ~up ~up

44 sw-up sw-up E E up up

55 go-W go-W ~E on~E on


55

Current Approaches and WorkCurrent Approaches and Work• Reinforcement Learning & HMMsReinforcement Learning & HMMs

– [Chrisman’92], [McCallum’95], [Boyen & Koller [Chrisman’92], [McCallum’95], [Boyen & Koller ’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]’98], [Murphy etal. ’00], [Kearns, Mansour, Ng ’00]

– Maintain probability distribution over current stateMaintain probability distribution over current state– ProblemProblem: Exact solution intractable for domains of : Exact solution intractable for domains of

high (>100) dimensionalityhigh (>100) dimensionality– ProblemProblem: Approximate solutions have unbounded : Approximate solutions have unbounded

errors, or make strong mixing assumptionserrors, or make strong mixing assumptions

• Learning AI-Planning operatorsLearning AI-Planning operators– [Wang ’95], [Benson ’95], [Pasula etal. ’04],…[Wang ’95], [Benson ’95], [Pasula etal. ’04],…– ProblemProblem: Assume fully observable domain: Assume fully observable domain


56

Present ContributionPresent Contribution• Identify (useful) sufficient conditions for Identify (useful) sufficient conditions for

efficient computation of action modelsefficient computation of action models– Actions map states 1:1Actions map states 1:1– Deterministic actions with limited effectDeterministic actions with limited effect

• Polynomial-time algorithms for exact Polynomial-time algorithms for exact update of action modelupdate of action model

• (Useful) sufficient conditions for (Useful) sufficient conditions for compact representation of action modelcompact representation of action model


57

Present ContributionPresent Contribution• Identify (useful) sufficient conditions for Identify (useful) sufficient conditions for

efficient computation of efficient computation of action modelsaction models– Actions map states 1:1Actions map states 1:1– Deterministic actions with limited effectDeterministic actions with limited effect

• Polynomial-time Polynomial-time algorithmsalgorithms for exact for exact update of action modelupdate of action model

• (Useful) sufficient (Useful) sufficient conditionsconditions for for compact representation of action modelcompact representation of action model


58

Effect of Concept ExpansionEffect of Concept Expansion

0

200

400

600

800

1000

1200

1400

1 2 3 4 5 6 7 8 9 10

Concepts

Rel

atio

ns

Total Relations

Relations per retrievedconcept


59

Text Translation to LogicText Translation to Logic• Difficulties:Difficulties:

– Counterintuitive LambdaMOO actionsCounterintuitive LambdaMOO actions– Enumerate observations for action resultEnumerate observations for action result

1.1.Create small game world Create small game world

2.2.Predicates needed to describe worldPredicates needed to describe world

3.3.Define STRIPS-like actions required to Define STRIPS-like actions required to interact with game worldinteract with game world

4.4.MUD outputs logical descriptions of worldMUD outputs logical descriptions of world


60

Map Propositions to SemanticsMap Propositions to Semantics• E.g., assume that every action E.g., assume that every action aa is non- is non-

failing, deterministic, non-conditionalfailing, deterministic, non-conditional– For every domain description D, For every domain description D, –

TTDD = Rules = RulesD D (a (a≈f≈f v av aff

v av a-f-f))

ff– RulesRulesD D = { a= { aff

g | “a causes f if g” | “a causes f if g” D } D }


61

Mapping Theory to SemanticsMapping Theory to Semantics• Every set of state-transition pairs Every set of state-transition pairs

represented using a logical formularepresented using a logical formula

• Theorem: Every consistent propositional Theorem: Every consistent propositional theory maps to a set of theory maps to a set of <transition,state> pairs and vice versa<transition,state> pairs and vice versa

• Have formulas for deterministic actionsHave formulas for deterministic actions– Conditional effect, sometimes inexecutableConditional effect, sometimes inexecutable– Non-conditional, sometimes inexecutableNon-conditional, sometimes inexecutable– Non-conditional, always executableNon-conditional, always executable


62

Distribution PropertiesDistribution Properties Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())

• Filtering a DNF belief state by factoringFiltering a DNF belief state by factoring


63

Distribution PropertiesDistribution Properties Project[a](Project[a](Project[a](Project[a](Project[a](Project[a]())


Project[a](Project[a](Project[a](Project[a](Project[a](TRUE)Project[a](TRUE)

• Filtering a DNF belief state by factoringFiltering a DNF belief state by factoring


64

Knowledge About ActionsKnowledge About Actions

• Goal 1:Goal 1: conclude that one of sw-up, conclude that one of sw-up, go-W, go-E causes ongo-W, go-E causes on

• Goal 2:Goal 2: show that sw-up is possible only show that sw-up is possible only when E is true (with some assumptions)when E is true (with some assumptions)


65

ChallengesChallenges(disregarding NLP for now):(disregarding NLP for now):

• Partially observable worldPartially observable world– E.g., Cannot see the light bulb in east roomE.g., Cannot see the light bulb in east room

• Incomplete knowledge about state of Incomplete knowledge about state of the world and effects of actionsthe world and effects of actions– E.g., do not know the effect/preconditions E.g., do not know the effect/preconditions

of flipping switchof flipping switch

• ....................


66

ChallengesChallenges(disregarding NLP for now):(disregarding NLP for now):

• Partially observable worldPartially observable world– E.g., Cannot see the light bulb in east roomE.g., Cannot see the light bulb in east room

• Incomplete knowledge about state of Incomplete knowledge about state of the world and effects of actionsthe world and effects of actions– E.g., do not know the effect/preconditions E.g., do not know the effect/preconditions

of flipping switchof flipping switch

• ....................


67

Relating Effect PropositionsRelating Effect Propositions

• Action-Definition(a)Action-Definition(a)t,t+1t,t+1

((a((at t (a(aff v (a v (affff fftt)) )) f ft+1t+1))

ff (a (at t fft+1 t+1 (a(aff v (a v (affff fftt))))))


69

Example: Moving ItemsExample: Moving Items• From here: Assume that actions non-From here: Assume that actions non-

conditional, always succeedconditional, always succeed• Initial belief state:Initial belief state:

set of { at(r,closet,room)}set of { at(r,closet,room)} all transition all transition rels.rels.

• Apply actionApply action a = a = move(r,closet,room)move(r,closet,room)tt

• Resulting belief stateResulting belief state

{ at(r,closet,room) } x { a{ at(r,closet,room) } x { aat(r,closet,room)at(r,closet,room)at(r,closet,room)at(r,closet,room) …} …}

{ -at(r,closet,room) } x { a{ -at(r,closet,room) } x { a-at(r,closet,room)-at(r,closet,room)at(r,closet,room)at(r,closet,room) ..} ..}

… …..


70

Example: Moving ItemsExample: Moving Items• Initial belief state:Initial belief state:

{ at(r,closet,room) } x { a{ at(r,closet,room) } x { aat(r,closet,room)at(r,closet,room)at(r,closet,room)at(r,closet,room) …} …}


… …..

• Apply observationApply observation -at-at(r,closet,room)(r,closet,room)tt



… …..


71

Filtering Stochastic ProcessesFiltering Stochastic Processes• Dynamic Bayes Nets (DBNs): factored Dynamic Bayes Nets (DBNs): factored

representationrepresentation

s1 s4s3s2 s5

s1 s4s3s2 s5

s1 s4s3s2 s5

s1 s4s3s2 s5


72



s4s3s2 s5

s4s3s2 s5

s4s3s2 s5

s4s3s2 s5


73



s4s3 s5

s4s3 s5

s4s3 s5

s4s3 s5


74



s4 s5

s4 s5

s4 s5

s4 s5O(2O(2nn) space) spaceO(2O(22n2n) time) time


75


representation: representation: O(2O(2nn) space, O(2) space, O(22n2n) time) time

• Kalman Filter: Gaussian belief state and Kalman Filter: Gaussian belief state and linear transition modellinear transition model

s1 s4s3s2 s5

s1 s4s3s2 s5

s1 s4s3s2 s5


76


representation: representation: O(2O(2nn) space, O(2) space, O(22n2n) time) time

• Kalman Filter: Gaussian belief state and Kalman Filter: Gaussian belief state and linear transition modellinear transition model

s4 s5

s4 s5

s4 s5O(nO(n22) space) spaceO(nO(n33) time) time


77

Example: Moving ItemsExample: Moving Items• Initial belief state:Initial belief state:

set of all world statesset of all world states• Apply actionApply action

move(r,closet,room)move(r,closet,room)tt


all states that satisfy all states that satisfy in(r,closet)in(r,closet)t+1t+1• Reason:Reason:

– If initially If initially in(r,closet), then still in(r,closet), then still in(r,closet)in(r,closet)– If initially If initially in(r,closet), then now in(r,closet), then now in(r,closet)in(r,closet)


78

Example: Filtering a LiteralExample: Filtering a Literal• Initial knowledge (time t):Initial knowledge (time t):

in(r,closet)in(r,closet)

• Apply Apply move(r,closet,room)move(r,closet,room)

Preconds:Preconds: in(r,closet) in(r,closet) locked(closet)locked(closet)

Effects:Effects: in(r,room) in(r,room) in(r,closet)in(r,closet)

• Resulting knowledge (time t+1):Resulting knowledge (time t+1):

in(r,room) in(r,room) in(r,closet) in(r,closet)

locked(closet) locked(closet) in(r,closet) in(r,closet)


79

Example: Filtering a FormulaExample: Filtering a Formula• Initial knowledge (time t):Initial knowledge (time t):

in(broom,closet) in(broom,closet) locked(closet)locked(closet)

• Apply Apply move(r,closet,room)move(r,closet,room)

Preconds:Preconds: in(r,closet) in(r,closet) locked(closet)locked(closet)

Effects:Effects: in(r,room) in(r,room) in(r,closet)in(r,closet)

• Resulting knowledge (time t+1):Resulting knowledge (time t+1):

in(r,room) in(r,room) in(r,closet) in(r,closet) locked(closet)locked(closet)


80

Brief Outline of Future EffortBrief Outline of Future Effort• Filtering FOL representationsFiltering FOL representations

• Compact reductions from FOL to Compact reductions from FOL to Propositional LogicPropositional Logic

• More classes of filtering that maintains More classes of filtering that maintains compact representationcompact representation

• Learning world models in partially Learning world models in partially observable domainsobservable domains

• Stochastic filteringStochastic filtering


81

Open ProblemsOpen Problems• More families of actions/observationsMore families of actions/observations

– Stochastic conditions on observationsStochastic conditions on observations– Different data structures (BDDs? Horn?)Different data structures (BDDs? Horn?)

• Compact, efficient stochastic filteringCompact, efficient stochastic filtering

• Average case?Average case?

• Relational / first-order filteringRelational / first-order filtering

• Dynamic observation models, filtering in Dynamic observation models, filtering in expanding worldsexpanding worlds


82

STRIPS-Filter: Experimental STRIPS-Filter: Experimental ResultsResults

[A. & Russell ’03]


83

STRIPS-Filter: Experimental STRIPS-Filter: Experimental ResultsResults

[A. & Russell ’03]

Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006 1 Eyal Amir U. of Illinois,...

Documents

Transcript of Connecting Learning and Logic Eyal Amir, Cambridge Present., May 2006 1 Eyal Amir U. of Illinois,...