Machine learning in robotics beyond the robot itself

41
Machine learning in robotics beyond the robot itself Marc Toussaint Machine Learning & Robotics Lab – FU Berlin [email protected] Oxford, Oct 1st, 2012 (I’ll soon open a new ML & Robotics lab at U Stuttgart – I’m looking for post docs and PhDs!) 1/34

Transcript of Machine learning in robotics beyond the robot itself

Machine learning in roboticsbeyond the robot itself

Marc Toussaint

Machine Learning & Robotics Lab – FU [email protected]

Oxford, Oct 1st, 2012

(I’ll soon open a new ML & Robotics lab at U Stuttgart – I’m looking for

post docs and PhDs!)

1/34

Outline

• Brief comment of ML in Robotics: internal vs. external DoFs

• Relational model-based RL

• Learning and grounding state/action symbols

2/34

Wolfgang Kohler (1917)Intelligenzprufungen amMenschenaffenThe Mentality of Apes

[pigeons]

• What models of the environment enable such reasoning?

• How could such models be learnt?

3/34

Internal vs. external DoFs

vector space hybrid, combinatorial space

great advances great challenges

mature ML/RL methods relational learning & inference

model identification, motorskill learning, map learning

manipulation learning, affordancelearning, physical exploration

4/34

Internal vs. external DoFs

vector space hybrid, combinatorial space

great advances great challenges

mature ML/RL methods relational learning & inference

model identification, motorskill learning, map learning

manipulation learning, affordancelearning, physical exploration

4/34

What would this imply from an ML perspective?

• Express priors for natural situations– “glass on table” is likely, “table on glass” unlikely– generalization, e.g. across objects with similar properties

• Generative models of manipulation sequences

• Models that combine geometry (e.g. no penetration of rigid objects),logic (1st order generalization on symbolic representations),probabilities (e.g. uncertainty due to partial knowledge or abstraction)

• Analogous to learning generative models of images/video.

5/34

• I think, learning models for planned manipulation raises huge andinteresting ML challenges

• In the following I’ll address one aspect: relational representations

6/34

Part 1. Model-based relational RL

Learning models P (s′|a, s), R(a, s) and using them for plannedmanipulation given symbolic relational representations

7/34

Objects & Relations

• World is composed of objects– objects have (changeable) properties & relations– the state s is given by the set of all properties/relations of all objects

• Example: We have a symbolic property big and relation on

– In a world with 2 objects A,B, the state s is composed of 3 variables

s = (big(A), big(B), on(A,B))

– In a world with 3 objects A,B,C, s is composed of 6 variables:

s = (big(A), big(B), big(C), on(A,B), on(B,C), on(A,C))

– In a world with n objects we have n+ n(n− 1)/2 state variables

The size of the state space is exponential in # objects

8/34

Relational models

• Relational models...– generalize data seen in one world (with objects A,B,C, ...)

to another world (with objects D,E, ..)– make predictions based only on the properties/relations of objects,

not their identity– thereby imply a very strong type of generalization/prior

which allows to efficiently learn in the exponentially large space

• Relational models typically area blue print for the generation of a graphical model

9/34

Example: Markov Logic Networks

• For every domain D (set of objects) a MLN defines a factor graph

• The MLN is defined by a set {(Fi, wi)} of pairs where– Fi is a formula in first-order logic– wi ∈ R is a weight

• For a domain D this generates a factor graph with– one random variable for each grounding of each predicate– one factor for each grounding of each formula

F (x,D) ∝ exp{∑i

∑true groudings of Fi in x

wi}

• The resulting factor graph has many shared parameters→ learning the weights implies strong generalization across objects

10/34

Example: Bayesian Logic Programs

• In logic programming,stable(A) :- on(A,B), big(B)

means “For all A,B if B is big and A on B then A is stable”

• In Bayesian Logic Programs [Kersting and de Raedt]stable(A) :- on(A,B), big(B)

means “For all A,B, if big(B) and on(A,B) are random variables, then stable(A) is a random

variable”

Associated with this rule is a conditional probability table (CPT) thatspecifies the probability distribution over stable(A) for any possiblevalues of on(A,B) and big(B)

• We have a knowledge representation that allows us to construct agrounded Bayesian Network for specific worlds (sets of objects); thisBN will have many shared parameters.

11/34

Example: Bayesian Logic Programs

• In logic programming,stable(A) :- on(A,B), big(B)

means “For all A,B if B is big and A on B then A is stable”

• In Bayesian Logic Programs [Kersting and de Raedt]stable(A) :- on(A,B), big(B)

means “For all A,B, if big(B) and on(A,B) are random variables, then stable(A) is a random

variable”

Associated with this rule is a conditional probability table (CPT) thatspecifies the probability distribution over stable(A) for any possiblevalues of on(A,B) and big(B)

• We have a knowledge representation that allows us to construct agrounded Bayesian Network for specific worlds (sets of objects); thisBN will have many shared parameters.

11/34

• There exist many many more relational modelling formalisms.Markov Logic Networks, Probabilistic Relational Models, Relational Markov Networks, Relational

Probability Trees, Stochastic Logic Programming, ...

See ECML/PKDD 2007 tutorial by Lise Getoor!http://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf

• In my view: currently the only way to express & learn uncertainknowledge about environments with objects & properties/relations

SRL + Robotics = perfect match!

12/34

• There exist many many more relational modelling formalisms.Markov Logic Networks, Probabilistic Relational Models, Relational Markov Networks, Relational

Probability Trees, Stochastic Logic Programming, ...

See ECML/PKDD 2007 tutorial by Lise Getoor!http://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf

• In my view: currently the only way to express & learn uncertainknowledge about environments with objects & properties/relations

SRL + Robotics = perfect match!

12/34

Model-based RL in the relational caseD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...

.

.

.

}• How can we learn a predictive model P (x′ |u, x) for this data?

With n = 20 objects, state space is > 2n2≈ 10120

13/34

Learning probabilistic rulesPasula, Zettlemoyer & Kaelbling: Learning probabilistic relational planning rules (ICAPS 2004)

• Compress this data into probabilistic relational rules:

grab(X) : on(X,Y ), block(Y ), table(Z)

0.7 : inhand(X), ¬on(X,Y )

0.2 : on(X,Z), ¬on(X,Y )

0.1 : noise

• Find a rule set that maximizes (likelihood - description length)

• Introducing uncertainty in the rules not only allows us to modelstochastic worlds, it enables to compress/regularize and therebylearn strongly generalizing models!

14/34

Planning as inference in relational domains

• Once the model is learnt, using it (planning) is hard

• SST & UCT do not scale with # objects

• The rules are a blue print for a Dynamic Bayesian Network→ Use Planning-as-Inference:

model

depending on:situation

relevance

modelrule-based DBN

(Lang & Toussaint, JAIR 2010)

15/34

Planning as inference in relational domains

• Once the model is learnt, using it (planning) is hard

• SST & UCT do not scale with # objects

• The rules are a blue print for a Dynamic Bayesian Network→ Use Planning-as-Inference:

model

depending on:situation

relevance

modelrule-based DBN

(Lang & Toussaint, JAIR 2010)

15/34

Planning as inference in relational domains

(we’re using factored frontier for approx. inference)

→ Advances in lifted inference will translate to better robot manipulationplanning.

16/34

Planning as inference in relational domainsInternational Probabilistic Planning

Competition (IPPC)Desktop Clearance task

(Lang & Toussaint, JAIR 2010) 17/34

Relational Exploration

• The state space is inherently exponential in the # objects. How couldwe realize strategies like R-max or E3 in relational domains?

• Key point:

strong generalization of model↔

strong implication on what is considered novel / is explored

For instance, if you’ve seen a red, green and yellow ball rolling, will you explorewhether the blue ball also rolls? Or rather explore something totally different,like dropping a blue box?

18/34

Relational Exploration

• The state space is inherently exponential in the # objects. How couldwe realize strategies like R-max or E3 in relational domains?

• Key point:

strong generalization of model↔

strong implication on what is considered novel / is explored

For instance, if you’ve seen a red, green and yellow ball rolling, will you explorewhether the blue ball also rolls? Or rather explore something totally different,like dropping a blue box?

18/34

Relational Exploration• Active Learning: set an exploration bonus where the “generalizing”

empirical density is low

• Relational empirical distribution:

propositional P (s) ∝ cD(s)

distance based Pd(s) ∝ exp{− min(se,ae,s′e)∈D

d(s, se)2}

predicate-based Pp(s) ∝ cp(s) I(s |= p) + c¬p(s) I(s |= ¬p)

context-based Pφ(s) ∝∑φ∈Φ

cD(φ) I(∃σ : s |= σ(φ))

Lang, Toussaint & Kersting (ECML 2010, JMLR 2012)19/34

ApplicationRandom exploration:

Planning:

Real-world:

Online explore-exploit:

Lang & Toussaint: Planning with Noisy Probabilistic Relational Rules (JAIR 2010)Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in a blocksworld using probabilistic inference (ICRA 2010)Lang, Toussaint & Kersting: Exploration in Relational Worlds (ECML 2010, JMLR 2012)

20/34

We developed (the first applicable) full-fledged model-based relationalRL system.

21/34

Part 2. Learning and grounding state/actionsymbols

Model-based relational RL requires given symbols. Where do theycome from?

22/34

• Clustering sensor (and motor) space.E.g.: Jacob Feldman: Symbolic representation of probabilistic worlds, Cognition 123 (2012) 61-83

• Communication & Language. E.g.: Lux Steels’ Language Games

23/34

• Instead: Symbols to enable behavior

Optimize symbols such that a RL method using these symbols issuccessful

• In the following:– how we approached this in the model-based relational RL setting

24/34

• Instead: Symbols to enable behavior

Optimize symbols such that a RL method using these symbols issuccessful

• In the following:– how we approached this in the model-based relational RL setting

24/34

Top-down or bottom-up?

• Pure bottom-up invention of symbols? (dreaming...)

• Pure top-down grounding? (unsatisfactory)

state/actionsymbols

perceptual grounding/motion primitive

bottom-up(e.g. from imitation learning)

top-down(e.g. from human teacher)

reoptimizinggrounding to bepredictive,discriminative

relearnsymbolic

models

25/34

Symbols

• Define a symbol as a classifier of geometric features– For each object pair (i, j) we have a geometric feature yij ∈ R∼20

– A “unnamed” symbol σ(i, j) is true if a classifier with discriminativefunction fσ(yij) returns 1

What is a good objective function to train symbols fσ?

26/34

Objectives for good symbols

• Good symbols should allow us to– learn a symbolic transition model, discriminating effects of actions– learn a symbolic reward model– and be compact, promising generalization

L(Σ;D) = Ltransition(Σ;D) + Lreward(Σ;D) + ||Σ||2

27/34

Ltransition(Σ;D) =∑

(st,rt,at,st+1)∈D

||st − st+1||1 +∑

(st,rt,at,st+1)∈D

logP (st+1 | st, at;T)

Lreward(Σ;D) =∑

(st,rt,at,st+1)∈D

logP (rt | st;R)

||Σ||2 =∑s

w2s

f(yij ;w) = exp{−(yij − wc)>diag(w2s)−1(yij − wc)}

28/34

Application• Task of stacking seven objects:

Only cubes: 3 traces

0

1

2

3

30 60 90 120 150

#Data for symbol learning

Rew

ard

randomcontinuous NN

symbolic NNsymbolic PRADA

Only cubes: 9 traces

0

1

2

3

30 60 90 120 150

#Data for symbol learningR

ew

ard

randomcontinuous NN

symbolic NNsymbolic PRADA

Cubes and balls

0

0.5

1

1.5

2

2.5

3

80 160

#Data for symbol learning

Rew

ard

randomcontinuous NN

symbolic NNsymb PRADA

29/34

Application

• Task is to knock down 10 pins with 2 throws

30/34

• on-going work, submitted

• In the above: motion primitives→ state symbolsOther lines of work:– learning state symbols from a teacher– training motion primitives to be consistent with state symbols

31/34

Summary

• Modelling the environment in terms of objects & relations→ first full-fledged model-based relational RL– Learning probabilistic relational rules (Pasula et al.)– Planning as inference in the grounded DBN– Relational exploration

• Learning and grounding state/action symbolsIntegrate motion primitive and state symbol learning in a largerrelational RL framework

32/34

Conclusion

• The challenge basically becomes to understand and abstract the basicstructure of (manipulation) environments (objects, relations, geometry, physics,etc)... and translated them to ML priors & inference methods

• With relational RL and symbol learning we scatched the surface

33/34

Conclusion

• The challenge basically becomes to understand and abstract the basicstructure of (manipulation) environments (objects, relations, geometry, physics,etc)... and translated them to ML priors & inference methods

• With relational RL and symbol learning we scatched the surface 33/34

Thanks– PhD students:

Tobias Lang, Nikolay Jetchev, Stanio Dragiev

– Kristian Kersting (U Bonn)– Manuel Lopes (INRIA Bordeaux)– Akshat Kumar, Shlomo Zilberstein (U Mass)– Sethu Vijayakumar (U Edinburgh), Danica Kragic (KTH Stockholm)– Matthew Botvinick (U Princeton)– Michael Gienger, Christian Goerick (Honda, Offenbach)

thanks for your attention! 34/34