Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary...

15
Foundations of Robotics and Autonomous Learning Summer School (RALSS’17) Berlin, Sep 4-8, 2017 Active Learning & Bayesian Optimization Marc Toussaint University of Stuttgart Summer 2017

Transcript of Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary...

Page 1: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

Foundations of Robotics and AutonomousLearning Summer School (RALSS’17)

Berlin, Sep 4-8, 2017

Active Learning & Bayesian Optimization

Marc ToussaintUniversity of StuttgartSummer 2017

Page 2: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

• Detailed reference:http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/

14-BanditsOptimizationActiveLearningBayesianRL.pdf

• orhttp://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/

Lecture-Optimization.pdf

Chapter 5 “Global & Bayesian Optimization”

Active Learning & Bayesian Optimization – – 2/14

Page 3: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

4 Sessions

• Bandits

• Bayesian Optimization

Active Learning & Bayesian Optimization – – 3/14

Page 4: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

(1) Bandits

• Problem: You have B binary bandits; when choosing bandit i at time t,it returns yt ∼ Bern(θi); you want to maximize max

⟨∑Tt=1 yt

⟩. How do

you choose bandits?

Active Learning & Bayesian Optimization – – 4/14

Page 5: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

• Representing the state of knowledge: statistics & beliefs

• Exploration vs exploitation– Exploration: Choose the next decision to min 〈H(bt)〉– Exploitation: Choose the next decision to max 〈yt〉

• Belief planning– What would be an optimal strategy?– How large is V (bt) for T = 10 and 3 bandits?

• UCB: α(i) = yi + β√

2 lnnni

• Optimism in the face of uncertainty

Active Learning & Bayesian Optimization – – 5/14

Page 6: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

Practical: UCBgit checkout mastergit pull cd teaching/RoboticsCourse/12-banditsmake cleanAll; make./x.exe

• What you see are returns for t = 1, .., T averaged over K runs

• Implement a better decision policy

• Play around with different bandits, e.g. θ = (.5, .6) (harder), orGaussian bandits

Active Learning & Bayesian Optimization – – 6/14

Page 7: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

The Robotics’ Active Learning ChallengeAutonomously explore the environmentto learn what is manipulable and how

◦(bird research by Alex Kacelnik et al. (U Oxford))

Active Learning & Bayesian Optimization – – 7/14

Page 8: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

• Active Learning for ‘kinematic beliefs’Otte, Kulick, Toussaint & Brock: Entropy Based Strategies for Physical Exploration of theEnvironments Degrees of Freedom. IROS’14

Kulick, Otte & Toussaint: Active Exploration of Joint Dependency Structures. ICRA’15

• Application

◦Bernstein, Hofer, Kulick, Martin-Martin, Baum, Toussaint, Brock: . ICRA submission

Active Learning & Bayesian Optimization – – 8/14

Page 9: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

More work on Active Learning

• More efficient active learning of hyper parameters:Kulick, Lieck & Toussaint: Cross-Entropy as a Criterion for Robust Interactive Learning ofLatent Properties. NIPS workshop FILM’16

• Safe Active Learning:Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML’15

• In relational RL:Lang, Toussaint & Kersting: Exploration in Relational Domains for Model-basedReinforcement Learning. JMLR 2012

Lang, Toussaint & Kersting: Exploration in Relational Worlds. ECML’10

• Non-information-gain-based:Lopes, Lang & Toussaint: Exploration in Model-based Reinforcement Learning byEmpirically Estimating Learning Progress. NIPS’12

• Application symbol learning:Kulick, Toussaint, Lang & Lopes: Active Learning for Teaching a Robot GroundedRelational Symbols. IJCAI’13

Active Learning & Bayesian Optimization – – 9/14

Page 10: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

(2) Global Optimization

• Problem: Let x ∈ Rn, f : Rn → R, find

minx

f(x)

only by sampling values yt = f(xt). No access to ∇f or ∇2f .Observations may be noisy y ∼ N(y | f(xt), σ)

Active Learning & Bayesian Optimization – – 10/14

Page 11: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

• Global Optimization = Infinite Bandits

• Gaussian Processes as Belief bt = GP (f |Dt)

• Optimal Global Optimization? V (bt)

• Acquisition functions:

xMPIt = argmax

x

∫ y∗

−∞N(y|f(x), σ(x))

xEIt = argmax

x

∫ y∗

−∞N(y|f(x), σ(x)) (y∗ − y)

xUCBt = argmin

xf(x)− βtσ(x)

Active Learning & Bayesian Optimization – – 11/14

Page 12: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

Practicalgit checkout mastergit pull cd teaching/RoboticsCourse/13-bayesOptmake./x.exe

• This implements GP updates with random querying

• Implement an active learning strategy... discuss!

• Implement a GP-UCB strategy

Active Learning & Bayesian Optimization – – 12/14

Page 13: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

Issues

• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?

Active Learning & Bayesian Optimization – – 13/14

Page 14: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

Issues

• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?

Active Learning & Bayesian Optimization – – 13/14

Page 15: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize

No Free Lunch

Active Learning & Bayesian Optimization – – 14/14