Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary...
Transcript of Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary...
Foundations of Robotics and AutonomousLearning Summer School (RALSS’17)
Berlin, Sep 4-8, 2017
Active Learning & Bayesian Optimization
Marc ToussaintUniversity of StuttgartSummer 2017
• Detailed reference:http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
14-BanditsOptimizationActiveLearningBayesianRL.pdf
• orhttp://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
Lecture-Optimization.pdf
Chapter 5 “Global & Bayesian Optimization”
Active Learning & Bayesian Optimization – – 2/14
4 Sessions
• Bandits
• Bayesian Optimization
Active Learning & Bayesian Optimization – – 3/14
(1) Bandits
• Problem: You have B binary bandits; when choosing bandit i at time t,it returns yt ∼ Bern(θi); you want to maximize max
⟨∑Tt=1 yt
⟩. How do
you choose bandits?
Active Learning & Bayesian Optimization – – 4/14
• Representing the state of knowledge: statistics & beliefs
• Exploration vs exploitation– Exploration: Choose the next decision to min 〈H(bt)〉– Exploitation: Choose the next decision to max 〈yt〉
• Belief planning– What would be an optimal strategy?– How large is V (bt) for T = 10 and 3 bandits?
• UCB: α(i) = yi + β√
2 lnnni
• Optimism in the face of uncertainty
Active Learning & Bayesian Optimization – – 5/14
Practical: UCBgit checkout mastergit pull cd teaching/RoboticsCourse/12-banditsmake cleanAll; make./x.exe
• What you see are returns for t = 1, .., T averaged over K runs
• Implement a better decision policy
• Play around with different bandits, e.g. θ = (.5, .6) (harder), orGaussian bandits
Active Learning & Bayesian Optimization – – 6/14
The Robotics’ Active Learning ChallengeAutonomously explore the environmentto learn what is manipulable and how
◦(bird research by Alex Kacelnik et al. (U Oxford))
Active Learning & Bayesian Optimization – – 7/14
• Active Learning for ‘kinematic beliefs’Otte, Kulick, Toussaint & Brock: Entropy Based Strategies for Physical Exploration of theEnvironments Degrees of Freedom. IROS’14
Kulick, Otte & Toussaint: Active Exploration of Joint Dependency Structures. ICRA’15
• Application
◦Bernstein, Hofer, Kulick, Martin-Martin, Baum, Toussaint, Brock: . ICRA submission
Active Learning & Bayesian Optimization – – 8/14
More work on Active Learning
• More efficient active learning of hyper parameters:Kulick, Lieck & Toussaint: Cross-Entropy as a Criterion for Robust Interactive Learning ofLatent Properties. NIPS workshop FILM’16
• Safe Active Learning:Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML’15
• In relational RL:Lang, Toussaint & Kersting: Exploration in Relational Domains for Model-basedReinforcement Learning. JMLR 2012
Lang, Toussaint & Kersting: Exploration in Relational Worlds. ECML’10
• Non-information-gain-based:Lopes, Lang & Toussaint: Exploration in Model-based Reinforcement Learning byEmpirically Estimating Learning Progress. NIPS’12
• Application symbol learning:Kulick, Toussaint, Lang & Lopes: Active Learning for Teaching a Robot GroundedRelational Symbols. IJCAI’13
Active Learning & Bayesian Optimization – – 9/14
(2) Global Optimization
• Problem: Let x ∈ Rn, f : Rn → R, find
minx
f(x)
only by sampling values yt = f(xt). No access to ∇f or ∇2f .Observations may be noisy y ∼ N(y | f(xt), σ)
Active Learning & Bayesian Optimization – – 10/14
• Global Optimization = Infinite Bandits
• Gaussian Processes as Belief bt = GP (f |Dt)
• Optimal Global Optimization? V (bt)
• Acquisition functions:
xMPIt = argmax
x
∫ y∗
−∞N(y|f(x), σ(x))
xEIt = argmax
x
∫ y∗
−∞N(y|f(x), σ(x)) (y∗ − y)
xUCBt = argmin
xf(x)− βtσ(x)
Active Learning & Bayesian Optimization – – 11/14
Practicalgit checkout mastergit pull cd teaching/RoboticsCourse/13-bayesOptmake./x.exe
• This implements GP updates with random querying
• Implement an active learning strategy... discuss!
• Implement a GP-UCB strategy
Active Learning & Bayesian Optimization – – 12/14
Issues
• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?
Active Learning & Bayesian Optimization – – 13/14
Issues
• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?
Active Learning & Bayesian Optimization – – 13/14
No Free Lunch
Active Learning & Bayesian Optimization – – 14/14