Robot, Learning From Data
-
Upload
sungjoon-samuel -
Category
Engineering
-
view
769 -
download
1
Transcript of Robot, Learning From Data
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERINGSEOUL NATIONAL UNIVERSITY
Robot, Learning From Data: Direct Policy Learning in RKHS
& Inverse Reinforcement Learning Methods
Presenter: Sungjoon ChoiCyber-Physical Systems Laboratory (CPSLAB)
Seoul National University
CPSLAB (http://cpslab.snu.ac.kr)
2Learning From Data
https://canvas.northwestern.edu/courses/20122/assignments/syllabus
CPSLAB (http://cpslab.snu.ac.kr)
3Learning From Data
CPSLAB (http://cpslab.snu.ac.kr)
4Learning From Data
Contents
Learning from Demonstration
Direct Policy Learning Reward Learning
Kernel Methods
Reproducing Kernel Hilbert Space
Learning Theory in RKHS
Inverse Reinforcement Learning Methods
CPSLAB (http://cpslab.snu.ac.kr)
5Learning From Data
Learning From Demonstration
Human Expert
http://villains.wikia.com/wiki/Chef_Skinner
http://www.filmspotting.net/forum/index.php?topic=12312.660
Learning from Demonstration
http://blogs.disney.com/oh-my-disney/2014/09/04/learn-to-love-cooking-with-ratatouille/
Execute in Unseen Environments
CPSLAB (http://cpslab.snu.ac.kr)
6Learning From Data
Learning From Demonstration
There are two approaches: direct policy learning and reward learning.
Direct policy learning Reward learning• Try to find a policy function which maps a state
space to an action space.
State space,
Action space,
Policy function
• Cast the learning problem to regression or multi-class classification problem.
• Standard learning theory or approximation theory are often used to analyze the performance of learning.
• Try to find a reward function indicating how ‘good’ each state-action pair is.
Joint State-Action space,
Rewardspace,
• “The reward function, rather than the policy, is the most succint, robust, and transferable definition of the task.” [1]
[1] Ng, Andrew Y., and Stuart J. Russell. "Algorithms for inverse reinforcement learning." ICML. 2000.
• Often refer to as an Inverse Reinforcement Learning (IRL) problem.
CPSLAB (http://cpslab.snu.ac.kr)
7Learning From Data
Direct Policy Learning
CPSLAB (http://cpslab.snu.ac.kr)
8Learning From Data
Direct Policy Learning in Reproducing Kernel Hilbert Space (RKHS)
State space,
Action space,
Policy function
We will see this problem as a nonlinear regression problem. • In particular, we will use kernel-based regression method. • RKHS refers to as a reproducing kernel Hilbert space where our policy function is
included, , in other words, the hypothesis space.
We will also focus on how well this function generalizes, that is, how well it esti-mates the outputs for previously unseen inputs based in learning theory.
CPSLAB (http://cpslab.snu.ac.kr)
9Learning From Data
Definition and Existence Reproducing Kernel Hilbert Space
Existence of the RKHS is shown by Moore-Aronszajn theorem.
Following is the definition of the reproducing kernel Hilbert space. [1]
[1] Rasmussen, Carl Edward. "Gaussian processes for machine learning." (2006).
CPSLAB (http://cpslab.snu.ac.kr)
10Learning From Data
Reproducing Kernel Hilbert Space
Existence of the RKHS is shown by Moore-Aronszajn theorem.
Following is the definition of the reproducing kernel Hilbert space. [1]
[1] Rasmussen, Carl Edward. "Gaussian processes for machine learning." (2006).
What…?
CPSLAB (http://cpslab.snu.ac.kr)
11Learning From Data
Reproducing Kernel Hilbert Space – Approach 1
The first thing that might come into your mind when you hear about kernel methods would be transferring input data into the feature space.
What Mercer’s theorem say is that every kernel function can be expressed as an inner-product of infinite (finite for degenerate kernels) dimensional eigenvectors.
CPSLAB (http://cpslab.snu.ac.kr)
12Learning From Data
Reproducing Kernel Hilbert Space – Approach 1
Suppose we are given a kernel function .
From the Mercer’s theorem, we have infinite number of basis (eigen) function :.
Then, let’s think about a vector space spanned by the eigenfunctions: and
If we define the inner product of this space as , this space satisfies the reproducing prop-erty! In other words, this space spanned with eigenfunctions (features) is a RHKS!!
Reproducing property: Check! By definition
CPSLAB (http://cpslab.snu.ac.kr)
13Learning From Data
Reproducing Kernel Hilbert Space – Approach 1
Suppose we are given a kernel function .
From the Mercer’s theorem, we have infinite number of basis (eigen) function :.
Then, let’s think about a vector space spanned by the eigenfunctions: and
If we define the inner product of this space as , this space satisfies the reproducing prop-erty! In other words, this space spanned with eigenfunctions (features) is a RHKS!!
Reproducing property: Check! By definition
What…?
CPSLAB (http://cpslab.snu.ac.kr)
14Learning From Data
Reproducing Kernel Hilbert Space – Approach 2
The RKHS has two properties: 1. For every , as a function of belongs to . 2. has a reproducing property.
Suppose our kernel function is
Then, from the Moore-Aronszajn theorem, there exists a RKHS . But, what does this mean? We want to define a space of functions whose element has following form:
for some .
CPSLAB (http://cpslab.snu.ac.kr)
15Learning From Data
Reproducing Kernel Hilbert Space – Approach 2
We want to define a space of functions whose element has following form:
for some .
The RKHS has two properties: 1. For every , as a function of belongs to . 2. has a reproducing property.
Then, this space satisfies the reproducing property:
⟨ 𝑓 (⋅) ,𝑘 (⋅ , 𝑥 ) ⟩=⟨∑𝑖 𝛼𝑖𝑘 (𝑧𝑖 , ⋅) ,𝑘 (⋅ ,𝑥 )⟩𝐻=∑𝑖𝛼 𝑖 ⟨𝑘 (𝑧 𝑖 , ⋅) ,𝑘 (⋅ ,𝑥 ) ⟩=∑
𝑖𝛼𝑖𝑘 (𝑧𝑖 ,𝑥 )= 𝑓 (𝑥 )
If we define the inner-product of the Hilbert space as
CPSLAB (http://cpslab.snu.ac.kr)
16Learning From Data
Reproducing Kernel Hilbert Space – Approach 2
We want to define a space of functions whose element has following form:
for some .
If we define the inner-product of the Hilbert space as
The space defined with as above is a reproducing kernel Hilbert space.
As must be greater equal to zero, kernel function should be positive semi-definite!
A norm is defined as follows:.
CPSLAB (http://cpslab.snu.ac.kr)
17Learning From Data
Practical Usage
Empirical risk minimization with Hilbert norm regularization
If we set our hypothesis space as radial basis function networks, ,
then the optimization becomes
If we rewrite in a matrix form,
which is a quadratic programming with respect to dimensional vector .
CPSLAB (http://cpslab.snu.ac.kr)
18Learning From Data
Practical Usage
𝐾 𝑋𝑍𝛼 𝑌
𝛼𝑇 𝐾 𝑍𝑍 𝛼
If is identical to , then the above equation is identical to Gaussian process regression or kernel ridge regression. In practice, we can add additional constraints such as or which greatly increases the stability issue!
Often referred to as a sparse Gaussian process regression with inducing points.
CPSLAB (http://cpslab.snu.ac.kr)
19Learning From Data
Learning Theory
Suppose our training data be sampled from .
The expected risk of a function is defined as: .
Then, the expected risk can be decomposed into
where is the regression function.
CPSLAB (http://cpslab.snu.ac.kr)
20Learning From Data
Learning Theory
𝐼 [ 𝑓 ]= ∫𝑋 ×𝑌
( 𝑦− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 )𝑑𝑥𝑑𝑦+ ∫𝑋 ×𝑌
( 𝑓 (𝑥 )− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 ) 𝑑𝑥𝑑𝑦
∫𝑋 ×𝑌
( 𝑦− 𝑓 𝜌 (𝑥 ) )2𝑃 (𝑥 , 𝑦 )𝑑𝑥𝑑𝑦
∫𝑋
( 𝑓 (𝑥 )− 𝑓 𝜌 (𝑥 ) )2 𝑃 (𝑥 )𝑑𝑥
Expected Risk = Intrinsic Error + Estimation Error
Intrinsic Error (approximation error)
Estimation Error
We cannot handle this.
We can handle only part of this.
CPSLAB (http://cpslab.snu.ac.kr)
21Learning From Data
Learning Theory
The goal of the learning theory is to minimize following functional:.
is often called generalization error.
[1] Niyogi, Partha, and Federico Girosi. "On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions." Neural Computation 8.4 (1996): 819-842.
If we use a radial basis function network, , we can achieve following bound on the generalization error [1].
with probability at least .
CPSLAB (http://cpslab.snu.ac.kr)
22Learning From Data
Learning Theory
[1] Niyogi, Partha, and Federico Girosi. "On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions." Neural Computation 8.4 (1996): 819-842.
There are two sources of errors: 1. We are trying to approximate an infinite dimensional object, the regression function with a finite number of parameters (approximation error).
Approximation error:
2. We minimize the empirical risk and obtain , rather than minimizing the expected risk (estimation error).
Estimation error:
CPSLAB (http://cpslab.snu.ac.kr)
23Learning From Data
Reward Learning
CPSLAB (http://cpslab.snu.ac.kr)
24Learning From Data
Reward Learning
Understanding basic concept of solving Markov decision process (MDP) or reinforce-ment learning (RL) is crucial in a reward learning problem.
Goal of RL is to find a policy function which maximizes an expected sum of rewards:
If we define , then the optimization becomes,
Where is often called Bellman flow constraint.
CPSLAB (http://cpslab.snu.ac.kr)
25Learning From Data
Inverse Reinforcement Learning
True reward True density State-action traj.
Solving MDP
State-action traj. max𝑅∈𝐻 (𝑅 )
⟨ 𝜇 ,𝑅 ⟩
Solving IRL
Estimated reward
http://users.eecs.northwestern.edu/~argall/learning.html
CPSLAB (http://cpslab.snu.ac.kr)
26Learning From Data
Inverse Reinforcement Learning Methods
NR [1]
MMP [2]AN [3]
MaxEnt [4]BIRL [6] RelEnt [7]
StructIRL [8] GPIRL [5]
DeepIRL [9]
[1] Ng, Andrew Y., and Stuart J. Russell. "Algorithms for inverse reinforcement learning." ICML, 2000.
[2] Ratliff, Nathan D., J. Andrew Bagnell, and Martin A. Zinkevich. "Maximum margin planning." ICML, 2006.
[3] Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." ICML, 2004.
[4] Ziebart, Brian D., Andrew Maas, J.Andrew Bagnell, and Anind K. Dey, "Maximum Entropy Inverse Reinforcement Learning."AAAI. 2008.
[5] Levine, Sergey, Zoran Popovic, and Vladlen Koltun. "Nonlinear inverse reinforcement learning with Gaussian processes." NIPS 2011.
[6] Ramachandran, Deepak, and Eyal Amir. "Bayesian inverse reinforcement learning.“, AAAI, 2007
[7] Boularias, Abdeslam, Jens Kober, and Jan R. Peters. "Relative entropy inverse reinforcement learning." AISTATS. 2011.[8] Klein, Edouard, Matthieu Geis, Bilal Piot, and Olivier Pietquin, "Inverse reinforcement learning through structured classification." NIPS. 2012.[9] Wulfmeier, Markus, Peter Ondruska, and Ingmar Posner. "Deep Inverse Reinforcement Learning." arXiv. 2015.
CPSLAB (http://cpslab.snu.ac.kr)
27Learning From Data
Inverse Reinforcement Learning Methods
NR [1]
MMP [2]AN [3]
Maximize discrepancy between expert’s and sampled value.Objective
Maximize margin between expert’s demonstration and every state-actions.
Minimize value between expert’s and sampled ones.
StructIRL [8] Cast IRL to multi-class classification problem.
NR [1]
MMP [2]
AN [3]
StructIRL [8]
CPSLAB (http://cpslab.snu.ac.kr)
28Learning From Data
Inverse Reinforcement Learning Methods
MaxEnt [4]
BIRL [6]
RelEnt [7]
GPIRL [5]
DeepIRL [9]
Objective
Define likelihood of state-action trajectories and use MLE.
MaxEnt [4]
BIRL [6]
Define posterior of state-action trajectories and use MH sampling.
Define likelihood using SGP and use gradient ascent method.
Minimize relative entropy between expert’s and learner’s distribution.
GPIRL [5]
RelEnt [7]
DeepIRL [9]
Model likelihood with neural networks.
CPSLAB (http://cpslab.snu.ac.kr)
29Learning From Data
Conclusion
I believe selecting a proper machine learning algorithm is more than selecting a chocolate from a chocolate box.
"Deep Learning is brute force learning. It is not intelligent learning." "Machine learning is not only about machines, but also about humans."
- Vladimir Vapnik @ NIPS15
http://www.forbes.com/forbes/welcome/http://aboutintelligence.blogspot.kr/2009/01/vapniks-picture-explained.html
Robotics deals with humans!
CPSLAB (http://cpslab.snu.ac.kr)
30Learning From Data
Thank you for your attention!!Any Questions?