Tutorial---The Covariance Matrix Adaptation Evolution Strategy ...
Academic Course: 10 On-line adaptation, learning, evolution
-
Upload
fet-aware-project-self-awareness-in-autonomic-systems -
Category
Technology
-
view
273 -
download
1
description
Transcript of Academic Course: 10 On-line adaptation, learning, evolution
Designed by Gusz Eiben & Mark Hoogendoorn
On-line adaptation, learning, evolution
Designed by Gusz Eiben & Mark Hoogendoorn
Outline
• Population-based Adaptive Systems
• Types of adaptation: evolution, individual (lifetime) learning, social learning
• Machine learning
• Reinforcement learning
• Off-line vs. on-line adaptation
Designed by Gusz Eiben & Mark Hoogendoorn
Population-based Adaptive Systems
PAS have two essential features
•They consist of a group of basic units that can perform actions, e.g., computation, communication, interaction, etc.
•The ability to adapt at
– individual level (modify agent ) and/or
– group level (add/remove agent).
Designed by Gusz Eiben & Mark Hoogendoorn
Types of adaptation
• Evolutionary learning (EL): Changes at population level (assumed non-Lamarckian)
• Lifetime learning (LL): Changes at agent level
– Individual learning (IL): adaptation autonomouslythrough a purely internal procedure
– Social learning (SL): adaptation through interaction /communication
Designed by Gusz Eiben & Mark Hoogendoorn
Taxonomy of adaptation
Adaptation
EvolutionaryLearning
LifetimeLearning
IndividualLearning
SocialLearning
Designed by Gusz Eiben & Mark Hoogendoorn
Taxonomy of adaptation 2
Adaptation
EvolutionaryLearning
LifetimeLearning
IndividualLearning
SocialLearning
Learning
Evolution
Designed by Gusz Eiben & Mark Hoogendoorn
Adaptation ≠ operation• Operation: controller is being used
– Sensory inputs outputs (motor, comm. device)
– Robot behavior changes, not the controller
• Adaptation: controller is being changed
– Present controller new controller
– Uses utility/reward/fitness info
– It may require
• One single robot – learning
• More robots – evolution, social learning
• Adaptation + operation = generate + test
• Off-line (initial controller design, before start) vs. on-line (after start)
Designed by Gusz Eiben & Mark Hoogendoorn
Genotype
Develo
pm
ental
Engin
e (deco
der)
Genetic operators:mutation & xover
Learningoperators
Robot behavior
State of theenvironment
Phenotype =controller
Reward
FitnessSelectionoperators
Designed by Gusz Eiben & Mark Hoogendoorn
Genotype
Develo
pm
ental
Engin
e (deco
der)
Genetic operators:mutation & xover
Learningoperators
Robot behavior
State of theenvironment
Phenotype =controller
Reward
FitnessSelectionoperators
Designed by Gusz Eiben & Mark Hoogendoorn
Genotype
Develo
pm
ental
Engin
e (deco
der)
Genetic operators:mutation & xover
Learningoperators
Robot behavior
State of theenvironment
Reward
FitnessSelectionoperators
Phenotype
controllershape
Designed by Gusz Eiben & Mark Hoogendoorn
Phenotype
Genotype
Develo
pm
ental
Engin
e (deco
der)
Genetic operators:mutation & xover
Learningoperators
Robot behavior
State of theenvironment
Reward
FitnessSelectionoperators
controllershape
Designed by Gusz Eiben & Mark Hoogendoorn
Evolutionary loop
GenotypeD
evelop
men
tal Engin
eGenetic operators:mutation & xover
Learning operator(s)Robot
behaviorChanges in
environmentController =phenotype
Reward
FitnessSelection
operator(s)
Designed by Gusz Eiben & Mark Hoogendoorn
Learning loop
GenotypeD
evelop
men
tal Engin
eGenetic operators:mutation & xover
Learning operator(s)Robot
behaviorChanges in
environmentController =phenotype
Reward
FitnessSelection
operator(s)
Designed by Gusz Eiben & Mark Hoogendoorn
ENVIRONMENTAGENT
Reward r(t)
State s(t)
Action a(t)
Designed by Gusz Eiben & Mark Hoogendoorn
Reinforcement learning
Agent in situation/state st chooses action at
World changes to situation/state st+1
Agent perceives situation st+1 and gets reward rt+1
Telling the agent what to do is its
POLICY πt(s, a) = P r{at = a|st = s}
Given the situation at time t is s, the policy gives the probability the agent’saction will be a.
For example: πt(s, goforward) = 0.5, πt(s, gobackward) = 0.5.
Reinforcement learning ⇒ Get/find/learn the policy
Designed by Gusz Eiben & Mark Hoogendoorn
Further reading
• Evert Haasdijk and A.E. Eiben and Alan F.T. Winfield, Individual Social and Evolutionary Adaptation in Collective Systems , Serge Kernbach (eds.) , Handbook of Collective Robotics , Pan Stanford , 2011