Academic Course: 10 On-line adaptation, learning, evolution

16
Designed by Gusz Eiben & Mark Hoogendoorn On-line adaptation, learning, evolution

description

By Gusz Eiben & Mark Hoogendoorn

Transcript of Academic Course: 10 On-line adaptation, learning, evolution

Page 1: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

On-line adaptation, learning, evolution

Page 2: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Outline

• Population-based Adaptive Systems

• Types of adaptation: evolution, individual (lifetime) learning, social learning

• Machine learning

• Reinforcement learning

• Off-line vs. on-line adaptation

Page 3: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Population-based Adaptive Systems

PAS have two essential features

•They consist of a group of basic units that can perform actions, e.g., computation, communication, interaction, etc.

•The ability to adapt at

– individual level (modify agent ) and/or

– group level (add/remove agent).

Page 4: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Types of adaptation

• Evolutionary learning (EL): Changes at population level (assumed non-Lamarckian)

• Lifetime learning (LL): Changes at agent level

– Individual learning (IL): adaptation autonomouslythrough a purely internal procedure

– Social learning (SL): adaptation through interaction /communication

Page 5: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Taxonomy of adaptation

Adaptation

EvolutionaryLearning

LifetimeLearning

IndividualLearning

SocialLearning

Page 6: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Taxonomy of adaptation 2

Adaptation

EvolutionaryLearning

LifetimeLearning

IndividualLearning

SocialLearning

Learning

Evolution

Page 7: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Adaptation ≠ operation• Operation: controller is being used

– Sensory inputs outputs (motor, comm. device)

– Robot behavior changes, not the controller

• Adaptation: controller is being changed

– Present controller new controller

– Uses utility/reward/fitness info

– It may require

• One single robot – learning

• More robots – evolution, social learning

• Adaptation + operation = generate + test

• Off-line (initial controller design, before start) vs. on-line (after start)

Page 8: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Genotype

Develo

pm

ental

Engin

e (deco

der)

Genetic operators:mutation & xover

Learningoperators

Robot behavior

State of theenvironment

Phenotype =controller

Reward

FitnessSelectionoperators

Page 9: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Genotype

Develo

pm

ental

Engin

e (deco

der)

Genetic operators:mutation & xover

Learningoperators

Robot behavior

State of theenvironment

Phenotype =controller

Reward

FitnessSelectionoperators

Page 10: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Genotype

Develo

pm

ental

Engin

e (deco

der)

Genetic operators:mutation & xover

Learningoperators

Robot behavior

State of theenvironment

Reward

FitnessSelectionoperators

Phenotype

controllershape

Page 11: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Phenotype

Genotype

Develo

pm

ental

Engin

e (deco

der)

Genetic operators:mutation & xover

Learningoperators

Robot behavior

State of theenvironment

Reward

FitnessSelectionoperators

controllershape

Page 12: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Evolutionary loop

GenotypeD

evelop

men

tal Engin

eGenetic operators:mutation & xover

Learning operator(s)Robot

behaviorChanges in

environmentController =phenotype

Reward

FitnessSelection

operator(s)

Page 13: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Learning loop

GenotypeD

evelop

men

tal Engin

eGenetic operators:mutation & xover

Learning operator(s)Robot

behaviorChanges in

environmentController =phenotype

Reward

FitnessSelection

operator(s)

Page 14: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

ENVIRONMENTAGENT

Reward r(t)

State s(t)

Action a(t)

Page 15: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Reinforcement learning

Agent in situation/state st chooses action at

World changes to situation/state st+1

Agent perceives situation st+1 and gets reward rt+1

Telling the agent what to do is its

POLICY πt(s, a) = P r{at = a|st = s}

Given the situation at time t is s, the policy gives the probability the agent’saction will be a.

For example: πt(s, goforward) = 0.5, πt(s, gobackward) = 0.5.

Reinforcement learning ⇒ Get/find/learn the policy

Page 16: Academic Course: 10 On-line adaptation, learning, evolution

Designed by Gusz Eiben & Mark Hoogendoorn

Further reading

• Evert Haasdijk and A.E. Eiben and Alan F.T. Winfield, Individual Social and Evolutionary Adaptation in Collective Systems , Serge Kernbach (eds.) , Handbook of Collective Robotics , Pan Stanford , 2011