1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara...

68
1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    2

Transcript of 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara...

Page 1: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

1

Learning Behavior-Selection by Emotions and Cognition in aMulti-Goal Robot Task

Sandra Clara Gadanho

Presented by Jamie Levy

Page 2: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

2

Purpose

Build an autonomous robot controller which can learn to master a complex task when situated in a realistic environment.Continuous time and spaceNoisy sensorsUnreliable actuators

Page 3: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

3

Possible problems to the Learning Algorithm: Multiple goals may conflict with each other Situations in which the agent needs to

temporarily overlook one goal to accomplish another.

Short-term and long-term goals.

Page 4: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

4

Possible problems to the Learning Algorithm (cont): May need a sequence of different

behaviors to accomplish one goal. Behaviors are unreliable. Behavior’s appropriate duration is

undetermined, it depends on the environment and on their success.

Page 5: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

5

Emotion-based Architecture

Traditional RL adaptive system complemented with an emotion system responsible for behavior switching.

Innate emotions define goals. Agent learns emotion associations of

environment-state and behavior pairs to determine its decisions.

Q-learning to learn behavior-selection policy which is stored in Neural Networks.

Page 6: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

6

ALEC – Asynchronous Learning by Emotion and Cognition Augments the EB architecture with a

cognitive system.Which has explicit rule knowledge extracted

from environment interactions. Is based on the CLARION model by Sun

and Peterson 1998.Allows learning the decision rules in a bottom

up fashion.

Page 7: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

7

ALEC architecture (cont)

Cognitive system of ALEC I was directly inspired by the top-level of the CLARION model

ALEC II has some changes (to be discussed later)

ALEC III emotion system learns about goal state exclusively while cognitive system learns about goal state transitions.

LEC (Learning by Emotion and Cognition) is non-asynchronous used to test usefulness of behavior switching.

Page 8: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

8

EB II

Replaces emotional model with a goal system.

Goal system is based on a set of homeostatic variables that it attempts to maintain within certain bounds.

Page 9: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

9

EBII Architecture is composed of two parts:

Goal System Adaptive

System

Page 10: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

10

Perceptual Values

Light intensity Obstacle density Energy availability

Indicates whether a nearby source is releasing energy

Page 11: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

11

Behavior System

Three hand-designed behaviors to select from:Avoid obstaclesSeek lightWall following

These are not designed to be very reliable and may failEx) wall following may lead to a crash

Page 12: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

12

Goal System

Responsible for deciding when behavior switching should occur.

Goals are explicitly identified and associated with homeostatic variables.

Page 13: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

13

Three different states –-target-recovery-danger

Page 14: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

14

Homeostatic Variables

Variable remains in its target as long as its values are optimal or acceptable.

Well-being variable is derived from the above.

Variable has effect on well-being.

Page 15: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

15

Homeostatic Variables

Energy Reflects the goal of maintaining its energy

WelfareMaintains goal of avoiding collisions

ActivityEnsures agent keeps moving; otherwise value

slowly decreases and target state is not maintained

Page 16: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

16

Well-Being

State Change – when a homeostatic variable changes from one state to another the well-being is positively influenced.

Predictions of State Change – when some perceptual cue predicts the state change of a homeostatic variable, influence is similar to above, but lower in value.

These are modeled after emotions and may describe “pain” or “pleasure.”

Page 17: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

17

Well-Being (cont)

cs = state coefficient rs = influence of state on well being.

Page 18: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

18

Well-Being (cont)

ct(sh) = state transition coefficient wh = weight of homeostatic variable

1.0 for energy 0.6 for welfare 0.4 for activity

Page 19: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

19

Well-Being (cont)

cp = prediction of coefficient rph = value of prediction

Only considered for energy and activity variables

Page 20: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

20

Page 21: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

21

Page 22: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

22

Well-being calculation (cont)

Page 23: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

23

Well-being calculation - Prediction

Values of rph depend on the strengths of the current predictions and vary between -1 (for predictions of no desirable change) and 1.

If there is no prediction rph = 0.

Page 24: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

24

Well-being calculation - Prediction

Page 25: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

25

Well-being calculation - Prediction

Activity prediction provides a no-progress indicator given at regular time intervals when the activity of the robot is low for long periods of time.rp(activity) = -1

There is no prediction for welfarerp(welfare) = 0

Page 26: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

26

Adaptive System Uses Q-learning State information is

fed to NN comprising of homeostatic variable values and other perceptual values gathered from sensors.

Page 27: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

27

Adaptive System (cont) Developed

controller tries to maximize the reinforcement received by selecting between one of the available hand-designed behaviors.

Page 28: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

28

Adaptive System (cont) Agent may select

between performing the behavior proven better in past or an arbitrary one.

Selection function is based on Boltzmann-Gibbs’ distribution. (pg 30 in class textbook).

Page 29: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

29

EB II Architecture

Page 30: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

30

ALEC Architecture

Page 31: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

31

ALEC I Inspired by the CLARION model Each individual rule consists of a condition

for activation and a behavior suggestion. Activation condition is dictated by a set of

intervals, one for each dimension of input space.

6 input dimensions varying between 0 and 1 with intervals of 0.2

Page 32: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

32

ALEC I (cont) A condition interval may only start or end

at pre-defined points of the input space. Since this may lead to a large number of

possible states, rule learning is limited to those few cases with successful behavior selection.

Other cases are left to Emotion System which uses its generalization abilities to cover the state space.

Page 33: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

33

ALEC I (cont)

Successful behaviors for certain states used to extract a rule corresponding to the decision made and are added to the agent’s rule set.

If same decision is made, the agent updates the Success Rate (SR) for that rule.

Page 34: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

34

ALEC I – Success

r = immediate reinforcement Difference of Q-value between state x

where decision a was made and the resulting state y.

Tsuccess = 0.2 Constant threshold

Page 35: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

35

ALEC I (cont) – Rule expansion, shrinkage If a rule is often successful, the agent tries to

generalize it to cover nearby environmental states.

If a rule is very poor, the agent makes it more specific.

If it still does not improve, the rule is deleted. Maximum of 100 rules.

Page 36: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

36

ALEC I (cont) – Rule expansion, shrinkage Statistics are kept for the success rate of

every possible one-state expansion or shrinkage of the rule, to select best option.

Rule is compared to a “match all” rule (rule_all) with the same behavior suggestion and against itself after the best expansion or shrinkage (rule_exp, rule_shrink).

Page 37: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

37

ALEC I (cont) – Rule expansion, shrinkage

A rule is expanded if it is significantly better than the match-all rule and the expanded rule is better or equal to the original rule.

A rule that is insufficiently better than the match-all rule is shrunk if this results in an improvement or otherwise is deleted.

Page 38: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

38

ALEC I (cont) – Rule expansion, shrinkage

Page 39: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

39

Rule expansion, shrinkage (cont)

Constant thresholds:Tsuccess = 0.2 Thresholds Texpand = 2.0 Tshrunk = 1.0

Page 40: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

40

ALEC I (cont) – Rule expansion, shrinkage

A rule that performs badly is deleted. A rule is also deleted if its condition has not

been met for a while. When two rules propose the same behavior

selection and their conditions are sufficiently similar, they are merged into a single rule.

Success rate is reset whenever a rule is modified by merging, expansion or shrinkage.

Page 41: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

41

Cognitive System (cont)

If the cognitive system has a rule that applies to the current environmental state, then the cognitive system influences the behavior decision.Adds an arbitrary constant of 1.0 to the

respective Q-value before the stochastic behavior selection is made.

Page 42: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

42

ALEC Architecture

Page 43: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

43

Page 44: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

44

Example of a Rule – execute avoid obstacles. Six input dimensions segmented with 0.2

granularity – 0, 0.2, 0.4, 0.6, 0.8, 1 energy = [0.6, 1] activity = [0,1] welfare = [0, 0.6] light intensity = [0,1] obstacle density = [0.8, 1] energy availability = [0,1]

Page 45: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

45

ALEC II

Instead of the above function, the agent considers that a behavior is successful if there is a positive homeostatic variable transition. If a variable state changes to the target state from the

danger state.

Page 46: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

46

ALEC III

Same as ALEC II except that the well-being does not depend on state transitions nor predictionsct(sh) = 0 cp = 0

Page 47: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

47

Experiments

Goal of ALEC is to allow an agent faced with realistic world conditions to adapt on-line and autonomously to its environment.

Cope with continuous time and space Limited memory Time constraints Noisy sensors Unreliable actuators

Page 48: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

48

Khepera Robot

Left and Right wheel motors 8 infrared sensors that allow it to detect

object proximity and ambient light6 in the front2 in the rear

Page 49: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

49

Experiment (cont)

Page 50: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

50

Goals

Maintain Energy Avoid Obstacles Move around in environment

Not as important as the first two.

Page 51: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

51

Energy Acquisition

Must overlook goal of avoiding obstaclesMust bump into source

Energy is available for a short periodMust look for new sources

Energy is received by high values of light in rear sensors

Page 52: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

52

Procedure

Each experiment consisted of:100 different robot trials of 3 million simulation

steps A new fully recharged robot with all state

values reset placed at randomly selected starting positions in each trial

For evaluation the trial period was divided into 60 smaller periods of 50,000 steps.

Page 53: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

53

Procedure (cont)

For each of these periods the following were recorded: Reinforcement – mean of reinforcement (well-being)

value calculated at each step. Energy – mean level of robot Distance – mean value of Euclidean distance d taken

at 100-step intervals (approx. # steps to move between corners of environment)

Collisions – percentage of steps involving collisions.

Page 54: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

54

Results

Pairs of controllers were compared using a randomized analysis of variance (RANOVA) by Piater (1999)

Page 55: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

55

Results (cont)

Most important contribution to reinforcement is the state value.

For the successful accomplishment of the task and goals, all homeostatic variables should be taken into consideration in reinforcement. Agents with no: Energy dependent reinforcement fail in their main task

of maintaining energy levels. Welfare – increased collisions Activity – move only as a last resort (avoid collisions)

Page 56: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

56

Results (cont)

Predictions of state transitions proved essential for an agent to accomplish its tasks.Controller with no energy prediction is unable

to acquire energy.Controller with no activity prediction will

eventually stop moving.

Page 57: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

57

Results – EB, EBII and Random

The first set of graphs is dealing with three different agents:EB – discussed in earlier paperEB II Random – selects randomly amongst the

differently available behaviors at regular intervals.

Page 58: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

58

Results – EB, EBII and Random

Page 59: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

59

Page 60: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

60

Page 61: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

61

Page 62: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

62

Page 63: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

63

Page 64: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

64

Page 65: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

65

Page 66: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

66

Page 67: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

67

Conclusion

Emotion and Cognitive systems can improve learning, but are unable to store and consult all the single events the agent experiences.

The Emotion system gives a “sense” of what is right, while the Cognitive system has constructs a model of reality and corrects the emotion system when it reaches incorrect conclusions.

Page 68: 1 Learning Behavior- Selection by Emotions and Cognition in a Multi-Goal Robot Task Sandra Clara Gadanho Presented by Jamie Levy.

68

Future work

Adding more specific knowledge in the cognitive system which then may be used for planning of more complex tasks.