Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot...

35
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi- Robot Systems 10/3/02

description

Motivations Robotic soccer - hard!

Transcript of Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot...

Page 1: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Physically Diverse Robot TeamsRobot Teams - Chapter 7

CS8803 Autonomous Multi-Robot Systems10/3/02

Page 2: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Robots are cool.• Robot teams are cooler.• Robots are hard to

program/control.• Robot teams are even harder.

Page 3: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Robotic soccer - hard!

Page 4: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Diagnose and rebuild the transmission of this 1969 Jaguar E-Type - Really Hard!

Page 5: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Answer: Robot Learning!

Page 6: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Challenges:– Very large state spaces– Uncertain credit assignments– Limited training time– Uncertainty in sensing and shared info– Non-deterministic actions– Difficulty in defining appropriate abstractions

for learned info– Difficulty of merging info from different robot

experiences

Page 7: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• Benefits– Increased robustness– Reduced Complexity– Increased ease of adding new assets

to team

Page 8: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Motivations

• 4 types of learning in robotic systems:– Learning numerical functions for

calibrations or parameter adjustments

– Learning about the world– Learning to coordinate behaviors– Learning new behaviors

Page 9: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Inherently cooperative tasks are difficult to learn!– Utility of the action of a robot dependent

on the actions of other robots– Soccer a good example– Cooperative Multi-robot Observation of

Multiple Moving Targets (CMOMMC)• Scalable

Page 10: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• CMOMMT Application:– S: a 2-D bounded, enclosed spatial region– V: a team of m robot vehicles w/ 360 field

of view w/ limited range.– O(t): a set of n targets in region S at time t– B(t): a matrix such that Bij = 1 if robot i is

observing target j at time t.– Sensor coverage is much less than region

area

Page 11: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Goal: develop algorithm A-CMOMMT– Maximize average number of targets

observed at any given time.

Page 12: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Human-Generated Solution– Local force vectors

• Targets attract• Teammates repel

– Magnitude dependent on distance from robot

– Weight reduced if target already being observed

– Direction given by summing vectors

Page 13: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Results:

Page 14: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Distributed, Pessimistic Lazy Q-Learning– No a priori model– Reinforcement learning– Instance-based learning– Assumes lower bound on utility

Page 15: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Q-Learning– For each action/state pair, Q(s,a) = 0– Observe state s.– Do:

• Select an action and execute• Receive reward r• Observe new state s’• Update table entry for Q(s,a)

Page 16: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Lazy Learning (instance-based learning)– Delays use of gathered info until

necessaryRandomly built look-up table: (state, action)

Reinforcement Function

Situation Matcher Evaluation Function

World ActionState

Page 17: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Pessimistic Algorithm– Rates utility of an action based on

lower bound• Predict the state following each possible

action in current state• Compute lower bound of utility of each

new state• Choose action corresponding to highest

lower bound

Page 18: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Results– Much better than

random– Not as good as

human-generated– Significant results

Page 19: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Q-Learning w/ VQQL and GLA– State space huge– Want generalized algorithm– 2 Phases

• Learn quantizer• Learn Q function

Page 20: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Generalized Lloyd Algorithm– Clustering technique

• Converts continuous state space to discrete

– Takes set T of M states– Returns set C of N states– Stopping Criterion (Dm - Dm+1)/ Dm <

Page 21: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Vector Quantization for Q-Learning– Obtain a set T of examples of states– Design a vector quantizer C using T with

GLA– Learn the Q function

•Choose an action following an exploration strategy

•Receive experience tuple <s, a, s’, r>•Quantize the tuple obtaining <s^, a, s^’, r>•Update the Q table

Page 22: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• 2 experiments– Local reward function– Collaborative reward function

Page 23: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning New Cooperative Behaviors

• Results– Competitive– Can handle higher

dimension state spaces

Page 24: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

• Need robots to perform life-long tasks– Environmental changes– Variations in robot capabilities– Heterogeneity

• Overlap in capabilities• Change in heterogeneity

Page 25: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

• Problem def– R: set of n robots– T: set of m tasks– Ai: set of actions robot i can perform– H: Ai->T set of functions H, return task

completed by action Ai

– q(aij): quality metric– Ui: set of actions robot i performs in current

mission

Page 26: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

– Given R, T, Ai and H, determine set of actions Ui that optimizes the performance metric

Page 27: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

• ALLIANCE overview– Completely distributed– Behaviors grouped into sets

• Activated as a set• Controlled by high-level motivational

behaviors– Impatience and Acquiescence

thresholds– Broadcast communication

Page 28: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

• L-ALLIANCE overview– Extension of ALLIANCE

• Automatically updates motivational behaviors

– 2 problems to solve:• How to give robots ability to obtain

knowledge about the quality of team member performance

• How to use team member performance knowledge to select a task to pursue

Page 29: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

– Performance monitors• One for every behavior set• Monitors how self and others performing

Page 30: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

– Control phases•Active learning phase

– Random choices– Maximally patient– Catalog monitors and update control

parameters•Adaptive learning phase

– Must make effort to accomplish mission– Acquiesce and become impatient quickly– Still catalog monitors and update control

parameters

Page 31: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

– Action Selection Strategy• At each iteration, robot ri divides remaining

tasks into two categories– Tasks that ri expects to perform better than all

others and are not being currently done– All other tasks ri can do

• Robot ri repeats following until no tasks left to do

– Select tasks from the first category, longest first until none left

– Select tasks from second category, shortest first

Page 32: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

• Results - Box Pushing– Experiment 1

• 2 identical robots• 1 fails

Page 33: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Parameter Adjustment

– Experiment 2• 2 different robots• Different

capabilities– L-ALLIANCE

capable of keeping teams working toward goal

• Changes to composition

• Changes to ability

Page 34: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Conclusions

• Lots of challenges left• Rewards tantalizing• Learning approaches not yet

superior to human generated solutions

Page 35: Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Questions?