Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot...
-
Upload
vivian-evans -
Category
Documents
-
view
230 -
download
0
description
Transcript of Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot...
Learning for Physically Diverse Robot TeamsRobot Teams - Chapter 7
CS8803 Autonomous Multi-Robot Systems10/3/02
Motivations
• Robots are cool.• Robot teams are cooler.• Robots are hard to
program/control.• Robot teams are even harder.
Motivations
• Robotic soccer - hard!
Motivations
• Diagnose and rebuild the transmission of this 1969 Jaguar E-Type - Really Hard!
Motivations
• Answer: Robot Learning!
Motivations
• Challenges:– Very large state spaces– Uncertain credit assignments– Limited training time– Uncertainty in sensing and shared info– Non-deterministic actions– Difficulty in defining appropriate abstractions
for learned info– Difficulty of merging info from different robot
experiences
Motivations
• Benefits– Increased robustness– Reduced Complexity– Increased ease of adding new assets
to team
Motivations
• 4 types of learning in robotic systems:– Learning numerical functions for
calibrations or parameter adjustments
– Learning about the world– Learning to coordinate behaviors– Learning new behaviors
Learning New Cooperative Behaviors
• Inherently cooperative tasks are difficult to learn!– Utility of the action of a robot dependent
on the actions of other robots– Soccer a good example– Cooperative Multi-robot Observation of
Multiple Moving Targets (CMOMMC)• Scalable
Learning New Cooperative Behaviors
• CMOMMT Application:– S: a 2-D bounded, enclosed spatial region– V: a team of m robot vehicles w/ 360 field
of view w/ limited range.– O(t): a set of n targets in region S at time t– B(t): a matrix such that Bij = 1 if robot i is
observing target j at time t.– Sensor coverage is much less than region
area
Learning New Cooperative Behaviors
• Goal: develop algorithm A-CMOMMT– Maximize average number of targets
observed at any given time.
Learning New Cooperative Behaviors
• Human-Generated Solution– Local force vectors
• Targets attract• Teammates repel
– Magnitude dependent on distance from robot
– Weight reduced if target already being observed
– Direction given by summing vectors
Learning New Cooperative Behaviors
• Results:
Learning New Cooperative Behaviors
• Distributed, Pessimistic Lazy Q-Learning– No a priori model– Reinforcement learning– Instance-based learning– Assumes lower bound on utility
Learning New Cooperative Behaviors
• Q-Learning– For each action/state pair, Q(s,a) = 0– Observe state s.– Do:
• Select an action and execute• Receive reward r• Observe new state s’• Update table entry for Q(s,a)
Learning New Cooperative Behaviors
• Lazy Learning (instance-based learning)– Delays use of gathered info until
necessaryRandomly built look-up table: (state, action)
Reinforcement Function
Situation Matcher Evaluation Function
World ActionState
Learning New Cooperative Behaviors
• Pessimistic Algorithm– Rates utility of an action based on
lower bound• Predict the state following each possible
action in current state• Compute lower bound of utility of each
new state• Choose action corresponding to highest
lower bound
Learning New Cooperative Behaviors
• Results– Much better than
random– Not as good as
human-generated– Significant results
Learning New Cooperative Behaviors
• Q-Learning w/ VQQL and GLA– State space huge– Want generalized algorithm– 2 Phases
• Learn quantizer• Learn Q function
Learning New Cooperative Behaviors
• Generalized Lloyd Algorithm– Clustering technique
• Converts continuous state space to discrete
– Takes set T of M states– Returns set C of N states– Stopping Criterion (Dm - Dm+1)/ Dm <
Learning New Cooperative Behaviors
• Vector Quantization for Q-Learning– Obtain a set T of examples of states– Design a vector quantizer C using T with
GLA– Learn the Q function
•Choose an action following an exploration strategy
•Receive experience tuple <s, a, s’, r>•Quantize the tuple obtaining <s^, a, s^’, r>•Update the Q table
Learning New Cooperative Behaviors
• 2 experiments– Local reward function– Collaborative reward function
Learning New Cooperative Behaviors
• Results– Competitive– Can handle higher
dimension state spaces
Learning for Parameter Adjustment
• Need robots to perform life-long tasks– Environmental changes– Variations in robot capabilities– Heterogeneity
• Overlap in capabilities• Change in heterogeneity
Learning for Parameter Adjustment
• Problem def– R: set of n robots– T: set of m tasks– Ai: set of actions robot i can perform– H: Ai->T set of functions H, return task
completed by action Ai
– q(aij): quality metric– Ui: set of actions robot i performs in current
mission
Learning for Parameter Adjustment
– Given R, T, Ai and H, determine set of actions Ui that optimizes the performance metric
Learning for Parameter Adjustment
• ALLIANCE overview– Completely distributed– Behaviors grouped into sets
• Activated as a set• Controlled by high-level motivational
behaviors– Impatience and Acquiescence
thresholds– Broadcast communication
Learning for Parameter Adjustment
• L-ALLIANCE overview– Extension of ALLIANCE
• Automatically updates motivational behaviors
– 2 problems to solve:• How to give robots ability to obtain
knowledge about the quality of team member performance
• How to use team member performance knowledge to select a task to pursue
Learning for Parameter Adjustment
– Performance monitors• One for every behavior set• Monitors how self and others performing
Learning for Parameter Adjustment
– Control phases•Active learning phase
– Random choices– Maximally patient– Catalog monitors and update control
parameters•Adaptive learning phase
– Must make effort to accomplish mission– Acquiesce and become impatient quickly– Still catalog monitors and update control
parameters
Learning for Parameter Adjustment
– Action Selection Strategy• At each iteration, robot ri divides remaining
tasks into two categories– Tasks that ri expects to perform better than all
others and are not being currently done– All other tasks ri can do
• Robot ri repeats following until no tasks left to do
– Select tasks from the first category, longest first until none left
– Select tasks from second category, shortest first
Learning for Parameter Adjustment
• Results - Box Pushing– Experiment 1
• 2 identical robots• 1 fails
Learning for Parameter Adjustment
– Experiment 2• 2 different robots• Different
capabilities– L-ALLIANCE
capable of keeping teams working toward goal
• Changes to composition
• Changes to ability
Conclusions
• Lots of challenges left• Rewards tantalizing• Learning approaches not yet
superior to human generated solutions
Questions?