17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select...

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Authors: Areolino de Almeida Neto - UFMA Bodo Heimann - University of Hannover Luiz Carlos S. Góes - ITA Cairo L. Nascimento Jr. - ITA

Avoidance Avoidance of Multiple of Multiple Dynamic Dynamic ObstaclesObstacles



ObjectiveObjective

To drive a mobile robot by a safe path using indications of directions which avoid dynamic and static obstacles

goalrobot obstacle



Reinforcement LearningReinforcement Learning

Characteristics: Intuitive data Cumulative learning Constructive solution Direct knowledge acquisition Very adequate to decision making




States: Distance to the possible point

of collision (4)

Direction of the obstacle (8)

Shortest distance between obstacle and the robot’s path (8)

Time condition of arriving (3)




Actions: lateral velocity

3 to the right1 null3 to the left

frontal velocity3 ahead1 null3 back




State are mapped to actions using a coding scheme. There are 768 states.

For each state there are 49 possible actions and their corresponding “evaluation value”.

Training means creating the “evaluation values” for each state and the possible actions.




600/n10/at100f

Training using only one obstacle: 1st level: Monte Carlo (~450000 runs)

directfast computationevaluation function

2nd level: Q-learning

necessary in around50 situations

t: duration of movementa: number of actionsn: iteration number



Obstacle AvoidanceObstacle Avoidance

Architecture: Use of a path a priori (static environment) Detection of a possibility of collision

Classification of a collision:possible immediate




Algorythm for Multiple Obstacles Avoidance: One obstacle is defined as main and actions are

indicated based on this obstacle; The situation is divided in sectors:




Algorythm for Multiple Obstacles Avoidance: If the last action decided has a chance to avoid the

obstacles (if it can drive the robot by a free and sufficient large sector), than it is maintained;

If not, then the RL technique indicates 10 actions for the present situation. For a new situation, the actions are the 10 best actions, otherwise they are the 10 best actions belong to the same quadrant of the last action;




Algorythm for Multiple Obstacles Avoidance: For the 10 actions indicated, if more than one can

drive the robot by a safe trajectory, then the action with the fewest changing in lateral velocity is chosen;

If none, so the 10 actions indicated are reflected to the other side (left or right side) and a safe trajectory is searched again;

If none, an action, considering the 10 best actions for all quadrants, that presents the possibility of no collision is immediately chosen




Algorythm for Multiple Obstacles Avoidance: If none, so an action, considering the 10 best

actions for all quadrants, that presents the possibility of arrival at the collision point before or after the obstacle is immediately chosen;

Finally, if none was found, then the robot should stop.




Neural Representation: Problem: state-action matrix explosion (37632) Solution: neural representation Use of multiple neural networks

• training 1st NN: E = D – Y1

• training the 2nd NN: E = (D – Y1) – Y2




Results:




Conclusion: Complex avoidance with primitives actions Direct knowledge with Monte Carlo technique Improvement in knowledge with Q-learning Neural representation can compact well the

state-action matrix

Acknowledgements: CAPES, DAAD, UFMA and ITA for the financial support

17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select...

Documents

Transcript of 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select...