17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select...

15
17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: Areolino de Almeida Neto - UFMA Bodo Heimann - University of Hannover Luiz Carlos S. Góes - ITA Cairo L. Nascimento Jr. - ITA Avoidance Avoidance of Multiple of Multiple Dynamic Dynamic Obstacles Obstacles

description

17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Reinforcement Learning Characteristics: h Intuitive data h Cumulative learning h Constructive solution h Direct knowledge acquisition h Very adequate to decision making

Transcript of 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select...

Page 1: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Authors: Areolino de Almeida Neto - UFMA Bodo Heimann - University of Hannover Luiz Carlos S. Góes - ITA Cairo L. Nascimento Jr. - ITA

Avoidance Avoidance of Multiple of Multiple Dynamic Dynamic ObstaclesObstacles

Page 2: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

ObjectiveObjective

To drive a mobile robot by a safe path using indications of directions which avoid dynamic and static obstacles

goalrobot obstacle

Page 3: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

Characteristics: Intuitive data Cumulative learning Constructive solution Direct knowledge acquisition Very adequate to decision making

Page 4: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

States: Distance to the possible point

of collision (4)

Direction of the obstacle (8)

Shortest distance between obstacle and the robot’s path (8)

Time condition of arriving (3)

Page 5: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

Actions: lateral velocity

3 to the right1 null3 to the left

frontal velocity3 ahead1 null3 back

Page 6: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

State are mapped to actions using a coding scheme. There are 768 states.

For each state there are 49 possible actions and their corresponding “evaluation value”.

Training means creating the “evaluation values” for each state and the possible actions.

Page 7: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

600/n10/at100f

Training using only one obstacle: 1st level: Monte Carlo (~450000 runs)

directfast computationevaluation function

2nd level: Q-learning

necessary in around50 situations

t: duration of movementa: number of actionsn: iteration number

Page 8: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Architecture: Use of a path a priori (static environment) Detection of a possibility of collision

Classification of a collision:possible immediate

Page 9: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Algorythm for Multiple Obstacles Avoidance: One obstacle is defined as main and actions are

indicated based on this obstacle; The situation is divided in sectors:

Page 10: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Algorythm for Multiple Obstacles Avoidance: If the last action decided has a chance to avoid the

obstacles (if it can drive the robot by a free and sufficient large sector), than it is maintained;

If not, then the RL technique indicates 10 actions for the present situation. For a new situation, the actions are the 10 best actions, otherwise they are the 10 best actions belong to the same quadrant of the last action;

Page 11: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Algorythm for Multiple Obstacles Avoidance: For the 10 actions indicated, if more than one can

drive the robot by a safe trajectory, then the action with the fewest changing in lateral velocity is chosen;

If none, so the 10 actions indicated are reflected to the other side (left or right side) and a safe trajectory is searched again;

If none, an action, considering the 10 best actions for all quadrants, that presents the possibility of no collision is immediately chosen

Page 12: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Algorythm for Multiple Obstacles Avoidance: If none, so an action, considering the 10 best

actions for all quadrants, that presents the possibility of arrival at the collision point before or after the obstacle is immediately chosen;

Finally, if none was found, then the robot should stop.

Page 13: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Reinforcement LearningReinforcement Learning

Neural Representation: Problem: state-action matrix explosion (37632) Solution: neural representation Use of multiple neural networks

• training 1st NN: E = D – Y1

• training the 2nd NN: E = (D – Y1) – Y2

Page 14: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Results:

Page 15: 17º International Congress of Mechanical Engineering November 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel São Paulo - SP - Brazil Authors: h Areolino.

17º International Congress of Mechanical EngineeringNovember 10–14, 2003 – Holiday Inn Select Jaraguá - Hotel

São Paulo - SP - Brazil

Obstacle AvoidanceObstacle Avoidance

Conclusion: Complex avoidance with primitives actions Direct knowledge with Monte Carlo technique Improvement in knowledge with Q-learning Neural representation can compact well the

state-action matrix

Acknowledgements: CAPES, DAAD, UFMA and ITA for the financial support