December 4, Project

60
Combing Reactive and Deliberative Algorithms CSCI7000: Final Presentation Maciej Stachura Dec. 4, 2009

Transcript of December 4, Project

Page 1: December 4, Project

Combing Reactive and Deliberative Algorithms

CSCI7000: Final Presentation

Maciej Stachura

Dec. 4, 2009

Page 2: December 4, Project

Outline

• Project Overview

• Positioning System

• Hardware Demo

Page 3: December 4, Project

Project Goals

• Combine deliberative and reactive algorithms

• Show stability and completeness

• Demonstrate multi-robot coverage on iCreate robots.

Page 4: December 4, Project

Coverage Problem

• Cover Entire Area

• Deliberative Algorithm Plans Next Point to visit.

• Reactive Algorithm pushes robot to that point.

• Reactive Algorithm Adds 2 constraints:

• Maintain Communication Distance

• Collision Avoidance

Page 5: December 4, Project

Proof of Stability

also,

error decays

Therefore stable system.

Page 6: December 4, Project

Demo for single vehicle

• Implimented on iCreate.

• 5 points to visit.

• Deliberative Algorithm Selects Point.

• Reactive Algorithm uses potential field to reach point.

• Point reached when within some minimum distance.

VIDEO

Page 7: December 4, Project

Multi-robot Case

• 2 Robot Coverage

• Blue is free to move

• Green must stay in communication range.

• Matlab Simulation.

VIDEO

Page 8: December 4, Project

Outline

• Project Overview

• Positioning System

• Hardware Demo

Page 9: December 4, Project

Positioning System

• Problems with Stargazer.• Periods of no measurement

• Occasional Bad Measurements

• State Estimation (SPF)• Combine Stargazer with Odometry

• Reject Bad Measurements

Page 10: December 4, Project

SPF Explanation

• Sigma Point Filter uses Stargazer and Odometry measures to predict robot position.

• Non-guassian Noise

• Implimented and Tested on robot platform.

• Performs very well in the presence of no measurements or bad measurement.

Page 11: December 4, Project

Outline

• Project Overview

• Positioning System

• Hardware Demo

Page 12: December 4, Project

Roomba Pac-Man

• Implimented 5 Robot Demo along

with Jack Elston.

• Re-creation of Pac-Man Game.

• Demonstrate NetUAS system.

• Showcase most of concepts

from class.

Page 13: December 4, Project

Video

Page 14: December 4, Project

Roomba Pac-Man

• Reactive Algorithms:• Walls of maze

• Potential Field

• Deliberative Algorithms• Ghost Planning (Enumerate States)

• Collision Avoidance

• Game modes

• Decentralized• Each ghost ran planning algorithm

• Collaborated on positions

• Communication• 802.11b Ad-hoc Network

• AODV, no centralized node

Page 15: December 4, Project

Roomba Pac-Man

• Reactive Algorithms:• Walls of maze

• Potential Field

• Deliberative Algorithms• Ghost Planning (Enumerate States)

• Collision Avoidance

• Game modes

• Decentralized• Each ghost ran planning algorithm

• Collaborated on positions

• Communication• 802.11b Ad-hoc Network

• AODV, no centralized node

Page 16: December 4, Project

Roomba Pac-Man

• Reactive Algorithms:• Walls of maze

• Potential Field

• Deliberative Algorithms• Ghost Planning (Enumerate States)

• Collision Avoidance

• Game modes

• Decentralized• Each ghost ran planning algorithm

• Collaborated on positions

• Communication• 802.11b Ad-hoc Network

• AODV, no centralized node

Page 17: December 4, Project

Roomba Pac-Man

• Reactive Algorithms:• Walls of maze

• Potential Field

• Deliberative Algorithms• Ghost Planning (Enumerate States)

• Collision Avoidance

• Game modes

• Decentralized• Each ghost ran planning algorithm

• Collaborated on positions

• Communication• 802.11b Ad-hoc Network

• AODV, no centralized node

Page 18: December 4, Project

Roomba Pac-Man

• Simulation• Multi-threaded Sim. Of Robots

• Combine Software with Hardware

• Probabilistic Modelling• Sigma Point Filter

• Human/Robot Interaction• Limited Human Control of Pac-Man

• Autonomous Ghosts

• Hardware Implimentation• SBC's running Gentoo

• Experimental Verification

Page 19: December 4, Project

Roomba Pac-Man

• Simulation• Multi-threaded Sim. Of Robots

• Combine Software with Hardware

• Probabilistic Modelling• Sigma Point Filter

• Human/Robot Interaction• Limited Human Control of Pac-Man

• Autonomous Ghosts

• Hardware Implimentation• SBC's running Gentoo

• Experimental Verification

Page 20: December 4, Project

Roomba Pac-Man

• Simulation• Multi-threaded Sim. Of Robots

• Combine Software with Hardware

• Probabilistic Modelling• Sigma Point Filter

• Human/Robot Interaction• Limited Human Control of Pac-Man

• Autonomous Ghosts

• Hardware Implimentation• SBC's running Gentoo

• Experimental Verification

Page 21: December 4, Project

Roomba Pac-Man

• Simulation• Multi-threaded Sim. Of Robots

• Combine Software with Hardware

• Probabilistic Modelling• Sigma Point Filter

• Human/Robot Interaction• Limited Human Control of Pac-Man

• Autonomous Ghosts

• Hardware Implimentation• SBC's running Gentoo

• Experimental Verification

Page 22: December 4, Project

Left to Do

• Impliment inter-robot potential field.

• Conduct Experiments

• Generalize Theory?

Page 23: December 4, Project

End

Questions?

http://pacman.elstonj.com

Page 24: December 4, Project

A Gradient Based Approach

Greg Brown

Page 25: December 4, Project

  Introduction   Robot State Machine   Gradients for “Grasping” the Object   Gradient for Moving the Object   Convergence Simulation Results   Continuing Work

Page 26: December 4, Project

Place a single beacon on an object and another at the object’s destination. Multiple robots cooperate to move the object.

Goals:   Minimal/No Robot Communication   Object has an Unknown Geometry   Use Gradients for Reactive Navigation

Page 27: December 4, Project
Page 28: December 4, Project

  Each Robot Knows: ◦  Distance/Direction to Object ◦  Distance/Direction to Destination ◦  Distance/Direction to All Other Robots ◦  Bumper Sensor to Detect Collision

  Robots Do Not Know ◦  Object Geometry ◦  Actions other Robots are taking

Page 29: December 4, Project
Page 30: December 4, Project

  Related “Grasping” Work: ◦  Grasping with hand – Maximize torque [Liu et al] ◦  Cage objects for pushing [Fink et al] ◦  Tug Boats Manipulating Barge [Esposito] ◦  ALL require known geometry

  My Hybrid Approach ◦  Even distribution around object ◦  Alternate between Convergence and Repulsion

Gradients

◦  Similar to Cow Herding example from class.

Page 31: December 4, Project

Pull towards object:

Avoid nearby robots: €

γ = ri − robj

β = 1− 1+ dc4

dc4

( ri − rj2− dc

2)2

( ri − rj2− dc

2)2 +1

sign(dc − ri −rj )+1

2

j=1

N

Page 32: December 4, Project

Cost =γ 2

(γκ c + β)1/κ c

Combined Cost Function:

Page 33: December 4, Project

Repel from all robots:

Cost =1

(1+ β)1/κ r

β = ri − rj2− dr

2

j=1

N

Page 34: December 4, Project
Page 35: December 4, Project

  Related Work ◦  Formations [Tanner and Kumar] ◦  Flocking [Lindhé et al] ◦  Pushing objects [Fink et al, Esposito] ◦  No catastrophic failure if out of position.

  My Approach: ◦  Head towards destination in steps ◦  Keep close to object. ◦  Communicate “through” object ◦  Maintain orientation.

  Assuming forklift on Robot can rotate 360º

Page 36: December 4, Project

Next Step Vector:

Pull to destination:

γ1 = ri − rγ i€

rγ i = rideali + dmrObjCenter − rObjDestrObjCenter − rObjDest

Page 37: December 4, Project

Valley Perpendicular to Travel Vector:

m = −rObjCenterx − rObjDestx

rObjCentery − rObjDesty + .0001

γ 2 =mrix − riy −mrγ x + rγ y

(m2 +1)

Page 38: December 4, Project

Cost = γ1κ1γ 2

κ 2

Page 39: December 4, Project
Page 40: December 4, Project
Page 41: December 4, Project

0

10

20

30

40

50

60 52

1 67

0 82

0 96

9 11

18

1268

14

17

1566

17

15

1865

20

14

2163

23

13

2462

26

11

2761

29

10

3059

32

08

3358

35

07

3656

38

06

3955

41

04

4254

44

03

4552

47

01

4851

50

00

Num

ber o

f Occ

uren

ces

Time Steps

3 Bots

4 Bots

5 Bots

6 Bots

Page 42: December 4, Project

  Resolve Convergence Problems   Noise in Sensing   Noise in Actuation

Page 43: December 4, Project

0

10

20

30

40

50

60 24

5 40

4 56

2 72

1 87

9 10

38

1196

13

55

1513

16

72

1830

19

89

2147

23

06

2464

26

23

2781

29

40

3098

32

57

3415

35

74

3732

38

91

4049

42

08

4366

45

25

4683

48

42

5000

Num

ber o

f Occ

uren

ces

Time Steps

3 Bots

4 Bots

5 Bots

6 Bots

Page 44: December 4, Project

Modular RobotsLearning

ContributionsConclusion

A Young Modular Robot’s Guide to Locomotion

Ben Pearre

Computer Science

University of Colorado at Boulder, USA

December 6, 2009

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 45: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Outline

Modular Robots

LearningThe ProblemThe Policy GradientDomain Knowledge

ContributionsGoing forwardSteeringCurriculum Development

Conclusion

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 46: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Modular Robots

How to get these to move?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 47: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

The Learning Problem

Given unknown sensations and actions, learn a task:

◮ Sensations s ∈ Rn

◮ State x ∈ Rd

◮ Action u ∈ Rp

◮ Reward r ∈ R

◮ Policy π(x , θ) = Pr(u|x , θ) : R|θ| × R

|u|

Example policy:

u(x , θ) = θ0 +∑

i

θi (x − bi )TDi (x − bi ) + N (0, σ)

What does that mean for locomotion?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 48: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

Policy Gradient Reinforcement Learning: Finite Difference

Vary θ:

◮ Measure performance J0 of π(θ)

◮ Measure performance J1...n of π(θ + ∆1...nθ)

◮ Solve regression, move θ along gradient.

gradient =(

∆ΘT∆Θ)−1

∆ΘT J

where ∆Θ =

∆θ1

...∆θn

and J =

J1 − J0

...Jn − J0

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 49: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

Policy Gradient Reinforcement Learning: Likelihood Ratio

Vary u:

◮ Measure performance J(π(θ)) of π(θ) with noise. . .

◮ Compute log-probability of generated trajectory Pr(τ |θ)

Gradient =

⟨(

H∑

k=0

∇θ log πθ(uk |xk)

)(

H∑

l=0

rl

)⟩

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 50: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

Why is RL slow?

“Curse of Dimensionality”

◮ Exploration

◮ Learning rate

◮ Domain representation

◮ Policy representation

◮ Over- and under-actuation

◮ Domain knowledge

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 51: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

Domain Knowledge

Infinite space of policies to explore.

◮ RL is model-free. So what?

◮ Representation is bias.

◮ Bias search towards “good” solutions

◮ Learn all of physics. . . and apply it?

◮ Previous experience in this domain?

◮ Policy implemented by <programmer, agent> “autonomous”?

How would knowledge of this domain help?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 52: December 4, Project

Modular RobotsLearning

ContributionsConclusion

The ProblemThe Policy GradientDomain Knowledge

Dimensionality Reduction

Task learning as domain-knowledge acquisition:

◮ Experience with a domain

◮ Skill at completing some task

◮ Skill at completing some set of tasks?

◮ Taskspace Manifold

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 53: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

Goals

1. Apply PGRL to a new domain.

2. Learn mapping from task manifold to policy manifold.

3. Robot school?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 54: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

1: Learning to locomote

◮ Sensors: Force feedback onservos? Or not.

◮ Policy: u ∈ R8 controls

servos

ui = N (θi , σ)

◮ Reward: forward speed

◮ Domain knowledge: none

Demo?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 55: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

1: Learning to locomote

0 500 1000 1500 2000 2500−0.1

0

0.1

0.2

0.3

0.4

s

v

0 500 1000 1500 2000 2500−10

−5

0

5

10

s

θ

Learning to move

steer bow

steer stern

bow

port fwd

stbd fwd

port aft

stbd aft

stern

effort10−step forward speed

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 56: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

2: Learning to get to a target

◮ Sensors: Bearing to goal.

◮ Policy: u ∈ R8 controls servos

◮ Policy parameters: θ ∈ R16

µi (x , θ) = θi · s (1)

= [ θi ,0 θi ,1 ]

[

]

(2)

ui = N (µi , σ) (3)

∇θilog π(x , θ) =

1

σ2(ui − θi · s) · s (4)

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 57: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

2: Task space → policy space

◮ 16-DOF learning FAIL!

◮ Try simpler task:◮ Learn to locomote with

θ ∈ R16

◮ Try bootstrapping:

1. Learn to locomote with 8DOF

2. Add new sensing andcontrol DOF

◮ CHEATING! Why?

0 20 40 60 80 100 12050

100

150

200

250

300Time to complete task

task

seco

nds

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 58: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

Curriculum development for manifold discovery?

◮ Etude in Locomotion◮ Task-space manifold for locomotion

θ ∈ ξ · [ 0 0 1 −1 1 −1 1 1 ]T

◮ Stop exploring in task nullspace◮ FAST!

◮ Etude in Steering◮ Can task be completed on locomotion manifold?◮ One possible approximate solution uses the bases

[

0 0 1 −1 1 −1 1 11 −1 0 0 0 0 0 0

]T

◮ Can second basis be learned?

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 59: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Going forwardSteeringCurriculum Development

3: How to teach a robot?

How to teach an animal?

1. Reward basic skills

2. Develop control along useful DOFs

3. Make skill more complex

4. A good solution NOW!

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Page 60: December 4, Project

Modular RobotsLearning

ContributionsConclusion

Conclusion

Exorcising the Curse of Dimensionality

◮ PGRL works for low-DOF problems.

◮ Task-space dimension < state-space dimension.

◮ Learn f: task-space manifold → policy-space manifold.

Ben Pearre A Young Modular Robot’s Guide to Locomotion