Generating Informative Paths for Persistent Sensing in...

Generating Informative Paths for Persistent Sensing

in Unknown Environments

Daniel E. Soltero Mac Schwager Daniela Rus

Abstract— We present an online algorithm for a robot to

shape its path to a locally optimal configuration for collecting

information in an unknown dynamic environment. As the robot

travels along its path, it identifies both where the environment

is changing, and how fast it is changing. The algorithm then

morphs the robot’s path online to concentrate on the dynamic

areas in the environment in proportion to their rate of change.

A Lyapunov-like stability proof is used to show that, under our

proposed path shaping algorithm, the path converges to a locally

optimal configuration according to a Voronoi-based coverage

criterion. The path shaping algorithm is then combined with a

previously introduced speed controller to produce guaranteed

persistent monitoring trajectories for a robot in an unknown

dynamic environment. Simulation and experimental results with

a quadrotor robot support the proposed approach.

I. INTRODUCTION

Robots operating in dynamic and unknown environmentsare often faced with the problem of deciding where to goto get the most relevant information for their current task.Informative path planning addresses this problem. Given adynamic unknown environment, and a robot with a sensor tomeasure the environment, we want to find a path for the robotthat will maximize information gathering. To achieve this, therobot needs to do two things: 1) learn the structure of theenvironment by identifying the areas within the environmentthat are dynamic and the rate of change for these areas;2) compute a path which allows it to sense the dynamicareas. Such a path is referred to as an informative path.

In this paper we present a new online path shaping al-gorithm for generating closed informative paths in unknowndynamic environments. The key insight is to use a parameteradaptation law inspired by [1] to learn a function representingthe rate of change of each point in the environment. Therobot moves along the path, marking the areas it observesas dynamic or static, and learning the rate of change ofthe dynamic areas. While learning the environment model,the algorithm simultaneously modifies its path based on thismodel by changing its waypoints to optimize a cost functionrelated to the informativeness of the path. The cost functionattempts to place the path waypoints at the centroids of theirVoronoi cells, while also making sure the waypoints are nottoo far from one another. An example of the results of ourpath shaping algorithm can be seen in Figure 1.

This material is based upon work supported in part by ONR-MURI AwardN00014-09-1-1051, the NSF Graduate Research Fellowship Award 0645960& The Boeing Company.

D. E. Soltero and D. Rus are with the Computer Science and ArtificialIntelligence Laboratory, Massachusetts Institute of Technology, Cambridge,MA ([email protected], [email protected]). M. Schwager iswith the Mechanical Engineering Department, Boston University, Boston,MA ([email protected]).

(a) Iteration: 1 (b) Iteration: 30

(c) Iteration: 65 (d) Iteration: 95

(e) Iteration: 130 (f) Iteration: 255

Fig. 1: Simulated system during the path shaping phase. The path, shown asthe black line, connects all the waypoints, shown as blue circles. The pointsof interest are shown as purple dots. The simulated robot and its sensorfootprint are shown as a green dot and green circle, respectively.

Many important applications call for robots to move alonginformative paths in unknown dynamic environments. Forexample, our algorithm could be used to provide surveillanceof a city by allowing a robot to learn the regions wherecrime is frequently committed, and generate paths to visitthe crime-ridden areas more frequently. Our algorithm couldalso be used for a robot vacuum cleaning task, where we wantthe robot to visit locations that accumulate dust and avoidlocations that remain clean, or an estimation task, where wewant the robot to keep an updated mental model of realityby visiting dynamic locations more frequently than staticlocations.

2012 IEEE/RSJ International Conference onIntelligent Robots and SystemsOctober 7-12, 2012. Vilamoura, Algarve, Portugal

U.S. Government work not protected by U.S.copyright

2172

In this paper we apply our path shaping algorithm incombination with the speed control algorithm presentedin [2] in order to design an informative trajectory (bywhich we mean the path and the speed along the path)to provide persistent sensing of an environment. In thisapplication we wish for the robot, assumed to have a finitesensor footprint, to gather information so as to guarantee abound on the difference between the robot’s current modelof the environment and the actual state of the environmentfor all time and over all locations. Since its sensor hasa finite footprint, the robot cannot collect the data aboutall of the environment at once. As data about a dynamicregion becomes outdated, the robot must return to that regionrepeatedly to collect new data. More generally, if differentparts of the environment change at different rates, we wishto have the robot visit these areas in proportion to their ratesof change to ensure a bounded uncertainty in the estimation.The algorithm from [2] calculates the speed of the robotat each point along a given path in order to perform apersistent sensing task, i.e. to prevent the robot’s model ofthe environment from becoming too outdated. We use thepath morphing algorithm in this paper as the input to thespeed controller from [2] to produce informative trajectoriesonline with provable performance guarantees.

More specifically, the persistent sensing problem is definedin [2] as an optimization whose goal is to keep a timechanging environment, modeled as an accumulation function,as low as possible everywhere. The accumulation functiongrows where it is not covered by the robot’s sensor, indicatinga growing need to collect data at that location, and shrinkswhere it is covered by the robot’s sensor, indicating a de-creasing need for data collection. This accumulation functioncan be thought of as the amount of dust in a cleaning task, asthe difference between the robot’s mental model and realityin a mapping or estimation task, or as the chance of crimein a surveillance task. The goal of a persistent sensing taskis to maintain this accumulation function bounded for theentire environment and for all time. In this paper we describean algorithm for informative path planning that enables therobot to identify where and when to collect information.

The contributions of this paper are:• a provably stable adaptive controller for learning the

location of dynamic events in an environment, andcomputing an informative path for these events,

• a provably stable extension to the adaptive controller forcomputing informative paths to perform locally optimalpersistent sensing tasks,

• simulation and hardware implementation.

A. Relation to Previous Work

Most of the previous work in path planning focuses onreaching a goal while avoiding collision with obstacles,e.g. [3], or on computing an optimal path according to somemetric, e.g. [4]. Other works have focused on probabilisticapproaches to path planning, e.g. [5]. More relevant to ourwork is the prior work in adaptive path planning, e.g. [6], [7],which considers adapting the path to changes in the system’s

state. Even more relevant to our work is that of informativesensing, where the goal is to calculate paths that provide themost information for robots [8], provide the most informationwhile maintaining periodic connectivity in order for robotsto share information and synchronize [9], provide the mostinformation taking into account complex vehicle dynamicsand sensor limitations [10], and other variations. All of thementioned previous work in informative sensing, as withmost in path planning, treats the path planning problem asa search and/or recursive problem, with running times thatcan be very high for large systems, and some of them onlyproviding approximations of optimal solutions. In contrast,in this paper we took another approach to path planning; wetreat the problem as a continuous-time dynamical systemscontrol problem and use adaptive control tools to create anovel algorithm for computing informative paths.

The adaptive controller we use to solve the informativepath planning problem is based on Voronoi partitions. Re-lated work in Voronoi coverage includes [11], where theobjective was to design a sampling trajectory that minimizedthe uncertainty of an estimated random field at the end ofa time frame. A form of generalized Voronoi partitions wasused to solve this optimization problem. In our work, wealso use Voronoi partitions as the basis of our algorithms,but the field, although unknown, is not random. Also, weare not concerned with optimizing trajectories that minimizethe predictive variance, but rather we generate paths byoptimizing the location of waypoints defining the path, ac-cording to a Voronoi-based coverage criterion in an unknownenvironment. In that sense, this work builds upon [1], wherea group of agents were coordinated to place themselvesin static, locally optimal locations to sense an unknownenvironment. Voronoi partitions were used to position therobots, while a parameter adaptation law allowed the agentsto learn the environment model. We build upon this work byletting the waypoints in a single robot’s path define a Voronoidecomposition, and using a parameter adaptation law to learnboth where and how fast the environment is changing.

The persistent sensing concept motivating this work wasintroduced in [2], where a linear program was designed tocalculate the robot’s speed at each point along a path inorder for it to perform a persistent sensing task, i.e. maintainthe accumulation function bounded. The robot was assumedto have full knowledge of the environment, and was givena pre-designed path. In this paper we alleviate these twoassumptions by having the robot estimate the static/dynamicstructure of the environment, and use these estimates to shapeits path into a useful path. These two alleviations providea significant step towards persistent sensing in dynamicenvironments.

In Section II we set up the problem, present locationaloptimization tools, and present a basis function approxima-tion of the environment. Section III introduces the adaptivecontroller and proves stability of the system under thiscontroller. In Section IV, we introduce the controller exten-sion, designed to execute persistent sensing tasks. Finally, inSection V we provide simulated and hardware results.

2173

II. PROBLEM FORMULATION

We assume we are given a robot whose task is to sensean unknown dynamic environment while traveling along aclosed path consisting of a finite number of waypoints. Thegoal is for the robot to identify the areas within the environ-ment that are dynamic and compute a path which allowsit to sense these dynamic areas. A formal mathematicaldescription of the problem follows.

Let there be a robot, with position pr, traveling along afinite number n of waypoints in a convex, bounded areaQ ⇢ R2. An arbitrary point in Q is denoted q and theposition of the ith waypoint is denoted pi. Let {p1, . . . , pn}be the configuration of the path and let {V1, . . . , Vn} bethe Voronoi partitions of Q, with waypoint positions as thegenerator points, defined as

Vi = {q 2 Q : kq� pik kq� pjk, 8j 6= i},where k ·k denotes the l2-norm. We assume that the robot isable to compute the Voronoi partitions based on the waypointlocations, as is common in the literature [12], [13].

Since the path is closed, each waypoint i has a previouswaypoint i� 1 and next waypoint i+ 1 related to it, whichare referred to as the neighbor waypoints of i. Note thati + 1 = 1 for i = n, and i � 1 = n for i = 1. Once therobot reaches a waypoint, it continues to move to the nextwaypoint, in a straight line interpolation.

A sensory function, defined as a map � : Q 7! R�0

(where R�0 refers to non-negative scalars) determines theconstant rate of change of the environment at point q 2 Q.The function �(q) is not known by the robot, but the robotis equipped with a sensor to make a point measurement of�(pr) at its position pr.Remark II.1. The interpretation of the sensory function �(q)may be adjusted for a broad range of applications. It canbe any kind of weighting of importance for points q 2 Q.In this paper we treat it as a rate of change in a dynamicenvironment.

A. Locational Optimization

Let pi,j = pi � pj . Notice that pi,j = �pj,i. Buildingupon [1], we can formulate the cost incurred by the systemover the region Q as

H =

nX

i=1

Z

Vi

Ws

2

kq� pik2�(q)dq+

nX

i=1

Wn

2

kpi,i+1k2, (1)

where kq � pik can be interpreted as the unreliability ofmeasuring the sensory function �(q) when the robot is atpi, and kpi,i+1k can be interpreted as the cost of a waypointbeing too far away from a neighboring waypoint. Ws and Wn

are constant positive scalar weights assigned to the sensingtask and neighbor distance, respectively. Note that unreliablesensing and distance between neighboring waypoints areexpensive. A formal definition of informative path follows.Definition II.2 (Informative Path). An informative pathcorresponds to a set of waypoint locations that locallyminimize (1).

Next we define three properties analogous to mass-moments of rigid bodies. The mass, first mass-moment,and centroid of Vi are defined as MVi =

R

ViWs�(q)dq,

LVi =

R

ViWsq�(q)dq and CVi =

LViMVi

, respectively. Also,let ei = CVi � pi.

From locational optimization [14], and from differentiationunder the integral sign for Voronoi partitions [15] we have

@H@pi

= �MViei �Wnpi+1,i �Wnpi�1,i. (2)

An equilibrium is reached when @H@pi

= 0. Assigning to eachwaypoint dynamics of the form

pi = ui, (3)

where ui is the control input, we propose the followingcontrol law for the waypoints to converge to an equilibriumconfiguration:

ui =Ki(MViei + ↵i)

�i, (4)

where ↵i = Wnpi+1,i + Wnpi�1,i, �i = MVi + 2Wn andKi is a uniformly positive definite matrix.Remark II.3. �i > 0 has the nice effect of normalizingthe weight distribution between sensing and staying close toneighboring waypoints. Also, Ki could potentially be time-varying to improve performance.

B. Sensory Function ApproximationWe assume that the sensory function �(q) can be parame-

terized as an unknown linear combination of a set of knownbasis functions. That is,Assumption II.4 (Matching condition). 9a 2 Rm

�0 andK : Q 7! Rm

�0, where Rm�0 is a vector of non-negative en-

tries, such that

�(q) = K(q)Ta, (5)

where the vector of basis functions K(q) is known by therobot, but the parameter vector a is unknown.

Let a(t) be the robot’s approximation of the parametervector a. Then, ˆ�(q) = K(q)T a is the robot’s approximationof �(q). Building on this, we define the mass-momentapproximations as

ˆMVi =

Z

Vi

Wsˆ�(q)dq, ˆLVi =

Z

Vi

Wsqˆ�(q)dq,

ˆCVi =

ˆLVi

ˆMVi

.

Additionally, we can define a = a � a, and the sensoryfunction error, and mass-moment errors as

˜�(q) =

ˆ�(q)� �(q) = K(q)T a, (6)

˜MVi =

ˆMVi �MVi =

Z

Vi

WsK(q)T dq a, (7)

˜LVi =

ˆLVi � LVi =

Z

Vi

WsqK(q)T dq a, (8)

˜CVi =

˜LVi

˜MVi

. (9)

2174

Finally, in order to compress the notation, let Kr(t) and �r(t)be the value of the basis function vector and the value of �at the robot’s position pr(t), respectively.

III. ADAPTIVE CONTROLLER

We design an adaptive control law and prove that itcauses the path to converge to a locally optimal configurationaccording to (1), while causing the robot’s estimate of theenvironment change rates to converge to the real descriptionby integrating its sensory measurements along its trajectory.

Since the robot does not know �(q), but has an estimateˆ�(q), the control law from (4) becomes

ui =Ki(

ˆMVi ei + ↵i)

ˆ�i

, (10)

where

↵i = Wnpi+1,i +Wnpi�1,i, ˆ�i =ˆMVi + 2Wn,

ei =

ˆCVi � pi.

The parameter a is adjusted according to an adaptationlaw which is described next. Let

⇤ =

Z t

0w(⌧)Kr(⌧)Kr(⌧)

T d⌧, (11)

� =

Z t

0w(⌧)Kr(⌧)�r(⌧)d⌧, (12)

where w(t) is a positive constant scalar if t < ⌧w, and zerootherwise, and ⌧w is some positive time at which part of theadaptation shuts down to maintain ⇤ and � bounded. Let

b =nX

i=1

Z

Vi

WsK(q)(q� pi)T dq pi, (13)

˙apre = �b� �(⇤a� �), (14)

where � > 0 is the adaptation gain scalar. Since a(j) � 0, 8j(where a(j) denotes the jth element of a), we enforcea(j) � 0, 8j. We do this by using a projection law [1],

˙a = �(

˙apre � Iproj ˙apre), (15)

where � 2 Rm⇥m is a diagonal positive definite adaptationgain matrix, and the diagonal matrix Iproj is defined element-wise as

Iproj(j) =

8

>

<

>

:

0, if a(j) > 0,

0, if a(j) = 0 and ˙apre(j) � 0,

1, otherwise,(16)

where (j) denotes the jth element for a vector and the jth

diagonal element for a matrix.Theorem III.1 (Convergence Theorem). Under Assump-tion II.4, with waypoint dynamics specified by (3), controllaw specified by (10) and adaptive law specified by (15), wehave

(i) limt!1 k ˆMVi(t)ei(t) + ↵i(t)k = 0 8i 2 {1, . . . , n},

(ii) limt!1 k˜�r(⌧)k = 0 8⌧ | w(⌧) > 0.

Proof. We define a Lyapunov-like function based on therobot’s path and environment estimate, and use Barbalat’slemma to prove asymptotic stability of the system to a locallyoptimal equilibrium.

Let

V = H+

1

2

aT��1a. (17)

Taking the time derivative of V , we obtain

˙V =

nX

i=1

@H@pi

T

pi + aT��1˙a

=

nX

i=1

�(MViei + ↵r)T pi + aT��1

˙a. (18)

From (7), (8), (9), it is easy to check that

LVi = MViCVi = MViˆCVi +

˜MVi(ˆCVi � ˜CVi).

Plugging this into (18),

˙V =

nX

i=1

�(MVi ei + ↵i)T pi + (

˜LVi � ˜MViˆCVi)

T pi

+aT��1˙a.

Using (7), we have

˙V =

nX

i=1

�(

ˆMVi ei + ↵i)T pi + (

˜LVi � ˜MVipi)T pi

+aT��1˙a.

Substituting the dynamics specified by (3) and control lawspecified by (10), we obtain

˙V =

nX

i=1

� 1

ˆ�i

(

ˆMVi ei + ↵i)TKi(

ˆMVi ei + ↵i)

+

nX

i=1

(

˜LVi � ˜MVipi)Tpi + aT��1

˙a.

Using (7) and (8),

˙V =

nX

i=1

� 1

ˆ�i

(


ˆMVi ei + ↵i)

+aTnX

i=1

Z

Vi

WsK(q)(q� pi)T dq pi

+aT��1˙a.

Plugging in the adaptation law from (15), (11) and (12),

˙V =

nX

i=1

� 1

ˆ�i

(


ˆMVi ei + ↵i)

��

Z t

0w(⌧)(˜�r(⌧))

2d⌧ � aT Iproj ˙apre. (19)

Denote the three terms in (19) as ⇠1(t), ⇠2(t) and ⇠3(t), sothat ˙V(t) = ⇠1(t)+⇠2(t)+⇠3(t). Notice that ⇠1(t) 0 sinceKi is uniformly positive definite and ˆ�i > 0, ⇠2(t) 0

since it is the negative integral of a squared quantity, andit was proven in [1] that ⇠3(t) 0. Now consider the time

2175

integral of each of these three terms,R t0 ⇠k(⌧)d⌧ , k = 1, 2, 3.

Since each of the terms is non-positive,R t0 ⇠k(⌧)d⌧ 0, 8k,

and since V > 0, each integral is lower bounded byR t0 ⇠k(⌧)d⌧ � �V0, where V0 is the initial value of V . There-

fore, these integrals are lower bounded and non-increasing,and hence limt!1

R t0 ⇠k(⌧)d⌧ exists and is finite for all k.

Furthermore, it was shown in [1] (Lemma 1) that ˙⇠1(t) and˙⇠2(t) are uniformly bounded, therefore ⇠1(t) and ⇠2(t) areuniformly continuous. Hence, by Barbalat’s Lemma [16],⇠1(t) ! 0 and ⇠2(t) ! 0. This implies (i) and (ii).

Remark III.2. Property (i) from Theorem III.1 implies thatthe path will reach a locally optimal configuration for sens-ing, i.e. an informative path, where the waypoints reach astable balance between providing good sensing locations andbeing close to their neighbor waypoints. Additionally, thisbalance can be tuned to the desired behavior (short paths vs.good coverage) by proper selection of Wn and Ws.Remark III.3. Property (ii) from Theorem III.1 implies thatthe sensory function estimate error, ˜�(q), will converge tozero for all points on the robot’s trajectory with positiveweight w(t), but not necessarily for all the environment. Thismeans that the robot will learn the true sensory function forthe environment if its trajectory is rich enough while theweight is positive. Since it is crucial for the sensory functionestimate to approach the true sensory function for all theenvironment, the initial waypoint locations can be designedso the robot initially travels most of the environment indynamic unknown environments (see Figure 2).

This adaptive controller is applicable to any sensing taskdone by a robot without full knowledge of the environmentstructure. In the next section we present an extension that isspecific to persistent sensing tasks.

IV. ADAPTIVE CONTROLLER FOR PERSISTENT SENSING

The main motivation behind this work is to generate aninformative path that can be used by a robot tasked withpersistent sensing. We now extend the adaptive controllerfrom Section III and prove that it enables a robot with afinite sensor footprint to learn the environment dynamics, inthe form of growth rates of the environment’s accumulationfunction, and to generate an informative path to follow andsense the growing accumulation function in order to maintainit bounded.

The robot is assumed to be equipped with a sensor with afinite footprint. Examples of sensors with finite footprint area camera for a surveillance task and a vacuum cleaner fora cleaning task. Although any footprint shape can be used,we use a constant circular footprint for simplicity, definedas F (pr) = {q 2 Q : kq� prk ⇢}, when the robot is atlocation pr, where ⇢ is a constant positive scalar.

We would like to use the stability criterion for a persistentsensing task given the speed of the robot at each pointalong the path, referred to as the speed profile, definedin [2], and plug it into the adaptive controller, such that thecontrol action increases the stability margin of the persistentsensing task through time. However, since the robot does not

know the environment, but has an estimate of it, it uses theestimated version of this stability criterion, given by

ˆ�(q, t)

c(q)T (t)� ⌧c(q, t) = s(q, t) < 0,

8q s.t. ˆ�(q, t) > 0, (20)

where ˆ�(q, t) (the estimated sensory function) is the esti-mated rate at time t at which the environment’s accumulationfunction grows at point q (referred to as growth rate), theconstant scalar c(q) is the rate at which the accumulationfunction shrinks (is consumed) when the robot’s sensor iscovering point q, and c(q) > �(q), 8q is necessary for astable persistent task to be feasible. Also, T (t) is the time ittakes the robot to complete the path at time t, and ⌧c(q, t)is the time the robot’s sensor covers point q along the pathat time t. These last two quantities are calculated using thespeed profile for the path at time t and the robot’s sensorfootprint F (pr). The estimated stability margin of the systemis ˆS(t) = �(maxq s(q, t)). A stable persistent task is onein which S (the true version of ˆS) is positive, which meansthe robot is able to maintain the environment’s accumulationfunction bounded at all points q. Note that only points qthat satisfy ˆ�(q, t) > 0 are considered in a persistent sensingtask since it is not necessary to persistently sense a point thathas no sensory interest. Points that satisfy this condition arereferred to as points of interest.

In [2], a linear program was given which can calculate thespeed profile for the path at time t that maximizes ˆS(t) (orS(t) for ground-truth). With the speed profiles obtained withthis linear program, and using (20), we can formulate a newcontroller to drive the robot’s path in a direction to performstable persistent sensing tasks. Hence, from this point on,we assume that this maximizing speed profile is known andused to obtain s(q, t), 8q, 8t.

Let the waypoints have new dynamics of the form

pi = Iiui, (21)

where ui is defined in (10),

Ii =

(

0, if @S@pi

Tui < 0 and t� tiu > ⌧dwell,

1, otherwise,(22)

⌧dwell is a design parameter, and tiu is the most recent time atwhich Ii switched from zero to one (switched “up”). Equa-tions (21) and (22) look at how each waypoint movementaffects the stability margin through time, and ensure that thisquantity does not decreases.Remark IV.1. For positive ✏ ! 0, @S

@pi(t) is not always

defined when argmaxq s(q, t � ✏) 6= argmaxq s(q, t + ✏).In such cases @S

@pi(t) refers to @S

@pi(t+ ✏).

Theorem IV.2 (Convergence Theorem for Persistent Sensing).Under Assumption II.4, with waypoint dynamics specified by(21), control law specified by (10), and adaptive law specifiedby (15), we have

(i) limt!1 Ii(t)k ˆMVi(t)ei(t) + ↵i(t)k = 0,8i 2 {1, . . . , n},

2176

(ii) limt!1 k˜�r(⌧)k = 0, 8⌧ | w(⌧) > 0.

Proof. We define a Lyapunov-like function based on therobot’s path and environment-rates estimate, and proveasymptotic stability of the system to a locally optimalequilibrium. However, contrary to the previous section, abit more work is needed to prove property (i) due to thepiecewise differentiability of the new Lyapunov-like functioncaused by the new waypoint dynamics.

Let Vf be the new Lyapunov-like function, and let Vf= V

from (17). Then, following the procedure from Section III,but with pi defined by (21), we get

˙Vf=

nX

i=1

� 1

ˆ�i

(

ˆMVi ei + ↵i)TIiKi(

ˆMVi ei + ↵i)

��

Z t

0w(⌧)(˜�r(⌧))

2d⌧ � aT Iproj ˙apre. (23)

Using a similar analysis as in Section III, denote thethree terms in (23) as �⇠1(t), �⇠2(t) and �⇠3(t), sothat ˙Vf

(t) = �⇠1(t)� ⇠2(t)� ⇠3(t). In Section III weshowed that ⇠1(t) � 0, ⇠2(t) � 0, and ⇠3(t) � 0. Addi-tionally, the time integral of each of these three terms,limt!1

R t0 ⇠k(⌧)d⌧ , exists and is finite for all k. It was

shown in [1] (Lemma 1) that ˙⇠2(t) is uniformly bounded,therefore ⇠2(t) is uniformly continuous. Hence, by Barbalat’sLemma, ⇠2(t) ! 0. This implies (ii).

Now, as proven before,R10 ⇠1(⌧)d⌧ exists and is finite,

and we want to show that limt!1 ⇠1(t) = 0. Let ⇠i1be defined such that ⇠1 =

Pni=1 ⇠

i1. Let us assume that

limt!1 ⇠i1(t) 6= 0, that is, 8t 9tj � t, ✏ > 0, such that⇠i1(tj) � ✏. Let {tj}1j=1 be the infinite sequence of tj’sseparated by more than 2⌧dwell such that ⇠i1(tj) � ✏. Thatis, |tj � tj0 | > 2⌧dwell, 8j 6= j0 and tj , tj0 2 {tj}1j=1. Since,from [1] (Lemma 1), ˙⇠i1(t) is uniformly bounded by somevalue B when Ii = 1 (i.e. | ˙⇠i1(t)| B), and whenever Ii = 1,it remains with this value for at least ⌧dwell, then we have that8tj 2 {tj}1j=1,

Z tj+⌧dwell

tj�⌧dwell

⇠i1(⌧)d⌧ � ✏� > 0, (24)

where

� = min{ ✏

2B,⌧dwell

2

} > 0. (25)

ThenZ 1

0⇠i1(⌧)d⌧ �

X

tj2{tj}1j=1

Z tj+⌧dwell

tj�⌧dwell

⇠i1(⌧)d⌧

�X

tj2{tj}1j=1

✏�, (26)

which is infinite, and contradicts the already proven fact thatR10 ⇠1(⌧)d⌧ =

Pni=1

R10 ⇠i1(⌧)d⌧ exists and is finite. There-

fore, by contradiction, limt!1 ⇠i1(t) = 0, hence property (i)holds.

Remark IV.3. The stability margin can theoretically worsenwhile Ii, for some i, cannot switch from one to zero because

(a) Learning iteration: 0

(b) Learning iteration: 75

(c) Learning iteration: 210

Fig. 2: Simulated system during the learning phase. Left: 1) the path, shownas the black line, connects all the waypoints, shown as blue circles; 2)the points of interest in the environment are shown as purple dots; 3)the simulated robot and its sensor footprint are shown as a green dot andgreen circle, respectively. Right: the environment growth rates, where thetranslucent data represents the true environment growth rates and the solid-colored data represents the estimated growth rates.

0 100 200 3000

0.5

1

Learning iterations

Inte

g.

Pa

ram

. E

rro

r

Fig. 3: Integral parameter error

0 50 100 150 200

100

200

300

400

500

Learning iterations

Ly

ap

un

ov

!li

ke

Fu

nc

tio

n

Fig. 4: Lyapunov-like function

it is waiting for t � tiu > ⌧dwell. However, ⌧dwell can beselected arbitrarily small and, in practice, any computer willenforce a ⌧dwell due to discrete time steps. Therefore, it isnot a practical restriction. As a result, intuitively, (i) fromTheorem IV.2 means that limt!1 k ˆMVi(t)ei(t)+↵i(t)k = 0

only if this helps the persistent sensing task. Otherwiselimt!1 Ii(t) = 0, meaning that the persistent task will notbenefit if waypoint i moves.

2177

0 200 400 600 8000

0.05

0.1

0.15

Iterations

Mean

Po

sit

ion

Err

or

(m)

EstimatedTrue

Fig. 5: Mean waypoint position error

0 200 400 600 800

43.5

44

44.5

45

Iterations

Lyap

un

ov!

like F

un

cti

on

Fig. 6: Lyapunov-like function

0 200 400 600 800!0.1

!0.05

0

0.05

Iterations

Sta

bilit

y M

arg

in

Estimated

True

Fig. 7: Stability margin for persistent sensing task

V. SIMULATION, IMPLEMENTATION AND RESULTS

A. Numerical Simulation

The adaptive controller for persistent tasks was simulatedin a MATLAB environment for many test cases. Here wepresent a case for n = 71 waypoints. A fixed-time stepnumerical solver was used with a time step of 0.01 seconds,and ⌧dwell = 0.009. The region Q was taken to be theunit square. The sensory function �(q) was parametrizedas a Gaussian network with 25 truncated Gaussians, i.e.K = [K(1) · · · K(25)]

T , where

G(j) =

1

�p2⇡

expn

� (q� µj)2

2�2

o

,

Gtrunc =

1

�p2⇡

expn

� ⇢trunc2

2�2

o

,

K(j) =

(

G(j)�Gtrunc, if kq� µjk < ⇢trunc,

0, otherwise,(27)

� = 0.18 and ⇢trunc = 0.1. The unit square was dividedinto an even 5 ⇥ 5 grid and each µj was chosen sothat each of the 25 Gaussians was centered at its cor-responding grid square. The parameters were chosen asa(6) = 20, a(7) = 10, a(8) = 16, a(14) = 10, a(17) = 16,and a(j) = 0 otherwise. The environment growth rates cre-ated with these parameters can be seen in Figure 2c. Theestimated parameter a was initialized element-wise to 5,and ⇤ and � were initialized to zero. The parameters forthe controller were Ki = 30, 8i, � = identity, � = 500,Wn = 3, Ws = 50, w = 10, and ⇢ = 0.05. The spatialintegrals were approximated by discretizing each Voronoiregion into a 7 ⇥ 7 grid and summing contributions of theintegrand over the grid. Voronoi regions were computedusing a decentralized algorithm similar to the one in [12].The environment was discretized into a 12⇥12 grid and onlypoints in this grid that satisfied ˆ�(q) > 0 were used as pointsof interest in (20) . This discretization is only used in (20),and by using this discretized version of the environment, therunning time for experiments is greatly reduced. For moresensitive systems, this grid can be refined.

The initial path was designed to “zig-zag” across theenvironment (see Figure 2) to provide a rich initial trajectoryfor the robot to sample the environment. We first allowedthe robot to go through the initial path without reshaping itso that it could sample the space and learn the distributionof sensory information in the environment. Therefore, wepresent results in two separate phases: 1) learning phase, and

2) path shaping phase. The learning phase corresponds to therobot traveling through its full path once, without reshapingit, in order to learn the environment rates of change. The pathshaping phase corresponds to when (21) is used to reshapethe path into an informative path, and starts when the learningphase is done. In the path shaping phase, w = 0.

1) Learning Phase: The robot travels its entire pathonce, measuring the environment growth rates as it travelsand using the adaptation law from (15) to estimate theserates. This process can be seen in Figure 2. As the robottravels its path, the adaptation law causes ˜�(q) ! 0,8q 2 Q, as can be seen from Figures 2a, 2b, 2c, wherethe robot’s estimate of the environment growth rates (ˆ�(q),solid-colored data) converges to the real environment growthrates (�(q), translucent data) for all of the space. Thismeans that the robot’s trajectory was rich enough to generateaccurate estimates of the environment structure. Figure 3shows that limt!1

R t0 w(⌧)(˜�r(⌧))2d⌧ = 0, in accordance

with (ii) from Theorem IV.2. Finally, for this learning phase,we see in Figure 4 that the Lyapunov-like function Vf ismonotonically non-increasing, which supports our theory.

2) Path Shaping Phase: Once the robot travels throughits path once (and learns the environment growth rates),the controller from Section IV is activated. We can seehow the path evolves under this controller in Figure 1.The robot learns the location of the points of interest (withpositive growth rate) in the learning phase and, consequently,shapes its path to cover these points while improving thestability margin of the persistent sensing task. After 65iterations (Figure 1c), the path already tends to go throughall points of interest, while avoiding all other points. At 255iterations (Figure 1f), the path clearly only goes throughpoints of interest, enabling the robot to perform a locallyoptimal persistent sensing task. The path at 800 iterations isapproximately the same to the one at 255 iterations.

Figure 5 shows the true and estimated mean way-point position errors, where the estimated error refersto the quantity Ii(t)k ˆMVi(t)ei(t) + ↵i(t)k. As shown,limt!1 Ii(t)k ˆMVi(t)ei(t) + ↵i(t)k = 0, in accordance to(i) from Theorem IV.2. Figure 6 shows the Lyapunov-likefunction Vf monotonically non-increasing during this pathmorphing phase and settling to a local minimum, meaningthat the path is driven to a locally optimal configuration thatis helpful for the persistent sensing task. The initial value ofthis function in the path morphing phase is the final valueof the function in the learning phase.

2178

(a) Iteration 0 (b) Iteration 50 (c) Iteration 375

Fig. 8: Hardware implementation of the system using a quadrotor robot. Three snapshots of the path shaping phase at different iteration values are shown.The path, shown as the black line, connects all the waypoints. The points of interest in the environment are shown as purple dots. The robot is the green-litquadrotor, and its sensor footprint is represented by the green circle under the robot’s position.

Finally, Figure 7 shows the persistent sensing task’s sta-bility margin evolving through time, using the speed profilesfrom [2]. The estimated stability margins is shown, as wellas the true value for ground-truth. The stability marginstarts off with a negative value, indicating that the persistentsensing task is initially unstable, and increases through time,indicating that the path is morphing to make the persistentsensing task “more stable”. By the end of the simulation, thepersistent sensing task is stable, i.e. the stability margin ispositive, and the robot achieves an informative trajectory. Thediscrete jumps in the stability margin are due to the discretetime steps, and can be minimized by shortening these steps.

B. ImplementationThe adaptive controller for persistent tasks was imple-

mented with a quadrotor robot for the same simulatedenvironment as described earlier and for a path consisting of71 waypoints. The accompanying video submission presentsthe results of this implementation. The path shaping phasewas run for 375 iterations. Results for the implementationwere practically identical to simulated results and are omit-ted to avoid redundancy. Figure 8 shows snapshots of theimplementation at different iteration values. Figure 8c showsthe final informative path, which is approximately the sameto that in Figure 1f from the simulations.

VI. CONCLUSION

This paper uses a Voronoi-based coverage approach, build-ing upon previous work in [1], to generate an adaptivecontroller for a robot to learn the rates at which the environ-ment changes, through parameter estimation, and shape itspath into an informative path, i.e. a locally optimal path forsensing dynamic regions in the environment. A Lyapunov-like proof showed that the controller will shape the pathto a locally optimal configuration, and drive the estimatedparameter error to zero, assuming the robot’s trajectory isrich enough. An extension of the adaptive controller wasdesigned and proven to drive the path to a locally optimalconfiguration that is beneficial to a persistent sensing task.The adaptive control law was simulated and implementedfor a robot moving through 71 waypoints, generating resultsthat support the theory, and producing stable persistent tasksbased only on the robot’s estimate of the environment.

REFERENCES

[1] M. Schwager, D. Rus, and J.-J. Slotine, “Decentralized, adaptivecoverage control for networked robots,” The International Journalof Robotics Research, vol. 28, no. 3, pp. 357–375, 2009. [Online].Available: http://ijr.sagepub.com/content/28/3/357.abstract

[2] S. L. Smith, M. Schwager, and D. Rus, “Persistent robotic tasks:Monitoring and sweeping in changing environments,” Robotics, IEEETransactions on, vol. PP, no. 99, pp. 1 –17, 2011.

[3] A. Piazzi, C. Guarino Lo Bianco, and M. Romano, “⌘3-splines for thesmooth path generation of wheeled mobile robots,” Robotics, IEEETransactions on, vol. 23, no. 5, pp. 1089 –1095, oct. 2007.

[4] K. Kyriakopoulos and G. Saridis, “Minimum jerk path generation,”in Robotics and Automation, 1988. Proceedings., 1988 IEEE Interna-tional Conference on, apr 1988, pp. 364 –369 vol.1.

[5] L. Kavraki, P. Svestka, J.-C. Latombe, and M. Overmars, “Probabilisticroadmaps for path planning in high-dimensional configuration spaces,”Robotics and Automation, IEEE Transactions on, vol. 12, no. 4, pp.566 –580, aug 1996.

[6] C. Cunningham and R. Roberts, “An adaptive path planning algorithmfor cooperating unmanned air vehicles,” in Robotics and Automation,2001. Proceedings 2001 ICRA. IEEE International Conference on,vol. 4, 2001, pp. 3981 – 3986 vol.4.

[7] A. Stentz, “Optimal and efficient path planning for partially-knownenvironments,” in Robotics and Automation, 1994. Proceedings., 1994IEEE International Conference on, may 1994, pp. 3310 –3317 vol.4.

[8] A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficientinformative sensing using multiple robots,” J. Artif. Int. Res.,vol. 34, no. 1, pp. 707–755, Apr. 2009. [Online]. Available:http://dl.acm.org/citation.cfm?id=1622716.1622735

[9] G. Hollinger and S. Singh, “Multi-robot coordination with periodicconnectivity,” in Robotics and Automation (ICRA), 2010 IEEE Inter-national Conference on, may 2010, pp. 4457 –4462.

[10] D. Levine, B. Luders, and J. P. How, “Information-rich path plan-ning with general constraints using rapidly-exploring random trees,”in AIAA Infotech@Aerospace Conference, Atlanta, GA, April 2010,(AIAA-2010-3360).

[11] R. Graham and J. Cortes, “Adaptive information collection by roboticsensor networks for spatial estimation,” Automatic Control, IEEETransactions on, vol. PP, no. 99, p. 1, 2011.

[12] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control formobile sensing networks,” Robotics and Automation, IEEE Transac-tions on, vol. 20, no. 2, pp. 243 – 255, april 2004.

[13] S. Salapaka, A. Khalak, and M. Dahleh, “Constraints on locationaloptimization problems,” in Decision and Control, 2003. Proceedings.42nd IEEE Conference on, vol. 2, dec. 2003, pp. 1741 – 1746 Vol.2.

[14] Z. Drezner, “Facility location: A survey of applications and methods,”in Springer Series in Operations Research. New York: Springer-Verlag, 1995.

[15] L. C. A. Pimenta, M. Schwager, Q. Lindsey, V. Kumar, D. Rus,R. C. Mesquita, and G. A. S. Pereira, “Simultaneous coverage andtracking (scat) of moving targets with robot networks,” in Proceedingsof the Eighth International Workshop on the Algorithmic Foundationsof Robotics (WAFR 08), December 2008.

[16] P. A. Ioannou and J. Sun, Robust Adaptive Control. Englewood Cliffs,NJ: Prentice-Hall, 1996.

2179

Generating Informative Paths for Persistent Sensing in...

Documents

Transcript of Generating Informative Paths for Persistent Sensing in...