Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

7
Receding Horizon Stochastic Control Algorithms for Sensor Management Darin Hitchings and David A. Casta˜ on Abstract— The increasing use of smart sensors that can dynamically adapt their observations has created a need for algorithms to control the information acquisition process. While such problems can usually be formulated as stochastic control problems, the resulting optimization problems are complex and difficult to solve in real-time applications. In this paper, we consider sensor management problems for sensors that are trying to find and classify objects. We propose alternative approaches for sensor management based on receding horizon control using a stochastic control approximation to the sensor management problem. This approximation can be solved using combinations of linear programming and stochastic control techniques for partially observed Markov decision problems in a hierarchical manner. We explore the performance of our proposed receding horizon algorithms in simulations using heterogeneous sensors, and show that their performance is close to that of a theoretical lower bound. Our results also suggest that a modest horizon is sufficient to achieve near- optimal performance. I. I NTRODUCTION Recent advances in embedded computing have introduced a new generation of sensors that have the capability of adapting their sensing dynamically in response to collected information. For instance, unmanned aerial vehicles (UAVs) have multiple sensors —radar and electro-optical cameras —which can dynamically change their fields of view and measurement modes. These advances have created a need for a commensurate theory of sensor management (SM) and control to ensure that relevant information is collected for the mission of the sensor system given the available sensor resources. There are numerous applications involving surveillance, diagnosis and fault identification that require such control. One of the earliest examples of SM arose in the context of Search, with applications to anti-submarine warfare [1]. Sensors had the ability to move spatially and allocate their search effort over time and space. Most of the early work on search theory focused on open-loop search plans rather than feedback control of search trajectories [2]. Extensions of search theory to problems requiring adaptive feedback strategies have been developed in some restricted contexts [3]. Adaptive SM has its roots in the field of statistics, where Bayesian experiment design was used to configure subse- quent experiments based on observed information. Wald [4], [5] considered sequential hypothesis testing with costly ob- servations. Lindley [6] and Kiefer [7] expanded the concepts to include variations in potential measurements. Chernoff [8] This work was supported by a grant from AFOSR. The authors are with the Dept of Electrical & Computer Eng., Boston University, [email protected], [email protected] and Fedorov [9] used Cramer-Rao bounds for selecting se- quences of measurements for nonlinear regression problems. Most of the strategies proposed for Bayesian experiment design involve single-step optimization criteria, resulting in greedy or myopic strategies that optimize bounds on the expected performance after the next experiment. Other approaches to adaptive SM using single-stage optimization have been proposed using alternative information theoretic measures [10], [11]. Feedback control approaches to SM that consider opti- mization over time have also been explored. Athans [12] considered a two-point boundary value approach to control- ling the error covariance in linear estimators by choosing the measurement matrices. Multi-armed bandit formulations have been used to control individual sensors in applications related to target tracking [13], [14]. Such approaches are restricted to single-sensor control, selecting among individual subproblems to measure, in order to obtain solutions using Gittins indices [15], [16]. Approximate dynamic program- ming (DP) techniques have also been proposed using ap- proximations to the optimal cost-to-go based on information theoretic measures evaluated using Monte Carlo techniques [17], [18]. A good overview of these techniques is available in [19]. The above approaches for dynamic feedback control are limited in application to problems with a small number of sensor-action choices and simple constraints because the algorithms must enumerate and evaluate the various control actions. In [20], combinatorial optimization techniques are integrated into a DP formulation to obtain approximate stochastic dynamic programming (SDP) algorithms that ex- tend to large numbers of sensor actions. Subsequent work in [21] derived an SDP formulation using partially ob- served Markov decision processes (POMDPs) and obtained a computable lower bound to the achievable performance of feedback strategies for complex multi-sensor management problems. The lower bound was obtained by a convex relaxation of the original combinatorial POMDP using mixed strategies and averaged constraints. However, the results in [21] do not specify algorithms with performance close to the lower bound. In this paper, we develop and implement algorithms for the efficient computation of adaptive SM strategies for complex problems involving multiple sensors with different observa- tion modes and large numbers of objects. The algorithms are based on using the lower bound formulation from [21] as an objective in a receding horizon (RH) optimization problem and developing techniques for obtaining feasible decisions from the mixed strategy solutions. The resulting

Transcript of Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Page 1: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Receding Horizon Stochastic Control Algorithms for Sensor Management

Darin Hitchings and David A. Castanon

Abstract— The increasing use of smart sensors that candynamically adapt their observations has created a need foralgorithms to control the information acquisition process. Whilesuch problems can usually be formulated as stochastic controlproblems, the resulting optimization problems are complex anddifficult to solve in real-time applications. In this paper, weconsider sensor management problems for sensors that aretrying to find and classify objects. We propose alternativeapproaches for sensor management based on receding horizoncontrol using a stochastic control approximation to the sensormanagement problem. This approximation can be solved usingcombinations of linear programming and stochastic controltechniques for partially observed Markov decision problemsin a hierarchical manner. We explore the performance of ourproposed receding horizon algorithms in simulations usingheterogeneous sensors, and show that their performance isclose to that of a theoretical lower bound. Our results alsosuggest that a modest horizon is sufficient to achieve near-optimal performance.

I. INTRODUCTION

Recent advances in embedded computing have introduceda new generation of sensors that have the capability ofadapting their sensing dynamically in response to collectedinformation. For instance, unmanned aerial vehicles (UAVs)have multiple sensors —radar and electro-optical cameras—which can dynamically change their fields of view andmeasurement modes. These advances have created a needfor a commensurate theory of sensor management (SM)and control to ensure that relevant information is collectedfor the mission of the sensor system given the availablesensor resources. There are numerous applications involvingsurveillance, diagnosis and fault identification that requiresuch control.

One of the earliest examples of SM arose in the contextof Search, with applications to anti-submarine warfare [1].Sensors had the ability to move spatially and allocate theirsearch effort over time and space. Most of the early workon search theory focused on open-loop search plans ratherthan feedback control of search trajectories [2]. Extensionsof search theory to problems requiring adaptive feedbackstrategies have been developed in some restricted contexts[3].

Adaptive SM has its roots in the field of statistics, whereBayesian experiment design was used to configure subse-quent experiments based on observed information. Wald [4],[5] considered sequential hypothesis testing with costly ob-servations. Lindley [6] and Kiefer [7] expanded the conceptsto include variations in potential measurements. Chernoff [8]

This work was supported by a grant from AFOSR.The authors are with the Dept of Electrical & Computer Eng., Boston

University, [email protected], [email protected]

and Fedorov [9] used Cramer-Rao bounds for selecting se-quences of measurements for nonlinear regression problems.Most of the strategies proposed for Bayesian experimentdesign involve single-step optimization criteria, resultingin greedy or myopic strategies that optimize bounds onthe expected performance after the next experiment. Otherapproaches to adaptive SM using single-stage optimizationhave been proposed using alternative information theoreticmeasures [10], [11].

Feedback control approaches to SM that consider opti-mization over time have also been explored. Athans [12]considered a two-point boundary value approach to control-ling the error covariance in linear estimators by choosingthe measurement matrices. Multi-armed bandit formulationshave been used to control individual sensors in applicationsrelated to target tracking [13], [14]. Such approaches arerestricted to single-sensor control, selecting among individualsubproblems to measure, in order to obtain solutions usingGittins indices [15], [16]. Approximate dynamic program-ming (DP) techniques have also been proposed using ap-proximations to the optimal cost-to-go based on informationtheoretic measures evaluated using Monte Carlo techniques[17], [18]. A good overview of these techniques is availablein [19].

The above approaches for dynamic feedback control arelimited in application to problems with a small numberof sensor-action choices and simple constraints because thealgorithms must enumerate and evaluate the various controlactions. In [20], combinatorial optimization techniques areintegrated into a DP formulation to obtain approximatestochastic dynamic programming (SDP) algorithms that ex-tend to large numbers of sensor actions. Subsequent workin [21] derived an SDP formulation using partially ob-served Markov decision processes (POMDPs) and obtaineda computable lower bound to the achievable performance offeedback strategies for complex multi-sensor managementproblems. The lower bound was obtained by a convexrelaxation of the original combinatorial POMDP using mixedstrategies and averaged constraints. However, the results in[21] do not specify algorithms with performance close to thelower bound.

In this paper, we develop and implement algorithms for theefficient computation of adaptive SM strategies for complexproblems involving multiple sensors with different observa-tion modes and large numbers of objects. The algorithmsare based on using the lower bound formulation from [21]as an objective in a receding horizon (RH) optimizationproblem and developing techniques for obtaining feasibledecisions from the mixed strategy solutions. The resulting

Page 2: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

algorithms are scalable to large numbers of tasks, andsuitable for real-time SM. We also extend the model of [21]to incorporate search actions in addition to classification.We evaluate alternative approaches for obtaining feasibledecision strategies, and evaluate the resulting performance ofthe RH algorithms using multi-sensor simulations. Our sim-ulation results demonstrate that our RH algorithms achieveperformance comparable to the predicted lower bound of [21]and shed insight into the relative value of different strategiesfor partitioning sensor resources either geographically or bysensor specialization.

The rest of this paper is organized as follows: Section IIdescribes the formulation of the stochastic SM problem.Section III provides an example of the column generationtechnique for generating mixed strategies for SM. Section IVdiscusses how we create feasible, sequenced sensor schedulesfrom these mixed strategies. Section V documents our sim-ulation results for various scenarios. Section VI summarizesour results and discusses areas for future work.

II. PROBLEM FORMULATION AND BACKGROUND

The problem formulation is an extension of the POMDPformulation presented in [21]. Assume that there are a finitenumber of locations 1, . . . , N , each of which may have anobject with a given type, or which may be empty. Assumethat there is a set of S sensors, each of which has multiplesensor modes, and that each sensor can observe one and onlyone location at each time with a selected mode.

Let xi ∈ {0, 1, . . . , D} denote the state of location i,where xi = 0 if location i is unoccupied, and otherwisexi = k > 0 indicates location i has an object of typek. Let πi(0) ∈ <D+1 be a discrete probability distributionover the possible states for the ith location for i = 1, . . . , Nwhere D ≥ 2. Assume additionally that the random variablesxi, i = 1, . . . , N are mutually independent.

There are s = 1, . . . , S sensors, each of which has m =1, . . . ,Ms possible modes of observation. We assume thereis a series of T discrete decision stages where sensors canselect which location to measure, where T is large enoughso that all of the sensors can use their available resources.At each stage, each sensor can choose to employ one andonly of its modes on a single location to collect a noisymeasurement concerning the state xi at that location. Eachsensor s has a limited set of locations that it can observe,denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor sat stage t is a pair:

us(t) = (is(t),ms(t)) (1)

consisting of a location to observe, is ∈ Os, and a mode forthat observation, ms.

Sensor measurements are modeled as belonging to a finiteset y ∈ {1, . . . , Ls}. The likelihood of the measured valueis assumed to depend on the sensor s, sensor mode m,location i and on the true state at the location xi but noton the states of other locations. Denote this likelihood asP (y|xi, i, s, m). We assume that this likelihood is time-invariant, and that the random measurements yi,s,m(t) are

conditionally independent of other measurements yj,σ,n(τ)given the location states xi, xj for all sensor modes m,nprovided i 6= j or τ 6= t.

Each sensor has a limited quantity of Ri resources avail-able for measurements. Associated with the use of mode mby sensor s on location i is a resource cost rs(us(t)) to usethis mode, representing power or some other type of resourcerequired to use this mode from this sensor.

T−1∑t=0

rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2)

This is a hard constraint for each realization of observationsand decisions.

Let I(t) denote the sequence of past sensing actions andmeasurement outcomes up to and including stage t− 1:

I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t− 1}Under the assumption of conditional independence of mea-surements and independence of individual states at each lo-cation, the conditional probability of (x1, . . . , xN ) given I(t)can be factored as a product of belief states at each location.Denote the belief state at location i as πi(t) = p(xi|I(t)).When a sensor measurement is taken, the belief state is up-dated according to Bayes’ Rule. A measurement of location iwith the sensor-mode combination us(t) = (i, m) at stage tthat generates observable y(t) updates the belief vector as:

πi(t + 1) =diag{P (y(t)|xi = j, i, s,m)}πi(t)

1T diag{P (y(t)|xi = j, i, s, m)}πi(t)(3)

where 1 is the D +1 dimensional vector of all ones. Eq. (3)captures the relevant information dynamics that SM controls.

In addition to information dynamics, there are resourcedynamics that characterize the available resources at stage t.The dynamics for sensor s are given as:

Rs(t + 1) = Rs(t)− rs(us(t)); Rs(0) = Rs (4)

These dynamics constrain the admissible decisions by asensor, in that it can only use modes that do not use moreresources than are available.

Given the final information I(T ), the quality of theinformation collected is measured by making an estimate ofthe state of each location i given the available information.Denote these estimates as vi i = 1, . . . , N . The Bayes’ costof selecting estimate vi when the true state is xi is denotedas c(xi, vi) ∈ < with c(xi, vi) ≥ 0. The objective of the SM,stochastic control formulation is to minimize:

J =N∑

i=1

E[c(xi, vi)] (5)

by selecting adaptive sensor control policies and final esti-mates subject to the dynamics of Eq. (3) and the constraintsof Eq. (4) and Eq. (2).

The results in [21] provide an SDP algorithm to solvethe above problem, with cost-to-go at stage t dependingon the joint belief state π(t) = [π1(t), . . . , πN (t)] and theresidual resource state R(t) = [R1(t), . . . , RS(t)]. Because

Page 3: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

of this dependency, the cost-to-go does not decouple overlocations. This leads to a very large POMDP problem withcombinatorially many actions and an underlying belief stateof dimension (D + 1)N that is computationally intractableunless there are very few locations.

In [21], the above stochastic control problem was replacedwith a simpler problem that provided a lower bound on theoptimal cost, by expanding the set of admissible strategies,replacing the constraints of Eq. (2) by the “soft” constraints:

E[T−1∑t=0

rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6)

To solve the simpler problem, [21] proposed incorporation ofthe soft constraints in Eq. (6) into the objective function usingLagrange multipliers λs for each sensor s. The augmentedobjective function is:

Jλ = J +T−1∑t=0

S∑s=1

λs E[rs(us(t))]−S∑

s=1

λsRs (7)

A key result in [21] was that when the optimization of Eq. (7)was done over mixed strategies for given values of Lagrangemultipliers λs, the stochastic control problem decoupled intoindependent POMDPs for each location, and the optimiza-tion could be performed using feedback strategies for eachlocation i that depended only on the information collectedfor that location, Ii(t). These POMDPs have an underlyinginformation state-space of dimension D + 1, correspondingto the number of possible states at a single location, andcan be solved efficiently. Because the measurements andpossible sensor actions are finite-valued, the set of possibleSM strategies Γ is also finite. Let Q(Γ) denote the set ofmixed strategies that assign probability q(γ) to the choice ofstrategy γ ∈ Γ. The problem of finding the optimal mixedstrategies can be written as:

minq∈Q(Γ)

γ∈Γ

q(γ)Eγ

J(γ) (8)

γ∈Γ

q(γ)Eγ[

N∑

i=1

T−1∑t=0

rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9)

γ∈Γ

q(γ) = 1 (10)

where we have one constraint for each of the S sensor re-source pools and an additional simplex constraint in Eq. (10)which ensures that q ∈ Q(Γ) forms a valid probabilitydistribution. This is a large linear program (LP), where thenumber of possible variables are the strategies in Γ. However,the total number of constraints is S+1, which establishes thatoptimal solutions of this LP are mixtures of no more thanS + 1 strategies. Thus, one can use a column generationapproach [22], [23], [24] to quickly identify an optimalmixed strategy. In this approach, one solves Eq. (8) andEq. (9) restricting the mixed strategies to be mixtures ofa small subset Γ′ ⊂ Γ. The solution of the restricted LPhas optimal dual prices λs, s = 1, . . . , S. Using these prices,

one can determine a corresponding optimal pure strategy byminimizing:

Jλ =N∑

i=1

E[c(xi, vi)] +T−1∑t=0

S∑s=1

λs E[rs(us(t))]−S∑

s=1

λsRs

(11)

which the results in [21] show can be decoupled intoN independent optimization problems, one for each location.Each of these problems is solved as a POMDP using standardalgorithms such as point-based value iteration (PBVI) [25]to determine the best pure strategy γ1 for these prices. If thebest pure strategy γ1 is already in the set Γ′, then the solutionof Eq. (8) and Eq. (9) restricted to Q(Γ′) is an optimal mixedstrategy over all of Q(Γ). Otherwise, the strategy γ1 is addedto the admissible set Γ′, and the iteration is repeated. Theresult is a set of mixed strategies that achieve a performancelevel that is a lower bound on the original SM optimizationproblem with hard constraints.

III. COLUMN GENERATION AND POMDP SUBPROBLEMEXAMPLE

We present an example to illustrate the column generationalgorithm and POMDP algorithms discussed previously. Inthis simple example we consider 100 objects (N=100), 2 pos-sible object types (D=2) with X = {non-military vehicle,military vehicle}, and 2 sensors that each have one mode(S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions haveresource costs: rs, where r1 = 1, r2 = 2. Sensors return2 possible observation values, corresponding to binary objectclassifications, with likelihoods:

P (yi,1,1(t)|xi, u1(t)) P (yi,2,1(t)|xi, u2(t))

(0.90 0.100.10 0.90

) (0.92 0.080.08 0.92

)

where the (j, k) matrix entry denotes the likelihood that y =j if xi = k. The second sensor has 2% better performancethan the first sensor but requires twice as many resources touse. Each sensor has Rs = 100 units of resources, and canview each location. Each of the 100 locations has a uniformprior of πi = [0.5 0.5]T ∀ i. For the performance objective,we use c(xi, vi) = 1 if xi 6= vi, and 0 otherwise, where thecost is 1 unit for a classification error.

Table I demonstrates the column generation solution pro-cess. The first three columns are initialized by guessing val-ues of resource prices and obtaining the POMDP solutions,yielding expected costs and expected resource use for eachsensor at those resource prices. A small LP is solved to obtainthe optimal mixture of the first three strategies γ1, . . . , γ3,and a corresponding set of dual prices. These dual prices areused in the POMDP solver to generate the fourth columnγ4, which yields a strategy that is different from that of thefirst 3 columns. The LP is re-solved for mixtures of the first4 strategies, yielding new resource prices that are used togenerate the next column. This process continues until thesolution using the prices after 7 columns yields a strategy thatwas already represented in a previous column, terminating

Page 4: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

γ1 γ2 γ3 γ4 γ5 γ6 γ7

min 50.0 2.80 2.44 1.818 8 10 6.22R1 0 218 200 0 0 100 150 ≤ 100R2 0 0 36 800 200 0 18 ≤ 100Simplex 1 1 1 1 1 1 1 = 1

Optimalcost - - 26.22 21.28 7.35 5.95 5.95Mixtureweights 0 0.424 0 0 0.500 0.076 0

λc1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061

λc2 1.0e15 0.025 0.015 0 0.060 0.210 0.041

TABLE I: Column generation example with 100 objects. Thetableau is displayed in its final form after convergence. λc

s describethe lambda trajectories up until convergence. R1 and R2 areresource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbersrepresent useful solution data.

Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and6 of Table I. The frequency of choosing each of these 3 strategiesis controlled by the relative proportion of the mixture weight qc ∈(0..1) with c ∈ {2, 5, 6}.

the algorithm. The optimal mixture combines the strategiesof the second, fifth and sixth columns. When the master prob-lem converges, the optimal cost, J∗, for the mixed strategyis 5.95 units. The resulting policy graphs are illustrated inFig. 1, where branches up indicate measurements y = 1(‘non-military’) and down y = 2 (‘military’). The red andgreen nodes denote the final decision, vi, for a location.Note that the strategy of column 5 uses only the secondsensor, whereas the strategies of columns 2 and 6 use onlythe first sensor. The mixed strategy allows the soft resourceconstraints to be satisfied with equality. Table I also showsthe resource costs and expected classification performanceof each column.

The example illustrates some of the issues associated withthe use of soft constraints in the optimization: the resultingsolution does not lead to SM strategies that will alwayssatisfy the hard constraints Eq. (2). We address this issuein the subsequent section.

IV. RECEDING HORIZON CONTROL

The column generation algorithm described previouslysolves the approximate SM problem with “soft” constraintsin terms of mixed strategies that, on average, satisfy theresource constraints. However, for control purposes, onemust select actual SM actions that satisfy the hard constraints

Eq. (2). Another issue is that the solutions of the decoupledPOMDPs provide individual sensor schedules for each lo-cation that must be interleaved into a single coherent sensorschedule. Furthermore, exact solution of the small decoupledPOMDPs for each set of prices can be time consuming,making the resulting algorithm unsuited for real-time SM.

To address this, we will explore a set of RH algorithmsthat will convert the mixed strategy solutions discussed in theprevious section to actions that satisfy the hard constraints,and limit the computational complexity of the resultingalgorithm. The RH algorithms have adjustable parameterswhose effects we will explore in simulation.

The RH algorithms start at stage t with an informationstate/resource state pair, consisting of available informationabout each location i = 1, . . . , N represented by the condi-tional probability vector πi(t) and available sensor resourcesRs(t), s = 1, . . . , S. The first step in the algorithms isto solve the SM problem of Eq. (5) starting at stage t tofinal stage T subject to soft constraints Eq. (6), using thehierarchical column generation / POMDP algorithms to geta set of mixed strategies. We introduce a parameter corre-sponding to the maximum number of sensing actions perlocation to control the resulting computational complexityof the POMDP algorithms.

The second step is to select sensing actions to implementat the current stage t from the mixed strategies. Thesestrategies are mixtures of at most S + 1 pure strategies,with associated probabilistic weights. We explore three ap-proaches for selecting sensing actions:

• str1: Select the pure strategy with maximum probability.• str2: Randomly select a pure strategy per location

according to the optimal mixture probabilities.• str3: Select the pure strategy with positive probability

that minimizes the expected sensor resource use (andthus leaves resources for use in future stages.)

Once pure strategies for each location have been selected,the third step is to select a sensing action to be implementedfor each location. Our approach is to select the first sensingaction of the pure strategy for each location. Note thatthere may not be enough sensor resources to execute theselected actions, particularly in the case where the purestrategy with maximum probability is selected. To addressthis, we rank sensing actions by their expected entropygain [26], which is the expected reduction in entropy ofthe conditional probability distribution, πi(t), based on theanticipated measurement value. We schedule sensor actionsin order of decreasing expected entropy gain, and performthose actions at stage t that have enough sensor resources tobe feasible.

The measurements collected from the scheduled actionsare used to update the information states πi(t + 1) usingEq. (3). The resources used by the actions are eliminatedfrom the available resources to compute Rs(t + 1) usingEq. (4). The RH algorithm is then executed from the newinformation state/resource state condition.

Page 5: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Search Low-res Hi-resy1 y2 y3 y1 y2 y3 y1 y2 y3

empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03

truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95

TABLE II: Observation likelihoods for different sensormodes with the observation symbols y1, y2 and y3.

Fig. 2: Illustration of scenario with two partially-overlappingsensors.

V. SIMULATION RESULTS

In order to evaluate the relative performance of the differ-ent RH algorithms, we performed a set of experiments withsimulations. In these experiments, there were 100 locations,each of which could be empty, or have objects of three types,so the possible states of location i were xi ∈ {0, 1, 2, 3}where type 1 represents cars, type 2 trucks, and type 3military vehicles. Sensors can have several modes: a searchmode, a low resolution mode and a high resolution mode.The search mode primarily detects the presence of objects;the low resolution mode can identify cars, but confusesthe other two types, whereas the high resolution mode canseparate the three types. Observations are modeled as havingthree possible values. The search mode consumes 0.25 unitsof resources, whereas the low-resolution mode consumes1 unit and the high resolution mode 5 units, uniformlyfor each sensor and location. Table II shows the likelihoodfunctions that were used in the simulations.

Initially, each location has a state with one of two priorprobability distributions: πi(0) = [0.10 0.60 0.20 0.10]T ,i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T , i ∈[11, . . . , 100]. Thus, the first 10 locations are likely to containobjects, whereas the other 90 locations are likely to be empty.When multiple sensors are present, they may share somelocations in common, and have locations that can only beseen by a specific sensor, as illustrated in Fig. 2.

The cost function used in the experiments, c(xi, vi) isshown in Table III. The parameter MD represents the costof a missed detection, and will be varied in the experiments.

Table IV shows simulation results for a search and clas-sify scenario involving 2 identical sensors (with the same

xi; vi empty car truck militaryempty 0 1 1 1

car 1 0 0 1truck 1 0 0 1

military MD MD MD 0

TABLE III: Decision costs

MD = 1 MD = 5 MD = 10Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3

Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10

Hor. 4Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86

Hor. 6Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94

BoundsRes 30 3.35 11.50 13.85Res 50 2.21 6.27 9.40Res 70 1.32 2.95 4.96

TABLE IV: Simulation results for 2 homogeneous, multi-modalsensors. str1: select the most likely pure strategy for all locations;str2: randomize the choice of strategy per location according tomixture probabilities; str3: select the strategy that yields the leastexpected use of resources for all locations.

visibility), evaluating different versions of the RH controlalgorithms and with different resource levels. The variable“Horizon” is the total number of sensor actions allowed perlocation plus one additional action for estimating the locationcontent. The table shows results for different resource levelsper sensor, from 30 to 70 units, and displays the lower boundperformance computed. The missed detection cost MD isvaried from 1 to 10. The results shown in Table IV representthe average of 100 Monte Carlo simulation runs of the RHalgorithms.

The results show that using a longer horizon in theplanning improves performance minimally, so that usinga RH replanning approach with a short horizon can beused to reduce computation time with limited performancedegradation. The results also show that the different RHalgorithms have performance close to the optimal lowerbound in most cases, with the exception being the case ofMD = 5 with 70 units of sensing resources per sensor. Fora horizon 6 plan, the longest horizon studied, the simulationperformance is close to that of the associated bound. Interms of which strategy is preferable for converting themixed strategies, the results of Table IV are unclear. Forshort planning horizons in the RH algorithms, the preferredstrategy appears to be to use the least resources (str3), thusallowing for improvement from replanning. For the longerhorizons, there was no significant difference in performanceamong the three strategies. To illustrate the computationalrequirements of this scenario (4 states, 3 observations, 2 sen-sors (6 actions), full sensor-overlap), the number of columnsgenerated by the column generation algorithm to compute aset of mixed strategies was on the order of 10-20 columnsfor the horizon 6 algorithms, which takes about 60 sec ona 2.2 Ghz, single-core, Intel P4 machine under linux usingC code in ‘Debug’ mode (with 1000 belief-points for PBVI).Memory usage without optimizations is around 3 MB.

There are typically 4-5 planning sessions in a simulation.Profiling indicates that roughly 80% of the computing time

Page 6: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

Homogeneous HeterogeneousMD MD

Horizon 3 1 5 10 1 5 10Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41

Horizon 4Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65

Horizon 6Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17

TABLE V: Comparison of lower-bounds for 2 homogeneous, bi-modal sensors vs. 2 heterogeneous sensors.

goes towards value backups in the PBVI routine and 15%goes towards (recursively) tracing decision-trees in order toback out (deduce) the measurement costs from hyperplanecosts. (Every node in a decision-tree / policy-graph (foreach pure strategy) has a corresponding hyperplane witha vector of cost coefficients that represent classification +measurement costs).

In the next set of experiments, we compare the use ofheterogeneous sensors that have different modes available. Inthese experiments, the 100 locations are guaranteed to havean object, so xi = 0 is not feasible. The prior probabilityof object type for each location is πi(0) = [0 0.7 0.2 0.1]T .Table V shows the results of experiments with sensors thathave all sensing modes, versus an experiment where onesensor has only a low-resolution mode and the other sensorhas both high and low-resolution modes. The table shows thelower bounds predicted by the column generation algorithm,to illustrate the change in performance expected from thedifferent architectural choices of sensors. The results indicatethat specialization of one sensor can lead to significantdegradation in performance due to inefficient use of itsresources.

The next set of results explore the effect of spatial distribu-tion of sensors. We consider experiments where there are twohomogeneous sensors which have only partially-overlappingcoverage zones. (We define a ‘visibility group’ as a setof sensors that have a common coverage zone). Table VIgives bounds for different percentages of overlap. Notethat, even when there is only 20% overlap, the achievableperformance is similar to that of the 100% overlap case inTable V, indicating that proper choice of strategies can leadto efficient sharing of resources from different sensors andequalizing their workload. The last set of results show theperformance of the RH algorithms for three homogeneoussensors with partial overlap and different resource levels.The visibility groups are graphically portrayed in Fig. 3.Table VII presents the simulated cost values averaged over100 simulations of the different RH algorithms and the lowerbounds. The results support our previous conclusions: whena short horizon is used in the RH algorithm, and there aresufficient resources, the strategy that uses the least resourcesis preferred as it allows for replanning when new information

Overlap 60% Overlap 20%MD MD

Horizon 3 1 5 10 1 5 10Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45

Horizon 4Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47

Horizon 6Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16

TABLE VI: Comparison of performance bounds with 2 homo-geneous sensors with partial overlap in coverage. Only the boldnumbers are different.

Fig. 3: The 7 visibility groups for the 3 sensor experimentindicating the number of locations in each group.

is available. If the RH algorithm uses a longer horizon, thenits performance approaches the theoretical lower bound, andthe difference in performance between the three approachesfor sampling the mixed strategy to obtain a pure strategyis statistically insignificant. Our results suggest that RHcontrol with modest horizons of 2 or 3 sensor actions perlocation can yield performance close to the best achievableperformance using mixed strategies. If shorter horizons areused to reduce computation, then an approach that samplesmixed strategies by using the smallest amount of resourcesis preferred. The results also show that, with proper SM,geographically distributed sensors with limited visibility canbe coordinated to achieve equivalent performance to centrally

MD = 1 MD = 5 MD = 10Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3

Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21

Horizon 4Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90

Horizon 6Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11

BoundsRes 100 4.62 15.66 19.56Res 166 2.92 8.22 10.89Res 233 2.18 4.87 7.18

TABLE VII: Simulation results for 3 homogeneous sensors withpartial overlap as shown in Fig. 3.

Page 7: Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

pooled resources.In terms of the computational complexity of our RH al-

gorithms, the main bottleneck is the solution of the POMDPproblems. The LPs solved in the column generation approachare small and are solved in minimal time. Solving thePOMDPs required to generate each column (one POMDPfor each visibility group in cases with partial sensor overlap)is tractable by virtue of the hierarchical breakdown of theSM problem into independent subproblems. It is also verypossible to accelerate these computations using multi-coreCPU or (NVIDIA) GPU processors, as the POMDPs arehighly parallelizable.

VI. CONCLUSIONS

In this paper, we introduce RH algorithms for near-optimal, closed-loop SM with multi-modal, resource-constrained, heterogeneous sensors. These RH algorithmsexploit a lower bound formulation developed in earlier workthat decomposes the SM optimization into a master problem,which is addressed with linear-programming techniques, andsingle location stochastic control problems that are solvedusing POMDP algorithms. The resulting algorithm generatesmixed strategies for sensor plans, and the RH algorithmsconvert these mixed strategies into sensor actions that satisfysensor resource constraints.

Our simulation results show that the RH algorithmsachieve performance close to that of the theoretical lowerbounds in [21]. The results also highlight the differentbenefits of choosing a longer horizon for RH strategies andalternative approaches at sampling the mixed strategy solu-tions. Our simulations also show the effects of geographicallydistributing sensors so that there is limited overlap in fieldof view and the effects of specializing sensors by using arestricted number of modes.

There are many interesting directions for extensions tothis work. First, one could consider the presence of objectdynamics, where objects can arrive at or depart from specificlocations. Second, one can also consider options for sensormotion, where individual sensors can change locations andthus observe new areas. Third, one could consider a set ofobjects that have deterministic, but time-varying, visibilityprofiles. Finally, one could consider approaches that reducethe computational complexity of the resulting algorithms, ei-ther through exploitation of parallel computing architecturesor through the use of off-line learning or other approximationtechniques.

REFERENCES

[1] B. Koopman, Search and Screening: General Principles with Histor-ical Applications. Pergamon, New York NY, 1980.

[2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey ofthe search theory literature,” Naval Research Logistics, vol. 38, no. 4,pp. 469–494, 1991.

[3] D. A. Castanon, “Optimal search strategies in dynamic hypothesis test-ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25,no. 7, pp. 1130–1138, Jul 1995.

[4] A. Wald, “On the efficient design of statistical investigations,” TheAnnals of Mathematical Statistics, vol. 14, pp. 134–140, 1943.

[5] ——, “Sequential tests of statistical hypotheses,” The Annals ofMathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online].Available: http://www.jstor.org/stable/2235829

[6] D. V. Lindley, “On a measure of the information provided by anexperiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005,1956.

[7] J. C. Kiefer, “Optimum experimental designs,” Journal of the RoyalStatistical Society Series B, vol. 21, pp. 272–319, 1959.

[8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM,Philadelphia, PA, 1972.

[9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, NewYork, 1972.

[10] K. Kastella, “Discrimination gain to optimize detection and classifica-tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on,vol. 27, no. 1, pp. 112–116, Jan. 1997.

[11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor managementusing an active sensing approach,” Signal Processing, vol. 85, no. 3,pp. 607–624, 2005.

[12] M. Athans, “On the determination of optimal costlymeasurement strategies for linear stochastic systems,” Automatica,vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail-able: http://www.sciencedirect.com/science/article/B6V21-47SV18C-5/2/dc50e03b2ec82f34c592d4056c0da466

[13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarmbandits: a methodology for beam scheduling in multitarget tracking,”Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893–2908, Dec 2001.

[14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program-ming based approaches to sensor resource management,” InformationFusion, 2002. Proceedings of the Fifth International Conference on,pp. 608–615 vol.1, 2002.

[15] J. C. Gittins, “Bandit processes and dynamic allocation indices,”Journal of the Royal Statistical Society. Series B (Methodological),vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available:http://www.jstor.org/stable/2985029

[16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo-ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans-actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998.

[17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensormanagement in target tracking,” in IEEE Nonlinear Statistical SignalProcessing Workshop, 2006.

[18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partiallyobservable markov decision process approximations for adaptive sens-ing,” Discrete Event Systems, 2008. WODES 2008. 9th InternationalWorkshop on, pp. 173–180, May 2008.

[19] D. A. Castanon, A. Hero, D. Cochran, and K. Kastella, Foundationsand Applications of Sensor Management, 1st ed. Springer, 2008,ch. 1.

[20] D. A. Castanon, “Approximate dynamic programming for sensormanagement,” in Proc 36th Conference on Decision and Control.IEEE, 1997, pp. 1202–1207.

[21] ——, “Stochastic control bounds on sensor network performance,”Decision and Control, 2005 and 2005 European Control Conference.CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005.

[22] P. C. Gilmore and R. E. Gomory, “A linear programmingapproach to the cutting-stock problem,” Operations Research,vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available:http://www.jstor.org/stable/167051

[23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm forlinear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961.[Online]. Available: http://www.jstor.org/stable/1911818

[24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza-tion with imperfect information,” Naval Research Logistics, vol. 47,no. 8, pp. 607–619, 2000.

[25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: Ananytime algorithm for pomdps,” in International Joint Conference onArtificial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032.

[26] K. Kastella, “Discrimination gain for sensor management in multitar-get detection and tracking,” in IEEE-SMC and IMACS MulticonferenceCESA 1996, vol. 1, Jul. 1996, pp. 167–172.