MS Project

Darin Hitchings8/18/02

Vehicle Routing Project

Overview:

The purpose of this vehicle routing project is to explore sub-optimal solutions to the difficult

problem of object classification with noisy sensors on a 2D grid with locality constraints. The problem

at hand is to find the best policy for planning vehicle movements to explore the graph in order to

minimize a cost-function. An exact Dynamic Programming solution for such a problem is highly

infeasible because the complexity of the DP algorithm grows exponentially with the number of cells in

the grid, which is proportional to the number of states in the problem. Instead, this project formulates

the problem as a multi-commodity network flow problem with each vehicle as a commodity and tasks

assigned at each grid square. The vehicles must flow through the graph to collect value and are

constrained to move in 4 directions (the 4-neighbors of each cell) as well as constrained by the

boundaries of the grid. A simulation was created to compare the benefit of this algorithm with a

standard myopic policy for vehicle movement.

Terms and Definitions:

Throughout the entirety of this paper, the words “cell”, “node” and “coordinate” are all defined

to mean a grid-square within the graph where a task can be performed. The simulation uses a regular,

square grid and hence there will be G^2 possible tasks at any one time for one vehicle on the graph

where G is defined to be the width of the grid (width = height). In addition, the words “vehicle”,

“sensor platform” or just “platform” are used interchangeably. Each side of each cell has an associated

directed arc which represents the flow across that boundary of the cell of a given sensor platform at a

given time. These flows come in pairs: there exists a corresponding flow entering every cell for each

direction in which there is a flow leaving a cell. Note that corner cells have 2 pairs of flows crossing

their two interior faces, cells on the side of the grid have 3 pairs of flows associated with their interior

faces and cells within the interior of the grid have 4 pairs of flows crossing their exposed sides. The

convention within this project is that a positive flow at a cell is directed outwards from that cell.

Lastly, the coordinate system defines the cell at the bottom-left corner of the grid as (0,0) and the cell

at the top-right corner of the grid as (G-1,G-1). A task for this problem consists of an assignment for a

given vehicle to make a measurement at a particular cell at a particular time. Tasks at cells are denoted

X(m,k,t) where m is the cell index, k is the number of the vehicle doing the task and t is the time at

which the task is done. “Arcs” which are the vehicle flows are denoted Y(m,n,k,t). This notation

specifies that platform k is flowing from cell index m to cell index n at time t. All vehicle flows are

positive by convention. Bounds on k and t are: 0 ≤ k < K and 0 ≤ t < T, where K is the number of

platforms in the simulation and T is the number of times in which the platforms can move. G, K, T є

Ζ1. In this project T gives the initial length of the planning horizon. T needs to be an even number or

else it isn’t possible for the platforms to end up back at base at the end of the simulation. As time

progresses the planning horizon shrinks because there are fewer and fewer moves to go before time is

up. A constraint is placed on all platforms that says they must return to base by the end of the planning

horizon. If a cell is referred to by index instead of coordinate, the indices are taken to start at (0,0) and

increase along the rows in the direction of +x. Coordinate (0,1) is at index 10, coordinate (9,9) is at

index 99 when G=10.

Variables:

1) x(m,k,t) : the fractional amount of task m (task m = node m = cell m) which is done at time t by sensor k. This value is a real number. (sensor k = platform k = vehicle k)

2) y(m,n,k,t) : fractional flow of sensor k from node m to node n at time t, is 0 for n not in F*(m) which is to say for cells that are not neighbors of m. Fractional flow values are real numbers.

3) F*(m) : the 4 neighboring cells of node m

4) m0 : the base cell where all of the platforms start at and return to, typically at the center of the grid

5) Ns(m,t) : number of sensors entering the system, non-zero only when t=0 and m=m0, (for the initial LP problem)

6) Ne(m,t) : number of sensors leaving the system, non-zero only when t=T and m= m0

7) v(m) : value of sensing task at node m = entropy of node m

Algorithm:

The solution method used in this project uses an approximation to the optimal cost-to-go

function, which would be computed from the D.P. algorithm. The basic equation of dynamic

programming is the Bellman Equation, which is a recursive equation backwards in time. This equation

states:

where .

This equation basically says that if one starts at the end of the time horizon one can work one’s way

backwards to the beginning until the current time is reached and build a (very big) table of optimal

moves for every state that can occur through the future course of the simulation. In the Bellman

Equation, J*() is the optimal cost-to-go, x is the state, U(x(t)) is the set of possible actions that is

available at state x at time t, and g is the local cost of being in the current state. This equation does not

lend itself towards an easy solution when the length of the time horizon or the number of possible

states or both grows large, so even though the solution is optimal, for real-time decision making other

suboptimal methods are required.

Several relevant papers1,2 indicate that using the rollout-algorithm (described below) to

approximate the cost-to-go, J(x(t)), yields much of the performance given by the optimal policy from

D.P. With this idea in mind, this project breaks the solution procedure into two steps. First there is a

look-ahead step in which every possible move is considered up to a certain depth in the future. After

this given depth, the program then uses a rollout step to approximate the cost-to-go farther ahead in

time than the look-ahead can see. Given the sum of the exact local costs of every possible future state

for the next several moves in the look-ahead region plus an approximate cost-to-go from the rollout

region, an approximation to the actual cost-to-go is computed. The aforementioned “regions” in the

solution procedure are regions in time. The costs-to-go values are used to decide which of the possible

next moves is best: by choosing moves that minimize the cost-to-go or maximize the reward-to-go,

one is assured that all vehicles follow an optimal path. The state of the system is comprised of the

position of the sensor platforms and the probability vector at each cell, so minimizing the cost-to-go

from a given state ensures each platform is positioned optimally at all future times. Optimality is

ensured through penalizing bad future vehicle positions (bad states) by assigning extra cost to them

with the objective function. The current version of the simulation uses a look-ahead window of one

move.

The idea behind the rollout algorithm is that a base policy is fixed and then successively

evaluated in time down to the end of the planning horizon to get an approximation of the optimal cost-

to-go that the base policy would give were it actually used at every decision step. Base policies are

heuristical in nature because again optimal policies are intractable. By rolling out these heuristics to

the end of the planning horizon, a program can greatly increase the performance of the base heuristic

by predicting how it will do over time. Using a heuristic in the rollout algorithm helps to eliminate the

explosive complexity of looking forward in time. The rollout step for this project is performed by

making a Linear Programming approximation, which relaxes the constraints that specify every vehicle,

must be in one place at one time. Instead the relaxed constraints decree on average vehicles must be in

one place at one time. This approximation converts the rollout step into a linear problem, which can be

solved efficiently using the Simplex Method of Linear Programming. The algorithm is linear in the

number of possible moves at a given state, times the cost to compute an approximate cost-to-go, which

is of polynomial complexity given an LP approximation.

A simplex is a higher-dimensional version of a triangle. A simplex in 2D is a triangle, in 3D is

a tetrahedron and so forth. The simplex method works by checking boundary points of a region

enclosed by a system of linear inequalities. As seen from Fig 1, the optimal solution must reside at one

of the red corner points (vertices) for any objective function. If one of the vertices is not optimal then

by following the edge of one of the constraints that comprises that vertex, one can find a new vertex

with a better objective function value until there are no better vertices. The simplex method can be

thought of as an ameba that flows down through the valleys of an N-dimensional surface checking one

corner point after another and reflecting off the walls of the surface when it hits one. Eventually it will

always find the optimal vertex of the simplex formed by the system of inequalities, which gives the

optimal values of the N variables according to the objective function.

Figure 1

For this project the open-source program “lp_solve 3.0” was used as the solver for implementing the

Simplex Method, it is suitable for solving systems of several hundred variables and constraints.

In order to evaluate which moves are better than others in the look-ahead region a value-metric

was required. Several different choices were available but the standard Shannon Information / Entropy

criterion works well. Minimizing entropy was also the objective of the base policy in the rollout

algorithm. The entropy of the probability distribution at a cell m is defined to be

where C is the number of different class / object types possible.

In this project C=2. Objects of class type 0 are labeled as hostile and objects of type 1 are labeled as

friendly. Equivalently, one could say class 0 objects are “interesting” and type 1 objects are not. By

minimizing the entropy of all cells the vehicles gain information about the probability distribution of

each cell and act to minimize the number of False Alarms and Miss Detections that occur when it is

time to decide which cells are friendly and which are not at the end of the simulation. The closer an

entropy becomes to 0, the more a probability density function looks like a Dirac delta function and the

more skewed the distribution is in favor of being one type of object or another.

Definition of the LP Problem:

In order to create an LP problem, a set of constraint equations needed to be specified so that the

Simplex Method would be guaranteed to only consider moves which are possible for the platforms to

make (on average). For a grid of size GxG, there will be G^2 possible tasks at each time for the G^2

cells in the grid. Every time a task is completed a sensor measurement was taken at the cell associated

with the task and information is gleaned which reduces the probability of classification error when

objects are classified. The problem is posed as follows:

Problem: max

Subject to:

I.

II. t > 0

t = 0 for m0

III.

Constraint Equation I is just a set of bounds on every task that makes sure the LP problem only

tries to complete each task once. Equation II gives the important conservation of flow constraints for

each platform and specifies that only as many pieces of a platform as leave a cell can enter a cell. The

constraint is defined piece-meal because at time 0 there is a discontinuity where all of the platforms are

entering the system (the grid) for the first time. Thus there is a source of platforms at the base cell at

time 0 and all other cells generate no vehicles at any other time. The base cell does not source any

platforms after time 0 and conversely it doesn’t sink any other platforms except at time t=T. No other

cells sink sensor platforms at any time. The discontinuity of having a sink of platforms at the last time

t=T is a non-issue because no planning is done for any time after time T, and for time T-1 the planning

is trivial: all platforms should be one move away from home and must make the appropriate move to

return to base.

Two sets of bounds were added in the initialization of the LP problem for lp_solve. These

bounds are treated separately from how constraint equations are handled by lp_solve. The bounds are

a little redundant when the conservation of flow equations are in place, but they can be specified at

little extra computational cost.

x(m,k,t) ≤ 1, y(m,n,k,t) ≤ 1

Simulation:

The simulation for this project was created as a C++ program that interfaces with the C code

used in the lp_solve program. The simulator is a console application which is designed for large

numbers of Monte Carlo simulations, so all output is dumped to a log file and to a file which stores

simulation results. The program takes two command line arguments: a) the file name that contains all

of the simulation parameters, b) a file name to log error messages to. The convention used was to

name input files “input_G3_K2_T4.dat” for example and output files “output.dat.” The input file

contains parameters for G, K and T as well as FA / MD numbers, the number of Monte Carlo

simulations to run and statistics for the accuracy of the sensors. The order of the parameters in the

input file is documented with comments at the bottom of the input files (also see Appendix D.)

In the simulator the actual entropy-reward that is gained from visiting a cell is a random

variable and so the information that is gained from visiting a cell is not known in advance. Therefore a

random measurement is used to simulate the accuracy of the sensor reading that happens when a

platform passes by a cell; there are good measurements and there are bad measurements. Sometimes

measurements will lead the object classification algorithm (a Likelihood-Ratio Test) astray, however if

the sensors have any utility, then on average information is gained from visiting cells. Since the actual

reward is stochastic in nature, the simulator assigns expected rewards to visiting and taking a

measurement of each cell. By convention, the terminology “first-level entropy” is used to refer to the

value of visiting a cell once and the term “second-level entropy” is used for the value of visiting the

cell a second time. Good measurements cause these numbers to decrease monotonically.

The tricky part about the simulation was making the LP problem solved for each possible

platform move at time t+1 mesh with the problems that were being solved for all the possible moves at

time t. If the simulation does not initialize the LP problem at time t+1 in a manner consistent with the

intended state trajectory at time t, then the program would become schizophrenic and move

unpredictably.

Great length was taken in organizing the constraints’ rows and variables’ columns in the

definition of the LP problem for lp_solve so that its sparse matrix representation would be

conveniently organized in memory. Memory was allocated with the variables arranged backwards in

time so that after each time update the LP problem could be down-sized to correspond to the new

(smaller) planning-horizon, all data corresponding to early simulation steps is therefore located last in

memory. It was a simple matter to eliminate the last columns of the LP problem corresponding to the

previous simulation step, however in pairing down the LP problem after each time update, taking away

unused constraints means memory must be moved around because lp_solve stores variables in column

order and the constraints are represented as rows of non-zero coefficients in each column. With this

arrangement deleting columns is fast but deleting rows is slow and requires shuffling data.

So in the long run it actually proved faster not to resize the LP problem each round and instead

to use the LP variables’ bounds to keep the planning process synchronized with the simulation’s state

from one turn to the next. If the planning process dictated that a platform should move in a particular

direction then the lower bound on the flow corresponding to that platform at that time was set to 1.0.

These bounds acted as constraints that informed the LP solver that for all moves prior to the current

time there is no flexibility in assigning vehicle flows. This way the LP problem does not try to re-plan

for moves that have already happened. In addition, after each simulation step, all of the task variables

for the previous time step are zeroed out. This allows the LP problem to take a second look at cells

where it has been before without running into the constraint that says a tasks value can only be

collected once. The constraints specified by Equation I are meant to disallow a first-level entropy at a

cell from being collected twice. By zeroing out previous tasks and resetting the value of revisiting that

cell in the objective function to a new entropy value, the simulator can treat second-level entropies for

cells that have been visited as if they are first-level entropies. These modifications to the LP problem

after each time step act to enforce the desirable condition that tasks can only be completed at times

after the current time when the LP problem is solved once for every possible platform move during

each vehicle’s path-planning process. See Appendix A for more information on how the programs

main loop progresses.

Figure 2 displays a graphical representation of the course of one simulation when the simulator

is running in Debug mode (_DEBUG is defined). In this mode the simulator sends debug output to

several files (see Appendix C) and does not randomize the order in which the directions are checked

when the look-ahead routine is checking the possible moves of each vehicle. This setup makes tracing

through the code much easier but means that this example shows a bias because platforms have a

tendency for moving in the order the directions are enumerated in: N, E, S, W. (When entropy values

are compared, platforms will only move in a direction which happens later in this enumeration if the

entropy value is greater than the previous direction’s entropy when the program is in Debug mode.)

Appendix D contains a listing of all cost-to-go’s, optimal or not, along with a chart for how the

entropy coefficients of the LP problem change over time.

The simulator was run on a 1.2 Ghz Athlon machine running WinXP. The solver lp_solve was

able to handle 4x4 grids with 2-4 platforms and T = 4 however it sometimes complained of numerical

instabilities. A grid of size 4x4 is also problematic because it has no center cell. Therefore for the

purposes of creating this report problem sizes with 3x3 grids were used.

Figure 2

Problem sizes of G=4 and larger caused the lp_solve function solve() to terminate with a return value

of other than OPTIMAL which indicates that after several million iterations the solver still hasn’t

succeeded in finding the solution yet.

As far as the scalability of the simulator code goes, the computer the simulator was run on

could run 1000 Monte Carlo Simulations with the system G=3, K=2, T=4 in under a minute. For the

system G=4, K=2, T=4 each simulation takes around 1 sec. For a 5x5 system with K=2 and T=8 each

LP problem used to study a potential move took about 4 sec to solve.

Unfortunately, lp_solve 3.0 was unable to solve simulations with most of the parameter sets

that were intended to be analyzed. It could not run 1000 simulations for G=3, K=2 and T=6 without

failing to solve one of the LP problems of a potential move. The algorithm would solve most of the

LP’s with no complaints but every once and a while complain about numerical instability. Eventually

it would return a result of “no optimal solution found” at which point the solver was considered to have

crashed. This same behavior was observed for parameter sets such as (G=4, K=2, T=8) and (G=5,

K=1, T=8), only the solver crashed much faster.

In order to demonstrate that it was not the simulator code which had an instability, a trial

license for the commercial product “MOSEK Optimization Toolkit for Matlab v. 5” was obtained.

When the lp_solve problem failed to solve an LP problem that was given to it, the LP problem in

question was saved to disk in MPS format. Then MOSEK was used to read in the MPS file and solve

it, taking about 2.5 sec. This commercial program did not have any difficulty solving the LP problem

and specified that its result was the optimal solution. Therefore lp_solve is definitely not a program

that is suited for use with large simulations in this project and a license for a better solver needs to be

obtained. Unfortunately, the simulator code was designed for use with the lp_solve functional

interface in mind, not for the MOSEK toolkit for Matlab or the interface for the CPLEX solver. Even

if access to these commercial products was available, it would require rewriting a significant portion of

the simulator code as well as perhaps porting the project to the unix operating system.

A formula for the number of variables in the simulation is (8 + 12*(G-2) + 4*(G-

2)^2)*K*(T+1) + G^2*K*(T+1). So for (G=3, K=2, T=4) there are 330 variables in the LP problem.

For (G=4, K=2, T=4) there are 640. A system with (G=5, K=4, T=8) gives 3780 variables. Lastly, the

original system envisioned for simulation: (G=10, K=4, T=50) has 93,840 unknowns. None of these

larger systems could be analyzed with the lp_solve routine. The MOSEK toolkit was able to solve 5x5

grids without trouble although there was a lot of manual labor required in preparing an LP problem for

its use. From the documentation, the MOSEK program would no doubt have been able to solve the

system (G=10, K=4, T=50) however it had trouble reading in the MPS file. Since it could read in the

smaller MPS files okay, it was concluded that it is either not made to read in 18 MB MPS files or else

there was a limitation placed on reading in large files for the version of the program used.

Analysis:

This section will discuss the simulation results that were obtained in comparing the look-

ahead+rollout policy versus a simple myopic policy. The basic instrument for comparing the

performance of each strategy is the Receiver Operating Characteristic graph. This graph plots the

probability of a successful Detection of a target versus the probability of a False-Alarm for a given

policy. Minimizing one will necessarily cause an increase in the other. The closer the ROC curve of a

policy is to the left hand and top side of the graph the better the policy is. Each data point on these

graphs was obtained by running 1000 Monte Carlo Simulations and averaging the results together.

The following sets of parameters were chosen for comparison and were subject to the limitations of the

computing platform and the LP solver used: (G=3, K=1, T=4), (G=3, K=2, T=4), (G=3, K=4, T=4).

As mentioned previously, other systems that used a larger size grid or more simulation time-steps had

numerical instability problems and could not successfully run 1000 times without failing to solve an

LP problem. This limitation has another important effect on the course of each simulation. The

current version of the simulator is only built to deal with one level of lookahead and first-level

entropies. On a grid as small as 3x3, there are only 8 cells which have non-zero entropy values and are

thus worth visiting. (The base/home cell has no value for exploring.) Therefore for any system in

which K*T > G2-1, there will be sensor platforms resting around idly during the path-planning process

because they can not see the second-level entropies and so they think there is nothing to do. At any

given time-step the current simulator will only let a task at a cell be done once and if there are enough

time steps for some of the vehicles to cover all of the cells, the remaining vehicles will not use their

time productively. The constraints still force the vehicles to move however and so they can still take

measurements at cells they visit (provided no one else is there), but these under-utilized vehicles move

stupidly. This problem provides one more reason why a better solver is needed or else the simulation

needs to be revamped to deal with multi-level look-aheads. A two-level look-ahead would more than

double the number of variables in the simulation though so there is no recourse for having a

professional solver as the back-end of this project.

ROC curves were generated in Excel for the parameter sets that were successfully simulated.

See Fig 3. The graphs allow for the various entropy and rollout-based policies to be compared with

each other. On account of the small grid sizes and the idleness of the platforms when K=4 in the

rollout policy, these ROC curves do not demonstrate that the rollout policy is superior to a simple

entropy-based one. This less than stellar performance from the rollout occurs because the simulations

were so simple that no real planning needed to be done. The single key factor that would really be

required to bring out the inherent potential of the rollout algorithm is a larger size grid. On such a

small system an entropy-based policy does very well with the addition of a simple heuristic to make

the platforms avoid each other; there is not really very far they can go and their one-move level of

foresight covers a large percentage of the area of the grid. If such a policy was used on a 5x5 grid then

simply avoiding other platforms would not help a platform to effectively take measurements on cells

that are important to learn about. One more reason why both policies perform very similarly is that the

cells on the grid are uniformly distributed and thus all have an initial entropy value that is the same.

After the first look these entropy numbers will start to fork apart in two different directions each time

they are measured, but these simulations generally only have enough time in them for the platforms to

make one pass by each cell.

Figure 3

The graphs in Fig. 3 actually plot points for the FA to MD ratios: 1:1, 1:5, 1:10, 1:20, 1:30,

1:50, 1:70, 1:90, 1:95 and 1:100. However the data points for Pr(FA) and PR(D) tend to line up on top

of each other and only generate a handful of distinct points. This situation is similar to how one would

react if asked to choose between having a gift worth > $20 or else having $5 cash. One would choose

the gift. One would continue to choose the gift even if offered $10 or $15 cash. In a similar way, only

when the ratio of FA’s gets above certain cut-off points does the Likelihood Ratio Test risk having one

more MD for fewer FA’s. Therefore there is a very discrete nature to the way the data points are

plotted on the ROC graph.

One good thing about these simulations is that they show the ROC curves consistently improve

when more platforms are available for making measurements, which is what they should do. The

curves do not point out much of a difference between the rollout and entropy policies except for the

case of K=2. For the most part they depend solely on the number of vehicles available for exploring

the grid. Were this a larger system, it would be expected that a rollout policy with fewer vehicles

could out-perform an entropy-based policy with more vehicles, but this conclusion can not be drawn

from the data collected in these simulations.

Conclusion:

In conclusion this project suffered for want of a professional-grade implementation of the

Simplex Method for solving Linear Programming problems. The simulator did not show too much

difference between a myopic policy and a more sophisticated look-ahead + rollout policy for the

vehicle routing problem, however it did provide some intelligent results for the simulation cases it

could handle. Further analysis is required before any conclusions about the performance of the

algorithms used in this problem can be drawn. In the meantime, the work undertaken here should

serve as a reliable foundation for developing simulators that are more powerful in the future.

References:

[1] Bertsekas, D.P., Castañon, D.A., “Rollout Algorithms for Stochastic Scheduling Problems,” Journal of Heuristics, V. 5, 1999.

[2] Bertsekas, D.P., Castañon, D.A., Curry, M.L., Logan D. , “Adaptive Multi-platform Scheduling in a Risky Environment”, Proceeding of Symposium on Advances in Enterprise Control (San Diego, CA, November 1999).

Appendix A, Overview of Simulation’s Code:

Note: The simulation uses class CCoord to help do movement arithmetic. (All class types are prefaced with a “C” to indicate they are C++ data types.) If an object of class CCoord called “pos” is created then CCoord defines an overloaded member function associated with the “+” sign in the code such that pos+dir evaluates to the coordinate offset by the direction. The directions are enumerated as 0..3 for N,E,S,W. For example (2,2) + 0 is interpreted as going north from cell (2,2) which is cell (2,3).

This program only implements a single level of look-ahead and the look-ahead step of the solution is built into the LP approximation used for the rollout step. The function CBase::GetEntropyAt() was meant to be used to retrieve expected entropy values at different look-ahead levels in the future, however it currently only supports one level. These expected entropy values are different than the actual ones obtained during the measurements taken by the platforms in CBase::ScanObjects because the latter are stochastic quantities, which are determined by what kind of sensor observation is made. In order to have more than one level of look-ahead several changes must be made:

a. There are 2N number of calls to a “calculate entropy” function each time a cell is visited and approximately 4N LP problems to be solved per sensor where N is the number of levels ahead to look, so the complexity explodes for even small values of look-ahead. For two platforms with a look-ahead of 2, the LP problem would have to be solved 32 times per time step in the simulation if the platforms were in the interior of the grid.

b. A full set of constraints (all three equations) would need to be added for each tier of entropy values included. A look-ahead of 2 would touch on some second level entropies and would thus require conservation of flow constraints etc and would double the problem’s complexity.

c. The variable-encoding scheme within the vector x in the LP problem “Ax [<=>] b” has to have another layer of complexity in it to account for the x's and y's of the different entropy levels. Therefore “GetYPtr_time()” and “GetXPtr_time()” must change and become functions of look-ahead depth as well. See “5)” below.

d. Need to add constraints to guarantee that the first level entropies are picked up before the second level ones because the entropy values may not be monotonically decreasing and it would be non-causal if the solver attempted to collect values out of order.

Pseudocode:

1) Instantiate the global object “simConsts” of type CSimConstants that contains read-only values of all simulation parameters in one place where they are easy to get to from all program scopes. This class opens the input and output files and reads in the parameters when it is constructed.

2) Instantiate a CGrid object that contains the current state of all the cells’ probability vectors. A CGrid object creates CCell objects for each of the cells in the grid, G2 in all. Call CGrid’s member function InitiatizeEntropyGrid() to allocate a matrix of storage space for caching current entropy values of each cell in the grid.

3) Instantiate a CBase object, which creates one CSearcher object for each platform, specified by the value of K. CSearcher objects represent the simulation’s platforms and are initialized to start at home position at the base. CBase creates a set of CGrid objects called “futureGrids[]” for its internal use to cache entropy values used in the look-ahead process. This way the computational overhead of calculating entropy values into the future is reduced because the entropies can be reused and only need to be changed if a platform actually visits a cell (not just looks at it in the planning process.)

4) CBase creates a CLPInterface object called “VRPlp” in its constructor. This object acts as the interface between the C++ code of the simulation and the C code of lp_solve. When this object is constructed, it allocates the memory needed by lp_solve for the solution of an LP problem. The CLPInterface object uses a member variable called “oneConstraintRow[]” which is used to add constraint equations to the LP problem contained within the CLPInterace object. A related pointer variable called “arrayBase” is set to the value &oneConstraintRow[1]. lp_solve indexes arrays from 1..N and so the simulator references arrayBase from 0..N-1, which accesses oneConstraintRow[] from 1..N where N is its length. This memory is used to set constraint coefficients and add constraints using the lp_solve function add_constraint. lp_solve searches the array oneConstraintRow[] for non-zero coefficients and allocates memory in its sparse matrix data structure for storing non-zeros in the constraint equations at these locations. All of the CLPInterface functions AddConstraintEqOne(), AddConstraintEqTwo(), AddConstraintEqThree() assume that the array oneConstraintRow[] is all zeros before using it. Recycling the array by just turning on coefficients, adding a constraint and then turning off those coefficients is much faster than allocating a large array each time even though the array is only one-dimensional. Two flags in the file SimConstants.h are used to determine how the problem is constructed. The flag “CREATE_LP_FILE” tells the program whether or not it should store the original LP problem at t=0 in a format which can be read by the simulator later on. The code stores LP problems in a format with an “.lp” extension. Storing the LP to a file rather than creating a new one can allow the program to initialize much faster for large LP problems with thousands of unknowns and constraints. A second flag “READ_FROM_OLD_FILE” in SimConstants.h specifies whether or not the program should read in an old file or go through the work of generating a new LP problem, which requires running more initialization to generate the problem’s constraint equations. These flags work together. *Important note, the lp_solve fails to read in files with long variable and constraint names. Therefore the flag NAME_VRP is used in simConstants.h to leave a *.lp file unnamed when it is stored. Then by turning the flag back on the simulator can read in the unnamed LP problem and give it some easier to read names after the read process is completed.

5) If required the code in CLPInterface will create a fresh LP problem and generate constraints for it, otherwise it will read in the problem (in its initial state) from the file. Generating constraint equations from scratch is done by calling a function for each of the three main equations of the simulation along with a function to set the coefficients in the objective function to the appropriate entropy values. No value is assigned to visiting the home cell. Two important functions used in this process are “GetYPtr_time” “GetXPtr_time”. These functions return an index into the encoded array of variables that becomes the “x” in the system Ax [< = >] b that lp_solve will solve. GetYPtr_time returns a pointer to a flow variable as a function of the time, platform, coordinate and direction it is associated with. Similarly, the function GetXPtr_time returns a pointer to a task variable as a function of the time, platform and coordinate associated with it. These pointers are either used directly or pointer arithmetic is done to find the index value of variable from the beginning of the array. (See Appendix B for information on how the LP variables are encoded in an array.) lp_solve has a non-standard convention because it indexes columns from 1 to N and not 1 to N-1 as is the standard C convention. Rows are indexed from 1 to N as well. Row 0 in the lp_solve code is used for storing the objective function, not a constraint equation.

6) The program begins its main loops: it loops over simulations and within each simulation over simulation times from t = 0 to t ≤ T. The main loop proceeds as follows:

a) Call CBase::Update()1) If not at the first step of the simulation:

i) Call CBase::ScanObjects() to do the sensor measurements for each platform and update the state of each cell on which a measurement was performed. Update the cached entropy values (in the main CGrid object) which were affected at cells where the platforms currently reside. When two platforms are co-located, a cell is only observed and updated once.

ii) Call CBase::InitializeData() to update the state of the LP problem by adjusting the entropy coefficients in the objective function of the affected cells for all future times. These coefficients are shared so if platform k0 does a measurement at cell index 3 at time 1 then the new entropy value will be stored for all 1 < t ≤ T for all platforms 0 ≤ k ≤ K. For example this means that measurements taken by platform k0 will affect the coefficients in the LP problem for k1 and vice-versa in a two-platform simulation. Thus information is shared. In order to enforce that only future task assignments are allowed in an LP problem for t > 0, the task variables for all cells that are visited are set to have an upper bound of 0.

2) Do the Look-ahead Process to plan the next platform movements:i) Construct an object of class CRandomDirMap to randomize the directions platforms move in.ii) For k = 0..K-1, loop over the possible directions that vehicle k can move in:

a) Call FindPossibleDirections() to figure out how many ways platform k can move from its current position. FindPossibleDirections() will filter out directions which are in the grid but which would lead a platform out of range of the base by the end of the simulation.

b) If in Release mode and not Debug mode, each time k changes value, assign a new enumeration to the directions N, E, S, W with CRandomDirMap::Reset()

c) If the current move = pos+dir for platform k is valid then call CLPInterface::Solve() to calculate the future cost-to-go from this possible future state:1) Set a temporary bound to force platform k on the next turn to move to cell pos+dir

where “pos” is its current position. The move is forced by setting the appropriate flow variable’s lower bound to be 1.0.

2) Call lp_solve to get the cost-to-go of this possible future state3) Remove the temporary bound by setting the lower bound of the flow variable back to 0.

d) Check if the cost-to-go that is returned is better than the best one found so far for the other moves. If so store the value in the variable in “maxEntropy” and the current direction “dir” to the variable “bestDir”.

e) After looping over all possible directions, choose the best direction, store its value in the array sensorMoves[] and set the bound for that flow back to 1.0 and leave it that way.

3) Move the sensor platforms:i) Move the platforms one unit in the direction stored in sensorMoves[]ii) Update the number of visits stored in each cell that a platform moves to.

b) If at the last time in the current simulation, calculate the cost information, update statistics for the simulation and increase the simulation count.

7) If all the simulations are done let the objects go out of scope and their destructors will deallocate all of the memory that was used.

Appendix B, LP Variable Encoding Scheme:

This appendix contains information about the packing scheme that the simulation uses for

storing the LP variables in the array that represents the x vector in the equation Ax [< = >] b. Task

variables X are stored in a three-level hierarchy while flow variables Y are stored in a four-level

hierarchy. (If this simulation was modified to use look-ahead values of > 1 then another level of

encoding would have to be added to both encoding schemes). The member functions GetYPtr_time()

and GetXPtr_time() within the CLPInterface class are responsible for doing this encoding work. There

are a set of functions {GetX(), GetY() and SetX(), SetY()} in this class which use the pointers returned

by GetYPtr_time() and GetXPtr_time() to read and set task and flow variables. The functions

Get*Ptr_time() get a pointer to the first byte of the chunk of memory associated with that time and then

call Get*Ptr_sensor() to find the first byte of the chunk of memory associated with that sensor

platform. In a similar way Get*Ptr_sensor() ends up calling Get*Ptr_coord(). At the bottom most

level GetYPtr_coord() will call GetYPtr_dir(). The task variables don’t have a fourth level in their

packing scheme. All of these functions are declared as inline functions, which effectively removes the

function calls and substitutes the code in place of the function call once the program is compiled in

Release mode. This makes the program easy to debug in Debug mode and it will still run without too

much function call overhead in Release mode. Here is a graphical representation of the encoding

scheme:

X and Y: First grouping by time:

ex using T=50, k = 0, m = (0,0)

y(m,dir,k,t) and x(m,k,t): +-----+-----+-----+-----+-----+-----+-----+-----+-----+ |y_50 |x_50 |y_49 |x_49 | | ... | y_0 | x_0 | +-----+-----+-----+-----+-----+-----+-----+-----+-----+

X and Y: Second grouping by sensor:

y_50(m,dir,k):

+-----+-----+-----+-----+-----+---------+ |y_k0 |y_k1 |y_k2 | ... | y_k(K-1)| +-----+-----+-----+-----+-----+---------+

x_50(m,k):

+-----+-----+-----+-----+-----+---------+ | x_k0| x_k1| x_k2| ... | x_k(K-1)| +-----+-----+-----+-----+-----+---------+

X and Y: Third grouping by cell:

ex y_(50,k=0)(m,dir):

+------+------+------+------+------+------+------+------+------+--- |y(0,0)|y(1,0)|y(2,0)| ... |y(0,1)|y(1,1)|y(2,1)|y(3,1)| +------+------+------+------+------+------+------+------+------+---

------+------------+... | y(G-1,G-1) | ------+------------+

ex x_(50,k=0)(m):

+------+------+------+------+------+------+------+------+------+--- |x(0,0)|x(1,0)|x(2,0)| ... |x(0,1)|x(1,1)|x(2,1)|x(3,1)| +------+------+------+------+------+------+------+------+------+---

------+------------+... | x(G-1,G-1) | ------+------------+

Y: Fourth Grouping by Direction of Flow:

ex y_(50,k=0,(0,0))(dir):

+-----+-----+ | y_N | y_E | (The number of arcs varies with y’s position in the grid) +-----+-----+

Appendix C, Simulator Debug Information:

There are several functions that were used extensively in debugging the simulator. The

program is single-threaded and can thus be stepped through line-by-line when necessary. The only

exception to this ability is in the code of lp_solve which reads in *.lp files. The lp_solve function

read_lp_file() uses code in ytab.c and lex.c, which implement the yacc and lex parsing functions ported

from the unix operating system. Unfortunately, ytab.c actually includes the source code of lex.c into

its own file. This very atypical maneuver makes both file’s code very difficult to debug. This is

further complicated by the fact there are numerous goto’s in the files as well as #line directives which

arbitrarily label where the compiler says it is in the program’s compilation. This code is a mess and

cannot be stepped through in any kind of a reasonable fashion. It is also written to be cross-platform

compatible so there are layers upon layers of #ifdef … #endif statements to wade through: many

functions in the files have different versions corresponding to what platform or compiler is being used

to compile them. If break-points were set at the top of every function in the file ytab.c and the program

was stepped through, one could possibly learn enough about the programs flow to convert the

spaghetti-code into a more modern procedurized program. It might be easier to just rewrite the whole

parser, though there is a large body of code in the files read.c, ytab.c and lex.c.

With regard to testing the output of lp_solve and its ability to read in *.lp files, the following

three member functions of class CLPInterface were used: PrintSolution(), PrintLP() and WriteLP().

The first function, PrintSolution() will print the LP variables to file "Solution.dat" once an LP program

has been solved by calling Solve(). This file prints out a list of every task and flow variable at every

time for that solution. PrintLP() can be called with a filename such as “VRP.dat” to print out the entire

matrix structure A and the right hand side vector b in the equation “Ax [< = >] b” in dense format. For

small simulations where there are less than 2048 columns in the dense matrix, Microsoft Excel allowed

for the text file to be imported and viewed relatively easily. For larger simulations it was necessary to

either go through the labor of telling Excel to skip over importing some columns or use another

program such as Word and view the matrix’s coefficients in a much less orderly format. Lastly, the

member function WriteLP() can be used for debug purposes as well as storing an LP problem to a file

in its initial state. This is the function which will write out an LP problem to a file with a *.lp

extension. It is useful for looking at the LP problem in sparse format without all the extra zeros. All

of these test / debug functions are generally wrapped in “#ifdef _DEBUG … #endif ” preprocessor

statements so that they will be stripped out when the program is compiled in Release mode and thus

won’t waste processor time on I/O operations when the program is doing a real simulation.

To ease the difficulty of the debugging process, the #define’d constant “NAME_VRP” can be

set in simConstants.h to TRUE which will cause the Initialization routine in CLPInterface to label the

constraint rows and variable names of the LP problem. That way when debug output is dumped to a

file with one of the aforementioned functions, it is much easier to read the constraint equations and see

what is going on. Each constraint row has a label such as “Ey(n',2,k1,6)-Ey(2,n',k1,7) = 0.00”

to specify the purpose of that constraint. One important note: the letter “E” was used not to mean

expectation but as a space-abbreviated representation of the symbol “∑” for summation. This

particular example is a conservation of flow constraint for platform k1 that says there must be as many

vehicles flowing out of cell index 2 at time 7 as there are flowing into cell index 2 at time 6.

For debug output the class CSimConstants ties the output file (argv[2]) to the standard error

object “cerr”. By using “cerr << “a string” << endl;” throughout the program (after the simConsts

constructor), error reports can be logged to a file in this fashion. The default name for this log file is

“output.dat”.

Appendix D, Simulator Trial Run:

/* ------------------------------------------------------------------------------ */

input.dat:

3 2 2 10.3 0.701.01.0 10.00.10 0.90100 4

// 1: <GRID_SIZE> <K> <numOfClasses> <numOfModes>// 2: <init-prob-type1> <init-prob-type2> ... <init-prob-type(numOfClasses-1)>// 3: <time-mode1> <time-mode2> ... <time-mode(numOfModes)>// 4: <FA cost> <MD cost>// 5: <prob-y=0-mode1-type1> <prob-y=0-mode1-type2> ... <prob-y=0-mode1-type(numOfClasses)>// 6: <prob-y=0-mode2-type1> <prob-y=0-mode2-type2> ... <prob-y=0-mode2-type(numOfClasses)>// 7: <prob-y=0-modeN-type1> <prob-y=0-modeN-type2> ... <prob-y=0-modeN-type(numOfClasses)>// 8: <maxNumOfSimulations> <T>

/* ------------------------------------------------------------------------------ */

t = 0:

+-----+-----+-----+ |0.316|0.316|0.316| | 6 | 7 | 8 | +-----+-----+-----+ |0.316| 0.00|0.316| | 3 |K0 K1| 5 | +-----+-----+-----+ |0.316|0.316|0.316| | 0 | 1 | 2 | +-----+-----+-----+

sensorPos[0] == (1,1) sensorPos[1] == (1,1)

Objective Function Coeffs for t == 0:

X(0,k0,t4) X(1,k0,t4) X(2,k0,t4) X(3,k0,t4) X(4,k0,t4) X(5,k0,t4) X(6,k0,t4) X(7,k0,t4) X(8,k0,t4)0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316



X(0,k1,t3) X(1,k1,t3) X(2,k1,t3) X(3,k1,t3) X(4,k1,t3) X(5,k1,t3) X(6,k1,t3) X(7,k1,t3) X(8,k1,t3) 0.316 0.316 0.316 0.316 0.00 0.316 0.316 0.316 0.316







lp_solve()'s costs-to-go for prospective directions:

t==0, k==0, dir==N, entropyToGo == 1.8957150269382, k0 chooses N (for debugging, turned off logic to break ties randomly)t==0, k==0, dir==E, entropyToGo == 1.8957150269382t==0, k==0, dir==S, entropyToGo == 1.8957150269382t==0, k==0, dir==W, entropyToGo == 1.8957150269382

t==0, k==1, dir==N, entropyToGo == 1.8957150269382t==0, k==1, dir==E, entropyToGo == 1.8957150269382, k1 chooses Et==0, k==1, dir==S, entropyToGo == 1.8957150269382t==0, k==1, dir==W, entropyToGo == 1.8957150269382

t = 1:

+-----+-----+-----+ |0.316|0.073|0.316| | 6 | K0 | 8 | +-----+-----+-----+ |0.316| 0.00|0.253| | 3 | 4 | K1 | +-----+-----+-----+ |0.316|0.316|0.316| | 0 | 1 | 2 | +-----+-----+-----+














t==1, k==0, dir==E, entropyToGo == 1.2006963347606t==1, k==0, dir==S, entropyToGo == 0.94785751346912t==1, k==0, dir==W, entropyToGo == 1.2638100179588, k0 chooses W

t==1, k==1, dir==N, entropyToGo == 1.2006963347606t==1, k==1, dir==S, entropyToGo == 1.2638100179588, k1 chooses St==1, k==1, dir==W, entropyToGo == 0.94785751346912

t = 2: +-----+-----+-----+ |0.073|0.073|0.316| | K0 | 7 | 8 | +-----+-----+-----+ |0.316| 0.00|0.253| | 3 | 4 | 5 | +-----+-----+-----+ |0.316|0.316|0.253| | 0 | 1 | K1 | +-----+-----+-----+














t==2, k==0, dir==E, entropyToGo == 0.38917664490354t==2, k==0, dir==S, entropyToGo == 0.63190500897941, k0 chooses S

t==2, k==1, dir==N, entropyToGo == 0.56879132578116t==2, k==1, dir==W, entropyToGo == 0.63190500897941, k1 chooses W

t = 3:

+-----+-----+-----+ |0.073|0.073|0.316| | 6 | 7 | 8 | +-----+-----+-----+ |0.253| 0.00|0.253| | K0 | 4 | 5 | +-----+-----+-----+ |0.316|0.073|0.253| | 0 | K1 | 2 | +-----+-----+-----+













(Don't bother calling lp_solve for last move)

t==3, k==0, dir==E, entropyToGo == 0.0, k0 forced to move E, no entropy at base's coordinate

t==3, k==1, dir==N, entropyToGo == 0.0, k1 forced to move N, no entropy at base's coordinate

t = 4:

+-----+-----+-----+ |0.073|0.073|0.316| | 6 | 7 | 8 | +-----+-----+-----+ |0.253| 0.00|0.253| | 3 |K0 K1| 5 | +-----+-----+-----+ |0.316|0.073|0.253| | 0 | 1 | 2 | +-----+-----+-----+

t==4, no more moves to plan


MS Project

Documents

Transcript of MS Project