The Optimizing Simulator: An Intelligent Analysis Tool for ...castlelab.princeton.edu/html/Papers/Wu...

The Optimizing Simulator:

An Intelligent Analysis Tool for the Military Airlift Problem

Tongqiang Tony WuDepartment of Operations Research and Financial Engineering

Princeton UniversityPrinceton, NJ 08544

Warren B. PowellDepartment of Operations Research and Financial Engineering

Princeton UniversityPrinceton, NJ 08544

Alan WhismanAir Mobility CommandScott Air Force Base

July 2003

Abstract

There have been two primary modeling and algorithmic strategies for problems in military logistics:

simulation, offering tremendous modeling flexibility, and optimization, which offers the intelligence

of math programming. Each offers significant theoretical and practical advantages. In this paper,

we introduce the concept of an “optimizing simulator” which offers a spectrum of models and algo-

rithms which are differentiated only by the information content of the decision function. We argue

that there are four information classes, allowing us to construct decision functions with increasing

degrees of sophistication and realism as the information content of a decision is increased. Using

the context of the military airlift problem, we illustrate the succession of models, and demonstrate

that we can achieve improved performance as more information is used.

The military airlift problem, faced by the Air Mobility Command (AMC), deals with effec-

tively routing a fleet of aircraft to deliver requirements (troops, equipment, food and other forms

of freight) from different origins to different destinations as quickly as possible under a variety of

constraints. There are two major classes of models to solve the military airlift (and closely related

sealift) problem: optimization models (Morton et al. (1996), Rosenthal et al. (1997), Baker et al.

(2002)), and simulation models, such as MASS (Mobility Analysis Support System) and AMOS,

which are heavily used within the AMC. Bridging these approaches are stochastic programming

models (Goggins (1995),Morton et al. (2002)). In general, optimization models are intelligent but

less flexible while simulation models are flexible but are governed by simple rules. In this paper,

we introduce the concept of the “optimizing simulator” for this problem class, and propose a range

of models that span from simulation to optimization. We introduce four classes of information,

and create a spectrum of decision functions by controlling the information available to the decision

function. Our presentation is facilitated by a notational framework that provides natural connec-

tions with both simulation and optimization, while also providing constructs needed for complex

dynamic resource management problems.

Ferguson & Dantzig (1955) is one of the first to apply mathematical models to air-based trans-

portation. Subsequently, numerous studies were conducted on the application of optimization mod-

eling to the military airlift problem. Mattock et al. (1995) describe several mathematical modeling

formulations for military airlift operations. The RAND Corporation published a very extensive

analysis of airfield capacity in Stucker & Berg (1999). According to Baker et al. (2002), research

on air mobility optimization at the Naval Postgraduate School (NPS) started with the Mobility

Optimization Model (MOM). This model is described in Wing et al. (1991) and concentrates on

both sealift and airlift operations. Therefore, the MOM model is not designed to capture the char-

acteristics specific to airlift operations, but it is a good model in the sense that it is time-dependent.

THRUPUT, a general airlift model developed by Yost (1994), captures the specifics of airlift opera-

tions but is static. The United States Air Force Studies and Analysis Agency in the Pentagon asked

NPS to combine the MOM and THRUPUT models into one model that would be time dependent

and would also capture the specifics of airlift operations (Baker et al. (2002)). The resulting model

is called THRUPUT II, described in detail in Rosenthal et al. (1997). During the development

of THRUPUT II at NPS, a group at RAND developed a similar model called CONOP (CONcept

of OPerations), described in Killingsworth & Melody (1997). The THRUPUT II and CONOP

models each possessed several features that the other lacked. Therefore, the Naval Postgraduate

1

School and the RAND Corporation merged the two models into NRMO (NPS/RAND Mobility

Optimizer), described in Baker et al. (2002). Crino et al. (2002) introduced the group theoretic

tabu search method to solve the theater distribution vehicle routing and scheduling problem. Their

heuristic methodology evaluates and determines the routing and scheduling of multi-modal theater

transportation assets at the individual asset operational level.

A number of simulation models have also been proposed for airlift problems (and related prob-

lems in military logistics). Schank et al. (1991) reviews several deterministic simulation models.

Burke et al. (2002) at the Argonne National Laboratory developed a model called TRANSCAP

(Transportation System Capability) to simulate the deployment of forces from Army bases. The

heart of TRANSCAP is the discrete-event simulation module developed in MODSIM III. Perhaps

the most widely used model at the Air Mobility Command is MASS (Mobility Analysis Support

System) which is a discrete-event worldwide airlift simulation model used in strategic and theater

operations to deploy military and commercial airlift assets. It once was the standard for all airlift

problems in AMC and all airlift studies were compared with the results produced by the MASS

model. MASS (and its replacement AMOS), provides for a very high level of detail, allowing AMC

to run analyses on a wide variety of issues.

One feature of simulation models is their ability to handle uncertainty, and as a result there

has been a steady level of academic attention toward incorporating uncertainty into optimization

models. Dantzig & Ferguson (1956) is one of the first to study uncertain customer demands in the

airlift problem. Midler & Wollmer (1969) also takes into account stochastic cargo requirements.

They formulated two two-stage stochastic linear programming models – a monthly planning model

and a daily operational model for the flight scheduling. Goggins (1995) extended Throughput

II (Rosenthal et al. (1997)) to a two-stage stochastic linear program to address the uncertainty

of aircraft reliability. Niemi (2000) expands the NRMO model to incorporate stochastic ground

times through a two-stage stochastic programming model. To reduce the number of scenarios for

a tractable solution, they assume that the set of scenarios is identical for each airfield and time

period and a scenario is determined by the outcomes of repair times of different types of aircraft.

Their resulting stochastic programming model has an equivalent deterministic linear programming

formulation. Granger et al. (2001) compared the simulation model and the network approximation

model for the impact of stochastic flying times and ground times on a simplified airlift network

(one origin aerial port of embarkation (APOE), three intermediate airfields and one destination

2

aerial port of debarkation (APOD)). Based on the study, they suspect that a combination of

simulation and network optimization models should yield much better performance than either one

of these alone. Such an approach would use a network model to explore the variable space and

identify parameter values that promise improvements in system performance, then validate these

by simulation, Morton et al. (2002) developed the Stochastic Sealift Deployment Model (SSDM), a

multi-stage stochastic programming model to plan the wartime, sealift deployment of military cargo

subject to stochastic attack. They used scenarios to represent the possibility of random attacks.

NRMO is a linear programming model that is intelligent but not as flexible as a traditional

simulation. On the other hand, MASS is a simulation model with tremendous flexibility but is

governed by relatively simple rules. There has been interest in combining the strengths of these

two technologies. For example Schank et al. (1994) makes the point:

For the future, we recommend research into the integration of the KB (KnowledgeBased) and the MP (Math Programming) procedures. · · · The optimization approachdirectly determines the least-cost set of assets but does so with some loss of detail andby assuming that the world operates in an optimal fashion. KB simulations capturedetails and provide more realistic representations of real-world activities but do notcurrently balance trade-offs among transport assets, procedures, and cost.

Optimization models are often promoted because the solutions are “optimal” (or in some sense

“better” than what is happening in practice); simulation models lay claim to being more realistic,

exploiting their ability to capture a high level of detail. In practice, the real value of optimization

models is not in their solution quality, but rather in their ability to respond with intelligence to new

datasets. Simulation models are driven by rules, and it is often the case that new sets of rules have

to be coded to produce desirable behaviors when faced with new datasets. Optimization models

often behave very reasonably when given entirely new datasets, but this behavior depends on the

calibration of a cost model which typically has to combine hard dollar costs (such as the cost of fuel)

against “soft costs” which are introduced to encourage desirable behaviors. As a result, “optimal”

only takes meaning relative to a particular objective function; it is not hard for an experienced

expert to look at a solution and claim “it may be optimal but it is wrong!” Of course, experienced

modelers have long recognized that optimization models are only approximations of reality, but

this has not prevented claims of savings based on the results of an optimization model.

We present a model of the military airlift problem based on the DRTP (dynamic resource

transformation problem) notational framework of Powell et al. (2001). We use the concept of an

3

“optimizing simulator” for making decisions, where we propose a spectrum of decision functions

by controlling the amount of information available to the decision function. The DRTP notational

framework provides a natural bridge between optimization and simulation by providing some subtle

but critical notational changes that make it easier to model the information content of a decision. In

addition, the DRTP modeling paradigm provides for a high level of generality in the representation

of the underlying problem.

In section 1, we briefly review the NRMO model, MASS model and SSDM model. Our for-

mulation of the military airlift problem is presented in section 2. In section 3, we introduce all

the policies employed in this paper. These policies help to close the gap between simulation and

optimization models. One policy is based on the use of value function approximations that capture

the impact of decisions on the future. These are introduced in section 4. We conduct numerical

experiments on the optimizing simulator in section 5 and section 6 is the conclusion.

1 A comparison of models

In this section, we review the characteristics of the NRMO and MASS models to compare linear

programming with simulation, where a major difference is the modeling of information and the

ability to handle different forms of complexity. We also compare a stochastic programming model,

which represents one way of incorporating uncertainty into an optimization model. Following this

discussion, section 2 presents a model of the military airlift problem based on a modeling framework

presented in Powell et al. (2001). At the end of section 2, we compare all four modeling strategies.

1.1 The NRMO Model

The NRMO model has been in development since 1996 and has been employed in several airlift

studies. For a detailed review of the NRMO model, including a mathematical description, see Baker

et al. (2002). The goal of the NRMO model is to move equipment and personnel in a timely fashion

from a set of origin bases to a set of destination bases using a fixed fleet of aircraft with differing

characteristics. This deployment is driven by the movement and delivery requirements specified in

a list dubbed the Time-Phased Force Deployment Data set (TPFDD). This list essentially contains

the cargo and troops, along with their attributes, that must be delivered to each of the bases of a

given military scenario.

4

Aircraft types for the NRMO runs reported by Baker et al. (2002) include C-5, C-17, C-141,

Boeing 747, KC-10 and KC-135. Different types of aircraft have different features, such as passenger

and cargo-carrying capacity, airfield capacity consumed, range-payload curve, and so on. The

range-payload curve specifies the weight that an aircraft can carry given a distance traveled. These

range-payload curves are piecewise linear concave.

The activities in NRMO are represented using three time-space networks: one flows aircraft,

the second the cargo (freight and passengers) and the third flows crews. Cargo and troops are

carried from onload aerial port of embarkation (APOEs) to either offload aerial port of debarkation

(APODs) or forward operating bases (FOBs). Certain requirements need to be delivered to the

FOBs via some other means of transportation after being offloaded in the APODs. Some aircraft,

however, can bypass the APODs and deliver directly to the FOBs. Each requirement starts at a

specific APOE dictated by the TPFDD and then is either dropped off at an APOD or a FOB.

NRMO makes decisions about which aircraft to assign to which requirement, which freight

will move on an aircraft, which crew will handle a move, as well as variables that manage the

allocation of aircraft between roles such as long range, intertheatre operations and shorter, shuttle

missions within a theatre. The model can even handle the assignment of KC-10’s between the role

of strategic, cargo hauler and mid-air refueler.

In the model, NRMO minimizes a rather complex objective function that is based on several

“costs” assigned to the decisions. There are a variety of costs, including hard operating costs (fuel,

maintenance) and soft costs to encourage desirable behaviors. For example, NRMO assigns a cost

to penalize deliveries of cargo or troops that arrive after the required delivery date. This penalty

structure charges a heavier penalty the later the requirement is delivered. Another term penalizes

the cargo that is simply not delivered. The objective also penalizes reassignments of cargo and

deadheading crews. Lastly, the objective function offers a small reward for planes that remain at

an APOE as these bases are often in the continental US and are therefore close to home bases of

most planes. The idea behind this reward is to account for uncertainty in the world of war and to

keep unused planes well positioned in case of unforeseen contingencies.

The constraints of the model can be grouped into seven categories: (1) demand satisfaction; (2)

flow balance of aircraft, cargo and crews; (3) aircraft delivery capacity for cargo and passengers; (4)

the number of shuttle and tanker missions per period; (5) initial allocation of aircraft and crews;

(6) the usage of aircraft of each type; and (7) aircraft handling capacity at airfields. A general

5

statement of the model (see Baker et al. (2002) for a detailed description) is given by:

minx∑

t ctxt

subject to :Atxt −

∑tτ=1 Bt−τ,txt−τ = Rt

Dtxt ≤ ut

xt ≥ 0

(1)

Here t represents when an activity starts and τ is the time required to complete an action. The

matrices At and Bt−τ,t handle system dynamics. Dt captures possible coupling constraints the

run across the flows of aircraft, cargo and crews. ut is the upper bound on flows, and Rt models

the arrival of new resources to be managed. xt is the decision vector, and ct is the cost vector

corresponding to decision vector xt.

NRMO is implemented with the algebraic modeling language GAMS (Brooke et al. (1992)),

which facilitates the handling of states. There are a number of sets and subsets in NRMO, such

as sets of time periods, requirements, cargo types, aircraft types, bases, and routes. To make a

decision, one element of each suitable set is pulled out to form an index combination, such as an

aircraft of a certain type delivering a certain requirement on a certain route departing at a certain

time. They screen the infeasible index combinations to achieve a tractable computation. The

information in the NRMO model is captured in the TPFDD, which is known at the beginning of

the horizon.

NRMO requires that the behavior of the model be governed by a cost model (which simulators

do not require). The use of a cost model minimizes the need for extensive tables of rules to

produce “good” behaviors. The optimization model also responds in a natural way to changes in

the input data (for example, an increase in the capacity of an airbase will not produce a decrease

in overall throughput). But linear programming formulations suffer from weaknesses. A significant

limitation is the difficulty in modelling complex system dynamics. For example, a simulation model

can include logic such as “if there are four aircraft occupying all the parking spaces, a fifth aircraft

will have to be pulled off the runway where it cannot be refueled or repaired.” In addition, linear

programming models cannot directly model the evolution of information. This limits their use in

analyzing strategies which directly affect uncertainty (What is the impact of last-minute requests

on operational efficiency; What is the cost of sending an aircraft through an airbase with limited

maintenance facilities where the aircraft might break down?).

6

1.2 The MASS Model

The MASS model is a rule-based simulation model. Let

St = The state vector of the system at time t.

π = The policy employed.

Xπ = The function that maps a state St to a decision xt, under policy π.

xt = The vector of decisions.

ωt = Exogenous information at time t.

Mt = The transfer function that represent the system dynamics, i.e., starting at state

St, employing policy π, and getting new exogenous information ωt+1, Mt will lead

us to new system state St+1.

The policy π in the MASS model requires a sequence of feasibility checks. In the TPFDD,

the MASS model picks the first available requirement and checks the first available aircraft to

see whether this aircraft can deliver the requirement. In this feasibility check, the MASS model

examines whether the aircraft and the requirement are at the same airbase, whether the aircraft

can handle the size of the cargo in the requirement, as well as whether the en-route and destination

airbases can accommodate the aircraft. If the aircraft cannot deliver the requirement, the MASS

model checks the second available aircraft, and so on, until it finds an aircraft that can deliver this

requirement. If the aircraft can, the MASS model finishes the first requirement and moves to the

second requirement. It continues this procedure for the requirement in the TPFDD until it finishes

all the requirements. This policy is a rule-based policy, which does not use a cost function to make

the decision.

The evolution of states and decisions in the MASS model is as follows. At time t, the state of

the system, St, is known. Employing the policy π described above to make decisions given state

St, produces the decision xt. After the decision is made at time t, and the new information at time

t+1, ωt+1, has arrived, the system moves to a new state St+1 at time t+1. Thus, the MASS model

can be mathematically presented as:

xt ← Xπ(St)St+1 ← Mt(St, xt, ωt+1)

(2)

7

1.3 The SSDM Model

The Stochastic Sealift Deployment Model (SSDM) (Morton et al. (2002)) is a multi-stage stochastic

mixed-integer program designed to hedge against potential enemy attacks on seaports of debarka-

tion (SPODs) with probabilistic knowledge of the time, location and severity of the attacks. They

test the model on a network with two SPODs (the number actually used in the Gulf War) and

other locations such as a seaport of embarkation (SPOE) and locations near SPODs where ships

wait to enter berths. To keep the model computationally tractable, they assumed only a single

biological attack can occur. Thus, associated with each scenario is whether or not an attack occurs,

and if it does occur, the time ts at which it occurs. We then let Ts = 1, . . . , ts − 1 be the set

of time periods preceding an attack which occurs at time ts. This structure results in a scenario

tree that grows linearly with the number of time periods, as illustrated in figure 1. The stochastic

programming model for SSDM can then be written:

minx

∑s

∑t

psctxts (3)

subject to :

Atxts −t∑

τ=1

Bt−τ,txt−τ,s = Rt (4)

Dtxts ≤ uts (5)

xts = xts′ ∀t ∈ Ts ∩ Ts′ (6)

xts ≥ 0 and integer. (7)

Here s is the index for scenarios. ps is the probability of scenario s. t, τ, At, Bt−τ,t, ct, Dt and

Rt are the same as introduced in the NRMO model. uts is the upper bound on flows where the

capacities of SPOD cargo handling are varied under different attack scenarios, xts is the decision

vector. The non-anticipativity constraints (6) say that if scenarios s and s′ share the same branch

on the scenario tree at time t, the decision at time t should be same for those scenarios.

In general, the stochastic programming model is much larger than a deterministic linear pro-

gramming model and still struggles with the same difficulty in modelling complex system dynamics.

One has to design the scenarios carefully to avoid the exponential growth of scenarios in the stochas-

tic programming model. The value of the stochastic programming formulation is that it provides

8

No attack Attack

Tim

e

t1

t2

t3

t4

Figure 1: The single-attack scenario tree used in SSDM

for the analysis of certain forms of uncertainty (such as which SPOD or SPODs might be attacked

and the nature of the attack) so that robust strategies can be derived. Such problems do not require

a massive number of scenarios, and do not call for a high level of detail in the model.

2 A model of the military airlift problem

To represent the fundamentals behind the Air Mobility Command problem, it is useful to view the

problem as a Dynamic Resource Transformation Problem. This framework, established in Powell

et al. (2001), offers several features that are especially important for this problem class, including

the explicit modeling of the information content of decisions, the ability to model complex resource

attributes, and the ease with which new classes of decisions can be introduced. In this section, we

present the core elements of the problem, sometimes sacrificing completeness in order to focus on

the modeling of information.

Throughout this paper, we represent space and time using:

J = The set of airbases to which the airplanes can go. These airbases can be located

all over the world. We always use i, j to denote the airbases, i.e. i, j ∈ J

τij = The travel time between airbases i and j.

T = The discrete set of time periods that we simulate.

From time to time, we need to restrict our planning to information within a horizon. For this, we

let:

T pht = The set of time periods within a planning horizon for a problem being solved at

time t.

9

In section 2.1, resource classes and resource layering are described. Processes, including infor-

mation processes and system dynamics, are introduced in section 2.2. The controls are described

in section 2.3. We present the optimization formulation in section 2.3.3 and finally, we compare

the NRMO, MASS and DRTP models in section 2.4.

2.1 Resources

We use the term “resources” to refer to all physical objects that are being managed. In the airlift

problem, this refers to both the aircraft and the requirements. Our notation easily generalizes to

handle additional resources layers such as pilots, fuel and special equipment.

2.1.1 Resource Classes

There are two resources classes in the military airlift problem: aircraft and requirements. Define,

CR = The set of resources classes.

= Aircraft(A), Requirement(R)

We use the superscript A to denote the flavor of aircraft and superscript R to denote that of

requirements. In a more realistic model, we might include additional resource classes such as

pilots, fuel, maintenance resources and equipment for loading and unloading aircraft.

The resource spaces and attribute vectors can be represented as follows:

AA = The attribute space of aircraft.

aA = The attribute vector of aircraft, aA ∈ AA.

AR = The attribute space of requirements.

aR = The attribute vector of requirements, aR ∈ AR.

aφ = The null attribute vector. It has the dimensionality of the attribute vector for

requirements but the elements are empty, aφ ∈ AR.

The attribute vector of an aircraft is shown in Example 1. Certain attributes are inherent to the

particular resource and are therefore static, such as the type of aircraft. However, other attributes

are dynamic and constantly change, such as the location of the aircraft.

10

Example 1(Attribute vector of the aircraft):aA1 = The type of aircraft(e.g. C-17).

aA2 = The weight capacity of the plane(e.g. 106540 lbs).

aA3 = The current airfield where the plane is located(e.g. EDAR).

aA4 = The weight loaded in this aircraft(e.g. 0).

The attribute vector of a requirement is shown in Example 2.

Example 2(Attribute vector of the requirement):aR1 = The requirement ID(e.g. 2).

aR2 = The type of the cargo size(e.g. Bulk).

aR3 = The origin of the requirement(e.g. KBOS).

aR4 = The current airfield where the requirement is located(e.g. KBOS).

aR5 = The final destination of the requirement(e.g. OEDF).

The elements of the attribute vector can be divided into two broad groups of elements: static

attributes (e.g. the type of an aircraft) and dynamic attributes (e.g. the location of an aircraft). The

concept of a commodity that is common in modeling is equivalent to the vector of static attributes

of a resource, while the state of that commodity is equivalent to the vector of dynamic attributes

of that resource. Thus, one can view the military airlift problem as a multi-commodity problem

with a large number of states, but we have found that a single attribute vector is more flexible

and compact. In particular, certain algorithmic strategies depend on the ability to aggregate the

attribute space A in a general way (see section 3.5).

2.1.2 Resource Layering

In our presentation, aircraft and requirements are primitive resources. We can also create a com-

posite resource consisting of an aircraft and a requirement. We distinguish between the primitive

resource class “aircraft” and the aircraft layer which might consist of an aircraft and a requirement.

We would need to model the composite layer if we need to model information and decisions that

occur after a requirement is loaded onto an aircraft. For example, if a loaded aircraft breaks down,

we face the decision of whether to offload the requirement, or to delay the requirement while the

aircraft is delayed. Such a decision requires modeling a loaded aircraft which has just broken down.

To clearly distinguish between the resource and the resource layer, we use the same letter as a

11

resource surrounded by a pair of parentheses to denote that resource layer. We let:

(A) = A|R denotes the aircraft layer.

(R) = R denotes the requirement layer (the requirement layer is primitive).

Aircraft and requirements can be coupled to form an aircraft layer (composite resources) that

can be moved toward intermediate stops or the destination and uncoupled back to aircraft and

requirement after the trip. We may also have to uncouple the aircraft from the requirement enroute

if there is a mechanical failure. An aircraft can also be a primitive resource, which we represent

with a(A) = aA|aφ, an aircraft layer that couples an aircraft with a null requirement. Thus, the

attribute vector of the aircraft layer consists of attribute vectors of both aircraft and requirement

or aircraft and null with appropriate size. We define:

A(A) = AA|R

= AA ×AR, the attribute space of an aircraft layer.

a(A) = aA|R

= aA|aR ∈ A(A), the attribute vector of an aircraft layer.

The requirement layer is a primitive layer with attribute vector of only requirement attributes.

A(R) = AR, The attribute space of a requirement layer.

a(R) = aR, The attribute vector of a requirement layer.

The attribute space is determined largely by the design of the decision set D, which in turn must

reflect the evolution of information. In Figure 2, an aircraft has to pick up a requirement at i and

move it to j, in the process moving through intermediate airbases i′ and i′′. We can represent this as

a single composite decision where at the completion of the decision, the plane is sitting at j and the

requirement has been delivered. Alternatively, we could represent the primitive decision of simply

moving the aircraft one leg at a time. This requires modeling the aircraft with the requirement at

each intermediate stop. This produces a much larger attribute space, but allows the modeling of

new information as the aircraft arrives at each stop. At the same time, the number of primitive

decisions is much less than that of composite decisions because the number of reachable airbases

is much less than the number of routes between origin and destination. As a result, we typically

12

i

Origin i’

i’’

j

Destination

Composite Decision

Primitive Decision

Figure 2: Primitive and Composite Actions

find that there is a tradeoff between the complexity of the attribute space and the complexity of

the decision space.

2.1.3 Time Staging of Information

It is very common that information about activities is available before the activity actually takes

place. This arises when we know that an aircraft is inbound (but has not yet arrived) or because

a requirement has been booked now to be moved in the future. We model the difference between

when we know about something and when it happens using:

Rt,t′a′ = The number of resources that are knowable at time t, and will be actionable with

attribute vector a′ ∈ A at time t′. For the aircraft layer, the unit is the number of

aircraft. For requirements, the unit is the number of pounds with attribute vector

a′ ∈ A(R).

Rtt′ = The resource layer vector that is knowable at time t and actionable at time t′.

= (Rt,t′a′)a′∈A

= (R(A)tt′ , R

(R)tt′ )

Rt = The resource layer vector that is knowable at time t.

= (Rtt′)t′≥t

13

= (R(A)t , R

(R)t )

When we refer to the vector Rtt′ , t is referred to as the knowable time while t′ is referred to as the

actionable time.

Deterministic optimization models such as NRMO assume that all information is known at

time t = 0. Simulation models such as MASS (or AMOS), on the other hand, usually assume that

resources become knowable and actionable at the same time. Although this is not represented ex-

plicitly in a TPFDD, we explicitly model the difference between when a resource becomes knowable

and actionable.

2.2 Processes

In this section, we model exogeneous information processes, decisions, and system dynamics.

2.2.1 Exogenous Information Processes

We can have information arriving about resources (aircraft entering the system for the first time,

new requirements becoming known for the first time). As before, we capture the property that a

resource that becomes known at time t may not be actionable until time t′ ≥ t.

Rt,t′a′ = The number of new resources that first become known at time t, and will first

become actionable with attribute vector a′ ∈ A at time t′.

Rtt′ = The new resource layer vector that first become known at time t and actionable

at time t′.

= (Rt,t′a′)a′∈A

= (R(A)tt′ , R

(R)tt′ )

Rt = Vector of new resources that first become known to the system at time t.

= (Rtt′)t′≥t

= (R(A)t , R

(R)t )

In addition to Rt, we may also wish to model other information processes such as weather delays

or equipment breakdowns. More general notation is to introduce an information variable Wt that

14

would include Rt as well as other exogenous information such as wind speeds, delays at airbases

and the probability of equipment failures. We then let:

Ω = The set of possible outcomes of exogenous information Wt, t ≥ 0.

ω = One sample of the exogenous information (ω ∈ Ω). For simplicity, assume that

outcomes are discrete.

p(ω) = The probability of outcome ω.

2.2.2 Decisions

In addition to exogenous information, the AMC system also evolves as a result of endogenous

information, in the form of a sequence of decisions. Let

CD = Set of decision classes.

Dc = Set of specific decision in class c ∈ CD.

D = The set of decisions that can be applied to the resource layer.

=⋃

c∈CD Dc

For example, the airlift problem in this paper has the decision classes:

CD = coupling aircraft and requirement, uncoupling aircraft layer, moving aircraft layer,

do nothing

The class “moving aircraft layer” might include decisions:

Dmoving aircraft layer = CannonAFB,Ramstein,KuwaitCity, . . .

It is convenient to define:

Da = The subset of decisions that can be applied to a resource with attribute vector

a ∈ A. Da ⊆ D.

dφ = The decision to do nothing (hold).

15

xtad = The number of times that we apply a decision d ∈ Da to a resource with attribute

vector a ∈ A during time t. For the aircraft layer, the unit of flow is the number of

aircraft. For requirements, the unit of flow is the number of pounds with attribute

vector a ∈ A(R).

xt = xtada∈A,d∈Da

Decisions have to satisfy the constraints:

∑d∈Da

xtad = Rt,ta ∀a ∈ A (8)

xtad ≥ 0 and integer ∀a ∈ A(A), d ∈ Da (9)

xtad ≥ 0 ∀a ∈ A(R), d ∈ Da (10)

Our notation contrasts with the more classical xkij notation employed by the multicommodity

flow community. One limitation of the classical notation is that the outcome of the decision (the

destination node j) is deterministic. We may wish to simulate weather delays or enroute equipment

failures that are most easily modeled using our notation. The xtad notation is indexed purely by

the information available when the decision is made. The notation also allows us to use aggregation

in a flexible and powerful way, which we illustrate in section 3.5.

We defer until section 2.3 the discussion of how decisions are made. However, the next section

models the impact of decisions on our system.

2.2.3 System dynamics

To capture the effect of a decision, we define the modify function as follows:

M(t, a, d)→ (a′, c, τ)

In this equation, a ∈ A is the attribute vector of the resource that the decision is being applied

to, d ∈ Da is the decision that acts on the resource, and t is the time at which we make the decision.

After the decision is made, the outcome of the transformation is captured in the triplet (a′, c, τ),

where a′ is the new attribute vector of the modified layered resource, c is the cost of the decision,

and τ is the time it takes to carry out the decision and transform the resource. Having this function,

we can define the following:

16

ctad = The cost of applying decision d to a resource with attribute a ∈ A at time t.

ct = ctada∈A,d∈Da

It is useful to define:

δt,t′a′(t, a, d) =

1 If M(t, a, d) produces an aircraft layer with attribute vector a′ and if t+τ = t′

0 Otherwise

The dynamics of the layered resource can be modeled as follows:

Rt+1,t′a′ = Rt,t′a′ +∑a∈A

∑d∈Da

δt,t′a′(t, a, d)xtad + Rt+1,t′a′ ∀a′ ∈ A, t′ > t (11)

Equation (11) contains three elements on the right hand side: 1) what we know about the status

of resources for time t′ at time t, 2) the endogenous changes to resources for time t′ that are made

during period t, and 3) the exogenous changes to resources for time t′ that arrives during time t.

We may also express equation (11) using matrix notation as:

Rt+1 = Rt + Atxt + Rt+1 (12)

2.3 Controls

For the purposes of this paper, we restrict our discussion of controls to the types of decisions that

are being made for our model of the airlift problem, the decision function and the optimization

formulation that forms the basis of our optimizing simulator. For a more complete discussion of

controls, see Powell et al. (2001).

2.3.1 Types of decisions

Decisions that act on resources (these are the only decisions we consider in our model) can be

organized into three fundamental groups: couple (an aircraft and a requirement), modify (move an

aircraft with or without a requirement), and uncouple (remove the requirement from the aircraft).

In our model, we assume a requirement is decoupled when the aircraft arrives at the destination. We

have not explicitly modeled uncoupling at intermediate points (for example, to model the decision

to offload a requirement if an aircraft breaks down), but this would be straightforward. One goal

of our notation is to represent every action on a resource as a attribute-decision (a, d) pair. Not

17

only is the resulting optimization model quite compact, it also facilitates one of our algorithmic

strategies (see section 3.5).

Our decisions are given by:

DA,ca = The set of decisions to couple an aircraft with attribute a ∈ AA to requirements.

Each element of DA,ca is a potential requirement.

= dcaa′ , ∀a′ ∈ AR, a ∈ AA

DA,c = The set of decisions to couple an aircraft with a requirement.

= (DA,ca )a∈AA

= dcaa′ , ∀a ∈ AA, a′ ∈ AR

D(A),ma = The set of decisions to move an aircraft(layer) with attribute a ∈ A(A).

= dmaj , ∀j ∈ J , a ∈ A(A)

D(A),m = (D(A),ma )a∈A(A)

DR,ca′ = The set of decisions to couple a requirement with attribute a′ ∈ AR to an aircraft.

Each element of DR,ca′ is a potential aircraft.

= dca′a , ∀a ∈ AA, a′ ∈ AR

DR,c = (DR,ca′ )a′∈AR

= dca′a , ∀a ∈ AA, a′ ∈ AR

Thus,

D(A) = The set of decisions that can be applied to an aircraft layer.

= DA,c ∪ D(A),m ∪ dφ

DR = The set of decisions that can be applied to a requirement.

= DR,c ∪ dφ

D = The set of decisions that can be applied to layered resource (d ∈ D )

= D(A),DR

For a coupled aircraft (an aircraft with a requirement), it may only be moved or held. For a coupled

requirement, it is modified according to the aircraft that it coupled with and no other decision can

be made on it.

18

Let:

aw = Capacity in weight (pounds) of an aircraft with attribute vector a ∈ A(A)

The total capacity that the aircraft a ∈ AA coupled with a requirement a′ ∈ AR has must be

greater than the weight of requirement a′ ∈ AR coupled with the aircraft a ∈ AA:

xta′d′ ≤ aw · xtad ∀a ∈ AA, a′ ∈ AR : d = dcaa′ , d

′ = dca′a (13)

The coupling takes some time to transfer an empty aircraft to a coupled aircraft layer. Even already

coupled, the aircraft can still be held (do nothing, dφ ) to wait for a suitable time to take off. At the

destination, we uncouple the coupled aircraft layer. However, in the middle of the trip, the aircraft

layer may also be uncoupled to handle the situation that the aircraft fails during a trip. In this

case, the uncoupled requirement becomes a totally new requirement that needs to be reassigned.

One benefit in adopting the DRTP framework is that it is very easy to extend the model to

handle new types of decisions. For example, in the event that AMC would charter outside aircraft

to move the requirements (equivalent to moving a requirement without an (AMC) aircraft), the

only modification is the extension of decision set DR:

DR = DR,c ∪ dφ ∪ DR,m,

where,

DR,m = The set of decisions to move a requirement by outside aircraft (not AMC aircraft).

This set consists of a single element (effectively stating “move requirement to its

destination”) if we model outside aircraft as an unlimited resource that we are not

managing.

2.3.2 The decision function

The decision function depends on the available information. Define,

It = The information available at time t.

xt = Vector of decisions for time t.

19

Π = The set of policies, π ∈ Π.

Xπt (It) = Under policy π ∈ Π, the function that determines the set of decisions xt based on

the information It.

When optimizing the AMC system, our challenge is to find the best function Xπt . Note that

the structure of Xπ depends on the information that was available. The most important class of

information is the resource vector Rt, but we might include current estimates of travel times or

forecasts of future activities.

2.3.3 The Optimization formulation

We are now ready to state the overall optimization problem. Let:

Ct(xt) = The total contribution due to activities initiated in time period t

=∑a∈A

∑d∈D

ctadxtad

Our objective function is linear, although it is fairly straightforward to use nonlinear functions.

The decisions xt are returned by our decision function Xπt (It), so our challenge is not so much to

choose the best decisions but rather to choose the best policy. Thus, our optimization problem is

given by:

maxπ∈ΠE

[∑t∈T

Ct(Xπt (It))

](14)

This optimization is subject to the following constraints:

• Resource dynamics: Equation (11)

• Resource flow conservation: Equation (8)

• Non-negativity: Equation (9) and (10)

• Coupling (layering): Equation (13)

This formulation forms the basis of our optimizing simulator. In the remainder of the paper, we

describe different policies π, where each corresponds to the availability of a particular information

set (which determines It).

20

2.4 Comparison of NRMO, MASS, SSDM and DRTP models

The characteristics of the NRMO, MASS, SSDM and DRTP models are listed in Table 1. In this

table, we are focusing on the model (that is, the representation of the physical problem) rather than

the algorithm used to solve the problem. The primary distinguishing feature of these models is how

they have captured the flow of information, but they also differ in areas such as model flexibility,

and the responsiveness to changes in input data. The DRTP representation can produce a linear

programming model such as NRMO if we ignore evolving information processes, or a simulation

model such as MASS or AMOS if we explicitly model the TPFDD as an evolving information

process and code the appropriate rules for making decisions. As such, the DRTP representation

provides a mathematical representation that spans optimization and simulation.

3 A spectrum of decision functions

To build a decision function, we have to specify the information that is available to us when we

make a decision. There are four fundamental classes of information that we may use in making a

decision:

• Knowledge (Kt)- Exogenously supplied information on the state of the system (resources and

parameters).

• Forecasts of exogenous information process (Ω) - Forecasts of future exogenous information.

• Forecasts of the impact of decisions on other subproblems (V ) - These are functions which

capture the impact of decisions made in one point in time on another.

• Forecasts of decisions (ρ) - Patterns of behavior derived from a source of “expert knowledge.”

These represent a form of forecast of decisions at some level of aggregation.

Our view of information, then, can be viewed as one class of exogenously supplied information (data

knowledge), and three classes of forecasts. In the remainder of this section, we provide examples

and explain the general notation given above.

For each information set, there are different classes of decision functions (policies). Since Π is

the set of all possible decision functions that may be specified, in this paper, we investigate five

21

Model NRMO MASS SSDM DRTPCategory Large-scale linear

programmingSimulation Multi-stage

stochastic pro-gramming

Optimizing simu-lator

Informationprocesses

Requires knowingall informationwithin T horizon

at time 0. Can-not distinguishbetween know-able time t andactionable timet′ ≥ t.

Assumes action-able time equalsknowable time.May but doesn’tdistinguish be-tween knowabletime t and ac-tionable timet′ ≥ t.

Actionable timeequals knowabletime.

General modelingof knowable andactionable time.At time t, knowthe informationthat is actionableat time t′ ≥ t.

AttributeSpace

Multicommodityflow (attributeincludes aircrafttype and location)

Multi-attribute(location, fuellevel, mainte-nance)

Homogeneousships and cargo,extendable tomultiple shiptypes

General resourcelayering

Complexityof systemdynamics

Linear systems ofequations

Complex systemdynamics

Simple linear sys-tems of equations

Complex systemdynamics

Number ofScenarios

Single May be multiple Multiple May be multiple

DecisionSelection

Cost-based Rule-based Cost-based Span from rule-based to cost-based

InformationModeling

Assumes that ev-erything is known

Myopic, local in-formation

Assumes knowingthe probabilitydistrubution ofscenarios.

General modelingof information

Model Behav-ior

Reacts to datachanges intelli-gently

Noisy response tochanges in inputdata

Similar to LP, butproduces robustallocations

Can react withintelligence; willdisplay some noisecharacteristic ofsimulation

Table 1: Characteristics of NRMO, MASS and DRTP models

subclasses of Π: rule-based policies, myopic cost-based policies, rolling horizon policies, approximate

dynamic programming policies and expert knowledge policies. These can be classified into three

broad groups: rule-based, cost-based and hybrid. The policies we consider are as follows:

Rule-based policies:

ΠRB = The class of rule-based policies (missing information on costs).

22

Cost-based policies:

ΠMP = The class of myopic cost-based policies(includes cost information, but limited to information

that is knowable now).

ΠRH = The class of rolling horizon policies(includes forecasted information). Note that if |Ω| = 1, we

get a classical deterministic, rolling horizon policy which is widely used in dynamic settings.

ΠADP = The class of approximate dynamic programming policies. We use a functional approximation

to capture the impact of decisions at one point in time on the rest of the system.

Hybrid policies:

ΠEK = The class of policies that use expert knowledge. This is represented using low dimensional pat-

terns expressed in the vector ρ. Policies in ΠEK combine rules (in the form of low dimensional

patterns) and costs, and therefore represent a hybrid policy.

In the sections below, we describe the information content of different policies. The following

short-hand notation is used:

RB - Rule-based (the policy uses a rule rather than a cost function). All policies that

are not rule-based use an objective function.

MP - Myopic policy which uses only information that is known and actionable now.

RH - Rolling horizon policy, which uses forecasts of activities that might happen in the

future.

R - A single requirement.

RL - A list of requirements.

A - A single aircraft.

AL - A list of aircraft.

ADP - Approximate dynamic programming - this refers to policies that use an approxi-

mation of the value of resources (in particular, aircraft) in the future.

EK - Expert knowledge - these are policies that use patterns to guide behavior.

23

KNAN - Known now, actionable now - These are policies that only use resources (aircraft

and requirements) that are actionable now.

KNAF - Known now, actionable future - These policies use information about resources

that are actionable in the future.

These abbreviations are used to specify the information content of a policy. The remainder of the

section describes the spectrum of policies that are

3.1 Rule-based policies

Our rule-based policy is denoted (RB:R-A) (rule-based, one requirement and one aircraft). In the

Time-Phased Force Deployment Data (TPFDD), we pick the first available requirement and check

the first available aircraft to see whether this aircraft can deliver the requirement. In this feasibility

check, we examine whether the aircraft and the requirement are at the same airbase, whether the

aircraft can handle the size of the cargo in the requirement, as well as whether the en-route and

destination airbases can accommodate the aircraft. If the aircraft cannot deliver the requirement,

we check the second available aircraft, and so on, until we find an aircraft that can deliver this

requirement. If it can, we up-load the requirement to that aircraft and update the remaining weight

of that requirement. After the first requirement is finished, we move to the second requirement.

We continue this procedure for the requirement in the TPFDD until we finish all the requirements.

See Figure 3 for an outline of this policy.

Step 1: Pick the first available remaining requirement in the TPFDD.Step 2: Pick the first available remaining aircraft.

Step 3: Do the following feasibility check:Are the aircraft and the requirement at the same airbase?Can the aircraft handle the requirement?Can the en-route and destination airbases accommodatethe aircraft?

Step 4: If the answers are all yes in Step 3, deliver therequirement by that aircraft and update the remainingweight of that requirement.

Step 5: If that requirement is not finished, go to Step 2.Step 6: If there are remaining requirement, go to Step 1.

Figure 3: Policy (RB:R-A)

24

This policy is a rule-based policy, which does not use a cost function to make the decision. We

use the information set It = Rtt in policy (RB:R-A), that is, the actionable time is the same as the

knowable time.

3.2 Myopic cost-based policies

In this section, we describe a series of myopic, cost-based policies which differ in terms of how many

requirements and aircraft are considered at the same time.

Policy (MP:R-AL)

In policy (MP:R-AL) (myopic policy, one requirement and a list of aircraft), we choose the first

requirement that needs to be moved, and then create a list of potential aircraft that might be used

to move the requirement. Now we have a decision vector instead of a scalar (as occurred in our

rule-based system). As a result, we now need a cost function to identify the best out of a set of

decisions. Given a cost function, finding the best aircraft out of a set is a trivial sort. However,

developing a cost function that captures the often unstated behaviors of a rule-based policy can be

a surprisingly difficult task.

There are two flavors of policy (MP:R-AL), namely known now and actionable now (MP:R-

AL/KNAN) as well as known now, actionable in the future (MP:R-AL/KNAF). In the policy

(MP:R-AL/KNAN), the information set It = (Rtt, ct) is used and in the policy (MP:R-AL/KNAF),

the information set It = ((Rtt′)t′≥t, ct) is used. We explicitly include the costs as part of the

information set. Solving the problems requires that we sort the aircraft by the contribution and

choose the one with the highest contribution.

Policy (MP:RL-AL)

This policy is a direct extension of the policy (MP:R-AL). However, now we are matching a

list of requirements to a list of aircraft, which requires that we solve a linear program instead of

a simple sort. The linear program (actually, a small integer program) has to balance the needs of

multiple requirements and aircraft.

There are again two flavors of policy (MP:RL-AL), namely known now and actionable now

(MP:RL-AL/KNAN) as well as known now, actionable in the future (MP:RL-AL/KNAF). In

the policy (MP:RL-AL/KNAN), the information set is It = (Rtt, ct) and in the policy (MP:RL-

25

Q

Q

Q

Q

Requirements Aircrafts

Q

Q

Q

Q


Q

Q

Q

Q


4a: Policy (RB:R-A) 4b: Policy (MP:R-AL) 4c: Policy (MP:RL-AL)

Figure 4: Illustration of the information content of myopic policies. (a) A rule-based policy considersonly one aircraft and one requirement at a time. (b) The policy (R-AL) considers one requirementbut sorts over a list of aircraft. (c) The policy (RL-AL) works with a requirement list and anaircraft list all at the same time.

AL/KNAF), the information set is It = ((Rtt′)t′≥t, ct). If we include aircraft and requirements that

are known now but actionable in the future, then decisions that involve these resources represent

plans that may be changed in the future.

Figure 4 illustrates the rule-based and myopic cost-based policies.

3.3 Rolling horizon policy

Our next step is to bring into play forecasted activities, which refer to information that is not

known now.

A rolling horizon policy considers not only all aircraft and requirements that are known now

(and possibly actionable in the future), but also forecasts of what might become known now. In

the rolling horizon policy, we use the information that might become available within the planning

horizon: It = (Rt′t′′)t′′≥t′ , ct′ |t′, t′′ ∈ T pht . The structure of the decision function is typically the

same as with a myopic policy (but with a larger set of resources). However, we generally require the

information set It to be treated deterministically for practical reasons. As a result, all forecasted

information has to be modeled using point estimates. If we use more than one outcome, we would

be solving sequences of stochastic programs.

26

3.4 Approximate dynamic programming policy

An approximate dynamic programming policy employs a value function approximation to evaluate

the impact of decisions made in one time period on another time period. The value function

evaluates the reward of the resources in certain state. We define:

Vt(Rt) = The value of resource vector Rt at time t.

We are not able to compute the value function exactly, so we replace it with an approximation

Vt(Rt) which is a linear approximation of the resource vector:

Vt(Rt) =∑t′>t

vt,t′aRt,t′a

To illustrate the use of the value function, consider the example of a requirement that has to move

from England to Australia using either a C-5A or a C-5B. A problem is that the route from England

to Australia involves passing through airbases that are not prepared to repair a C-5B if it breaks

down, which might happen with a 20 percent probability. When a breakdown occurs, additional

costs are incurred to complete the repair, and the aircraft is delayed, possibly producing a late

delivery (and penalties for a late delivery). Furthermore, the amount of delay, and the cost of the

breakdown, can depend on whether there are other comparable aircraft present at those airbases

at the time.

A cost-based policy might assign an aircraft to move this requirement based on the cost of

getting the aircraft to the requirement. It is also possible to include the cost of moving the load,

but this gets more complicated when the additional costs of using the C-5B are purely related to

random elements. A rule-based policy would tend to either ignore the issue, or include a rule that

prevents completely the use of C-5B’s on certain routes. We could resort to Monte Carlo sampling

to represent the random events, but the decision of which aircraft to use needs to be made before

we know whether the aircraft will break down (or the status at the intermediate airbases). Monte

Carlo methods are useful for simulating the dynamics of the system, but are more limited when

simulating the information available to make a decision.

Approximate dynamic programming computes an approximation of the expected value of, say,

a C-5B in England, which would reflect both the costs and potential delays that might be incurred

if the C-5B is used on requirements going to Australia. The values vt,t′a would be computed using

27

observations of the value of a C-5B over multiple iterations. These values can use Monte Carlo

samples of breakdowns, including both repair costs and penalties for late deliveries. The statistics

vt,t′a represent an average over different samples, and as such approximate an expectation. A brief

description of how the value function is estimated is given in section 4.

In policy (ADP), we represent the information set as It = (Rt), ct, Vt, where Vt = (Vtt′)t′>t

represents the set of values vt,at′ that we described earlier.

3.5 Expert knowledge

All mathematical models require some level of simplification, often because of a simple lack of

information. As a result, a solution may be optimal but incorrect in the eyes of a knowledgeable

expert. In effect, the expert knows how the model should behave, reflecting information that is not

available in the model. We represent expert knowledge in the form of low-dimensional patterns,

such as “avoid sending C-5B’s on routes through India.” Simulation models capture this sort of

knowledge easily within their rules, but typically require that the rules be stated as hard constraints,

as in “never send C-5B’s on routes through India.”

To capture these patterns as well as the pattern aggregation on an attribute or a decision, we

define:

P = A collection of different patterns.

Gpa = Aggregation function for pattern p ∈ P that is applied to the attribute vector a.

Gpd = Aggregation function for pattern p ∈ P that is applied to the decision d.

a = An aggregated attribute vector. For pattern p, a ∈ Gpa(A).

d = An aggregated decision. For pattern p, d ∈ Gpd(D).

ρp

ad= The percent of the time that we should act on a resource with attribute in the set

a : Gpa(a) = a with a decision in the set d : Gp

d(d) = d.

ρp = (ρp

ad)a∈Gp

a(A),a∈Gpd(D).

ρ = (ρp)p∈P .

It is unlikely that we can match a given pattern exactly (in fact, different pattern classes may

conflict with each other), but it does provide a means for guiding the model. To determine how

28

well our current solution matches a pattern, we have to express our decision vector xt and resource

vector Rt at an aggregated level. Using the notation IX = 1 if X is true, define:

xp

tad= Gp(x), the aggregated decision,

=∑

a∈A∑

d∈Daxtad · IGp

a(a)=a · IGpd(d)=d

Rpt,ta =

∑a∈A Rt,ta · IGp

a(a)=a, the aggregated resources,

=∑

a∈A(∑

d∈Daxtad

)· IGp

a(a)=a

ρp

tad=

xp

tad

Rpt,ta

, the instantaneous pattern flow,

xp

ad=

∑t xp

tad, the aggregated decision over the entire horizon,

Rpa(x) =

∑t Rp

t,ta(x), the total number of resources with attribute a over the entire horizon,

ρp

ad=

xp

ad

Rpa, the (average) pattern flow from the model.

If we wish our flows to follow the exogenous pattern ρp

adthen we would like the model pattern

flow ρp

adto be as close as possible to ρp

ad, or equivalently, we want xp

adto be as close as possible to

(Rpaρ

p

ad). Following Marar & Powell (2002) and Powell et al. (to appear), we incorporate patterns

into our cost model using a proximal point term of the form:

xt(θ) = Xπt (θ) = arg max

xt∈Xt

∑a∈A

∑d∈D

ctadxtad + θ∑p∈P

Rpa(ρ

p

ad− ρp

ad)2 · Ia,d

(15)

Where, Ipa,d = IGp

a(a)=a,Gpd(d)=d. As θ increases, the model is given a greater emphasis on trying

to match its own flows xp

adwith the flows that would be predicted by the pattern, Rp

aρp

ad. The

algorithm for incorporating patterns is based on the work of Marar & Powell (2002), using a

modification introduced in Powell et al. (to appear) to improve stability.

We represent the information content of a knowledge-based decision as It = Rt, ct, Vt, ρ.

3.6 Discussion

In this section we have introduced a series of policies, each characterized by increasing information

sets. The policies are summarized in table 2, listed in order of the information content of each

policy. Research has shown that the adaptive policy ADP can compete with linear programming

solvers on deterministic problems. For single and multi-commodity flow problems, the results are

29

Policy Information Classes Decision Functions

Rule-based It = Rtt (RB:R-A)

Myopic cost-based, one requirement to alist of aircraft, known now and action-able now

It = (Rtt, ct) (MP:R-AL/KNAN)

Myopic cost-based, one requirement to alist of aircraft, known now and action-able in the future

It = ((Rtt′)t′≥t, ct) (MP:R-AL/KNAF)

Myopic cost-based, a list of requirementsto a list of aircraft, known now and ac-tionable now

It = (Rtt, ct) (MP:RL-AL/KNAN)

Myopic cost-based, a list of requirementsto a list of aircraft, known now and ac-tionable in the future

It = ((Rtt′)t′≥t, ct) (MP:RL-AL/KNAF)

Rolling horizon It = (Rtt′)t′≥t, ct′ |t′ ∈ T pht (RH)

Adaptive It = (Rtt′)t′≥t, ct′ , Vtt′ |t′ ≥ t (ADP)

Expert knowledge It = (Rtt′)t′≥t, ct′ , Vtt′ , ρ|t′ ≥ t (EK)

Table 2: Information classes and decision functions for different policies

near-optimal, and they significantly outperform deterministic approximations on stochastic datasets

(Godfrey & Powell (2002), Topaloglu & Powell (2002), Spivey & Powell ((to appear)). Since the

algorithmic strategy involves repeatedly simulating the system, these results are achieved without

losing the generality of simulation (but it does require that the simulation be run iteratively). Like

any cost model, the policy based on the ADP information set means that analysts can change the

behavior of the model primarily by changing costs. The final information set, EK (which we have

presented as including the value functions), allows us to manipulate the behavior of the model by

changing the exogenous patterns. Needless to say, the imposition of exogenous patterns will not, in

general, improve the results of the model as measured by the cost function (or even other statistics

such as throughput).

4 Approximating the value function

In this section, we provide a brief introduction to the computation of value function approximations

for resource allocation problems.

It is well known that problems of the form (14) can be solved via Bellman’s optimality equations

(see, for example, Puterman (1994)). The optimality equations are:

Vt(Rt) = maxxt∈Xt Ct(Rt, xt) + E [Vt+1(Rt+1)|Rt] (16)

where Xt is the feasible region for time period t. The challenge is in computing the value function

30

Vt(Rt). In this section, we briefly describe a strategy for approximating solving these large-scale

dynamic programs in a computationally tractable way. The discussion here is complete, but brief.

For a more detailed presentation, see Godfrey & Powell (2002) and Powell & Topaloglu (to appear).

Our algorithmic approach involves several strategies. We first begin by replacing the value func-

tion with a continuous approximation, which we denote by Vt(Rt). Recalling that Rt = (Rtt′)t′>t,

we first assume that Vt(Rt) is separable in Rtt′ . This allows us to write:

Vt(Rt) = maxxt∈Xt

Ct(Rt, xt) + E

∑t′≥t+1

Vt+1,t′(Rt+1,t′)|Rt

(17)

where we use Vt(Rt) as a placeholder. Even with an approximate value function, the expectation in

equation (17) is typically impossible to compute. Even replacing the expectations with a summation

over a small sample produces intractably large subproblems, which are especially difficult when we

are interested in integer solutions. One strategy is to drop the expectation and solve the problem

for a single sample realization. However, at time t, we do not know Rt+1. Thus, we introduce a

post-decision state variable that is known at time t:

Rxt = The post-decision state variable at time t. It is the state after the decision xt

was made and before the new information Rt+1 has arrived. We follow previous

convention and let Rxt = (Rx

tt′)t′>t and Rxtt′ = (Rx

t,t′a′)a′∈A.

The equation for the post-decision resource vector is given by:

Rxt = Rt + Atxt (18)

Clearly, Rt+1 = Rxt + Rt+1. The evolution of states, action and information is given by:

R0, x0, Rx0 , R1, R1, x1, R

x1 , . . . , Rt, Rt, xt, R

xt , . . .

When we formulate the optimality recursion around the state variable Rxt , we obtain:

Vt−1(Rxt−1) = E

maxxt∈Xt

Ct(Rt, xt) + Vt(Rx

t )|Rxt−1

(19)

We may now drop the expectation and solve for a single sample realization:

V nt−1(R

xt−1, ω

nt ) = maxxt(ωn

t )∈Xt(ωnt )

Ct(Rt, xt, ω

nt ) + V n−1

t (Rxt (xt, ω

nt ))

(20)

where,

31

V n−1t = The approximation of the value function Vt at time t and iteration n− 1.

V nt−1 = A place holder to hold the temporary value function at time t− 1 and iteration n.

Of course, the solution that we obtain from solving this problem is only a sample realization, but

our goal is not to find the optimal action (for a particular outcome) but rather to build the function

Xπt (Rt) that can be used to find an action given a state Rt. Our “policy” is to solve functions of

the form:

Xπt (Rt, Vt) = arg maxxt∈Xt(ωt)

Ct(Rt, xt, ωt) + Vt(Rx

t (xt, ωt))

(21)

Choosing the best policy, then, can be viewed as choosing the best function V .

We choose a functional form for the value function approximation so that the optimization

problem in equation (20) is computationally tractable. It is especially nice when the structure

of (20) is the same as the structure of maxx∈XCt(xt). For example, it is often the case that

the problem maxx∈XCt(xt) is a transportation problem. We would like to choose a functional

approximation for Vt(Rxt ) that retains the basic structure of the original problem as a network

problem. Clearly, a linear approximation has this property. The value function approximation

would now be given by:

V nt,t′(R

xtt′) =

∑a′∈A

vnt′a′R

xt,t′a′ ∀t′ > t (22)

V nt (Rx

t ) =∑t′>t

V nt,t′(R

xtt′) (23)

A linear approximation has the advantage that it is easy to estimate. Let xnt be the optimal

solution to this problem, and let vnt−1,a be the dual variable for the resource constraint (8). vn

t−1,a

is, of course, a subgradient of V nt−1(R

xt−1), and therefore forms the basis of a linear approximation

for the value function. Because the gradient can fluctuate from iteration to iteration, we perform

smoothing:

vnt−1,a = (1− αn)vn−1

t−1,a + αnvnt−1 ∀a ∈ A (24)

In principle, the step size should satisfy the standard conditions from stochastic approximation

theory that∑∞

n=1 αn = ∞ and∑∞

n=1(αn)2 < ∞. In practice, we have found that a constant

stepsize works well, or the McClain stepsize:

αn+1 = αn/(1 + αn − α)

32

which declines arithmetically to the limit point α which might be a value such as 0.05.

This approximation strategy has been showed to produce solutions that are near-optimal when

applied to deterministic fleet management problems, and which outperform deterministic approxi-

mations when applied to problems under uncertainty (Godfrey & Powell (2002), Topaloglu & Powell

(2002)).

5 Numerical experiments

To test the optimizing simulator with policies of different levels of information, we use an unclassified

TPFDD data set for a military airlift problem. The problem is to manage six aircraft types (C-5A,

C-5B, C-17, C-141B, KC-10A and NBC) to move a set of requirements of cargo and passengers

between the USA and Saudi Arabia, where the total weight of the requirements is about four times

the total capacities of all the aircraft. In the simulation, a typical delivery trip (pick up plus loaded

movement) needs four days to complete, thus all requirements need roughly 16 days to be delivered

if the capacities of all the aircraft are used. The simulation horizon is 40 days, divided into four

hour time intervals. Moving a requirement involves being assigned to a route which will bring the

aircraft through a series of intermediate airbases for refueling and maintenance. One of the biggest

operational challenges of these aircraft is that their probability of a failure of sufficient severity to

prevent a timely takeoff ranges between 10 and 25 percent. A failure can result in a delay or even

require an off-loading of the freight to another aircraft. To make the model interesting, we assume

that an aircraft of type C-141B (regardless of whether it is empty or loaded) has a 20 percent

probability of failure, and needs five days to be repaired if it fails at an airbase in region E (airbase

code names starting with E are located primarily in Northern Europe). All other aircraft types or

airbases are assumed to be able to repair the failures without delay.

The TPFDD file does not capture when the information about a requirement becomes known.

For our experiments, we assumed that requirements are known two days before they have to be

moved. Aircraft, on the other hand, are either actionable now (if they are empty and on the ground)

or are actionable at the end of a trip that is in progress. We assume that there are three types of

costs involved in the military airlift problem: transportation costs (0.01 cent per mile per pound of

capacity of an aircraft), aircraft repair costs (12 cents per period per pound of capacity of a broken

aircraft.) and penalties for late deliveries (4 cents per period per pound of requirement delivered

33

late).

We have such a rich family of models that it would become clumsy if we compare all the policies

introduced in section 3. To focus on the main idea of this paper, we run the optimizing simulator

on the following policies: (1) rule based, one requirement to one aircraft (RB:R-A), (2) cost based,

one requirement to a list of aircraft that are knowable now/actionable now (MP:R-AL/KNAN),

(3) cost based, a list of requirements to a list of aircraft that are knowable now/actionable now

(MP:RL-AL/KNAN), (4) the above policy but with aircraft that are knowable now/actionable in

the future (MP:RL-AL/KNAF), (5) the approximate dynamic programming policy (ADP) and (6)

the policy that uses expert knowledge (EK) on (MP:RL-AL/KNAF). The first five classes should

provide improved solutions as they are added, while the inclusion of expert knowledge is improving

the solution in terms of user acceptability. We did not explicitly test rolling horizon procedures

since this would have required generating a forecast of future events from the TPFDD. This would

be straightforward in the context of a civilian application such as freight transportation where

historical activities would form the basis of a forecast, but for these applications a historical record

does not exist.

We use three measures of solution quality. The first is the traditional measure of the objective

function. It is important to emphasize that this is an imperfect measure; simulation models capture

a number of issues in the rules that govern behavior that may not be reflected in a cost function.

The second measure is throughput which is of considerable interest in the study of airlift problems.

Our cost function captures throughput indirectly through costs that penalize late deliveries. Finally,

when we study the use of expert knowledge, we measure the degree to which the model matches

exogenously specified patterns.

Figure 5 shows the costs for each of the first five policies. The total cost is the sum of transporta-

tion costs, late delivery costs, and aircraft repair costs. The late delivery costs decrease steadily as

the information set increases. The repair costs are decreasing too and we notice that the repair cost

is significantly reduced in policy (ADP) since this policy learns from the early iterations and avoids

sending aircraft to airbases that lead to a longer repair time. However, the detour increases the

transportation cost of policy (ADP) slightly compared to policy (MP:RL-AL/KNAF). The other

transportation costs are decreasing when the information sets are increasing. The overall total

costs are decreasing as we expected since the information sets are increasing.

The throughputs of five different policies (RB:R-A), (MP:R-AL/KNAN), (MP:RL-AL/KNAN),

34

Costs of different policies

0

50

100

150

200

250

(RB:R-A)(MP:R-AL/KNAN)

(MP:RL-AL/KNAN)

(MP:RL-AL/KNAF)

(ADP)

Policies

Millio

nD

ollo

rs

Total cost

Travel cost

Late cost

Repair cost

Figure 5: Costs of different policies. Policy (RB:R-A) is rule based, one requirement to one aircraft.Policy (MP:R-AL/KNAN) is cost based, one requirement to a list of aircraft that are knowablenow actionable now. Policy (MP:RL-AL/KNAN) is cost based, a list of requirements to a list ofaircraft that are knowable now, actionable now. Policy (MP:RL-AL/KNAF) is cost based, a listof requirements to a list of aircraft that are knowable now actionable future. Policy (ADP) is theapproximate dynamic programming policy.

(MP:RL-AL/KNAF) and (ADP), as well as the cumulative expected throughput are plotted in

Figure 6. The throughputs also follow the above sequence, from the right to the left. It is clear

that the richer the information class is, the faster the delivery is, i.e. the closer to the left the

throughput curve is. Since some of the throughput curves cross each other, we calculate the area

between the expected throughput and the throughput curves of different policies and plot them

in Figure 7. These areas actually measure the lateness of the delivery of different policies. The

smaller the area is, the faster the delivery is. We may see that from policy (RB:R-A) to (ADP),

the areas are decreasing from 395 million to 119 million (pounds·days).

We tested the inclusion of expert knowledge by using an exogenous pattern to control the

35

Throughput curves of policies

0

5

10

15

20

25

30

35

40

45

50

0 30 60 90 120 150 180 210

Mil

lio

ns

Time periods

Po

un

ds

Cumulative expected thruput

(RB:R-A)

(MP:R-AL/KNAN)

(MP:RL-AL/KNAN)

(MP:RL-AL/KNAF)

(ADP)

Figure 6: Throughput curves of different policies, showing cumulative pounds delivered over thesimulation.

percentage of C-5A’s and C-5B’s through region E. Starting with the best myopic policy (MP:RL-

AL/KNAF), we found that C-5A’s and B’s went through region E 18.5 percent of the time. We

then imposed a single exogenous pattern on the attribute/decision pair (a = C-5A or C-5B) and

(d = region E). We then varied the exogenous pattern ρp

adfrom 0 to 0.5 in steps of 0.10. For each

value of ρp

ad, we varied the pattern weight θ over the values 0, 3, 5, 10, 100, 1000.

The results are shown in figure 8. The horizontal line corresponds to θ = 0, and we also show

36

Areas between the cumulative expected thruput curve

and different policy thruput curves

0

50

100

150

200

250

300

350

400

(RB:R-A)(MP:R-AL/KNAN)

(MP:RL-AL/KNAN)

(MP:RL-AL/KNAF)

(ADP)

Millio

ns

Policy

Po

un

d*

days

Figure 7: Areas between the cumulative expected throughput curve and the throughput curves ofdifferent policies.

a 45 degree dashed line representing the case that the model matches the pattern exactly. For the

remaining runs, varying θ for different values of ρp

adproduces a series of lines that are bounded by

the no-pattern and the exact-match lines. Choosing the correct value of θ is a subjective judgment.

Presumably, the cost function has been carefully designed, which means that deviations between

the solution that matches the exogenous pattern may still be acceptable.

These results indicate that we can retain the ability that exists in traditional simulators to

guide the behavior of the cost model using simple rules. It is important to emphasize, however,

that our exogenous patterns are restricted in their complexity. A rule must be a function of the

attribute/decision pair (a, d), which means that the rule may not reflect, for example, information

about other aircraft (the cost model must pick this up).

37

Pattern flow v .s. expert knowledge

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.1 0.2 0.3 0.4 0.5 0.6

Expert knowledge

Pa

tte

rnfl

ow

q=0

q=3

q=5

q=10

q=100

q=1000

Exact match

Figure 8: Pattern match under different θ weights

6 Conclusions

This paper demonstrates that we can create a spectrum of models for the military airlift problem,

ranging from a classical simulation through a model based on approximate dynamic programming

that has been shown in other research (on simpler problems) to produce near optimal solutions (and

solutions that outperform deterministic approximations when uncertainty is present). Finally, we

demonstrate that we can combine the strengths of optimization with rule-based simulation by

incorporating low-dimensional patterns through a proximal point term.

Our experimental work shows that increasing the information content of a decision does in fact

improve performance, both in terms of the objective function (measured in terms of costs) and

throughput. We also showed that we can push the solution in the direction of an exogenously

38

specified pattern to produce what an expert might describe as more realistic behavior. The last

two policies (ADP and EK) both require running the simulator iteratively, but we were encouraged

that we obtained very reasonable results in just three iterations, suggesting that the additional

computational burden is not too large. We also note that the simulator can be warm-started,

which means that we can perform a series of training iterations after which the results are stored.

Then, the analysis of new scenarios can be performed with fewer iterations.

References

Baker, S., Morton, D., Rosenthal, R. & Williams, L. (2002), ‘Optimizing military airlift’, OperationsResearch 50(4), 582–602. 1, 2, 4, 5, 6

Brooke, A., Kendrick, D. & Meeraus, A. (1992), GAMS: A User’s Guide, 2.25 edn, DuxburyPress/Wadsworth Publishing Company, Belmont, CA. 6

Burke, J. F. J., Love, R. J. & Macal, C. M. (2002), ‘Modelling force deployments from armyinstallations using the transportation system capability (TRANSCAP) model: A standardizedapproach’, Submitted to Mathematical and Computer Modelling 0(0), 00. 2

Crino, J. R., Moore, J. T., Barnes, J. W. & Nanry, W. P. (2002), ‘Solving the theater distribu-tion vehicle routing and scheduling problem using group theoretic tabu search’, Submitted toMathematical and Computer Modelling 0(0), 17. 2

Dantzig, G. & Ferguson, A. (1956), ‘The allocation of aircrafts to routes: An example of linearprogramming under uncertain demand’, Management Science 3, 45–73. 2

Ferguson, A. & Dantzig, G. B. (1955), ‘The problem of routing aircraft - a mathematical solution’,Aeronautical Engineering Review 14, 51–55. 1

Godfrey, G. & Powell, W. B. (2002), ‘An adaptive, dynamic programming algorithm for stochasticresource allocation problems I: Single period travel times’, Transportation Science 36(1), 21–39.30, 31, 33

Goggins, D. A. (1995), Stochastic modeling for airlift mobility, Master’s thesis, Naval PostgraduateSchool, Monterey, CA. 1, 2

Granger, J., Krishnamurthy, A. & Robinson, S. M. (2001), Stochastic modeling of airlift operations,in B. A. Peters, J. S. Smith, D. J. Medeiros & M. W. Rohrer, eds, ‘Proceedings of the 2001 WinterSimulation Conference’. 2

Killingsworth, P. & Melody, L. J. (1997), Should c17s be deployed as theater assets?: An applicationof the conop air mobility model, Technical report rand/db-171-af/osd, Rand Corporation. 1

Marar, A. & Powell, W. B. (2002), Using static flow patterns in time-staged resource allocationproblems, Technical report, Princeton University, Department of Operations Research and Fi-nancial Engineering. 29

Mattock, M. G., Schank, J. F., Stucker, J. P. & Rothenberg, J. (1995), New capabilities for strategicmobility analysis using mathematical programming, Technical report, RAND Corporation. 1

Midler, J. L. & Wollmer, R. D. (1969), ‘Stochastic programming models for airlift operations’,Naval Research Logistics Quarterly 16, 315–330. 2

Morton, D. P., Rosenthal, R. E. & Lim, T. W. (1996), ‘Optimization modeling for airlift mobility’,Military Operations Research pp. 49–67. 1

39

Morton, D. P., Salmeron, J. & Wood, R. K. (2002), ‘A stochastic program for optimizing militarysealift subject to attack’, 00 0(0), 36. 1, 3, 8

Niemi, A. (2000), Stochastic modeling for the NPS/RAND Mobility Optimization Model,Department of Industrial Engineering, University of Wisconsin-Madison, Available:http://ie.engr.wisc.edu/robinson/Niemi.htm. 2

Powell, W. B. & Topaloglu, H. (to appear), Stochastic programming in transportation and logistics,in A. Ruszczynski & A. Shapiro, eds, ‘Handbook in Operations Research and Management Science,Volume on Stochastic Programming’, North Holland, Amsterdam. 31

Powell, W. B., Shapiro, J. A. & Simao, H. P. (2001), A representational paradigm for dynamicresource transformation problems, in R. F. C. Coullard & J. H. Owens, eds, ‘Annals of OperationsResearch’, J.C. Baltzer AG, pp. 231–279. 4, 9, 17

Powell, W. B., Wu, T. T. & Whisman, A. (to appear), ‘Using low dimensional patterns in opti-mizing simulators: An illustration for the airlift mobility problem’, Mathematical and ComputerModeling 0(0), 000–000. 29

Puterman, M. L. (1994), Markov Decision Processes, John Wiley and Sons, Inc., New York. 30

Rosenthal, R., Morton, D., Baker, S., Lim, T., Fuller, D., Goggins, D., Toy, A., Turker, Y., Horton,D. & Briand, D. (1997), ‘Application and extension of the Thruput II optimization model forairlift mobility’, Military Operations Research 3(2), 55–74. 1, 2

Schank, J. F., Stucker, J. P., Mattock, M. G. & Rothenberg, J. (1994), New capabilities for strategicmobility analysis: Executive summary, Technical report, RAND Corporation. 3

Schank, J., Mattock, M., Sumner, G., Greenberg, I. & Rothenberg, J. (1991), A review of strategicmobility models and analysis, Technical report, RAND Corporation. 2

Spivey, M. & Powell, W. B. ((to appear)), ‘The dynamic assignment problem’, TransportationScience 0(0), 00–00. 30

Stucker, J. P. & Berg, R. T. (1999), Understanding airfield capacity for airlift operations, Technicalreport, RAND Corporation. 1

Topaloglu, H. & Powell, W. B. (2002), Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems, Technical report, Princeton University, Depart-ment of Operations Research and Financial Engineering. 30, 33

Wing, V., Rice, R. E., Sherwood, R. & Rosenthal, R. E. (1991), Determining the optimal mobilitymix, Technical report, Force Design Division, The Pentagon, Washington D.C. 1

Yost, K. A. (1994), The thruput strategic airlift flow optimization model, Technical report, AirForce Studies and Analyses Agency, The Pentagon, Washington D.C. 1

40

The Optimizing Simulator: An Intelligent Analysis Tool for ...castlelab.princeton.edu/html/Papers/Wu...

Documents

Transcript of The Optimizing Simulator: An Intelligent Analysis Tool for ...castlelab.princeton.edu/html/Papers/Wu...