The Optimizing Simulator: An Intelligent Analysis Tool for ...castlelab.princeton.edu/html/Papers/Wu...
Transcript of The Optimizing Simulator: An Intelligent Analysis Tool for ...castlelab.princeton.edu/html/Papers/Wu...
The Optimizing Simulator:
An Intelligent Analysis Tool for the Military Airlift Problem
Tongqiang Tony WuDepartment of Operations Research and Financial Engineering
Princeton UniversityPrinceton, NJ 08544
Warren B. PowellDepartment of Operations Research and Financial Engineering
Princeton UniversityPrinceton, NJ 08544
Alan WhismanAir Mobility CommandScott Air Force Base
July 2003
Abstract
There have been two primary modeling and algorithmic strategies for problems in military logistics:
simulation, offering tremendous modeling flexibility, and optimization, which offers the intelligence
of math programming. Each offers significant theoretical and practical advantages. In this paper,
we introduce the concept of an “optimizing simulator” which offers a spectrum of models and algo-
rithms which are differentiated only by the information content of the decision function. We argue
that there are four information classes, allowing us to construct decision functions with increasing
degrees of sophistication and realism as the information content of a decision is increased. Using
the context of the military airlift problem, we illustrate the succession of models, and demonstrate
that we can achieve improved performance as more information is used.
The military airlift problem, faced by the Air Mobility Command (AMC), deals with effec-
tively routing a fleet of aircraft to deliver requirements (troops, equipment, food and other forms
of freight) from different origins to different destinations as quickly as possible under a variety of
constraints. There are two major classes of models to solve the military airlift (and closely related
sealift) problem: optimization models (Morton et al. (1996), Rosenthal et al. (1997), Baker et al.
(2002)), and simulation models, such as MASS (Mobility Analysis Support System) and AMOS,
which are heavily used within the AMC. Bridging these approaches are stochastic programming
models (Goggins (1995),Morton et al. (2002)). In general, optimization models are intelligent but
less flexible while simulation models are flexible but are governed by simple rules. In this paper,
we introduce the concept of the “optimizing simulator” for this problem class, and propose a range
of models that span from simulation to optimization. We introduce four classes of information,
and create a spectrum of decision functions by controlling the information available to the decision
function. Our presentation is facilitated by a notational framework that provides natural connec-
tions with both simulation and optimization, while also providing constructs needed for complex
dynamic resource management problems.
Ferguson & Dantzig (1955) is one of the first to apply mathematical models to air-based trans-
portation. Subsequently, numerous studies were conducted on the application of optimization mod-
eling to the military airlift problem. Mattock et al. (1995) describe several mathematical modeling
formulations for military airlift operations. The RAND Corporation published a very extensive
analysis of airfield capacity in Stucker & Berg (1999). According to Baker et al. (2002), research
on air mobility optimization at the Naval Postgraduate School (NPS) started with the Mobility
Optimization Model (MOM). This model is described in Wing et al. (1991) and concentrates on
both sealift and airlift operations. Therefore, the MOM model is not designed to capture the char-
acteristics specific to airlift operations, but it is a good model in the sense that it is time-dependent.
THRUPUT, a general airlift model developed by Yost (1994), captures the specifics of airlift opera-
tions but is static. The United States Air Force Studies and Analysis Agency in the Pentagon asked
NPS to combine the MOM and THRUPUT models into one model that would be time dependent
and would also capture the specifics of airlift operations (Baker et al. (2002)). The resulting model
is called THRUPUT II, described in detail in Rosenthal et al. (1997). During the development
of THRUPUT II at NPS, a group at RAND developed a similar model called CONOP (CONcept
of OPerations), described in Killingsworth & Melody (1997). The THRUPUT II and CONOP
models each possessed several features that the other lacked. Therefore, the Naval Postgraduate
1
School and the RAND Corporation merged the two models into NRMO (NPS/RAND Mobility
Optimizer), described in Baker et al. (2002). Crino et al. (2002) introduced the group theoretic
tabu search method to solve the theater distribution vehicle routing and scheduling problem. Their
heuristic methodology evaluates and determines the routing and scheduling of multi-modal theater
transportation assets at the individual asset operational level.
A number of simulation models have also been proposed for airlift problems (and related prob-
lems in military logistics). Schank et al. (1991) reviews several deterministic simulation models.
Burke et al. (2002) at the Argonne National Laboratory developed a model called TRANSCAP
(Transportation System Capability) to simulate the deployment of forces from Army bases. The
heart of TRANSCAP is the discrete-event simulation module developed in MODSIM III. Perhaps
the most widely used model at the Air Mobility Command is MASS (Mobility Analysis Support
System) which is a discrete-event worldwide airlift simulation model used in strategic and theater
operations to deploy military and commercial airlift assets. It once was the standard for all airlift
problems in AMC and all airlift studies were compared with the results produced by the MASS
model. MASS (and its replacement AMOS), provides for a very high level of detail, allowing AMC
to run analyses on a wide variety of issues.
One feature of simulation models is their ability to handle uncertainty, and as a result there
has been a steady level of academic attention toward incorporating uncertainty into optimization
models. Dantzig & Ferguson (1956) is one of the first to study uncertain customer demands in the
airlift problem. Midler & Wollmer (1969) also takes into account stochastic cargo requirements.
They formulated two two-stage stochastic linear programming models – a monthly planning model
and a daily operational model for the flight scheduling. Goggins (1995) extended Throughput
II (Rosenthal et al. (1997)) to a two-stage stochastic linear program to address the uncertainty
of aircraft reliability. Niemi (2000) expands the NRMO model to incorporate stochastic ground
times through a two-stage stochastic programming model. To reduce the number of scenarios for
a tractable solution, they assume that the set of scenarios is identical for each airfield and time
period and a scenario is determined by the outcomes of repair times of different types of aircraft.
Their resulting stochastic programming model has an equivalent deterministic linear programming
formulation. Granger et al. (2001) compared the simulation model and the network approximation
model for the impact of stochastic flying times and ground times on a simplified airlift network
(one origin aerial port of embarkation (APOE), three intermediate airfields and one destination
2
aerial port of debarkation (APOD)). Based on the study, they suspect that a combination of
simulation and network optimization models should yield much better performance than either one
of these alone. Such an approach would use a network model to explore the variable space and
identify parameter values that promise improvements in system performance, then validate these
by simulation, Morton et al. (2002) developed the Stochastic Sealift Deployment Model (SSDM), a
multi-stage stochastic programming model to plan the wartime, sealift deployment of military cargo
subject to stochastic attack. They used scenarios to represent the possibility of random attacks.
NRMO is a linear programming model that is intelligent but not as flexible as a traditional
simulation. On the other hand, MASS is a simulation model with tremendous flexibility but is
governed by relatively simple rules. There has been interest in combining the strengths of these
two technologies. For example Schank et al. (1994) makes the point:
For the future, we recommend research into the integration of the KB (KnowledgeBased) and the MP (Math Programming) procedures. · · · The optimization approachdirectly determines the least-cost set of assets but does so with some loss of detail andby assuming that the world operates in an optimal fashion. KB simulations capturedetails and provide more realistic representations of real-world activities but do notcurrently balance trade-offs among transport assets, procedures, and cost.
Optimization models are often promoted because the solutions are “optimal” (or in some sense
“better” than what is happening in practice); simulation models lay claim to being more realistic,
exploiting their ability to capture a high level of detail. In practice, the real value of optimization
models is not in their solution quality, but rather in their ability to respond with intelligence to new
datasets. Simulation models are driven by rules, and it is often the case that new sets of rules have
to be coded to produce desirable behaviors when faced with new datasets. Optimization models
often behave very reasonably when given entirely new datasets, but this behavior depends on the
calibration of a cost model which typically has to combine hard dollar costs (such as the cost of fuel)
against “soft costs” which are introduced to encourage desirable behaviors. As a result, “optimal”
only takes meaning relative to a particular objective function; it is not hard for an experienced
expert to look at a solution and claim “it may be optimal but it is wrong!” Of course, experienced
modelers have long recognized that optimization models are only approximations of reality, but
this has not prevented claims of savings based on the results of an optimization model.
We present a model of the military airlift problem based on the DRTP (dynamic resource
transformation problem) notational framework of Powell et al. (2001). We use the concept of an
3
“optimizing simulator” for making decisions, where we propose a spectrum of decision functions
by controlling the amount of information available to the decision function. The DRTP notational
framework provides a natural bridge between optimization and simulation by providing some subtle
but critical notational changes that make it easier to model the information content of a decision. In
addition, the DRTP modeling paradigm provides for a high level of generality in the representation
of the underlying problem.
In section 1, we briefly review the NRMO model, MASS model and SSDM model. Our for-
mulation of the military airlift problem is presented in section 2. In section 3, we introduce all
the policies employed in this paper. These policies help to close the gap between simulation and
optimization models. One policy is based on the use of value function approximations that capture
the impact of decisions on the future. These are introduced in section 4. We conduct numerical
experiments on the optimizing simulator in section 5 and section 6 is the conclusion.
1 A comparison of models
In this section, we review the characteristics of the NRMO and MASS models to compare linear
programming with simulation, where a major difference is the modeling of information and the
ability to handle different forms of complexity. We also compare a stochastic programming model,
which represents one way of incorporating uncertainty into an optimization model. Following this
discussion, section 2 presents a model of the military airlift problem based on a modeling framework
presented in Powell et al. (2001). At the end of section 2, we compare all four modeling strategies.
1.1 The NRMO Model
The NRMO model has been in development since 1996 and has been employed in several airlift
studies. For a detailed review of the NRMO model, including a mathematical description, see Baker
et al. (2002). The goal of the NRMO model is to move equipment and personnel in a timely fashion
from a set of origin bases to a set of destination bases using a fixed fleet of aircraft with differing
characteristics. This deployment is driven by the movement and delivery requirements specified in
a list dubbed the Time-Phased Force Deployment Data set (TPFDD). This list essentially contains
the cargo and troops, along with their attributes, that must be delivered to each of the bases of a
given military scenario.
4
Aircraft types for the NRMO runs reported by Baker et al. (2002) include C-5, C-17, C-141,
Boeing 747, KC-10 and KC-135. Different types of aircraft have different features, such as passenger
and cargo-carrying capacity, airfield capacity consumed, range-payload curve, and so on. The
range-payload curve specifies the weight that an aircraft can carry given a distance traveled. These
range-payload curves are piecewise linear concave.
The activities in NRMO are represented using three time-space networks: one flows aircraft,
the second the cargo (freight and passengers) and the third flows crews. Cargo and troops are
carried from onload aerial port of embarkation (APOEs) to either offload aerial port of debarkation
(APODs) or forward operating bases (FOBs). Certain requirements need to be delivered to the
FOBs via some other means of transportation after being offloaded in the APODs. Some aircraft,
however, can bypass the APODs and deliver directly to the FOBs. Each requirement starts at a
specific APOE dictated by the TPFDD and then is either dropped off at an APOD or a FOB.
NRMO makes decisions about which aircraft to assign to which requirement, which freight
will move on an aircraft, which crew will handle a move, as well as variables that manage the
allocation of aircraft between roles such as long range, intertheatre operations and shorter, shuttle
missions within a theatre. The model can even handle the assignment of KC-10’s between the role
of strategic, cargo hauler and mid-air refueler.
In the model, NRMO minimizes a rather complex objective function that is based on several
“costs” assigned to the decisions. There are a variety of costs, including hard operating costs (fuel,
maintenance) and soft costs to encourage desirable behaviors. For example, NRMO assigns a cost
to penalize deliveries of cargo or troops that arrive after the required delivery date. This penalty
structure charges a heavier penalty the later the requirement is delivered. Another term penalizes
the cargo that is simply not delivered. The objective also penalizes reassignments of cargo and
deadheading crews. Lastly, the objective function offers a small reward for planes that remain at
an APOE as these bases are often in the continental US and are therefore close to home bases of
most planes. The idea behind this reward is to account for uncertainty in the world of war and to
keep unused planes well positioned in case of unforeseen contingencies.
The constraints of the model can be grouped into seven categories: (1) demand satisfaction; (2)
flow balance of aircraft, cargo and crews; (3) aircraft delivery capacity for cargo and passengers; (4)
the number of shuttle and tanker missions per period; (5) initial allocation of aircraft and crews;
(6) the usage of aircraft of each type; and (7) aircraft handling capacity at airfields. A general
5
statement of the model (see Baker et al. (2002) for a detailed description) is given by:
minx∑
t ctxt
subject to :Atxt −
∑tτ=1 Bt−τ,txt−τ = Rt
Dtxt ≤ ut
xt ≥ 0
(1)
Here t represents when an activity starts and τ is the time required to complete an action. The
matrices At and Bt−τ,t handle system dynamics. Dt captures possible coupling constraints the
run across the flows of aircraft, cargo and crews. ut is the upper bound on flows, and Rt models
the arrival of new resources to be managed. xt is the decision vector, and ct is the cost vector
corresponding to decision vector xt.
NRMO is implemented with the algebraic modeling language GAMS (Brooke et al. (1992)),
which facilitates the handling of states. There are a number of sets and subsets in NRMO, such
as sets of time periods, requirements, cargo types, aircraft types, bases, and routes. To make a
decision, one element of each suitable set is pulled out to form an index combination, such as an
aircraft of a certain type delivering a certain requirement on a certain route departing at a certain
time. They screen the infeasible index combinations to achieve a tractable computation. The
information in the NRMO model is captured in the TPFDD, which is known at the beginning of
the horizon.
NRMO requires that the behavior of the model be governed by a cost model (which simulators
do not require). The use of a cost model minimizes the need for extensive tables of rules to
produce “good” behaviors. The optimization model also responds in a natural way to changes in
the input data (for example, an increase in the capacity of an airbase will not produce a decrease
in overall throughput). But linear programming formulations suffer from weaknesses. A significant
limitation is the difficulty in modelling complex system dynamics. For example, a simulation model
can include logic such as “if there are four aircraft occupying all the parking spaces, a fifth aircraft
will have to be pulled off the runway where it cannot be refueled or repaired.” In addition, linear
programming models cannot directly model the evolution of information. This limits their use in
analyzing strategies which directly affect uncertainty (What is the impact of last-minute requests
on operational efficiency; What is the cost of sending an aircraft through an airbase with limited
maintenance facilities where the aircraft might break down?).
6
1.2 The MASS Model
The MASS model is a rule-based simulation model. Let
St = The state vector of the system at time t.
π = The policy employed.
Xπ = The function that maps a state St to a decision xt, under policy π.
xt = The vector of decisions.
ωt = Exogenous information at time t.
Mt = The transfer function that represent the system dynamics, i.e., starting at state
St, employing policy π, and getting new exogenous information ωt+1, Mt will lead
us to new system state St+1.
The policy π in the MASS model requires a sequence of feasibility checks. In the TPFDD,
the MASS model picks the first available requirement and checks the first available aircraft to
see whether this aircraft can deliver the requirement. In this feasibility check, the MASS model
examines whether the aircraft and the requirement are at the same airbase, whether the aircraft
can handle the size of the cargo in the requirement, as well as whether the en-route and destination
airbases can accommodate the aircraft. If the aircraft cannot deliver the requirement, the MASS
model checks the second available aircraft, and so on, until it finds an aircraft that can deliver this
requirement. If the aircraft can, the MASS model finishes the first requirement and moves to the
second requirement. It continues this procedure for the requirement in the TPFDD until it finishes
all the requirements. This policy is a rule-based policy, which does not use a cost function to make
the decision.
The evolution of states and decisions in the MASS model is as follows. At time t, the state of
the system, St, is known. Employing the policy π described above to make decisions given state
St, produces the decision xt. After the decision is made at time t, and the new information at time
t+1, ωt+1, has arrived, the system moves to a new state St+1 at time t+1. Thus, the MASS model
can be mathematically presented as:
xt ← Xπ(St)St+1 ← Mt(St, xt, ωt+1)
(2)
7
1.3 The SSDM Model
The Stochastic Sealift Deployment Model (SSDM) (Morton et al. (2002)) is a multi-stage stochastic
mixed-integer program designed to hedge against potential enemy attacks on seaports of debarka-
tion (SPODs) with probabilistic knowledge of the time, location and severity of the attacks. They
test the model on a network with two SPODs (the number actually used in the Gulf War) and
other locations such as a seaport of embarkation (SPOE) and locations near SPODs where ships
wait to enter berths. To keep the model computationally tractable, they assumed only a single
biological attack can occur. Thus, associated with each scenario is whether or not an attack occurs,
and if it does occur, the time ts at which it occurs. We then let Ts = 1, . . . , ts − 1 be the set
of time periods preceding an attack which occurs at time ts. This structure results in a scenario
tree that grows linearly with the number of time periods, as illustrated in figure 1. The stochastic
programming model for SSDM can then be written:
minx
∑s
∑t
psctxts (3)
subject to :
Atxts −t∑
τ=1
Bt−τ,txt−τ,s = Rt (4)
Dtxts ≤ uts (5)
xts = xts′ ∀t ∈ Ts ∩ Ts′ (6)
xts ≥ 0 and integer. (7)
Here s is the index for scenarios. ps is the probability of scenario s. t, τ, At, Bt−τ,t, ct, Dt and
Rt are the same as introduced in the NRMO model. uts is the upper bound on flows where the
capacities of SPOD cargo handling are varied under different attack scenarios, xts is the decision
vector. The non-anticipativity constraints (6) say that if scenarios s and s′ share the same branch
on the scenario tree at time t, the decision at time t should be same for those scenarios.
In general, the stochastic programming model is much larger than a deterministic linear pro-
gramming model and still struggles with the same difficulty in modelling complex system dynamics.
One has to design the scenarios carefully to avoid the exponential growth of scenarios in the stochas-
tic programming model. The value of the stochastic programming formulation is that it provides
8
No attack Attack
Tim
e
t1
t2
t3
t4
Figure 1: The single-attack scenario tree used in SSDM
for the analysis of certain forms of uncertainty (such as which SPOD or SPODs might be attacked
and the nature of the attack) so that robust strategies can be derived. Such problems do not require
a massive number of scenarios, and do not call for a high level of detail in the model.
2 A model of the military airlift problem
To represent the fundamentals behind the Air Mobility Command problem, it is useful to view the
problem as a Dynamic Resource Transformation Problem. This framework, established in Powell
et al. (2001), offers several features that are especially important for this problem class, including
the explicit modeling of the information content of decisions, the ability to model complex resource
attributes, and the ease with which new classes of decisions can be introduced. In this section, we
present the core elements of the problem, sometimes sacrificing completeness in order to focus on
the modeling of information.
Throughout this paper, we represent space and time using:
J = The set of airbases to which the airplanes can go. These airbases can be located
all over the world. We always use i, j to denote the airbases, i.e. i, j ∈ J
τij = The travel time between airbases i and j.
T = The discrete set of time periods that we simulate.
From time to time, we need to restrict our planning to information within a horizon. For this, we
let:
T pht = The set of time periods within a planning horizon for a problem being solved at
time t.
9
In section 2.1, resource classes and resource layering are described. Processes, including infor-
mation processes and system dynamics, are introduced in section 2.2. The controls are described
in section 2.3. We present the optimization formulation in section 2.3.3 and finally, we compare
the NRMO, MASS and DRTP models in section 2.4.
2.1 Resources
We use the term “resources” to refer to all physical objects that are being managed. In the airlift
problem, this refers to both the aircraft and the requirements. Our notation easily generalizes to
handle additional resources layers such as pilots, fuel and special equipment.
2.1.1 Resource Classes
There are two resources classes in the military airlift problem: aircraft and requirements. Define,
CR = The set of resources classes.
= Aircraft(A), Requirement(R)
We use the superscript A to denote the flavor of aircraft and superscript R to denote that of
requirements. In a more realistic model, we might include additional resource classes such as
pilots, fuel, maintenance resources and equipment for loading and unloading aircraft.
The resource spaces and attribute vectors can be represented as follows:
AA = The attribute space of aircraft.
aA = The attribute vector of aircraft, aA ∈ AA.
AR = The attribute space of requirements.
aR = The attribute vector of requirements, aR ∈ AR.
aφ = The null attribute vector. It has the dimensionality of the attribute vector for
requirements but the elements are empty, aφ ∈ AR.
The attribute vector of an aircraft is shown in Example 1. Certain attributes are inherent to the
particular resource and are therefore static, such as the type of aircraft. However, other attributes
are dynamic and constantly change, such as the location of the aircraft.
10
Example 1(Attribute vector of the aircraft):aA1 = The type of aircraft(e.g. C-17).
aA2 = The weight capacity of the plane(e.g. 106540 lbs).
aA3 = The current airfield where the plane is located(e.g. EDAR).
aA4 = The weight loaded in this aircraft(e.g. 0).
The attribute vector of a requirement is shown in Example 2.
Example 2(Attribute vector of the requirement):aR1 = The requirement ID(e.g. 2).
aR2 = The type of the cargo size(e.g. Bulk).
aR3 = The origin of the requirement(e.g. KBOS).
aR4 = The current airfield where the requirement is located(e.g. KBOS).
aR5 = The final destination of the requirement(e.g. OEDF).
The elements of the attribute vector can be divided into two broad groups of elements: static
attributes (e.g. the type of an aircraft) and dynamic attributes (e.g. the location of an aircraft). The
concept of a commodity that is common in modeling is equivalent to the vector of static attributes
of a resource, while the state of that commodity is equivalent to the vector of dynamic attributes
of that resource. Thus, one can view the military airlift problem as a multi-commodity problem
with a large number of states, but we have found that a single attribute vector is more flexible
and compact. In particular, certain algorithmic strategies depend on the ability to aggregate the
attribute space A in a general way (see section 3.5).
2.1.2 Resource Layering
In our presentation, aircraft and requirements are primitive resources. We can also create a com-
posite resource consisting of an aircraft and a requirement. We distinguish between the primitive
resource class “aircraft” and the aircraft layer which might consist of an aircraft and a requirement.
We would need to model the composite layer if we need to model information and decisions that
occur after a requirement is loaded onto an aircraft. For example, if a loaded aircraft breaks down,
we face the decision of whether to offload the requirement, or to delay the requirement while the
aircraft is delayed. Such a decision requires modeling a loaded aircraft which has just broken down.
To clearly distinguish between the resource and the resource layer, we use the same letter as a
11
resource surrounded by a pair of parentheses to denote that resource layer. We let:
(A) = A|R denotes the aircraft layer.
(R) = R denotes the requirement layer (the requirement layer is primitive).
Aircraft and requirements can be coupled to form an aircraft layer (composite resources) that
can be moved toward intermediate stops or the destination and uncoupled back to aircraft and
requirement after the trip. We may also have to uncouple the aircraft from the requirement enroute
if there is a mechanical failure. An aircraft can also be a primitive resource, which we represent
with a(A) = aA|aφ, an aircraft layer that couples an aircraft with a null requirement. Thus, the
attribute vector of the aircraft layer consists of attribute vectors of both aircraft and requirement
or aircraft and null with appropriate size. We define:
A(A) = AA|R
= AA ×AR, the attribute space of an aircraft layer.
a(A) = aA|R
= aA|aR ∈ A(A), the attribute vector of an aircraft layer.
The requirement layer is a primitive layer with attribute vector of only requirement attributes.
A(R) = AR, The attribute space of a requirement layer.
a(R) = aR, The attribute vector of a requirement layer.
The attribute space is determined largely by the design of the decision set D, which in turn must
reflect the evolution of information. In Figure 2, an aircraft has to pick up a requirement at i and
move it to j, in the process moving through intermediate airbases i′ and i′′. We can represent this as
a single composite decision where at the completion of the decision, the plane is sitting at j and the
requirement has been delivered. Alternatively, we could represent the primitive decision of simply
moving the aircraft one leg at a time. This requires modeling the aircraft with the requirement at
each intermediate stop. This produces a much larger attribute space, but allows the modeling of
new information as the aircraft arrives at each stop. At the same time, the number of primitive
decisions is much less than that of composite decisions because the number of reachable airbases
is much less than the number of routes between origin and destination. As a result, we typically
12
i
Origin i’
i’’
j
Destination
Composite Decision
Primitive Decision
Figure 2: Primitive and Composite Actions
find that there is a tradeoff between the complexity of the attribute space and the complexity of
the decision space.
2.1.3 Time Staging of Information
It is very common that information about activities is available before the activity actually takes
place. This arises when we know that an aircraft is inbound (but has not yet arrived) or because
a requirement has been booked now to be moved in the future. We model the difference between
when we know about something and when it happens using:
Rt,t′a′ = The number of resources that are knowable at time t, and will be actionable with
attribute vector a′ ∈ A at time t′. For the aircraft layer, the unit is the number of
aircraft. For requirements, the unit is the number of pounds with attribute vector
a′ ∈ A(R).
Rtt′ = The resource layer vector that is knowable at time t and actionable at time t′.
= (Rt,t′a′)a′∈A
= (R(A)tt′ , R
(R)tt′ )
Rt = The resource layer vector that is knowable at time t.
= (Rtt′)t′≥t
13
= (R(A)t , R
(R)t )
When we refer to the vector Rtt′ , t is referred to as the knowable time while t′ is referred to as the
actionable time.
Deterministic optimization models such as NRMO assume that all information is known at
time t = 0. Simulation models such as MASS (or AMOS), on the other hand, usually assume that
resources become knowable and actionable at the same time. Although this is not represented ex-
plicitly in a TPFDD, we explicitly model the difference between when a resource becomes knowable
and actionable.
2.2 Processes
In this section, we model exogeneous information processes, decisions, and system dynamics.
2.2.1 Exogenous Information Processes
We can have information arriving about resources (aircraft entering the system for the first time,
new requirements becoming known for the first time). As before, we capture the property that a
resource that becomes known at time t may not be actionable until time t′ ≥ t.
Rt,t′a′ = The number of new resources that first become known at time t, and will first
become actionable with attribute vector a′ ∈ A at time t′.
Rtt′ = The new resource layer vector that first become known at time t and actionable
at time t′.
= (Rt,t′a′)a′∈A
= (R(A)tt′ , R
(R)tt′ )
Rt = Vector of new resources that first become known to the system at time t.
= (Rtt′)t′≥t
= (R(A)t , R
(R)t )
In addition to Rt, we may also wish to model other information processes such as weather delays
or equipment breakdowns. More general notation is to introduce an information variable Wt that
14
would include Rt as well as other exogenous information such as wind speeds, delays at airbases
and the probability of equipment failures. We then let:
Ω = The set of possible outcomes of exogenous information Wt, t ≥ 0.
ω = One sample of the exogenous information (ω ∈ Ω). For simplicity, assume that
outcomes are discrete.
p(ω) = The probability of outcome ω.
2.2.2 Decisions
In addition to exogenous information, the AMC system also evolves as a result of endogenous
information, in the form of a sequence of decisions. Let
CD = Set of decision classes.
Dc = Set of specific decision in class c ∈ CD.
D = The set of decisions that can be applied to the resource layer.
=⋃
c∈CD Dc
For example, the airlift problem in this paper has the decision classes:
CD = coupling aircraft and requirement, uncoupling aircraft layer, moving aircraft layer,
do nothing
The class “moving aircraft layer” might include decisions:
Dmoving aircraft layer = CannonAFB,Ramstein,KuwaitCity, . . .
It is convenient to define:
Da = The subset of decisions that can be applied to a resource with attribute vector
a ∈ A. Da ⊆ D.
dφ = The decision to do nothing (hold).
15
xtad = The number of times that we apply a decision d ∈ Da to a resource with attribute
vector a ∈ A during time t. For the aircraft layer, the unit of flow is the number of
aircraft. For requirements, the unit of flow is the number of pounds with attribute
vector a ∈ A(R).
xt = xtada∈A,d∈Da
Decisions have to satisfy the constraints:
∑d∈Da
xtad = Rt,ta ∀a ∈ A (8)
xtad ≥ 0 and integer ∀a ∈ A(A), d ∈ Da (9)
xtad ≥ 0 ∀a ∈ A(R), d ∈ Da (10)
Our notation contrasts with the more classical xkij notation employed by the multicommodity
flow community. One limitation of the classical notation is that the outcome of the decision (the
destination node j) is deterministic. We may wish to simulate weather delays or enroute equipment
failures that are most easily modeled using our notation. The xtad notation is indexed purely by
the information available when the decision is made. The notation also allows us to use aggregation
in a flexible and powerful way, which we illustrate in section 3.5.
We defer until section 2.3 the discussion of how decisions are made. However, the next section
models the impact of decisions on our system.
2.2.3 System dynamics
To capture the effect of a decision, we define the modify function as follows:
M(t, a, d)→ (a′, c, τ)
In this equation, a ∈ A is the attribute vector of the resource that the decision is being applied
to, d ∈ Da is the decision that acts on the resource, and t is the time at which we make the decision.
After the decision is made, the outcome of the transformation is captured in the triplet (a′, c, τ),
where a′ is the new attribute vector of the modified layered resource, c is the cost of the decision,
and τ is the time it takes to carry out the decision and transform the resource. Having this function,
we can define the following:
16
ctad = The cost of applying decision d to a resource with attribute a ∈ A at time t.
ct = ctada∈A,d∈Da
It is useful to define:
δt,t′a′(t, a, d) =
1 If M(t, a, d) produces an aircraft layer with attribute vector a′ and if t+τ = t′
0 Otherwise
The dynamics of the layered resource can be modeled as follows:
Rt+1,t′a′ = Rt,t′a′ +∑a∈A
∑d∈Da
δt,t′a′(t, a, d)xtad + Rt+1,t′a′ ∀a′ ∈ A, t′ > t (11)
Equation (11) contains three elements on the right hand side: 1) what we know about the status
of resources for time t′ at time t, 2) the endogenous changes to resources for time t′ that are made
during period t, and 3) the exogenous changes to resources for time t′ that arrives during time t.
We may also express equation (11) using matrix notation as:
Rt+1 = Rt + Atxt + Rt+1 (12)
2.3 Controls
For the purposes of this paper, we restrict our discussion of controls to the types of decisions that
are being made for our model of the airlift problem, the decision function and the optimization
formulation that forms the basis of our optimizing simulator. For a more complete discussion of
controls, see Powell et al. (2001).
2.3.1 Types of decisions
Decisions that act on resources (these are the only decisions we consider in our model) can be
organized into three fundamental groups: couple (an aircraft and a requirement), modify (move an
aircraft with or without a requirement), and uncouple (remove the requirement from the aircraft).
In our model, we assume a requirement is decoupled when the aircraft arrives at the destination. We
have not explicitly modeled uncoupling at intermediate points (for example, to model the decision
to offload a requirement if an aircraft breaks down), but this would be straightforward. One goal
of our notation is to represent every action on a resource as a attribute-decision (a, d) pair. Not
17
only is the resulting optimization model quite compact, it also facilitates one of our algorithmic
strategies (see section 3.5).
Our decisions are given by:
DA,ca = The set of decisions to couple an aircraft with attribute a ∈ AA to requirements.
Each element of DA,ca is a potential requirement.
= dcaa′ , ∀a′ ∈ AR, a ∈ AA
DA,c = The set of decisions to couple an aircraft with a requirement.
= (DA,ca )a∈AA
= dcaa′ , ∀a ∈ AA, a′ ∈ AR
D(A),ma = The set of decisions to move an aircraft(layer) with attribute a ∈ A(A).
= dmaj , ∀j ∈ J , a ∈ A(A)
D(A),m = (D(A),ma )a∈A(A)
DR,ca′ = The set of decisions to couple a requirement with attribute a′ ∈ AR to an aircraft.
Each element of DR,ca′ is a potential aircraft.
= dca′a , ∀a ∈ AA, a′ ∈ AR
DR,c = (DR,ca′ )a′∈AR
= dca′a , ∀a ∈ AA, a′ ∈ AR
Thus,
D(A) = The set of decisions that can be applied to an aircraft layer.
= DA,c ∪ D(A),m ∪ dφ
DR = The set of decisions that can be applied to a requirement.
= DR,c ∪ dφ
D = The set of decisions that can be applied to layered resource (d ∈ D )
= D(A),DR
For a coupled aircraft (an aircraft with a requirement), it may only be moved or held. For a coupled
requirement, it is modified according to the aircraft that it coupled with and no other decision can
be made on it.
18
Let:
aw = Capacity in weight (pounds) of an aircraft with attribute vector a ∈ A(A)
The total capacity that the aircraft a ∈ AA coupled with a requirement a′ ∈ AR has must be
greater than the weight of requirement a′ ∈ AR coupled with the aircraft a ∈ AA:
xta′d′ ≤ aw · xtad ∀a ∈ AA, a′ ∈ AR : d = dcaa′ , d
′ = dca′a (13)
The coupling takes some time to transfer an empty aircraft to a coupled aircraft layer. Even already
coupled, the aircraft can still be held (do nothing, dφ ) to wait for a suitable time to take off. At the
destination, we uncouple the coupled aircraft layer. However, in the middle of the trip, the aircraft
layer may also be uncoupled to handle the situation that the aircraft fails during a trip. In this
case, the uncoupled requirement becomes a totally new requirement that needs to be reassigned.
One benefit in adopting the DRTP framework is that it is very easy to extend the model to
handle new types of decisions. For example, in the event that AMC would charter outside aircraft
to move the requirements (equivalent to moving a requirement without an (AMC) aircraft), the
only modification is the extension of decision set DR:
DR = DR,c ∪ dφ ∪ DR,m,
where,
DR,m = The set of decisions to move a requirement by outside aircraft (not AMC aircraft).
This set consists of a single element (effectively stating “move requirement to its
destination”) if we model outside aircraft as an unlimited resource that we are not
managing.
2.3.2 The decision function
The decision function depends on the available information. Define,
It = The information available at time t.
xt = Vector of decisions for time t.
19
Π = The set of policies, π ∈ Π.
Xπt (It) = Under policy π ∈ Π, the function that determines the set of decisions xt based on
the information It.
When optimizing the AMC system, our challenge is to find the best function Xπt . Note that
the structure of Xπ depends on the information that was available. The most important class of
information is the resource vector Rt, but we might include current estimates of travel times or
forecasts of future activities.
2.3.3 The Optimization formulation
We are now ready to state the overall optimization problem. Let:
Ct(xt) = The total contribution due to activities initiated in time period t
=∑a∈A
∑d∈D
ctadxtad
Our objective function is linear, although it is fairly straightforward to use nonlinear functions.
The decisions xt are returned by our decision function Xπt (It), so our challenge is not so much to
choose the best decisions but rather to choose the best policy. Thus, our optimization problem is
given by:
maxπ∈ΠE
[∑t∈T
Ct(Xπt (It))
](14)
This optimization is subject to the following constraints:
• Resource dynamics: Equation (11)
• Resource flow conservation: Equation (8)
• Non-negativity: Equation (9) and (10)
• Coupling (layering): Equation (13)
This formulation forms the basis of our optimizing simulator. In the remainder of the paper, we
describe different policies π, where each corresponds to the availability of a particular information
set (which determines It).
20
2.4 Comparison of NRMO, MASS, SSDM and DRTP models
The characteristics of the NRMO, MASS, SSDM and DRTP models are listed in Table 1. In this
table, we are focusing on the model (that is, the representation of the physical problem) rather than
the algorithm used to solve the problem. The primary distinguishing feature of these models is how
they have captured the flow of information, but they also differ in areas such as model flexibility,
and the responsiveness to changes in input data. The DRTP representation can produce a linear
programming model such as NRMO if we ignore evolving information processes, or a simulation
model such as MASS or AMOS if we explicitly model the TPFDD as an evolving information
process and code the appropriate rules for making decisions. As such, the DRTP representation
provides a mathematical representation that spans optimization and simulation.
3 A spectrum of decision functions
To build a decision function, we have to specify the information that is available to us when we
make a decision. There are four fundamental classes of information that we may use in making a
decision:
• Knowledge (Kt)- Exogenously supplied information on the state of the system (resources and
parameters).
• Forecasts of exogenous information process (Ω) - Forecasts of future exogenous information.
• Forecasts of the impact of decisions on other subproblems (V ) - These are functions which
capture the impact of decisions made in one point in time on another.
• Forecasts of decisions (ρ) - Patterns of behavior derived from a source of “expert knowledge.”
These represent a form of forecast of decisions at some level of aggregation.
Our view of information, then, can be viewed as one class of exogenously supplied information (data
knowledge), and three classes of forecasts. In the remainder of this section, we provide examples
and explain the general notation given above.
For each information set, there are different classes of decision functions (policies). Since Π is
the set of all possible decision functions that may be specified, in this paper, we investigate five
21
Model NRMO MASS SSDM DRTPCategory Large-scale linear
programmingSimulation Multi-stage
stochastic pro-gramming
Optimizing simu-lator
Informationprocesses
Requires knowingall informationwithin T horizon
at time 0. Can-not distinguishbetween know-able time t andactionable timet′ ≥ t.
Assumes action-able time equalsknowable time.May but doesn’tdistinguish be-tween knowabletime t and ac-tionable timet′ ≥ t.
Actionable timeequals knowabletime.
General modelingof knowable andactionable time.At time t, knowthe informationthat is actionableat time t′ ≥ t.
AttributeSpace
Multicommodityflow (attributeincludes aircrafttype and location)
Multi-attribute(location, fuellevel, mainte-nance)
Homogeneousships and cargo,extendable tomultiple shiptypes
General resourcelayering
Complexityof systemdynamics
Linear systems ofequations
Complex systemdynamics
Simple linear sys-tems of equations
Complex systemdynamics
Number ofScenarios
Single May be multiple Multiple May be multiple
DecisionSelection
Cost-based Rule-based Cost-based Span from rule-based to cost-based
InformationModeling
Assumes that ev-erything is known
Myopic, local in-formation
Assumes knowingthe probabilitydistrubution ofscenarios.
General modelingof information
Model Behav-ior
Reacts to datachanges intelli-gently
Noisy response tochanges in inputdata
Similar to LP, butproduces robustallocations
Can react withintelligence; willdisplay some noisecharacteristic ofsimulation
Table 1: Characteristics of NRMO, MASS and DRTP models
subclasses of Π: rule-based policies, myopic cost-based policies, rolling horizon policies, approximate
dynamic programming policies and expert knowledge policies. These can be classified into three
broad groups: rule-based, cost-based and hybrid. The policies we consider are as follows:
Rule-based policies:
ΠRB = The class of rule-based policies (missing information on costs).
22
Cost-based policies:
ΠMP = The class of myopic cost-based policies(includes cost information, but limited to information
that is knowable now).
ΠRH = The class of rolling horizon policies(includes forecasted information). Note that if |Ω| = 1, we
get a classical deterministic, rolling horizon policy which is widely used in dynamic settings.
ΠADP = The class of approximate dynamic programming policies. We use a functional approximation
to capture the impact of decisions at one point in time on the rest of the system.
Hybrid policies:
ΠEK = The class of policies that use expert knowledge. This is represented using low dimensional pat-
terns expressed in the vector ρ. Policies in ΠEK combine rules (in the form of low dimensional
patterns) and costs, and therefore represent a hybrid policy.
In the sections below, we describe the information content of different policies. The following
short-hand notation is used:
RB - Rule-based (the policy uses a rule rather than a cost function). All policies that
are not rule-based use an objective function.
MP - Myopic policy which uses only information that is known and actionable now.
RH - Rolling horizon policy, which uses forecasts of activities that might happen in the
future.
R - A single requirement.
RL - A list of requirements.
A - A single aircraft.
AL - A list of aircraft.
ADP - Approximate dynamic programming - this refers to policies that use an approxi-
mation of the value of resources (in particular, aircraft) in the future.
EK - Expert knowledge - these are policies that use patterns to guide behavior.
23
KNAN - Known now, actionable now - These are policies that only use resources (aircraft
and requirements) that are actionable now.
KNAF - Known now, actionable future - These policies use information about resources
that are actionable in the future.
These abbreviations are used to specify the information content of a policy. The remainder of the
section describes the spectrum of policies that are
3.1 Rule-based policies
Our rule-based policy is denoted (RB:R-A) (rule-based, one requirement and one aircraft). In the
Time-Phased Force Deployment Data (TPFDD), we pick the first available requirement and check
the first available aircraft to see whether this aircraft can deliver the requirement. In this feasibility
check, we examine whether the aircraft and the requirement are at the same airbase, whether the
aircraft can handle the size of the cargo in the requirement, as well as whether the en-route and
destination airbases can accommodate the aircraft. If the aircraft cannot deliver the requirement,
we check the second available aircraft, and so on, until we find an aircraft that can deliver this
requirement. If it can, we up-load the requirement to that aircraft and update the remaining weight
of that requirement. After the first requirement is finished, we move to the second requirement.
We continue this procedure for the requirement in the TPFDD until we finish all the requirements.
See Figure 3 for an outline of this policy.
Step 1: Pick the first available remaining requirement in the TPFDD.Step 2: Pick the first available remaining aircraft.
Step 3: Do the following feasibility check:Are the aircraft and the requirement at the same airbase?Can the aircraft handle the requirement?Can the en-route and destination airbases accommodatethe aircraft?
Step 4: If the answers are all yes in Step 3, deliver therequirement by that aircraft and update the remainingweight of that requirement.
Step 5: If that requirement is not finished, go to Step 2.Step 6: If there are remaining requirement, go to Step 1.
Figure 3: Policy (RB:R-A)
24
This policy is a rule-based policy, which does not use a cost function to make the decision. We
use the information set It = Rtt in policy (RB:R-A), that is, the actionable time is the same as the
knowable time.
3.2 Myopic cost-based policies
In this section, we describe a series of myopic, cost-based policies which differ in terms of how many
requirements and aircraft are considered at the same time.
Policy (MP:R-AL)
In policy (MP:R-AL) (myopic policy, one requirement and a list of aircraft), we choose the first
requirement that needs to be moved, and then create a list of potential aircraft that might be used
to move the requirement. Now we have a decision vector instead of a scalar (as occurred in our
rule-based system). As a result, we now need a cost function to identify the best out of a set of
decisions. Given a cost function, finding the best aircraft out of a set is a trivial sort. However,
developing a cost function that captures the often unstated behaviors of a rule-based policy can be
a surprisingly difficult task.
There are two flavors of policy (MP:R-AL), namely known now and actionable now (MP:R-
AL/KNAN) as well as known now, actionable in the future (MP:R-AL/KNAF). In the policy
(MP:R-AL/KNAN), the information set It = (Rtt, ct) is used and in the policy (MP:R-AL/KNAF),
the information set It = ((Rtt′)t′≥t, ct) is used. We explicitly include the costs as part of the
information set. Solving the problems requires that we sort the aircraft by the contribution and
choose the one with the highest contribution.
Policy (MP:RL-AL)
This policy is a direct extension of the policy (MP:R-AL). However, now we are matching a
list of requirements to a list of aircraft, which requires that we solve a linear program instead of
a simple sort. The linear program (actually, a small integer program) has to balance the needs of
multiple requirements and aircraft.
There are again two flavors of policy (MP:RL-AL), namely known now and actionable now
(MP:RL-AL/KNAN) as well as known now, actionable in the future (MP:RL-AL/KNAF). In
the policy (MP:RL-AL/KNAN), the information set is It = (Rtt, ct) and in the policy (MP:RL-
25
Q
Q
Q
Q
Requirements Aircrafts
Q
Q
Q
Q
Requirements Aircrafts
Q
Q
Q
Q
Requirements Aircrafts
4a: Policy (RB:R-A) 4b: Policy (MP:R-AL) 4c: Policy (MP:RL-AL)
Figure 4: Illustration of the information content of myopic policies. (a) A rule-based policy considersonly one aircraft and one requirement at a time. (b) The policy (R-AL) considers one requirementbut sorts over a list of aircraft. (c) The policy (RL-AL) works with a requirement list and anaircraft list all at the same time.
AL/KNAF), the information set is It = ((Rtt′)t′≥t, ct). If we include aircraft and requirements that
are known now but actionable in the future, then decisions that involve these resources represent
plans that may be changed in the future.
Figure 4 illustrates the rule-based and myopic cost-based policies.
3.3 Rolling horizon policy
Our next step is to bring into play forecasted activities, which refer to information that is not
known now.
A rolling horizon policy considers not only all aircraft and requirements that are known now
(and possibly actionable in the future), but also forecasts of what might become known now. In
the rolling horizon policy, we use the information that might become available within the planning
horizon: It = (Rt′t′′)t′′≥t′ , ct′ |t′, t′′ ∈ T pht . The structure of the decision function is typically the
same as with a myopic policy (but with a larger set of resources). However, we generally require the
information set It to be treated deterministically for practical reasons. As a result, all forecasted
information has to be modeled using point estimates. If we use more than one outcome, we would
be solving sequences of stochastic programs.
26
3.4 Approximate dynamic programming policy
An approximate dynamic programming policy employs a value function approximation to evaluate
the impact of decisions made in one time period on another time period. The value function
evaluates the reward of the resources in certain state. We define:
Vt(Rt) = The value of resource vector Rt at time t.
We are not able to compute the value function exactly, so we replace it with an approximation
Vt(Rt) which is a linear approximation of the resource vector:
Vt(Rt) =∑t′>t
vt,t′aRt,t′a
To illustrate the use of the value function, consider the example of a requirement that has to move
from England to Australia using either a C-5A or a C-5B. A problem is that the route from England
to Australia involves passing through airbases that are not prepared to repair a C-5B if it breaks
down, which might happen with a 20 percent probability. When a breakdown occurs, additional
costs are incurred to complete the repair, and the aircraft is delayed, possibly producing a late
delivery (and penalties for a late delivery). Furthermore, the amount of delay, and the cost of the
breakdown, can depend on whether there are other comparable aircraft present at those airbases
at the time.
A cost-based policy might assign an aircraft to move this requirement based on the cost of
getting the aircraft to the requirement. It is also possible to include the cost of moving the load,
but this gets more complicated when the additional costs of using the C-5B are purely related to
random elements. A rule-based policy would tend to either ignore the issue, or include a rule that
prevents completely the use of C-5B’s on certain routes. We could resort to Monte Carlo sampling
to represent the random events, but the decision of which aircraft to use needs to be made before
we know whether the aircraft will break down (or the status at the intermediate airbases). Monte
Carlo methods are useful for simulating the dynamics of the system, but are more limited when
simulating the information available to make a decision.
Approximate dynamic programming computes an approximation of the expected value of, say,
a C-5B in England, which would reflect both the costs and potential delays that might be incurred
if the C-5B is used on requirements going to Australia. The values vt,t′a would be computed using
27
observations of the value of a C-5B over multiple iterations. These values can use Monte Carlo
samples of breakdowns, including both repair costs and penalties for late deliveries. The statistics
vt,t′a represent an average over different samples, and as such approximate an expectation. A brief
description of how the value function is estimated is given in section 4.
In policy (ADP), we represent the information set as It = (Rt), ct, Vt, where Vt = (Vtt′)t′>t
represents the set of values vt,at′ that we described earlier.
3.5 Expert knowledge
All mathematical models require some level of simplification, often because of a simple lack of
information. As a result, a solution may be optimal but incorrect in the eyes of a knowledgeable
expert. In effect, the expert knows how the model should behave, reflecting information that is not
available in the model. We represent expert knowledge in the form of low-dimensional patterns,
such as “avoid sending C-5B’s on routes through India.” Simulation models capture this sort of
knowledge easily within their rules, but typically require that the rules be stated as hard constraints,
as in “never send C-5B’s on routes through India.”
To capture these patterns as well as the pattern aggregation on an attribute or a decision, we
define:
P = A collection of different patterns.
Gpa = Aggregation function for pattern p ∈ P that is applied to the attribute vector a.
Gpd = Aggregation function for pattern p ∈ P that is applied to the decision d.
a = An aggregated attribute vector. For pattern p, a ∈ Gpa(A).
d = An aggregated decision. For pattern p, d ∈ Gpd(D).
ρp
ad= The percent of the time that we should act on a resource with attribute in the set
a : Gpa(a) = a with a decision in the set d : Gp
d(d) = d.
ρp = (ρp
ad)a∈Gp
a(A),a∈Gpd(D).
ρ = (ρp)p∈P .
It is unlikely that we can match a given pattern exactly (in fact, different pattern classes may
conflict with each other), but it does provide a means for guiding the model. To determine how
28
well our current solution matches a pattern, we have to express our decision vector xt and resource
vector Rt at an aggregated level. Using the notation IX = 1 if X is true, define:
xp
tad= Gp(x), the aggregated decision,
=∑
a∈A∑
d∈Daxtad · IGp
a(a)=a · IGpd(d)=d
Rpt,ta =
∑a∈A Rt,ta · IGp
a(a)=a, the aggregated resources,
=∑
a∈A(∑
d∈Daxtad
)· IGp
a(a)=a
ρp
tad=
xp
tad
Rpt,ta
, the instantaneous pattern flow,
xp
ad=
∑t xp
tad, the aggregated decision over the entire horizon,
Rpa(x) =
∑t Rp
t,ta(x), the total number of resources with attribute a over the entire horizon,
ρp
ad=
xp
ad
Rpa, the (average) pattern flow from the model.
If we wish our flows to follow the exogenous pattern ρp
adthen we would like the model pattern
flow ρp
adto be as close as possible to ρp
ad, or equivalently, we want xp
adto be as close as possible to
(Rpaρ
p
ad). Following Marar & Powell (2002) and Powell et al. (to appear), we incorporate patterns
into our cost model using a proximal point term of the form:
xt(θ) = Xπt (θ) = arg max
xt∈Xt
∑a∈A
∑d∈D
ctadxtad + θ∑p∈P
Rpa(ρ
p
ad− ρp
ad)2 · Ia,d
(15)
Where, Ipa,d = IGp
a(a)=a,Gpd(d)=d. As θ increases, the model is given a greater emphasis on trying
to match its own flows xp
adwith the flows that would be predicted by the pattern, Rp
aρp
ad. The
algorithm for incorporating patterns is based on the work of Marar & Powell (2002), using a
modification introduced in Powell et al. (to appear) to improve stability.
We represent the information content of a knowledge-based decision as It = Rt, ct, Vt, ρ.
3.6 Discussion
In this section we have introduced a series of policies, each characterized by increasing information
sets. The policies are summarized in table 2, listed in order of the information content of each
policy. Research has shown that the adaptive policy ADP can compete with linear programming
solvers on deterministic problems. For single and multi-commodity flow problems, the results are
29
Policy Information Classes Decision Functions
Rule-based It = Rtt (RB:R-A)
Myopic cost-based, one requirement to alist of aircraft, known now and action-able now
It = (Rtt, ct) (MP:R-AL/KNAN)
Myopic cost-based, one requirement to alist of aircraft, known now and action-able in the future
It = ((Rtt′)t′≥t, ct) (MP:R-AL/KNAF)
Myopic cost-based, a list of requirementsto a list of aircraft, known now and ac-tionable now
It = (Rtt, ct) (MP:RL-AL/KNAN)
Myopic cost-based, a list of requirementsto a list of aircraft, known now and ac-tionable in the future
It = ((Rtt′)t′≥t, ct) (MP:RL-AL/KNAF)
Rolling horizon It = (Rtt′)t′≥t, ct′ |t′ ∈ T pht (RH)
Adaptive It = (Rtt′)t′≥t, ct′ , Vtt′ |t′ ≥ t (ADP)
Expert knowledge It = (Rtt′)t′≥t, ct′ , Vtt′ , ρ|t′ ≥ t (EK)
Table 2: Information classes and decision functions for different policies
near-optimal, and they significantly outperform deterministic approximations on stochastic datasets
(Godfrey & Powell (2002), Topaloglu & Powell (2002), Spivey & Powell ((to appear)). Since the
algorithmic strategy involves repeatedly simulating the system, these results are achieved without
losing the generality of simulation (but it does require that the simulation be run iteratively). Like
any cost model, the policy based on the ADP information set means that analysts can change the
behavior of the model primarily by changing costs. The final information set, EK (which we have
presented as including the value functions), allows us to manipulate the behavior of the model by
changing the exogenous patterns. Needless to say, the imposition of exogenous patterns will not, in
general, improve the results of the model as measured by the cost function (or even other statistics
such as throughput).
4 Approximating the value function
In this section, we provide a brief introduction to the computation of value function approximations
for resource allocation problems.
It is well known that problems of the form (14) can be solved via Bellman’s optimality equations
(see, for example, Puterman (1994)). The optimality equations are:
Vt(Rt) = maxxt∈Xt Ct(Rt, xt) + E [Vt+1(Rt+1)|Rt] (16)
where Xt is the feasible region for time period t. The challenge is in computing the value function
30
Vt(Rt). In this section, we briefly describe a strategy for approximating solving these large-scale
dynamic programs in a computationally tractable way. The discussion here is complete, but brief.
For a more detailed presentation, see Godfrey & Powell (2002) and Powell & Topaloglu (to appear).
Our algorithmic approach involves several strategies. We first begin by replacing the value func-
tion with a continuous approximation, which we denote by Vt(Rt). Recalling that Rt = (Rtt′)t′>t,
we first assume that Vt(Rt) is separable in Rtt′ . This allows us to write:
Vt(Rt) = maxxt∈Xt
Ct(Rt, xt) + E
∑t′≥t+1
Vt+1,t′(Rt+1,t′)|Rt
(17)
where we use Vt(Rt) as a placeholder. Even with an approximate value function, the expectation in
equation (17) is typically impossible to compute. Even replacing the expectations with a summation
over a small sample produces intractably large subproblems, which are especially difficult when we
are interested in integer solutions. One strategy is to drop the expectation and solve the problem
for a single sample realization. However, at time t, we do not know Rt+1. Thus, we introduce a
post-decision state variable that is known at time t:
Rxt = The post-decision state variable at time t. It is the state after the decision xt
was made and before the new information Rt+1 has arrived. We follow previous
convention and let Rxt = (Rx
tt′)t′>t and Rxtt′ = (Rx
t,t′a′)a′∈A.
The equation for the post-decision resource vector is given by:
Rxt = Rt + Atxt (18)
Clearly, Rt+1 = Rxt + Rt+1. The evolution of states, action and information is given by:
R0, x0, Rx0 , R1, R1, x1, R
x1 , . . . , Rt, Rt, xt, R
xt , . . .
When we formulate the optimality recursion around the state variable Rxt , we obtain:
Vt−1(Rxt−1) = E
maxxt∈Xt
Ct(Rt, xt) + Vt(Rx
t )|Rxt−1
(19)
We may now drop the expectation and solve for a single sample realization:
V nt−1(R
xt−1, ω
nt ) = maxxt(ωn
t )∈Xt(ωnt )
Ct(Rt, xt, ω
nt ) + V n−1
t (Rxt (xt, ω
nt ))
(20)
where,
31
V n−1t = The approximation of the value function Vt at time t and iteration n− 1.
V nt−1 = A place holder to hold the temporary value function at time t− 1 and iteration n.
Of course, the solution that we obtain from solving this problem is only a sample realization, but
our goal is not to find the optimal action (for a particular outcome) but rather to build the function
Xπt (Rt) that can be used to find an action given a state Rt. Our “policy” is to solve functions of
the form:
Xπt (Rt, Vt) = arg maxxt∈Xt(ωt)
Ct(Rt, xt, ωt) + Vt(Rx
t (xt, ωt))
(21)
Choosing the best policy, then, can be viewed as choosing the best function V .
We choose a functional form for the value function approximation so that the optimization
problem in equation (20) is computationally tractable. It is especially nice when the structure
of (20) is the same as the structure of maxx∈XCt(xt). For example, it is often the case that
the problem maxx∈XCt(xt) is a transportation problem. We would like to choose a functional
approximation for Vt(Rxt ) that retains the basic structure of the original problem as a network
problem. Clearly, a linear approximation has this property. The value function approximation
would now be given by:
V nt,t′(R
xtt′) =
∑a′∈A
vnt′a′R
xt,t′a′ ∀t′ > t (22)
V nt (Rx
t ) =∑t′>t
V nt,t′(R
xtt′) (23)
A linear approximation has the advantage that it is easy to estimate. Let xnt be the optimal
solution to this problem, and let vnt−1,a be the dual variable for the resource constraint (8). vn
t−1,a
is, of course, a subgradient of V nt−1(R
xt−1), and therefore forms the basis of a linear approximation
for the value function. Because the gradient can fluctuate from iteration to iteration, we perform
smoothing:
vnt−1,a = (1− αn)vn−1
t−1,a + αnvnt−1 ∀a ∈ A (24)
In principle, the step size should satisfy the standard conditions from stochastic approximation
theory that∑∞
n=1 αn = ∞ and∑∞
n=1(αn)2 < ∞. In practice, we have found that a constant
stepsize works well, or the McClain stepsize:
αn+1 = αn/(1 + αn − α)
32
which declines arithmetically to the limit point α which might be a value such as 0.05.
This approximation strategy has been showed to produce solutions that are near-optimal when
applied to deterministic fleet management problems, and which outperform deterministic approxi-
mations when applied to problems under uncertainty (Godfrey & Powell (2002), Topaloglu & Powell
(2002)).
5 Numerical experiments
To test the optimizing simulator with policies of different levels of information, we use an unclassified
TPFDD data set for a military airlift problem. The problem is to manage six aircraft types (C-5A,
C-5B, C-17, C-141B, KC-10A and NBC) to move a set of requirements of cargo and passengers
between the USA and Saudi Arabia, where the total weight of the requirements is about four times
the total capacities of all the aircraft. In the simulation, a typical delivery trip (pick up plus loaded
movement) needs four days to complete, thus all requirements need roughly 16 days to be delivered
if the capacities of all the aircraft are used. The simulation horizon is 40 days, divided into four
hour time intervals. Moving a requirement involves being assigned to a route which will bring the
aircraft through a series of intermediate airbases for refueling and maintenance. One of the biggest
operational challenges of these aircraft is that their probability of a failure of sufficient severity to
prevent a timely takeoff ranges between 10 and 25 percent. A failure can result in a delay or even
require an off-loading of the freight to another aircraft. To make the model interesting, we assume
that an aircraft of type C-141B (regardless of whether it is empty or loaded) has a 20 percent
probability of failure, and needs five days to be repaired if it fails at an airbase in region E (airbase
code names starting with E are located primarily in Northern Europe). All other aircraft types or
airbases are assumed to be able to repair the failures without delay.
The TPFDD file does not capture when the information about a requirement becomes known.
For our experiments, we assumed that requirements are known two days before they have to be
moved. Aircraft, on the other hand, are either actionable now (if they are empty and on the ground)
or are actionable at the end of a trip that is in progress. We assume that there are three types of
costs involved in the military airlift problem: transportation costs (0.01 cent per mile per pound of
capacity of an aircraft), aircraft repair costs (12 cents per period per pound of capacity of a broken
aircraft.) and penalties for late deliveries (4 cents per period per pound of requirement delivered
33
late).
We have such a rich family of models that it would become clumsy if we compare all the policies
introduced in section 3. To focus on the main idea of this paper, we run the optimizing simulator
on the following policies: (1) rule based, one requirement to one aircraft (RB:R-A), (2) cost based,
one requirement to a list of aircraft that are knowable now/actionable now (MP:R-AL/KNAN),
(3) cost based, a list of requirements to a list of aircraft that are knowable now/actionable now
(MP:RL-AL/KNAN), (4) the above policy but with aircraft that are knowable now/actionable in
the future (MP:RL-AL/KNAF), (5) the approximate dynamic programming policy (ADP) and (6)
the policy that uses expert knowledge (EK) on (MP:RL-AL/KNAF). The first five classes should
provide improved solutions as they are added, while the inclusion of expert knowledge is improving
the solution in terms of user acceptability. We did not explicitly test rolling horizon procedures
since this would have required generating a forecast of future events from the TPFDD. This would
be straightforward in the context of a civilian application such as freight transportation where
historical activities would form the basis of a forecast, but for these applications a historical record
does not exist.
We use three measures of solution quality. The first is the traditional measure of the objective
function. It is important to emphasize that this is an imperfect measure; simulation models capture
a number of issues in the rules that govern behavior that may not be reflected in a cost function.
The second measure is throughput which is of considerable interest in the study of airlift problems.
Our cost function captures throughput indirectly through costs that penalize late deliveries. Finally,
when we study the use of expert knowledge, we measure the degree to which the model matches
exogenously specified patterns.
Figure 5 shows the costs for each of the first five policies. The total cost is the sum of transporta-
tion costs, late delivery costs, and aircraft repair costs. The late delivery costs decrease steadily as
the information set increases. The repair costs are decreasing too and we notice that the repair cost
is significantly reduced in policy (ADP) since this policy learns from the early iterations and avoids
sending aircraft to airbases that lead to a longer repair time. However, the detour increases the
transportation cost of policy (ADP) slightly compared to policy (MP:RL-AL/KNAF). The other
transportation costs are decreasing when the information sets are increasing. The overall total
costs are decreasing as we expected since the information sets are increasing.
The throughputs of five different policies (RB:R-A), (MP:R-AL/KNAN), (MP:RL-AL/KNAN),
34
Costs of different policies
0
50
100
150
200
250
(RB:R-A)(MP:R-AL/KNAN)
(MP:RL-AL/KNAN)
(MP:RL-AL/KNAF)
(ADP)
Policies
Millio
nD
ollo
rs
Total cost
Travel cost
Late cost
Repair cost
Figure 5: Costs of different policies. Policy (RB:R-A) is rule based, one requirement to one aircraft.Policy (MP:R-AL/KNAN) is cost based, one requirement to a list of aircraft that are knowablenow actionable now. Policy (MP:RL-AL/KNAN) is cost based, a list of requirements to a list ofaircraft that are knowable now, actionable now. Policy (MP:RL-AL/KNAF) is cost based, a listof requirements to a list of aircraft that are knowable now actionable future. Policy (ADP) is theapproximate dynamic programming policy.
(MP:RL-AL/KNAF) and (ADP), as well as the cumulative expected throughput are plotted in
Figure 6. The throughputs also follow the above sequence, from the right to the left. It is clear
that the richer the information class is, the faster the delivery is, i.e. the closer to the left the
throughput curve is. Since some of the throughput curves cross each other, we calculate the area
between the expected throughput and the throughput curves of different policies and plot them
in Figure 7. These areas actually measure the lateness of the delivery of different policies. The
smaller the area is, the faster the delivery is. We may see that from policy (RB:R-A) to (ADP),
the areas are decreasing from 395 million to 119 million (pounds·days).
We tested the inclusion of expert knowledge by using an exogenous pattern to control the
35
Throughput curves of policies
0
5
10
15
20
25
30
35
40
45
50
0 30 60 90 120 150 180 210
Mil
lio
ns
Time periods
Po
un
ds
Cumulative expected thruput
(RB:R-A)
(MP:R-AL/KNAN)
(MP:RL-AL/KNAN)
(MP:RL-AL/KNAF)
(ADP)
Figure 6: Throughput curves of different policies, showing cumulative pounds delivered over thesimulation.
percentage of C-5A’s and C-5B’s through region E. Starting with the best myopic policy (MP:RL-
AL/KNAF), we found that C-5A’s and B’s went through region E 18.5 percent of the time. We
then imposed a single exogenous pattern on the attribute/decision pair (a = C-5A or C-5B) and
(d = region E). We then varied the exogenous pattern ρp
adfrom 0 to 0.5 in steps of 0.10. For each
value of ρp
ad, we varied the pattern weight θ over the values 0, 3, 5, 10, 100, 1000.
The results are shown in figure 8. The horizontal line corresponds to θ = 0, and we also show
36
Areas between the cumulative expected thruput curve
and different policy thruput curves
0
50
100
150
200
250
300
350
400
(RB:R-A)(MP:R-AL/KNAN)
(MP:RL-AL/KNAN)
(MP:RL-AL/KNAF)
(ADP)
Millio
ns
Policy
Po
un
d*
days
Figure 7: Areas between the cumulative expected throughput curve and the throughput curves ofdifferent policies.
a 45 degree dashed line representing the case that the model matches the pattern exactly. For the
remaining runs, varying θ for different values of ρp
adproduces a series of lines that are bounded by
the no-pattern and the exact-match lines. Choosing the correct value of θ is a subjective judgment.
Presumably, the cost function has been carefully designed, which means that deviations between
the solution that matches the exogenous pattern may still be acceptable.
These results indicate that we can retain the ability that exists in traditional simulators to
guide the behavior of the cost model using simple rules. It is important to emphasize, however,
that our exogenous patterns are restricted in their complexity. A rule must be a function of the
attribute/decision pair (a, d), which means that the rule may not reflect, for example, information
about other aircraft (the cost model must pick this up).
37
Pattern flow v .s. expert knowledge
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6
Expert knowledge
Pa
tte
rnfl
ow
q=0
q=3
q=5
q=10
q=100
q=1000
Exact match
Figure 8: Pattern match under different θ weights
6 Conclusions
This paper demonstrates that we can create a spectrum of models for the military airlift problem,
ranging from a classical simulation through a model based on approximate dynamic programming
that has been shown in other research (on simpler problems) to produce near optimal solutions (and
solutions that outperform deterministic approximations when uncertainty is present). Finally, we
demonstrate that we can combine the strengths of optimization with rule-based simulation by
incorporating low-dimensional patterns through a proximal point term.
Our experimental work shows that increasing the information content of a decision does in fact
improve performance, both in terms of the objective function (measured in terms of costs) and
throughput. We also showed that we can push the solution in the direction of an exogenously
38
specified pattern to produce what an expert might describe as more realistic behavior. The last
two policies (ADP and EK) both require running the simulator iteratively, but we were encouraged
that we obtained very reasonable results in just three iterations, suggesting that the additional
computational burden is not too large. We also note that the simulator can be warm-started,
which means that we can perform a series of training iterations after which the results are stored.
Then, the analysis of new scenarios can be performed with fewer iterations.
References
Baker, S., Morton, D., Rosenthal, R. & Williams, L. (2002), ‘Optimizing military airlift’, OperationsResearch 50(4), 582–602. 1, 2, 4, 5, 6
Brooke, A., Kendrick, D. & Meeraus, A. (1992), GAMS: A User’s Guide, 2.25 edn, DuxburyPress/Wadsworth Publishing Company, Belmont, CA. 6
Burke, J. F. J., Love, R. J. & Macal, C. M. (2002), ‘Modelling force deployments from armyinstallations using the transportation system capability (TRANSCAP) model: A standardizedapproach’, Submitted to Mathematical and Computer Modelling 0(0), 00. 2
Crino, J. R., Moore, J. T., Barnes, J. W. & Nanry, W. P. (2002), ‘Solving the theater distribu-tion vehicle routing and scheduling problem using group theoretic tabu search’, Submitted toMathematical and Computer Modelling 0(0), 17. 2
Dantzig, G. & Ferguson, A. (1956), ‘The allocation of aircrafts to routes: An example of linearprogramming under uncertain demand’, Management Science 3, 45–73. 2
Ferguson, A. & Dantzig, G. B. (1955), ‘The problem of routing aircraft - a mathematical solution’,Aeronautical Engineering Review 14, 51–55. 1
Godfrey, G. & Powell, W. B. (2002), ‘An adaptive, dynamic programming algorithm for stochasticresource allocation problems I: Single period travel times’, Transportation Science 36(1), 21–39.30, 31, 33
Goggins, D. A. (1995), Stochastic modeling for airlift mobility, Master’s thesis, Naval PostgraduateSchool, Monterey, CA. 1, 2
Granger, J., Krishnamurthy, A. & Robinson, S. M. (2001), Stochastic modeling of airlift operations,in B. A. Peters, J. S. Smith, D. J. Medeiros & M. W. Rohrer, eds, ‘Proceedings of the 2001 WinterSimulation Conference’. 2
Killingsworth, P. & Melody, L. J. (1997), Should c17s be deployed as theater assets?: An applicationof the conop air mobility model, Technical report rand/db-171-af/osd, Rand Corporation. 1
Marar, A. & Powell, W. B. (2002), Using static flow patterns in time-staged resource allocationproblems, Technical report, Princeton University, Department of Operations Research and Fi-nancial Engineering. 29
Mattock, M. G., Schank, J. F., Stucker, J. P. & Rothenberg, J. (1995), New capabilities for strategicmobility analysis using mathematical programming, Technical report, RAND Corporation. 1
Midler, J. L. & Wollmer, R. D. (1969), ‘Stochastic programming models for airlift operations’,Naval Research Logistics Quarterly 16, 315–330. 2
Morton, D. P., Rosenthal, R. E. & Lim, T. W. (1996), ‘Optimization modeling for airlift mobility’,Military Operations Research pp. 49–67. 1
39
Morton, D. P., Salmeron, J. & Wood, R. K. (2002), ‘A stochastic program for optimizing militarysealift subject to attack’, 00 0(0), 36. 1, 3, 8
Niemi, A. (2000), Stochastic modeling for the NPS/RAND Mobility Optimization Model,Department of Industrial Engineering, University of Wisconsin-Madison, Available:http://ie.engr.wisc.edu/robinson/Niemi.htm. 2
Powell, W. B. & Topaloglu, H. (to appear), Stochastic programming in transportation and logistics,in A. Ruszczynski & A. Shapiro, eds, ‘Handbook in Operations Research and Management Science,Volume on Stochastic Programming’, North Holland, Amsterdam. 31
Powell, W. B., Shapiro, J. A. & Simao, H. P. (2001), A representational paradigm for dynamicresource transformation problems, in R. F. C. Coullard & J. H. Owens, eds, ‘Annals of OperationsResearch’, J.C. Baltzer AG, pp. 231–279. 4, 9, 17
Powell, W. B., Wu, T. T. & Whisman, A. (to appear), ‘Using low dimensional patterns in opti-mizing simulators: An illustration for the airlift mobility problem’, Mathematical and ComputerModeling 0(0), 000–000. 29
Puterman, M. L. (1994), Markov Decision Processes, John Wiley and Sons, Inc., New York. 30
Rosenthal, R., Morton, D., Baker, S., Lim, T., Fuller, D., Goggins, D., Toy, A., Turker, Y., Horton,D. & Briand, D. (1997), ‘Application and extension of the Thruput II optimization model forairlift mobility’, Military Operations Research 3(2), 55–74. 1, 2
Schank, J. F., Stucker, J. P., Mattock, M. G. & Rothenberg, J. (1994), New capabilities for strategicmobility analysis: Executive summary, Technical report, RAND Corporation. 3
Schank, J., Mattock, M., Sumner, G., Greenberg, I. & Rothenberg, J. (1991), A review of strategicmobility models and analysis, Technical report, RAND Corporation. 2
Spivey, M. & Powell, W. B. ((to appear)), ‘The dynamic assignment problem’, TransportationScience 0(0), 00–00. 30
Stucker, J. P. & Berg, R. T. (1999), Understanding airfield capacity for airlift operations, Technicalreport, RAND Corporation. 1
Topaloglu, H. & Powell, W. B. (2002), Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems, Technical report, Princeton University, Depart-ment of Operations Research and Financial Engineering. 30, 33
Wing, V., Rice, R. E., Sherwood, R. & Rosenthal, R. E. (1991), Determining the optimal mobilitymix, Technical report, Force Design Division, The Pentagon, Washington D.C. 1
Yost, K. A. (1994), The thruput strategic airlift flow optimization model, Technical report, AirForce Studies and Analyses Agency, The Pentagon, Washington D.C. 1
40