Distributionally Robust Linear and Discrete Optimization ... · Distributionally Robust Linear and...

Distributionally Robust Linear and DiscreteOptimization with Marginals

Louis Chen1 Will Ma1 Karthik Natarajan3 James Orlin1

David Simchi-Levi1,2 Zhenzhen Yan4

1Operations Research CenterMassachusetts Institute of Technology

2Institute for Data, Systems, and SocietyMassachusetts Institute of Technology

3Singapore University of Technology and Design

4Nanyang Technological University

October 2018

October 2018 1 / 26

Outline of Talk

1 Motivation: Distributionally Robust Max Flow as an Example

2 Main Results and Connection to Other Results

October 2018 2 / 26

Optimization under Uncertainty

Stochastic Optimization:

infr∈C

Eu∼θ [Z (r , u)]

Robust Optimization:infr∈C

supu∈Ω

Z (r , u)

Distributionally Robust Optimization:

infr∈C

supθ∈P

Eu∼θ [Z (r , u)]

Specification of P: Moments, Marginals, Distributions within somedistance around a “reference” distribution - often guided by expressivepower, computational and calibration issues

October 2018 3 / 26

Problem of Interest

P := Γ(µ1, . . . , µn) is the set of joint distributions consistent withgiven marginals µ1, . . . , µn.

Suppose X is some finite set (potentially very large and might not beexplicitly specified) where:

Z (u) = maxχ∈X

uᵀχ

We are interested in:supθ∈P

Eu∼θ [Z (u)]

Consider a distributionally robust max flow problem where r is acapacity decision vector which needs to be made on arcs in additionto a random capacity vector u and Z (u + r) is the random maximumflow on the network:

maxr∈C

infθ∈P

Eu∼θ [Z (u + r)]

October 2018 4 / 26

Distributionally Robust Max Flow

(Max-flow) Z (u) = max v

s.t.∑

j :(i ,j)∈A

xij −∑

j :(j ,i)∈A

xji =

v , i = s

0, i 6= s, t

−v , i = t

0 ≤ xij ≤ uij , (i , j) ∈ A(Min-cut) Z (u) = min uᵀχ

s.t. χ ∈ Xcut .

October 2018 5 / 26

Distributionally Robust Max Flow with Marginals

October 2018 6 / 26

Dual Form: A Lower Bound

Let w be a vector of arc capacities, θ be a probability measure in theset Γ.

We obtain a lower bound on the expected maximum flow:

Eu∼θ [Z (w)− Z (u)] ≤∑

(i ,j)∈A

∫max(wij − uij , 0)dµij

=⇒ Eu∼θ [Z (u)] ≥ maxw

Z (w)−

∑(i ,j)∈A


October 2018 7 / 26

Primal Form: An Upper Bound

Let (u, χ) be a random vector, where u is consistent with marginalsand χ is a random cut-set incidence vector.

infθ∈Γ

Eu∼θ [Z (u)] = infθ∈Γ

Eu∼θ

[minχ∈Xcut

uᵀχ

]≤ E(u,χ) [uᵀχ]

Search over (u, χ) for the tightest bound:1 Find a distribution ν ∈ P(Xcut) for χ. (conv(Xcut) ≡ P(Xcut))2 Construct u||χ(in minimal fashion) s.t. u is a coupling consistent with

the marginals

How does the construction work in step 2?Use u||χ = ⊗ uij ||χij (conditionally independent) where χij = 1 w.p.Πijν(1) and χij = 0 w.p. Πijν(0).

How do we define uij ||χij in a minimizing fashion?

October 2018 8 / 26


Such feasible couplings yields an upper bound of the form:

infθ∈Γ

Eu∼θ [Z (u)] ≤ E(u,χ) [uᵀχ]

= Eχ

∑(i ,j)∈A

E [uij χij ||χ]

= Eχ

∑(i ,j)∈A

E [uij χij ||χij ]

=∑

(i ,j)∈A

Πijν(1)E [uij ||χij = 1]

October 2018 9 / 26


infθ∈Γ

Eu∼θ [Z (u)] ≤ minν∈P(Xcut)

∑(i ,j)∈A

∫ Πijν(1)

0F−1µij

(q) dq.

October 2018 10 / 26

Primal-Dual Formulations

Dual: maxwZ (w)−

∑(i ,j)∈A


Concave maximization with a max-flow problem with deterministiccapacities that needs to be optimizedUnivariate expectations

Primal: minν∈P(Xcut)

∑(i ,j)∈A

∫ Πijν(1)0 F−1

µij(q) dq

Convex minimization over the s-t cut polytopeSuffices to find a distribution ν∗ over the family of s-t cutsν∗ 7→ θ∗

How to find ν∗?

October 2018 11 / 26

Finding ν∗ Under Finite-Supported Marginals

October 2018 12 / 26

Decoding ν∗ from π∗

We can restrict to probability measures supported on “nested” collectionsof (s-t)-cuts (submodularity of cut-capacity function).

October 2018 13 / 26

Main Results

Theorem (Maximization Form)

Let Z ∗ := maxθ∈Γ(µ1,...,µn)

Ec∼θ

[maxx∈X

cᵀx

]where Eµi |ci | <∞ for all i . Then,

Z ∗ = maxν∈P(X )

n∑i=1

∑xi∈Xi

xi

∫ Πiν((−∞,xi ])

Πiν((−∞,xi−ei ])F−1µi

(t)dt,

= minψi :Xi→Rni=1

(max

ν∈P(X )

∑x∈X

[n∑

i=1

ψi (xi )

]ν(x) +

n∑i=1

∫ψ∗i (ci )dµi

),

where ψ∗i (ci ) = maxxi (cixi − ψi (xi )). Let νrel denote the optimal solution.If µ1, . . . , µn are absolutely continuous w.r.t. the Lebesgue measure, thereexists some suitably defined measurable selection x∗ : Rn → X of xOPT

s.t. the “persistence values” Pc∼θ∗(x∗i (c) = xi ) are given by:

Pc∼θ∗(x∗i (ci ) = xi ) = Πiνrel(xi ), ∀xi ∈ Xi , ∀i ∈ [n].

October 2018 14 / 26

Connection to Previous Results

Natarajan, Song and Teo (2009) provided the primal formulation forabsolutely continuous random variables - the result can be generalizedto arbitrary marginals using techniques from optimal transport theory.

Under the assumption of absolutely continuous marginals, one canshow that if ν and τ both solve the primal formulation, thenΠiν = Πiτ for all i .

This provides justification to use it in choice modeling - for examplethe model can recreate the multinomial logit choice probabilities foran appropriate choice of Γ (see Mishra, Natarajan, Padmanabhan,Teo and Li (2014)).

Natarajan, Song and Teo (2009) study Z ∗ where X is the feasibleregion to a bounded integer program (such as integer knapsack).Using a binary reformulation, they obtain tractable upper bounds, butthe complexity of finding Z ∗ is not discussed.

October 2018 15 / 26

Main Results: Hardness

Proposition (Related to Mangasarian and Shiau (1986))

Computing Z ∗ for the class of linear optimization problems given discretemarginal distributions and a H-polytope is NP-hard.

Related results:

Computing the worst-case expected value of a function of binaryrandom variables with fixed marginal probabilities is NP-hard, evenwhen the function is submodular - MAX-CUT problem (see Agrawal,Ding, Saberi and Ye (2012)).

Computing the worst-case expected value in distributionally robustlinear optimization problems with a given mean and covariance matrixis NP-hard - 2-norm maximization over a polytope (see Bertsimas,Doan, Natarajan and Teo (2010)).

October 2018 16 / 26

Main Results: Tractable Instances

Assume that the expected value of the univariate convex functionsmaxxi (cixi − ψi (xi )) and the subgradients are efficiently computable.

X describes as V-polytope:

maxν∈P(X )

∑x∈X

[n∑

i=1

ψi (xi )

]ν(x) = max

x∈X

n∑i=1

ψi (xi ).

X describes the extreme points to a 0-1 H-polytope (P = conv(X )):

maxν∈P(X )

∑x∈X

[n∑

i=1

ψi (xi )

]ν(x) = max

x∈P

n∑i=1

ψi (1)xi .

Example: Max-flow (min-cut)

Prior research: Meilijson and Nadas (1979) - longest path on adirected acylic graph (PERT), Birge and Maddox (1995) - PERT withmarginal moments, Bertsimas, Natarajan and Teo (2004, 2006) - 0/1optimization problem with marginal moments

October 2018 17 / 26

Main Results: Tractable Instances

Theorem

Suppose there exists a compact extended formulation of conv(X ) as:

Πx

(x , y) : y ∈ P, xi =∑xi∈Xi

xi

nxi∑j=1

yFxij

for F xij ∈ B, ∀j , ∀i

,

where nxi is a finite integer for each i , xi ∈ Xi and P is a 0-1 polytope ofthe form:

P ⊆

y ∈ [0, 1]B :∑xi∈Xi

nxi∑j=1

yFxij

= 1, ∀i ∈ [n]

,

then Z ∗ is efficiently computable.

October 2018 18 / 26

Main Results: Appointment Scheduling

n patients, service time of patient i is random with ci ∼ µi , si is theservice time scheduled for patient i :

mins∈S

supθ∈Γ(µ1,...,µn)

Ec∼θ [Z (s, c)] ,

where:

S =

s ∈ Rn :

n∑i=1

si ≤ T , si ≥ 0 ∀i ∈ [n]

Z (s, c) = maxn∑

i=1

(ci − si )xi

s.t. xi − xi−1 ≥ −1, ∀i = 2, . . . , n,

xn ≤ 1,xi ≥ 0, ∀i = 1 . . . , n

In this case, you can find such a compact extended formulation (thiswas first identified by Mak, Rong and Zhang (2016) with mean andvariance information).

October 2018 19 / 26

Main Results: Scheduling with Ranking

n jobs, single machine, duration of job i is random with ci ∼ µi , ti isthe amount by which job duration is reduced. This is decided beforeknowing the true realization or arrival sequence and the objective is tominimize sum of completion times:

mint∈T

supθ∈Γ(µ1,...,µn)

Ec∼θ [Z (t, c)] ,

where:

Z (t, c) := maxn∑

i=1

(ci − ti )xi

s.t. x ∈ Xperm.

In this case, you can find such a compact extended formulation usingthe Bikrhoff polytope for permutations.

October 2018 20 / 26

Main Results: Scheduling with Random Irregular StartingTime Costs

n jobs that need to be scheduled within a fixed time horizon0, 1, . . . ,T where job j ∈ N incurs a random cost cj(Sj) = c0

j (Sj)εjif it is started at time Sj .

Precedence constraints among two jobs i and j : Sj ≥ Si + dij wheredij is an integer number imposing a time lag between the jobs.Precedence among jobs results in directed graph with no cycles ofpositive length.

A lower bound on the total cost:

infθ∈Γ(µ1,...,µn)

E

minS∈S

n∑j=1

cj(Sj)

.In this case, you can find such a compact extended formulation usingthe time-indexed polytope.

October 2018 21 / 26

Main Results: (Near) Sufficiency of Tractability Conditions

Given m ≤ n linearly independent vectors x i , define:

(Parallelotope) Q =m∑i=1

[−x i , x i ].

Suppose x i ∈ −1, 0, 1n. Extreme point entries in −m, . . . ,m:

Extr(Q) =m∑i=1

εixi with |εi | = 1 for all i .

Πx

(x , ε) : x =

m∑i=1

εixi ,−1 ≤ εi ≤ 1,∀i

.

Corollary (Related to Bodlaender, Grizmann, Klee and Leeuwen(1990))

Computing Z ∗ for linear optimization problems over a parallelotope of theform Q =

∑mi=1[−x i , x i ], in which all x i ∈ −1, 0, 1n is NP-hard.

October 2018 22 / 26

Connection to Known Results: Dependence

Price of correlations where Z (·, ·) ≥ 0 (see Agrawal, Ding, Saberi andYe (2012)):

supθ∈Γ(µ1,...,µn) Eθ [Z (r , u)]

minr∈C supθ∈Γ(µ1,...,µn) Eθ [Z (r , u)]=

“From Independent Coupling”

“From Worst-case Coupling”

wherer ∈ arg min

r∈CEu∼µ1⊗...⊗µn [Z (r , u)]

Z (·, u) - monotone and submodular in u, POC ≤ e/(e − 1) (othergeneralizations discussed in their paper).

Z (·, u) - monotone and supermodular in u, POC can be very large.

October 2018 23 / 26

Connection to Known Results: Independence

E [Z (u)] might however be #P-hard to compute for discreteindependent distributions for many types of functions:

Submodular functions (Z (u) = minuᵀχ : χ ∈ Xcut)Supermodular functions (Z (u) = (

∑i ui − K )+).

Given a network where the edges are subject to random failure,independently and each with equal probability p, the probability thatthe failed edges does not contain a s-t cut is #P-hard to compute(see Provan and Ball (1983)).

#P - Set of the counting problems associated with the decisionproblems in the set NP. This class was introduced by Valiant in 1979.

Counting version of NP-hard problems are #P-hard.

However, there are easy decision problems, for which the countingversions can be hard (number of perfect matchings in a bipartitegraph).

October 2018 24 / 26

Known Results: Independence

A project is specified by precedence relations among tasks. Taskdurations are independent random variables with discrete, finiteranges. Then,

Computing a value of the cumulative distribution function of projectduration is #P-hard.Computing the mean of the distribution is at least as hard.Neither of the problems can be computed in time polynomial in thenumber of points in the range of the project duration unless P = NP(see Hagstrom 1988).

Only in special cases such as series-parallel graphs, with restrictedassumptions on the randomness (such as binary random variables),these problems can be solved in polynomial time (see Ball, Colbournand Provan (1995), Mohring (2001)).

October 2018 25 / 26

Concluding Remarks

Bounds under this set of distributions have been extensively studied inrisk and insurance - P(

∑i ci ≥ T ), E (

∑ci − T )+, P(maxi ci ≥ T ),

E (max ci − T )+.

However, in operations research and operations management, ourinterest is often in more complicated decision-making problems withconstraints - interplay of optimization and probability is often achallenge and needs to be carefully analyzed.

Some applications where this model has been studied includes facilitylocation design (Lu, Ran and Shen (2014)), appointment scheduling(Mak, Rong and Zhang (2016)), traffic equilibrium (Arikan,Ahipasaolgu and Natarajan (2018) and multi-product pricing (Yan,Cheng, Natarajan and Teo (2018)).

October 2018 26 / 26

Distributionally Robust Linear and Discrete Optimization ... · Distributionally Robust Linear and...

Documents

Transcript of Distributionally Robust Linear and Discrete Optimization ... · Distributionally Robust Linear and...