Optimization under uncertainty: modeling and solution...

MotivationCrude Monte Carlo methods

Improving scenario generationStability in stochastic programming

Optimization under uncertainty:modeling and solution methods

Paolo BrandimarteDipartimento di Scienze Matematiche

Politecnico di Torino

e-mail: [email protected]: http://staff.polito.it/paolo.brandimarte

Lecture 5: Scenario generation for stochastic programming

P. Brandimarte – Dip. di Scienze Matematiche Optimization Under Uncertainty



REFERENCES

P. Brandimarte. Numerical methods for finance and economics: A Matlab-based introduction, (2nd ed.).Wiley, 2006.

P. Brandimarte. Handbook in Monte Carlo simulation: Applications in financial engineering, riskmanagement, and economics. Wiley, late 2013.

J. Dupacová. Stability and sensitivity analysis for stochastic programming. Annals of Operations Research,27 (1990) 115-142.

H. Heitsch, W. Roemisch. Scenario reduction algorithms in stochastic programming. ComputationalOptimization and Applications, 24 (2003) 187-206.

J.L. Higle. Variance reduction and objective function evaluation in stochastic linear programs. INFORMSJournal of Computing, 10 (1998) 236-247.

K. Hoyland, S.W. Wallace. Generating scenario trees for multistage decision problems. ManagementScience, 47 (2001) 296-307.

G. Infanger. Planning under uncertainty: solving large-scale stochastic linear programs Boyd & Fraser,1998.

A.J. King, S.W. Wallace. Modeling with stochastic programming. Springer, 2012.

D.P. Kroese, T. Taimre, Z.I. Botev. Handbook of Monte Carlo methods. Wiley, 2011.

S.T. Rachev. Probability metrics and the stability of stochastic models. Wiley, 1991.




OUTLINE

1 MOTIVATION

2 CRUDE MONTE CARLO METHODS

3 IMPROVING SCENARIO GENERATION

4 STABILITY IN STOCHASTIC PROGRAMMING




The need for scenario generation

THE NEED FOR SCENARIO GENERATION

The problem involving the “true” distribution of random variables,

min cT x + Ehq(ξ)T y(ξ)

i(1)

s.t. Ax = b (2)

Wy(ξ) + T(ξ)x = h(ξ), ∀ξ ∈ Ξ (3)

x, y(ξ) ≥ 0 (4)

is typically intractable, either because the distribution is continuous, orbecause there are too many scenarios.

Then, we consider the discretized, or reduced, problem:

min cT x +Xs∈S

πsqTs ys (5)

s.t. Ax = b (6)

Wys + Tsx = hs, ∀s ∈ S (7)

x, ys ≥ 0 (8)




The need for scenario generation

THE NEED FOR SCENARIO GENERATION

A few algorithms (stochastic decomposition and stochastic dual dynamicprogramming) interleave scenario generation (sampling) and optimization.

They generate statistical estimates of cutting planes and test optimalityconditions statistically.

Here we just consider simpler methods that generate one scenario treebefore optimization, within a two-stage setting.

Even so, we have to deal with difficult issues:

How can we generate scenarios?

How many scenarios should we include?

How can we check the quality of the solution we obtain?

We should not take for granted that Monte Carlo sampling, the seeminglyobvious choice, is the best method for scenario generation.




Numerical integrationMonte Carlo sampling

NUMERICAL INTEGRATION

Monte Carlo methods should be regarded as numerical integration methods.

They can be used to estimate the expected value of a function g(·) of arandom variable X with probability density fX (x):

E[g(X )] =

Z +∞

−∞g(x)fX (x) dx .

Note that probabilities, too, can be regarded as expected values:

P(A) =

Z +∞

−∞IA(x)fX (x) dx ,

where IA(x) is the indicator function for event A.

In stochastic optimization, we are interested in a function defined by anexpectation:

H(z) = EX [g(X , z)] =

Z +∞

−∞g(x , z)fX (x) dx .





DETERMINISTIC GRID-BASED METHODS

Consider the problem of approximating the value of a definite integral like

I[f ] =

Z b

af (x) dx

over a bounded interval [a, b] for a function f of a single variable.

Since the integration is a linear operator, it is natural to look for anapproximation preserving this property. Using a finite number of values of fover a set of nodes xj such that

a = x0 < x1 < · · · < xN = b,

we define a quadrature formula, characterized by weights wj and nodes xj :

Q[f ] =nX

j=0

wj f (xj).

The idea does not scale well to high dimensions.





MONTE CARLO INTEGRATION

Consider an integral on the unit interval [0, 1]:

I =

Z 1

0g(x) dx .

We may think of this integral as the expected value E[g(U)], where U is auniform random variable on the interval (0, 1), i.e., U ∼ (0, 1).

What we have to do is generating a sequence {Ui} of independent randomobservations from the uniform distribution and then evaluate the samplemean: bIm =

1m

mXi=1

g(Ui).

The strong law of large numbers implies that, with probability 1,

limm→∞

bIm = I.





MC MECHANICS: (PSEUDO-)RANDOM VARIATE

GENERATION

The raw material of any Monte Carlo simulation is a stream of independentrealizations of a uniform random variable U(0, 1). An array of methods isoffered by software tools like R or Matlab.

Then, a transformation is applied in order to sample from another distribution(see Kroese et. al, 2011). The simplest one is the inverse transform method:

1 draw a random number U ∼ U(0, 1),2 return X = F−1(U), where F (x) = P{X ≤ x} is the CDF of X .

Thus, we see that a Monte Carlo simulation is essentially a way to estimate amultidimensional integral of some function on a unit hypercube:Z

[0,1]dF(x) dx,

where [0, 1]d = [0, 1]× [0, 1]× · · · × [0, 1].





CONVERGENCE OF MONTE CARLO METHODS

We are quite familiar with confidence intervals of the form

X ± tn−1,1−α/2S√n

.

Indeed, convergence of Monte Carlo estimates (assuming that suitableconditions hold) is related to σ/

√n.

Good news: this rate of convergence does not depend on problemdimensionality.

Bad news: this rate of convergence is slow. How many additional replicationsdo we need to gain one order of magnitude in terms of precision?

Brute force certainly does not work with stochastic programming.




Variance reductionGaussian quadratureLow-discrepancy sequencesOther deterministic strategies for scenario generation

IMPROVING SCENARIO GENERATION

Alternative ideas:

Monte Carlo with variance reduction

Gaussian quadrature

Low discrepancy sequences (quasi-Monte Carlo methods)

Optimized scenario generation by property matching

Optimal scenario reduction





VARIANCE REDUCTION

In order to improve convergence, which depends on σ/√

n, we may tryreducing σ, rather than increasing the sample size.Several variance reduction strategies are available:

Antithetic sampling

Common random numbers

Control variates

Importance sampling

Conditional Monte Carlo

Latin Hypercube sampling

Stratified sampling

See (Brandimarte, 2006) or (Brandimarte, 2013) for details, and (Higle, 1998)or (Infanger, 1998) for applications to stochastic programming.





ANTITHETIC SAMPLING

In plain Monte Carlo, we generate a sequence of independent observations.However, inducing some correlation in a clever way may be helpful.

Consider the idea of generating a sequence of paired replications (X (i)1 , X (i)

2 ),i = 1, . . . , n:

X (1)1 X (2)

1 . . . X (n)1

X (1)2 X (2)

2 . . . X (n)2 .

These observations are “horizontally” independent, in the sense that X (i1)j

and X (i2)k are independent however we choose j, k = 1, 2, provided i1 6= i2.

Thus, the pair-averaged observations X (i) = (X (i)1 + X (i)

2 )/2 are independent,and we may build a confidence interval based on them.





ANTITHETIC SAMPLING

The variance of the sample mean X (n) based on the observations X (i) is

Var[X (n)] =Var(X (i))

n

=Var(X (i)

1 ) + Var(X (i)2 ) + 2 Cov(X (i)

1 , X (i)2 )

4n

=Var(X )

2n(1 + ρ(X1, X2)) .

In order to reduce variance, we should take negatively correlated replicationswithin each pair.

To induce a negative correlation, we may use a random number sequence{Uk} for the first replication in each pair, and then {1−Uk} in the second one.

However, the negative correlation in the input streams need not inducenegative correlation in the output, unless some monotonicity conditionapplies to the whole simulation chain.





GAUSSIAN QUADRATURE

A typical way to measure the quality of a quadrature formula is the order ofthe polynomials for which the quadrature formula is exact.

In Newton–Cotes formulas we fix nodes and try to find suitable weights sothat the order of the formula is as large as possible.

In Gaussian quadrature nodes and weights are jointly determined, in order toimprove order.

Gaussian quadrature formulas are associated with non-negative weightfunction w(x), and we look for a quadrature formula likeZ b

aw(x)f (x) dx ≈

nXi=1

wi f (xi), (9)

In our setting, w(x) can be interpreted as a probability density.Gauss–Hermite quadrature, where w(x) = e−x2

, is clearly connected withcomputing the expected value of a function of a normal random variable.





GAUSSIAN QUADRATURE

Let Y be a random variable with normal distribution N (µ, σ2). Then

E[f (Y )] =

Z +∞

−∞

1√2πσ

e−12 (

y−µσ )

2

f (y) dy .

In order to use weights and nodes from a Gauss–Hermite formula, we needthe following change of variable

−x2 = −12

“ y − µ

σ

”2⇒ y =

√2σx + µ ⇒ dy√

2σ= dx .

Hence

E[f (Y )] ≈ 1√π

nXi=1

wi f (√

2σxi + µ).

We may interpret the formula as a discretized distribution, with values xi andprobabilities wi .





GAUSSIAN QUADRATURE: NUMERICAL EXAMPLE

From the properties of the lognormal distribution, we know that ifX ∼ N (µ, σ2), then

EheX

i= eµ+σ2/2

Let µ = 4 and σ = 2, then eµ+σ2/2 = 403.4288.

Using a sample size n = 1000, crude Monte Carlo in R yields:> set.seed(55555)> mean(exp(rnorm(1000,mu,sigma)))355.9714> mean(exp(rnorm(1000,mu,sigma)))387.6861> mean(exp(rnorm(1000,mu,sigma)))449.087> mean(exp(rnorm(1000,mu,sigma)))

329.0448

We observe a lot of variability.





GAUSSIAN QUADRATURE: NUMERICAL EXAMPLE

The following R script uses the statmod package and illustrates theprecision achieved with a few nodes:

# script to check Gauss-Hermite quadraturerequire(statmod)mu <- 4sigma <- 2numNodes <- c(5,10,15,20)trueValue <- exp(mu + sigma^2/2)for (i in 1:length(numNodes)){

out <- gauss.quad(numNodes[i], kind="hermite")w <- out$weights/sqrt(pi)x <- sqrt(2)*sigma*out$nodes + muapproxValue <- sum(w*exp(x))percError <- 100*abs(trueValue-approxValue)/trueValuecat("N=",numNodes[i]," True=",trueValue, " Approx=",

approxValue, " %error", percError,"\n")}

N= 5 True= 403.4288 Approx= 398.6568 %error 1.182865N= 10 True= 403.4288 Approx= 403.4286 %error 5.537709e-05N= 15 True= 403.4288 Approx= 403.4288 %error 1.893283e-10N= 20 True= 403.4288 Approx= 403.4288 %error 9.863052e-14





LOW-DISCREPANCY SEQUENCES

A Monte Carlo simulation is the calculation of an integral over a unithypercube. Hence, we should fill the unit hypercube in the most “uniform”way.

Consider a sequence P =`x1, x2, . . . , xN´

of N points in the m-dimensionalhypercube Im = [0, 1]m ⊂ Rm.

If the points are “well distributed,” their number included in any subset G of Im

should be roughly proportional to its volume vol(G).

Given a point z = (z1, z2, . . . , zm), consider the rectangular subset Gz

Gz = [0, z1)× [0, z2)× · · · × [0, zm),

with volume z1z2 · · · zm.

Let SP(G) be a function counting the number of points in the sequence Pthat are contained in a subset G ⊂ Im.





LOW-DISCREPANCY SEQUENCES

The star discrepancy of the sequence P is defined as:

D(P) = supz∈Im

˛SP(Gz)

N− z1z2 · · · zm

˛.

When computing a multidimensional integral on the unit hypercube, it isnatural to look for low-discrepancy sequences.

There is a theoretical result suggesting that low-discrepancy sequences mayperform better than pseudorandom sequences.

We know that the estimation error with Monte Carlo simulation is somethinglike O(1/

√N), where N is the sample size.

Sequences achieving a star discrepancy of order O(ln N)m/N are calledlow-discrepancy.





HALTON SEQUENCES IN MATLAB

The simplest low-discrepancy sequence is the Halton sequence, in which aprime number is associated with each dimension and the corresponding Vander Corput sequence is generated.

This is a sequence in the unit interval [0, 1], and it based on a simple recipe:

Represent an integer number n in a base b, where b is a prime number:

n = (· · · d4d3d2d1d0)b

Reflect the digits to obtain a number within the unit interval:

h = (0.d0d1d2d3d4 · · · )b

More formally:

n =∞X

k=0

dk bk ⇒ h(n, b) =∞X

k=0

dk b−(k+1).





HALTON SEQUENCES IN MATLAB

This is the Halton sequence with bases (2, 3, 5), using the Statistics Toolboxof Matlab:» pp = haltonset(3);» net(pp,10)ans =

0 0 00.5000 0.3333 0.20000.2500 0.6667 0.40000.7500 0.1111 0.60000.1250 0.4444 0.80000.6250 0.7778 0.04000.3750 0.2222 0.24000.8750 0.5556 0.44000.0625 0.8889 0.64000.5625 0.0370 0.8400





HALTON VS. PSEUDORANDOM SEQUENCES

The following picture suggests that a Halton sequence provides a moreuniform fill than pseudorandom numbers:





HALTON WITH LARGE PRIME NUMBERS

Unfortunately, Halton sequences have bad properties when large primenumbers are involved. This is the filling with bases 109 and 113:

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1





SOBOL SEQUENCES IN MATLAB

To solve the difficulty, Sobol sequences yield a suitable permutation of theVan der Corput with base 2:» pp = sobolset(5);» net(pp,10)ans =

0 0 0 0 00.5000 0.5000 0.5000 0.5000 0.50000.2500 0.7500 0.2500 0.7500 0.25000.7500 0.2500 0.7500 0.2500 0.75000.1250 0.6250 0.8750 0.8750 0.62500.6250 0.1250 0.3750 0.3750 0.12500.3750 0.3750 0.6250 0.1250 0.87500.8750 0.8750 0.1250 0.6250 0.37500.0625 0.9375 0.6875 0.4375 0.81250.5625 0.4375 0.1875 0.9375 0.3125





PROPERTY MATCHING

This is a deterministic scenario generation approach, based on the idea thatthe discretized distribution should match some properties (e.g., moments) ofthe original distribution.

Consider a multivariate normal variable X, with expected value µ andcovariance matrix Σ. We also know that, for each marginal distribution,skewness ξi = E[(Xi − µi)

3/σ3i ] is zero and kurtosis χi = E[(Xi − µi)

4/σ4i ] is 3.

Say that we want to generate a scenario fan (the single stage of a tree) ofsize S and that each realization has probability 1/S, for the sake of simplicity.Let us denote by xs

i the value of the random variable i in scenario s.

Then, we should have:

1S

Xs

xsi ≈ µi ∀i

1S

Xs

(xsi − µi)(xs

j − µj) ≈ σij ∀i, j

1S

Xs

(xsi − µi)

3

σ3i

≈ 0 ∀i1S

Xs

(xsi − µi)

4

σ4i

≈ 3 ∀i.





PROPERTY MATCHING

Approximate moment matching is obtained by solving the problem:

min w1

Xi

"1S

Xs

xsi − µi

#2

+ w2

Xi,j

"1S

Xs

`xs

i − µi´ `

xsj − µj

´− σij

#2

+ w3

Xi

"1S

Xs

„xs

i − µi

σi

«3#2

+ w4

Xi

"1S

Xs

„xs

i − µi

σi

«4

− 3

#2

The objective function includes four weights wk which may be used to finetune performance.

Also note that this is (in general) a nonconvex problem.See (Hoyland, Wallace, 2001) and (King, Wallace, 2012) for examples andheuristic strategies for the generation of multistage trees.





OPTIMAL SCENARIO REDUCTION

The idea of optimal scenario reduction is to sample a large set of scenarios,resulting in a probability measure P, and then to reduce the tree, resulting ina probability measure Q.

The optimal reduction should minimize a suitable metric for the distancebetween probability measures.

For instance let us consider:

a set of S scenarios ξi =˘ξi

t

¯Tt=1 with probabilities pi ;

a set of S scenarios ξj=

nξ

jt

oT

t=1with probabilities qj .






The Kantorovich distance DK , in this discrete case, is defined as

DK (P, Q) = infSX

i=1

SXj=1

ηijc(ξi , ξj)

s.t.SX

i=1

ηij = qj , ∀j

SXj=1

ηij = pi , ∀i

ηij ≥ 0,

where

c(ξi , ξj) =

TXτ=1

˛ξi

τ − ξjτ

˛






We see that evaluating the distance boils down to solving a transportationproblem.

Given this distance, it is possible to build heuristics in which scenarios aresuccessively eliminated, redistributing probability mass in an optimal way.

It is possible to prescribe scenario cardinality or the quality of theapproximation.

This idea was implemented in the Scenred package, for use with GAMS.See, e.g., (Heitsch, Roemisch, 2003) for more details.




Stability: in-sample and out-of-sample

STABILITY: IN-SAMPLE VS. OUT-OF-SAMPLE

Whatever scenario reduction strategy is employed (deterministic orstochastic) it is important to check the stability of the resulting solution(Dupacová, 1990). There are quite sophisticated studies on stability ofstochastic models (e.g., Rachev, 1991). Here we just consider adown-to-Earth approach (King, Wallace, 2012).

Let us use the streamlined notation

minx

f (x; ξ), minx

f (x; T )

to refer to the problem for the “true” probability distribution and the scenariotree, respectively.

In-sample stability means that if we sample trees Ti and Tj , leading tosolutions x∗i and x∗j , respectively, we obtain

f (x∗i ; Ti) ≈ f (x∗j ; Tj)

Note that we are not requiring stability in the solution itself, but in theobjective value. This is sensible if what matters is cost, profit, etc.




Stability: in-sample and out-of-sample

STABILITY: IN-SAMPLE VS. OUT-OF-SAMPLE

If the scenario tree is generated deterministically, we might compare treeswith slightly different branching structures.

Out of sample stability means

f (x∗i ; ξ) ≈ f (x∗j ; ξ)

In two stage problems, this check is not hard, as it requires solving manysecond stage problems. In multistage models, a (costly) rolling horizonsimulation procedure is needed.

Alternatively, we might check whether

f (x∗i ; Tj) ≈ f (x∗j ; Ti)


Optimization under uncertainty: modeling and solution...

Documents

Transcript of Optimization under uncertainty: modeling and solution...