MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...

1

MADS - Mesh Adaptive Direct Searchfor constrained optimization

Mark Abramson,Charles Audet,Gilles Couture,John Dennis,

www.gerad.ca/Charles.Audet/

Thanks to: ExxonMobil, AFOSR, Boeing, LANL,

FQRNT, NSERC, SANDIA, NSF.

Rice 2004

2

Outline

! Statement of the optimization problem

2

Outline


! The GPS and MADS algorithm classes

2

Outline



! GPS theory and limiting examples

2

Outline




! The MADS algorithm class

2

Outline




! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

2

Outline





! Discussion

Rice 2004

3

Outline





! Discussion

Rice 2004

4

The target problem

(NLP ) minimize f(x)

subject to x ∈ Ω,

where f : Rn → R ∪ {∞} may be discontinuous and:

4

The target problem



where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,

4

The target problem




! the functions provide few correct digits and may faileven for x ∈ Ω

4

The target problem




! the functions provide few correct digits and may faileven for x ∈ Ω

! accurate approximation of derivatives is problematic

Rice 2004

5

minimize f(x)

subject to x ∈ X,

where f : Rn → R ∪ {∞} may be discontinuous and:

5

minimize f(x)

subject to x ∈ X,

where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced

by simulations or output of MDO codes

5

minimize f(x)

subject to x ∈ X,



! the functions provide few correct digits and may faileven for x ∈ X

5

minimize f(x)

subject to x ∈ X,



! the functions provide few correct digits and may faileven for x ∈ X

! accurate approximation of derivatives is problematic

! surrogate models s ≈ f and P ≈ X may be available

Rice 2004

6

Goals – or validation of the method

NOMAD Algo(NLP ) ! x̂!

6



if f is continuously differentiable then ∇f(x̂) = 0

6



if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)

6



if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)if f is Lipschitz near x̂ then 0 ∈ ∂f(x̂)

Rice 2004

7

Clarke Calculus – for f Lipschitz near x

! Clarke generalized derivative at x in the direction v:

f ◦(x; v) = lim supy→x, t↓0

f(y + tv)− f(y)t

.

7




f(y + tv)− f(y)t

.

! The generalized gradient of f at x is the set

∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}

7




f(y + tv)− f(y)t

.


∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.

7




f(y + tv)− f(y)t

.


∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.

! f ◦(x; v) can be obtained from ∂f(x) :f ◦(x; v) = max{vTζ : ζ ∈ ∂f(x)}.

Rice 2004

8

Outline





! Discussion

Rice 2004

9

The two iterated phases of GPS and MADS

! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.

9



! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.

9



! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.

! This talk focusses on the basic algorithm, and theconvergence analysis. In the next talks, Alison, Markand Gilles will talk about surrogates in the search.

Rice 2004

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗

∗ is the incumbent solution

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

Surrogate

10

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

Surrogate

,

,

Rice 2004

11

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗

∗ is still the incumbent solution

•

•

•

11

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

•

We poll near the incumbent solution

11

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

•


••

•

11

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗


•

•

•


••

•


•

••

Rice 2004

12

!

"

##

##

##

##

##

##

##

###$

f

x1

x2

∗

∗

New iteration from the same incumbentsolution, but on a finer mesh Rice 2004

13

Positive spanning sets and meshes

! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.

13



! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows

Mk = {xk + ∆mk Dz : z ∈ N|D|}.

13



! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows

Mk = {xk + ∆mk Dz : z ∈ N|D|}.

!xk

Ex: D=[I; −I]

Rice 2004

14

Basic pattern search algorithm forunconstrained optimization

Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,

14


Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do

1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆

mk or

∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);

14


Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do

1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆

mk or

∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);

2. Else if xk minimizes f(x) for x ∈ Pk, then set xk+1 = xkand ∆mk+1 = ∆

mk /2 (xk is called a minimal frame center).

Rice 2004

15

The Coordinate Search (CS) frame Pk

Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

∆mk = 1

"xk

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

∆mk+1 =12

"xk+1

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

∆mk+1 =12

"xk+1

!p1

!p2

!p3

!p4

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

∆mk+1 =12

"xk+1

!p1

!p2

!p3

!p4

∆mk+2 =14

"xk+2

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

∆mk+1 =12

"xk+1

!p1

!p2

!p3

!p4

∆mk+2 =14

"xk+2 !p1

!p2!p3

!p4

15



∆mk = 1

"xk !p1

!p2

!p3

!p4

∆mk+1 =12

"xk+1

!p1

!p2

!p3

!p4

∆mk+2 =14

"xk+2 !p1

!p2!p3

!p4

Always the same 2n = 4 directions, regardless of ∆k.Rice 2004

16

The GPS frame Pk

Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

∆mk = 1

"xk

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

∆mk+1 =12

"xk+1

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

∆mk+1 =12

"xk+1""

""

""!p1

!p3

!p2

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

∆mk+1 =12

"xk+1""

""

""!p1

!p3

!p2

∆mk+2 =14

"xk+2

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

∆mk+1 =12

"xk+1""

""

""!p1

!p3

!p2

∆mk+2 =14

"xk+2!!

!

""

"

!p1!p2

!p3

16

The GPS frame Pk


∆mk = 1

"xk!!

!!

!!

!!

!!

!!p1

!p2

!p3

∆mk+1 =12

"xk+1""

""

""!p1

!p3

!p2

∆mk+2 =14

"xk+2!!

!

""

"

!p1!p2

!p3

Here, only 14 different ways of selecting Dk, regardless of ∆k.Rice 2004

17

Outline





! Discussion

Rice 2004

18

Convergence results – unconstrained GPS

If all iterates are in a compact set, then lim infk

∆mk = 0

18



∆mk = 0! The mesh is refined only at a minimal frame center.

18




! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.

18




! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.

18





! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.

18





! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.Let D̂ ⊆ D be the set of poll directions used infinitelyoften in the refining subsequence.D̂ is the set of refining direction.

Rice 2004

19

Set of refining directions D̂

%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

•xk3''

''

'

(((((

))

)))

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

•xk3''

''

'

(((((

))

)))

•xk4%

%%&

&&

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

•xk3''

''

'

(((((

))

)))

•xk4%

%%&

&&

•xk5%%&&

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

•xk3''

''

'

(((((

))

)))

•xk4%

%%&

&&

•xk5%%&&...•

x̂=lim xkk∈K

19


%%

%%

%%

%%•xk1+∆

mk1

d2

&&

&&

&&

&&•xk1+∆

mk1

d1

•xk1+∆

mk1

d3

•xk1

•xk2%

%%

%%

&&

&&

&

•xk3''

''

'

(((((

))

)))

•xk4%

%%&

&&

•xk5%%&&...•

x̂=lim xkk∈K

%%

%%*

&&

&&+

,D̂

Rice 2004

20


If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

20



! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.

20



! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set

of directions that positively span Rn.f◦(x̂; d) := lim sup

y→x̂,t↓0

f(y + td)− f(y)t

≥ limk∈K

f(xk + ∆kd)− f(xk)∆k

20





y→x̂,t↓0

f(y + td)− f(y)t

≥ limk∈K


! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.

20





y→x̂,t↓0

f(y + td)− f(y)t

≥ limk∈K


! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.

! f is strictly differentiable at x̂ ⇒ ∇f(x̂) = 0.

Rice 2004

21

Limitations of GPS

GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.

21

Limitations of GPS


GPS with empty search: The iterates stall at x0.

#$

%

&

f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%

&

21

Limitations of GPS


GPS with empty search: The iterates stall at x0.

#$

%

&

f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%

&

Even with a C1 function, GPS may generate infinitelymany limit points, some of them non-stationary.

Rice 2004

22

GPS convergence to a bad solution

!

"

a

b"

""

""

""

""

""

""

""

""

""

""

"""

""

""""

""

"""

" ""

""

""

"""

""

""

""

""

""

""

""

""

"""

""

""

""

""

Level Sets

22


!

"

a

b"

""

""

""

""

""

""

""

""

""

""

"""

""

""""

""

"""

" ""

""

""

"""

""

""

""

""

""

""

""

""

"""

""

""

""

""

Level Sets (0, 0) 4∈ ∂f(0, 0)

#

%

""

""

""

""

""

""

""

•

•

•

∂f(0, 0)

22


!

"

a

b"

""

""

""

""

""

""

""

""

""

""

"""

""

""""

""

"""

" ""

""

""

"""

""

""

""

""

""

""

""

""

"""

""

""

""

""

Level Sets (0, 0) 4∈ ∂f(0, 0)

#

%

""

""

""

""

""

""

""

•

•

•

∂f(0, 0)

GPS iterates – with a bad strategy – converge to theorigin, where the gradient exists and is nonzero(f is differentiable at (0, 0) but not strictly differentiable).

Rice 2004

23

Outline





! Discussion

Rice 2004

24

The MADS frames

In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk.

24

The MADS frames

In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

∆mk =14,∆

pk = 1

"xk

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

∆mk =14,∆

pk = 1

"xk"p2

"p1

"p3

''''''

((

((

((

(((

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

∆mk =14,∆

pk = 1

"xk"p2

"p1

"p3

''''''

((

((

((

(((

∆mk =116,∆

pk =

12

"xk

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

∆mk =14,∆

pk = 1

"xk"p2

"p1

"p3

''''''

((

((

((

(((

∆mk =116,∆

pk =

12

"xk "p2"p1

"p3

))

)

***

24

The MADS frames


pk = n

√∆mk .

∆mk = 1, ∆pk = 2

"xk

"p3

"p1

"p2

!!

!!

!!

!!

!!

!

∆mk =14,∆

pk = 1

"xk"p2

"p1

"p3

''''''

((

((

((

(((

∆mk =116,∆

pk =

12

"xk "p2"p1

"p3

))

)

***

Number of ways of selecting Dk increases as ∆pk gets smaller.Rice 2004

25

Barrier approach to constraints

To enforce Ω constraints, replace f by a barrier objective

fΩ(x) :={

f(x) if x ∈ Ω,+∞ otherwise.

This is a standard construct in nonsmooth optimization.

25



fΩ(x) :={



Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.

25



fΩ(x) :={



Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.

Quality of the limit solution depends the localsmoothness of f , not of fΩ.

Rice 2004

26

A MADS instance

note: GPS = MADS with ∆pk = ∆mk .

26

A MADS instance


An implementable way to generate Dk:

26

A MADS instance



! Let B be a lower triangular nonsingular random integermatrix.

26

A MADS instance




! Randomly permute the lines of B

26

A MADS instance




! Randomly permute the lines of B

! Complete to a positive basis

• Dk = [B; −B] (maximal positive basis 2n directions). or• Dk = [B; −

∑b∈B b] (minimal positive basis n+1 directions).

• Use Luis’ talk to order the poll directions

Rice 2004

27

Dense polling directions

Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).

27



The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.

27



The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.

Then the barrier approach to constraints promises strongoptimality under weak assumptions - the existence of ahypertangent vector, e.g., a vector that makes a negativeinner product with all the active constraint gradients.

Rice 2004

28

MADS convergence results

Let f be Lipschitz near a limit x̂ of a refining sequence.

Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

28




Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:

28





f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

28





f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω

28





f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω : −∇f(x̂)Tv ≤ 0,∀v ∈ TCoΩ (x̂).

Rice 2004

29

A problem for which GPS stagnates

10 5 0 5 10

3

2

1

0

1

2

3

x

y

� 4 � 3.5 � 3 � 2.5 � 2

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

x

y

Rice 2004

30

Our results

Rice 2004

31

Results for a chemE pid problem

Rice 2004

32

Constrained optimization

A disk constrained problem

minx,y

x + y

s.t. x2 + y2 ≤ 6

How hard can that be?

32

Constrained optimization

A disk constrained problem

minx,y

x + y

s.t. x2 + y2 ≤ 6

How hard can that be?

Very hard for GPS and filter-GPS with the standard 2ndirections with an empty search

Rice 2004

33

50 100 150 200 250 300 350 400 450

!3.46

!3.44

!3.42

!3.4

!3.38

!3.36

Number of function evaluations

fdynamic 2n directions

GPS FilterGPS BarrierMADS (5 runs)

Rice 2004

34

Parameter fit in a rheology problemRheology is a branch of mechanics that studiesproperties of materials which determine theirresponse to mechanical force.

MODEL :Viscosity η of a material can be modelled as afunction of the shear rate γ̇i :

η(γ̇) = η0(1 + λ2γ̇2)β−1

2

A parameter fit problem.

Rice 2004

35

Observation Strain rate Viscosityi γ̇i (s−1) ηi (Pa · s)1 0.0137 32202 0.0274 21903 0.0434 1640 The unconstrained4 0.0866 1050 optimization problem :5 0.137 7666 0.274 4907 0.434 348 min g(η0,λ,β)8 0.866 223 η0,λ,β

9 1.37 163 with10 2.74 104

11 4.34 76.7 g =∑13

i=1 |η(γ̇)− ηi|12 5.46 68.113 6.88 58.2

Rice 2004

36

500 1000 1500 2000 2500

50

100

150

200

250


f

Coordinate searchFixed CompleteFixed opportunistDynamic opportunistDynamic opportunist with cache

Rice 2004

37

500 1000 1500 2000 2500

50

100

150

200

250


f

GPS with n+1 directionsNo searchLHS searchRandom search

Rice 2004

38

500 1000 1500 2000 2500

50

100

150

200

250


f

MADS with n+1 directionsNo searchLHS searchRandom search

Rice 2004

39

Discussion

! MADS variant looks good as a 1st try

39

Discussion

! MADS variant looks good as a 1st try,and is more general than the instance shown here.

39

Discussion


! Numerically, randomness is a blessing and a curse.

39

Discussion



! MADS can handle oracular or yes/no constraints.

39

Discussion




! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.

39

Discussion




! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.

! MADS is the result of nonsmooth analysis pointing upthe weaknesses in GPS.

Rice 2004

40

Later today...

! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.

40

Later today...

! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.

! MADS replaces GPS in our NOMADm and NOMADsoftwares. Gilles and Mark will present a demo of thesesofwares after lunch.

Rice 2004

MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...

Documents

Transcript of MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...