1
MADS - Mesh Adaptive Direct Searchfor constrained optimization
Mark Abramson,Charles Audet,Gilles Couture,John Dennis,
www.gerad.ca/Charles.Audet/
Thanks to: ExxonMobil, AFOSR, Boeing, LANL,
FQRNT, NSERC, SANDIA, NSF.
Rice 2004
2
Outline
! Statement of the optimization problem
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
3
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
! the functions provide few correct digits and may faileven for x ∈ Ω
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
! the functions provide few correct digits and may faileven for x ∈ Ω
! accurate approximation of derivatives is problematic
Rice 2004
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
! the functions provide few correct digits and may faileven for x ∈ X
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
! the functions provide few correct digits and may faileven for x ∈ X
! accurate approximation of derivatives is problematic
! surrogate models s ≈ f and P ≈ X may be available
Rice 2004
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)if f is Lipschitz near x̂ then 0 ∈ ∂f(x̂)
Rice 2004
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.
! f ◦(x; v) can be obtained from ∂f(x) :f ◦(x; v) = max{vTζ : ζ ∈ ∂f(x)}.
Rice 2004
8
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.
! This talk focusses on the basic algorithm, and theconvergence analysis. In the next talks, Alison, Markand Gilles will talk about surrogates in the search.
Rice 2004
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
Surrogate
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
Surrogate
,
,
Rice 2004
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
••
•
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
••
•
∗ is still the incumbent solution
•
••
Rice 2004
12
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
New iteration from the same incumbentsolution, but on a finer mesh Rice 2004
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows
Mk = {xk + ∆mk Dz : z ∈ N|D|}.
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows
Mk = {xk + ∆mk Dz : z ∈ N|D|}.
!xk
Ex: D=[I; −I]
Rice 2004
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do
1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆
mk or
∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do
1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆
mk or
∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);
2. Else if xk minimizes f(x) for x ∈ Pk, then set xk+1 = xkand ∆mk+1 = ∆
mk /2 (xk is called a minimal frame center).
Rice 2004
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2 !p1
!p2!p3
!p4
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2 !p1
!p2!p3
!p4
Always the same 2n = 4 directions, regardless of ∆k.Rice 2004
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2!!
!
""
"
!p1!p2
!p3
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2!!
!
""
"
!p1!p2
!p3
Here, only 14 different ways of selecting Dk, regardless of ∆k.Rice 2004
17
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.Let D̂ ⊆ D be the set of poll directions used infinitelyoften in the refining subsequence.D̂ is the set of refining direction.
Rice 2004
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&...•
x̂=lim xkk∈K
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&...•
x̂=lim xkk∈K
%%
%%*
&&
&&+
,D̂
Rice 2004
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.
! f is strictly differentiable at x̂ ⇒ ∇f(x̂) = 0.
Rice 2004
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
GPS with empty search: The iterates stall at x0.
#$
%
&
f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%
&
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
GPS with empty search: The iterates stall at x0.
#$
%
&
f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%
&
Even with a C1 function, GPS may generate infinitelymany limit points, some of them non-stationary.
Rice 2004
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets (0, 0) 4∈ ∂f(0, 0)
#
%
""
""
""
""
""
""
""
•
•
•
∂f(0, 0)
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets (0, 0) 4∈ ∂f(0, 0)
#
%
""
""
""
""
""
""
""
•
•
•
∂f(0, 0)
GPS iterates – with a bad strategy – converge to theorigin, where the gradient exists and is nonzero(f is differentiable at (0, 0) but not strictly differentiable).
Rice 2004
23
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk.
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk "p2"p1
"p3
))
)
***
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk "p2"p1
"p3
))
)
***
Number of ways of selecting Dk increases as ∆pk gets smaller.Rice 2004
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.
Quality of the limit solution depends the localsmoothness of f , not of fΩ.
Rice 2004
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
! Randomly permute the lines of B
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
! Randomly permute the lines of B
! Complete to a positive basis
• Dk = [B; −B] (maximal positive basis 2n directions). or• Dk = [B; −
∑b∈B b] (minimal positive basis n+1 directions).
• Use Luis’ talk to order the poll directions
Rice 2004
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.
Then the barrier approach to constraints promises strongoptimality under weak assumptions - the existence of ahypertangent vector, e.g., a vector that makes a negativeinner product with all the active constraint gradients.
Rice 2004
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω : −∇f(x̂)Tv ≤ 0,∀v ∈ TCoΩ (x̂).
Rice 2004
29
A problem for which GPS stagnates
10 5 0 5 10
3
2
1
0
1
2
3
x
y
� 4 � 3.5 � 3 � 2.5 � 2
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
x
y
Rice 2004
30
Our results
Rice 2004
31
Results for a chemE pid problem
Rice 2004
32
Constrained optimization
A disk constrained problem
minx,y
x + y
s.t. x2 + y2 ≤ 6
How hard can that be?
32
Constrained optimization
A disk constrained problem
minx,y
x + y
s.t. x2 + y2 ≤ 6
How hard can that be?
Very hard for GPS and filter-GPS with the standard 2ndirections with an empty search
Rice 2004
33
50 100 150 200 250 300 350 400 450
!3.46
!3.44
!3.42
!3.4
!3.38
!3.36
Number of function evaluations
fdynamic 2n directions
GPS FilterGPS BarrierMADS (5 runs)
Rice 2004
34
Parameter fit in a rheology problemRheology is a branch of mechanics that studiesproperties of materials which determine theirresponse to mechanical force.
MODEL :Viscosity η of a material can be modelled as afunction of the shear rate γ̇i :
η(γ̇) = η0(1 + λ2γ̇2)β−1
2
A parameter fit problem.
Rice 2004
35
Observation Strain rate Viscosityi γ̇i (s−1) ηi (Pa · s)1 0.0137 32202 0.0274 21903 0.0434 1640 The unconstrained4 0.0866 1050 optimization problem :5 0.137 7666 0.274 4907 0.434 348 min g(η0,λ,β)8 0.866 223 η0,λ,β
9 1.37 163 with10 2.74 104
11 4.34 76.7 g =∑13
i=1 |η(γ̇)− ηi|12 5.46 68.113 6.88 58.2
Rice 2004
36
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
Coordinate searchFixed CompleteFixed opportunistDynamic opportunistDynamic opportunist with cache
Rice 2004
37
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
GPS with n+1 directionsNo searchLHS searchRandom search
Rice 2004
38
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
MADS with n+1 directionsNo searchLHS searchRandom search
Rice 2004
39
Discussion
! MADS variant looks good as a 1st try
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.
! MADS is the result of nonsmooth analysis pointing upthe weaknesses in GPS.
Rice 2004
40
Later today...
! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.
40
Later today...
! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.
! MADS replaces GPS in our NOMADm and NOMADsoftwares. Gilles and Mark will present a demo of thesesofwares after lunch.
Rice 2004
Top Related