MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...

130
1 MADS - Mesh Adaptive Direct Search for constrained optimization Mark Abramson, Charles Audet, Gilles Couture, John Dennis, www.gerad.ca/Charles.Audet/ Thanks to: ExxonMobil, AFOSR, Boeing, LANL, FQRNT, NSERC, SANDIA, NSF. Rice 2004

Transcript of MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...

  • 1

    MADS - Mesh Adaptive Direct Searchfor constrained optimization

    Mark Abramson,Charles Audet,Gilles Couture,John Dennis,

    www.gerad.ca/Charles.Audet/

    Thanks to: ExxonMobil, AFOSR, Boeing, LANL,

    FQRNT, NSERC, SANDIA, NSF.

    Rice 2004

  • 2

    Outline

    ! Statement of the optimization problem

  • 2

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

  • 2

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

  • 2

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class

  • 2

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

  • 2

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

    ! Discussion

    Rice 2004

  • 3

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

    ! Discussion

    Rice 2004

  • 4

    The target problem

    (NLP ) minimize f(x)

    subject to x ∈ Ω,

    where f : Rn → R ∪ {∞} may be discontinuous and:

  • 4

    The target problem

    (NLP ) minimize f(x)

    subject to x ∈ Ω,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,

  • 4

    The target problem

    (NLP ) minimize f(x)

    subject to x ∈ Ω,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,

    ! the functions provide few correct digits and may faileven for x ∈ Ω

  • 4

    The target problem

    (NLP ) minimize f(x)

    subject to x ∈ Ω,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,

    ! the functions provide few correct digits and may faileven for x ∈ Ω

    ! accurate approximation of derivatives is problematic

    Rice 2004

  • 5

    minimize f(x)

    subject to x ∈ X,

    where f : Rn → R ∪ {∞} may be discontinuous and:

  • 5

    minimize f(x)

    subject to x ∈ X,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced

    by simulations or output of MDO codes

  • 5

    minimize f(x)

    subject to x ∈ X,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced

    by simulations or output of MDO codes

    ! the functions provide few correct digits and may faileven for x ∈ X

  • 5

    minimize f(x)

    subject to x ∈ X,

    where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced

    by simulations or output of MDO codes

    ! the functions provide few correct digits and may faileven for x ∈ X

    ! accurate approximation of derivatives is problematic

    ! surrogate models s ≈ f and P ≈ X may be available

    Rice 2004

  • 6

    Goals – or validation of the method

    NOMAD Algo(NLP ) ! x̂!

  • 6

    Goals – or validation of the method

    NOMAD Algo(NLP ) ! x̂!

    if f is continuously differentiable then ∇f(x̂) = 0

  • 6

    Goals – or validation of the method

    NOMAD Algo(NLP ) ! x̂!

    if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)

  • 6

    Goals – or validation of the method

    NOMAD Algo(NLP ) ! x̂!

    if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)if f is Lipschitz near x̂ then 0 ∈ ∂f(x̂)

    Rice 2004

  • 7

    Clarke Calculus – for f Lipschitz near x

    ! Clarke generalized derivative at x in the direction v:

    f ◦(x; v) = lim supy→x, t↓0

    f(y + tv)− f(y)t

    .

  • 7

    Clarke Calculus – for f Lipschitz near x

    ! Clarke generalized derivative at x in the direction v:

    f ◦(x; v) = lim supy→x, t↓0

    f(y + tv)− f(y)t

    .

    ! The generalized gradient of f at x is the set

    ∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}

  • 7

    Clarke Calculus – for f Lipschitz near x

    ! Clarke generalized derivative at x in the direction v:

    f ◦(x; v) = lim supy→x, t↓0

    f(y + tv)− f(y)t

    .

    ! The generalized gradient of f at x is the set

    ∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.

  • 7

    Clarke Calculus – for f Lipschitz near x

    ! Clarke generalized derivative at x in the direction v:

    f ◦(x; v) = lim supy→x, t↓0

    f(y + tv)− f(y)t

    .

    ! The generalized gradient of f at x is the set

    ∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.

    ! f ◦(x; v) can be obtained from ∂f(x) :f ◦(x; v) = max{vTζ : ζ ∈ ∂f(x)}.

    Rice 2004

  • 8

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

    ! Discussion

    Rice 2004

  • 9

    The two iterated phases of GPS and MADS

    ! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.

  • 9

    The two iterated phases of GPS and MADS

    ! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.

    ! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.

  • 9

    The two iterated phases of GPS and MADS

    ! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.

    ! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.

    ! This talk focusses on the basic algorithm, and theconvergence analysis. In the next talks, Alison, Markand Gilles will talk about surrogates in the search.

    Rice 2004

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is the incumbent solution

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is the incumbent solution

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is the incumbent solution

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is the incumbent solution

    Surrogate

  • 10

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is the incumbent solution

    Surrogate

    ,

    ,

    Rice 2004

  • 11

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is still the incumbent solution

  • 11

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is still the incumbent solution

    We poll near the incumbent solution

  • 11

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is still the incumbent solution

    We poll near the incumbent solution

    ••

  • 11

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    ∗ is still the incumbent solution

    We poll near the incumbent solution

    ••

    ∗ is still the incumbent solution

    ••

    Rice 2004

  • 12

    !

    "

    ##

    ##

    ##

    ##

    ##

    ##

    ##

    ###$

    f

    x1

    x2

    New iteration from the same incumbentsolution, but on a finer mesh Rice 2004

  • 13

    Positive spanning sets and meshes

    ! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.

  • 13

    Positive spanning sets and meshes

    ! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.

    ! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows

    Mk = {xk + ∆mk Dz : z ∈ N|D|}.

  • 13

    Positive spanning sets and meshes

    ! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.

    ! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows

    Mk = {xk + ∆mk Dz : z ∈ N|D|}.

    !xk

    Ex: D=[I; −I]

    Rice 2004

  • 14

    Basic pattern search algorithm forunconstrained optimization

    Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,

  • 14

    Basic pattern search algorithm forunconstrained optimization

    Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do

    1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆

    mk or

    ∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);

  • 14

    Basic pattern search algorithm forunconstrained optimization

    Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do

    1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆

    mk or

    ∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);

    2. Else if xk minimizes f(x) for x ∈ Pk, then set xk+1 = xkand ∆mk+1 = ∆

    mk /2 (xk is called a minimal frame center).

    Rice 2004

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

    ∆mk+1 =12

    "xk+1

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

    ∆mk+1 =12

    "xk+1

    !p1

    !p2

    !p3

    !p4

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

    ∆mk+1 =12

    "xk+1

    !p1

    !p2

    !p3

    !p4

    ∆mk+2 =14

    "xk+2

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

    ∆mk+1 =12

    "xk+1

    !p1

    !p2

    !p3

    !p4

    ∆mk+2 =14

    "xk+2 !p1

    !p2!p3

    !p4

  • 15

    The Coordinate Search (CS) frame Pk

    Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.

    ∆mk = 1

    "xk !p1

    !p2

    !p3

    !p4

    ∆mk+1 =12

    "xk+1

    !p1

    !p2

    !p3

    !p4

    ∆mk+2 =14

    "xk+2 !p1

    !p2!p3

    !p4

    Always the same 2n = 4 directions, regardless of ∆k.Rice 2004

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

    ∆mk+1 =12

    "xk+1

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

    ∆mk+1 =12

    "xk+1""

    ""

    ""!p1

    !p3

    !p2

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

    ∆mk+1 =12

    "xk+1""

    ""

    ""!p1

    !p3

    !p2

    ∆mk+2 =14

    "xk+2

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

    ∆mk+1 =12

    "xk+1""

    ""

    ""!p1

    !p3

    !p2

    ∆mk+2 =14

    "xk+2!!

    !

    ""

    "

    !p1!p2

    !p3

  • 16

    The GPS frame Pk

    Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).

    ∆mk = 1

    "xk!!

    !!

    !!

    !!

    !!

    !!p1

    !p2

    !p3

    ∆mk+1 =12

    "xk+1""

    ""

    ""!p1

    !p3

    !p2

    ∆mk+2 =14

    "xk+2!!

    !

    ""

    "

    !p1!p2

    !p3

    Here, only 14 different ways of selecting Dk, regardless of ∆k.Rice 2004

  • 17

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

    ! Discussion

    Rice 2004

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0! The mesh is refined only at a minimal frame center.

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0! The mesh is refined only at a minimal frame center.

    ! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0! The mesh is refined only at a minimal frame center.

    ! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0! The mesh is refined only at a minimal frame center.

    ! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.

    ! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.

  • 18

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then lim infk

    ∆mk = 0! The mesh is refined only at a minimal frame center.

    ! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.

    ! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.Let D̂ ⊆ D be the set of poll directions used infinitelyoften in the refining subsequence.D̂ is the set of refining direction.

    Rice 2004

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

    •xk3''

    ''

    '

    (((((

    ))

    )))

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

    •xk3''

    ''

    '

    (((((

    ))

    )))

    •xk4%

    %%&

    &&

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

    •xk3''

    ''

    '

    (((((

    ))

    )))

    •xk4%

    %%&

    &&

    •xk5%%&&

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

    •xk3''

    ''

    '

    (((((

    ))

    )))

    •xk4%

    %%&

    &&

    •xk5%%&&...•

    x̂=lim xkk∈K

  • 19

    Set of refining directions D̂

    %%

    %%

    %%

    %%•xk1+∆

    mk1

    d2

    &&

    &&

    &&

    &&•xk1+∆

    mk1

    d1

    •xk1+∆

    mk1

    d3

    •xk1

    •xk2%

    %%

    %%

    &&

    &&

    &

    •xk3''

    ''

    '

    (((((

    ))

    )))

    •xk4%

    %%&

    &&

    •xk5%%&&...•

    x̂=lim xkk∈K

    %%

    %%*

    &&

    &&+

    ,D̂

    Rice 2004

  • 20

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

  • 20

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

    ! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.

  • 20

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

    ! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set

    of directions that positively span Rn.f◦(x̂; d) := lim sup

    y→x̂,t↓0

    f(y + td)− f(y)t

    ≥ limk∈K

    f(xk + ∆kd)− f(xk)∆k

  • 20

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

    ! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set

    of directions that positively span Rn.f◦(x̂; d) := lim sup

    y→x̂,t↓0

    f(y + td)− f(y)t

    ≥ limk∈K

    f(xk + ∆kd)− f(xk)∆k

    ! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.

  • 20

    Convergence results – unconstrained GPS

    If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)

    ! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set

    of directions that positively span Rn.f◦(x̂; d) := lim sup

    y→x̂,t↓0

    f(y + td)− f(y)t

    ≥ limk∈K

    f(xk + ∆kd)− f(xk)∆k

    ! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.

    ! f is strictly differentiable at x̂ ⇒ ∇f(x̂) = 0.

    Rice 2004

  • 21

    Limitations of GPS

    GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.

  • 21

    Limitations of GPS

    GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.

    GPS with empty search: The iterates stall at x0.

    #$

    %

    &

    f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%

    &

  • 21

    Limitations of GPS

    GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.

    GPS with empty search: The iterates stall at x0.

    #$

    %

    &

    f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%

    &

    Even with a C1 function, GPS may generate infinitelymany limit points, some of them non-stationary.

    Rice 2004

  • 22

    GPS convergence to a bad solution

    !

    "

    a

    b"

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    """"

    ""

    """

    " ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    Level Sets

  • 22

    GPS convergence to a bad solution

    !

    "

    a

    b"

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    """"

    ""

    """

    " ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    Level Sets (0, 0) 4∈ ∂f(0, 0)

    #

    %

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ∂f(0, 0)

  • 22

    GPS convergence to a bad solution

    !

    "

    a

    b"

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    """"

    ""

    """

    " ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    """

    ""

    ""

    ""

    ""

    Level Sets (0, 0) 4∈ ∂f(0, 0)

    #

    %

    ""

    ""

    ""

    ""

    ""

    ""

    ""

    ∂f(0, 0)

    GPS iterates – with a bad strategy – converge to theorigin, where the gradient exists and is nonzero(f is differentiable at (0, 0) but not strictly differentiable).

    Rice 2004

  • 23

    Outline

    ! Statement of the optimization problem

    ! The GPS and MADS algorithm classes

    ! GPS theory and limiting examples

    ! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results

    ! Discussion

    Rice 2004

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk.

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk"p2

    "p1

    "p3

    ''''''

    ((

    ((

    ((

    (((

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk"p2

    "p1

    "p3

    ''''''

    ((

    ((

    ((

    (((

    ∆mk =116,∆

    pk =

    12

    "xk

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk"p2

    "p1

    "p3

    ''''''

    ((

    ((

    ((

    (((

    ∆mk =116,∆

    pk =

    12

    "xk

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk"p2

    "p1

    "p3

    ''''''

    ((

    ((

    ((

    (((

    ∆mk =116,∆

    pk =

    12

    "xk "p2"p1

    "p3

    ))

    )

    ***

  • 24

    The MADS frames

    In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆

    pk = n

    √∆mk .

    ∆mk = 1, ∆pk = 2

    "xk

    "p3

    "p1

    "p2

    !!

    !!

    !!

    !!

    !!

    !

    ∆mk =14,∆

    pk = 1

    "xk"p2

    "p1

    "p3

    ''''''

    ((

    ((

    ((

    (((

    ∆mk =116,∆

    pk =

    12

    "xk "p2"p1

    "p3

    ))

    )

    ***

    Number of ways of selecting Dk increases as ∆pk gets smaller.Rice 2004

  • 25

    Barrier approach to constraints

    To enforce Ω constraints, replace f by a barrier objective

    fΩ(x) :={

    f(x) if x ∈ Ω,+∞ otherwise.

    This is a standard construct in nonsmooth optimization.

  • 25

    Barrier approach to constraints

    To enforce Ω constraints, replace f by a barrier objective

    fΩ(x) :={

    f(x) if x ∈ Ω,+∞ otherwise.

    This is a standard construct in nonsmooth optimization.

    Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.

  • 25

    Barrier approach to constraints

    To enforce Ω constraints, replace f by a barrier objective

    fΩ(x) :={

    f(x) if x ∈ Ω,+∞ otherwise.

    This is a standard construct in nonsmooth optimization.

    Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.

    Quality of the limit solution depends the localsmoothness of f , not of fΩ.

    Rice 2004

  • 26

    A MADS instance

    note: GPS = MADS with ∆pk = ∆mk .

  • 26

    A MADS instance

    note: GPS = MADS with ∆pk = ∆mk .

    An implementable way to generate Dk:

  • 26

    A MADS instance

    note: GPS = MADS with ∆pk = ∆mk .

    An implementable way to generate Dk:

    ! Let B be a lower triangular nonsingular random integermatrix.

  • 26

    A MADS instance

    note: GPS = MADS with ∆pk = ∆mk .

    An implementable way to generate Dk:

    ! Let B be a lower triangular nonsingular random integermatrix.

    ! Randomly permute the lines of B

  • 26

    A MADS instance

    note: GPS = MADS with ∆pk = ∆mk .

    An implementable way to generate Dk:

    ! Let B be a lower triangular nonsingular random integermatrix.

    ! Randomly permute the lines of B

    ! Complete to a positive basis

    • Dk = [B; −B] (maximal positive basis 2n directions). or• Dk = [B; −

    ∑b∈B b] (minimal positive basis n+1 directions).

    • Use Luis’ talk to order the poll directions

    Rice 2004

  • 27

    Dense polling directions

    Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).

  • 27

    Dense polling directions

    Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).

    The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.

  • 27

    Dense polling directions

    Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).

    The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.

    Then the barrier approach to constraints promises strongoptimality under weak assumptions - the existence of ahypertangent vector, e.g., a vector that makes a negativeinner product with all the active constraint gradients.

    Rice 2004

  • 28

    MADS convergence results

    Let f be Lipschitz near a limit x̂ of a refining sequence.

    Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

  • 28

    MADS convergence results

    Let f be Lipschitz near a limit x̂ of a refining sequence.

    Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

    Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:

  • 28

    MADS convergence results

    Let f be Lipschitz near a limit x̂ of a refining sequence.

    Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

    Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:

    f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

  • 28

    MADS convergence results

    Let f be Lipschitz near a limit x̂ of a refining sequence.

    Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

    Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:

    f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

    • In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω

  • 28

    MADS convergence results

    Let f be Lipschitz near a limit x̂ of a refining sequence.

    Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).

    Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:

    f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).

    • In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω : −∇f(x̂)Tv ≤ 0,∀v ∈ TCoΩ (x̂).

    Rice 2004

  • 29

    A problem for which GPS stagnates

    10 5 0 5 10

    3

    2

    1

    0

    1

    2

    3

    x

    y

    � 4 � 3.5 � 3 � 2.5 � 2

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    1.5

    1.6

    x

    y

    Rice 2004

  • 30

    Our results

    Rice 2004

  • 31

    Results for a chemE pid problem

    Rice 2004

  • 32

    Constrained optimization

    A disk constrained problem

    minx,y

    x + y

    s.t. x2 + y2 ≤ 6

    How hard can that be?

  • 32

    Constrained optimization

    A disk constrained problem

    minx,y

    x + y

    s.t. x2 + y2 ≤ 6

    How hard can that be?

    Very hard for GPS and filter-GPS with the standard 2ndirections with an empty search

    Rice 2004

  • 33

    50 100 150 200 250 300 350 400 450

    !3.46

    !3.44

    !3.42

    !3.4

    !3.38

    !3.36

    Number of function evaluations

    fdynamic 2n directions

    GPS FilterGPS BarrierMADS (5 runs)

    Rice 2004

  • 34

    Parameter fit in a rheology problemRheology is a branch of mechanics that studiesproperties of materials which determine theirresponse to mechanical force.

    MODEL :Viscosity η of a material can be modelled as afunction of the shear rate γ̇i :

    η(γ̇) = η0(1 + λ2γ̇2)β−1

    2

    A parameter fit problem.

    Rice 2004

  • 35

    Observation Strain rate Viscosityi γ̇i (s−1) ηi (Pa · s)1 0.0137 32202 0.0274 21903 0.0434 1640 The unconstrained4 0.0866 1050 optimization problem :5 0.137 7666 0.274 4907 0.434 348 min g(η0,λ,β)8 0.866 223 η0,λ,β

    9 1.37 163 with10 2.74 104

    11 4.34 76.7 g =∑13

    i=1 |η(γ̇)− ηi|12 5.46 68.113 6.88 58.2

    Rice 2004

  • 36

    500 1000 1500 2000 2500

    50

    100

    150

    200

    250

    Number of function evaluations

    f

    Coordinate searchFixed CompleteFixed opportunistDynamic opportunistDynamic opportunist with cache

    Rice 2004

  • 37

    500 1000 1500 2000 2500

    50

    100

    150

    200

    250

    Number of function evaluations

    f

    GPS with n+1 directionsNo searchLHS searchRandom search

    Rice 2004

  • 38

    500 1000 1500 2000 2500

    50

    100

    150

    200

    250

    Number of function evaluations

    f

    MADS with n+1 directionsNo searchLHS searchRandom search

    Rice 2004

  • 39

    Discussion

    ! MADS variant looks good as a 1st try

  • 39

    Discussion

    ! MADS variant looks good as a 1st try,and is more general than the instance shown here.

  • 39

    Discussion

    ! MADS variant looks good as a 1st try,and is more general than the instance shown here.

    ! Numerically, randomness is a blessing and a curse.

  • 39

    Discussion

    ! MADS variant looks good as a 1st try,and is more general than the instance shown here.

    ! Numerically, randomness is a blessing and a curse.

    ! MADS can handle oracular or yes/no constraints.

  • 39

    Discussion

    ! MADS variant looks good as a 1st try,and is more general than the instance shown here.

    ! Numerically, randomness is a blessing and a curse.

    ! MADS can handle oracular or yes/no constraints.

    ! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.

  • 39

    Discussion

    ! MADS variant looks good as a 1st try,and is more general than the instance shown here.

    ! Numerically, randomness is a blessing and a curse.

    ! MADS can handle oracular or yes/no constraints.

    ! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.

    ! MADS is the result of nonsmooth analysis pointing upthe weaknesses in GPS.

    Rice 2004

  • 40

    Later today...

    ! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.

  • 40

    Later today...

    ! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.

    ! MADS replaces GPS in our NOMADm and NOMADsoftwares. Gilles and Mark will present a demo of thesesofwares after lunch.

    Rice 2004