Characterizing the Worst-Case Performance of Algorithms...
Transcript of Characterizing the Worst-Case Performance of Algorithms...
![Page 1: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/1.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Characterizing the Worst-Case Performance of Algorithmsfor Nonconvex Optimization
Frank E. Curtis, Lehigh University
joint work with
Daniel P. Robinson, Johns Hopkins University
presented at
Rutgers University
29 March 2018
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 1 of 43
![Page 2: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/2.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 2 of 43
![Page 3: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/3.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 3 of 43
![Page 4: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/4.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Problem
Consider the problem to minimize an objective function f : Rn â R:
minxâRn
f(x).
Various iterative algorithms have been proposed of the form
xk+1 â xk + sk for all k â N,
where {xk} is the iterate sequence and {sk} is the step sequence.
Question: What is the best algorithm for solving my problem?
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 4 of 43
![Page 5: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/5.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Problem
Consider the problem to minimize an objective function f : Rn â R:
minxâRn
f(x).
Various iterative algorithms have been proposed of the form
xk+1 â xk + sk for all k â N,
where {xk} is the iterate sequence and {sk} is the step sequence.
Question: What is the best algorithm for solving my problem?
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 4 of 43
![Page 6: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/6.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
History
Nonlinear optimization algorithm design has had parallel developments:
convexity
Rockafellar
Fenchel
Nemirovski
Nesterov
subgradientinequality
convergence,complexityguarantees
smoothness
Powell
Fletcher
Goldfarb
Nocedal
sufficientdecrease
convergence,fast local
convergence
Worlds are finally colliding!
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 5 of 43
![Page 7: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/7.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Worst-case complexity for convex optimization
Worst-case complexity: Upper limit on the resources an algorithm willrequire to (approximately) solve a given problem
. . . for convex optimization: Bound on the number of iterations (or functionor derivative evaluations) until
âxk â xââ ⤠ξxor f(xk)â f(xâ) ⤠ξf ,
where xâ is some global minimizer of f .
Fact(?): Convex setting: better complexity often implies better performance.
(Really, need to consider work complexity, conditioning, structure, etc.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 6 of 43
![Page 8: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/8.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Worst-case complexity for convex optimization
Worst-case complexity: Upper limit on the resources an algorithm willrequire to (approximately) solve a given problem
. . . for convex optimization: Bound on the number of iterations (or functionor derivative evaluations) until
âxk â xââ ⤠ξxor f(xk)â f(xâ) ⤠ξf ,
where xâ is some global minimizer of f .
Fact(?): Convex setting: better complexity often implies better performance.
(Really, need to consider work complexity, conditioning, structure, etc.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 6 of 43
![Page 9: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/9.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Worst-case complexity for convex optimization
Worst-case complexity: Upper limit on the resources an algorithm willrequire to (approximately) solve a given problem
. . . for convex optimization: Bound on the number of iterations (or functionor derivative evaluations) until
âxk â xââ ⤠ξxor f(xk)â f(xâ) ⤠ξf ,
where xâ is some global minimizer of f .
Fact(?): Convex setting: better complexity often implies better performance.
(Really, need to consider work complexity, conditioning, structure, etc.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 6 of 43
![Page 10: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/10.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Worst-case complexity for nonconvex optimization
. . . for nonconvex optimization: Here is how we do it now:
Since one generally cannot guarantee that {xk} converges to a minimizer, one asksfor an upper bound on the number of iterations until
ââf(xk)â ⤠ξg (first-order stationarity)
and maybe also Îť(â2f(xk)) ⼠âÎľH (second-order stationarity)
For example, it is known that for first-order stationarity we have the bounds. . .
Algorithm until ââf(xk)â2 ⤠ξgGradient descent O(Îľâ2
g )
Newton / trust region O(Îľâ2g )
Cubic regularization O(Îľâ3/2g )
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 7 of 43
![Page 11: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/11.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Worst-case complexity for nonconvex optimization
. . . for nonconvex optimization: Here is how we do it now:
Since one generally cannot guarantee that {xk} converges to a minimizer, one asksfor an upper bound on the number of iterations until
ââf(xk)â ⤠ξg (first-order stationarity)
and maybe also Îť(â2f(xk)) ⼠âÎľH (second-order stationarity)
For example, it is known that for first-order stationarity we have the bounds. . .
Algorithm until ââf(xk)â2 ⤠ξgGradient descent O(Îľâ2
g )
Newton / trust region O(Îľâ2g )
Cubic regularization O(Îľâ3/2g )
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 7 of 43
![Page 12: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/12.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Self-examination
But. . .
I Is this the best way to characterize our algorithms?
I Is this the best way to represent our algorithms?
People listen! Cubic regularization. . .
I Griewank (1981)
I Nesterov & Polyak (2006)
I Weiser, Deuflhard, Erdmann (2007)
I Cartis, Gould, Toint (2011), the ARC method
. . . is a framework to which researchers have been attracted. . .
I Agarwal, Allen-Zhu, Bullins, Hazan, Ma (2017)
I Carmon, Duchi (2017)
I Kohler, Lucchi (2017)
I Peng, Roosta-Khorasan, Mahoney (2017)
However, there remains a large gap between theory and practice!
(Trust region methods arguably perform better in general.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 8 of 43
![Page 13: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/13.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Self-examination
But. . .
I Is this the best way to characterize our algorithms?
I Is this the best way to represent our algorithms?
People listen! Cubic regularization. . .
I Griewank (1981)
I Nesterov & Polyak (2006)
I Weiser, Deuflhard, Erdmann (2007)
I Cartis, Gould, Toint (2011), the ARC method
. . . is a framework to which researchers have been attracted. . .
I Agarwal, Allen-Zhu, Bullins, Hazan, Ma (2017)
I Carmon, Duchi (2017)
I Kohler, Lucchi (2017)
I Peng, Roosta-Khorasan, Mahoney (2017)
However, there remains a large gap between theory and practice!
(Trust region methods arguably perform better in general.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 8 of 43
![Page 14: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/14.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Example: Matrix factorization
Symmetric low-rank matrix factorization problem:
minXâRdĂr
12âXXT âMâ2F ,
where M â RdĂd with rank(M) = r.
I Nonconvex, but. . .
I Global minimum value is known (itâs zero)
I All local minima are global minima
Jin, Ge, Netrapalli, Kakade, Jordan (2017)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 9 of 43
![Page 15: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/15.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Example: Dictionary learning
Learning a representation of input data in the form of linear combinations of some(unknown) basic elements, called atoms, which compose a dictionary:
minXâX ,Y âRnĂn
âZ âXY â2 + Ď(Y )
s.t. X := {X â RdĂn : âXiâ2 ⤠1 for all i â {1, . . . , n}},
where Z â RdĂn is a given input.
Nonconvex, but, under some conditions, all saddle points can be âescapedâ.
Sun, Qu, Wright (2016)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 10 of 43
![Page 16: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/16.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Example: Dictionary learning
Learning a representation of input data in the form of linear combinations of some(unknown) basic elements, called atoms, which compose a dictionary:
minXâX ,Y âRnĂn
âZ âXY â2 + Ď(Y )
s.t. X := {X â RdĂn : âXiâ2 ⤠1 for all i â {1, . . . , n}},
where Z â RdĂn is a given input.
Nonconvex, but, under some conditions, all saddle points can be âescapedâ.
Sun, Qu, Wright (2016)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 10 of 43
![Page 17: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/17.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Other examples
I Phase retrieval
I Orthogonal tensor decomposition
I Deep linear learning
I . . .
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 11 of 43
![Page 18: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/18.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Pedogogical example
But if weâre talking about nonconvex optimization, we also could have. . .
What real problem exhibits this behavior? (I donât know!)
More on this example later. . .
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 12 of 43
![Page 19: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/19.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Purpose of this talk
Our goal: A complementary approach to characterize algorithms.
I global convergence
I worst-case complexity, contemporary type + our approach
I local convergence rate
Weâre admitting: Our approach does not always give the complete picture.
But we believe it is useful.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 13 of 43
![Page 20: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/20.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Purpose of this talk
Our goal: A complementary approach to characterize algorithms.
I global convergence
I worst-case complexity, contemporary type + our approach
I local convergence rate
Weâre admitting: Our approach does not always give the complete picture.
But we believe it is useful.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 13 of 43
![Page 21: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/21.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 14 of 43
![Page 22: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/22.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Simple setting
Suppose the gradient g := âf is Lipschitz continuous with constant L > 0.
Consider the iteration
xk+1 â xk â 1Lgk for all k â N.
A contemporary complexity analysis considers the set
G(Îľg) := {x â Rn : âg(x)â2 ⤠ξg}
and aims to find an upper bound on the cardinality of
Kg(Îľg) := {k â N : xk 6â G(Îľg)}.
gk := âf(xk)Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 15 of 43
![Page 23: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/23.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Upper bound on |Kg(Îľg)|
Using sk = â 1Lgk and the upper bound
fk+1 ⤠fk + gTk sk + 12Lâskâ22,
one finds with finf := infxâRn f(x) that
fk â fk+1 ⼠12Lâgkâ22
=â (f0 â finf) ⼠12L|Kg(Îľg)|Îľ2g
=â |Kg(Îľg)| ⤠2L(f0 â finf)Îľâ2g .
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 16 of 43
![Page 24: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/24.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
âNiceâ f
But what if f is âniceâ?
e.g., satisfying the Polyak- Lojasiewicz condition for c â (0,â), i.e.,
f(x)â finf ⤠12câg(x)â22 for all x â Rn.
Now consider the set
F(Îľf ) := {x â Rn : f(x)â finf ⤠ξf}
and consider an upper bound on the cardinality of
Kf (Îľf ) := {k â N : xk 6â F(Îľf )}.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 17 of 43
![Page 25: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/25.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Upper bound on |Kf (Îľf )|
Using sk = â 1Lgk and the upper bound
fk+1 ⤠fk + gTk sk + 12Lâskâ22,
one finds that
fk â fk+1 ⼠12Lâgkâ22
⼠cL
(fk â finf)
=â (1â cL
)(fk â finf) ⼠fk+1 â finf=â (1â c
L)k(f0 â finf) ⼠fk â finf
=â |Kf (Îľf )| ⤠log
(f0 â finf
Îľf
)(log
(L
Lâ c
))â1
.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 18 of 43
![Page 26: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/26.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
For the first step. . .
In the âgeneral nonconvexâ analysis. . .
. . . the expected decrease for the first step is much more pessimistic:
general nonconvex: f0 â f1 ⼠12LÎľ2g
PL condition: (1â cL
)(f0 â finf) ⼠f1 â finf
. . . and it remains more pessimistic throughout!
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 19 of 43
![Page 27: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/27.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Upper bounds on |Kf (Îľf )| versus |Kg(Îľg)|
Let f(x) = 12x2, meaning that g(x) = x.
I Let Îľf = 12Îľ2g , meaning that F(Îľf ) = G(Îľg).
I Let x0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 20 of 43
![Page 28: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/28.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Upper bounds on |Kf (Îľf )| versus |{k â N : 12âgkâ
22 > Îľg}|
Let f(x) = 12x2, meaning that 1
2g(x)2 = 1
2x2.
I Let Îľf = Îľg , meaning that F(Îľf ) = G(Îľg).
I Let x0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 21 of 43
![Page 29: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/29.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Bad worst-case!
Worst-case complexity bounds in the general nonconvex case are very pessimistic.
I The analysis immediately admits a large gap when the function is nice.
I The âessentially tightâ examples for the worst-case bounds are. . . weird.1
1Cartis, Gould, Toint (2010)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 22 of 43
![Page 30: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/30.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Plea
My plea to optimizers:
Letâs not have these be the problems that dictate how we
I characterize our algorithms and
I represent our algorithms to the world!
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 23 of 43
![Page 31: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/31.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 24 of 43
![Page 32: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/32.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Motivation
We want a characterization strategy that
I attempts to capture behavior in actual practice
I i.e., is not âbogged downâ by pedogogical examples
I can be applied consistently across different classes of functions
I shows more than just the worst of the worst case
Our idea is to
I partition the search space (dependent on f and x0)
I analyze how an algorithm behaves over different regions
I characterize an algorithmâs behavior by region
For some functions, there will be holes, but for some of interest there are none!
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 25 of 43
![Page 33: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/33.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Motivation
We want a characterization strategy that
I attempts to capture behavior in actual practice
I i.e., is not âbogged downâ by pedogogical examples
I can be applied consistently across different classes of functions
I shows more than just the worst of the worst case
Our idea is to
I partition the search space (dependent on f and x0)
I analyze how an algorithm behaves over different regions
I characterize an algorithmâs behavior by region
For some functions, there will be holes, but for some of interest there are none!
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 25 of 43
![Page 34: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/34.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Intuition
Think about an arbitrary point in the search space, i.e.,
L := {x â Rn : f(x) ⤠f(x0)}.
I If âg(x)â2 ďż˝ 0, then âa lotâ of progress can be made.
I If min(eig(â2f(x)))ďż˝ 0, then âa lotâ of progress can also be made.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 26 of 43
![Page 35: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/35.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Assumption
Assumption 1
I f is p-times continuously differentiable
I f is bounded below by finf := infxâRn f(x)
I for all p â {1, . . . , p}, there exists Lp â (0,â) such that
f(x+ s) ⤠f(x) +
pâj=1
1
j!âjf(x)[s]j
︸ ︡︡ ︸tp(x,s)
+Lp
p+ 1âsâp+1
2
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 27 of 43
![Page 36: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/36.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
pth-order term reduction
Definition 2
For each p â {1, . . . , p}, define the function
mp(x, s) =1
p!âpf(x)[s]p +
rp
p+ 1âsâp+1
2 .
Letting smp (x) := arg minsâRn , the reduction in the pth-order term from x is
âmp(x) = mp(x, 0)âmp(x, smp (x)) ⼠0.
*Exact definition of rp is not complicated, but weâll skip it here
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 28 of 43
![Page 37: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/37.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
1st-order and 2nd-order term reductions
Theorem 3
For p ⼠2, the following hold:
âm1(x) =1
2r1ââf(x)â22
and âm2(x) =1
6r22max{âÎť(â2f(xk)), 0}3.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 29 of 43
![Page 38: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/38.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Regions
We propose to partition the search space, given (Îş, fref) â (0, 1)Ă [finf, f(x0)], into
R1 := {x â L : âm1(x) ⼠κ(f(x)â fref)},
Rp := {x â L : âmp(x) ⼠κ(f(x)â fref)} \
pâ1âj=1
Rj
for all p â {2, . . . , p},
and R := L \
pâj=1
Rj
.
*We donât need fref = finf, but, for simplicity, think of it that way here
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 30 of 43
![Page 39: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/39.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Illustration
(p = 2) R1: black R2: gray R: white
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 31 of 43
![Page 40: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/40.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Functions satisfying Polyak- Lojasiewicz
Theorem 4
A continuously differentiable f with a Lipschitz continuous gradient satisfies thePolyak- Lojasiewicz condition if and only if R1 = L for any x0 â Rn.
Hence, if we prove something about the behavior of an algorithm over R1, then
I we know how it behaves if f satisfies PL and
I we know how it behaves at any point satisfying the PL inequality.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 32 of 43
![Page 41: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/41.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Functions satisfying a strict-saddle-type property
Theorem 5
If f is twice-continuously differentiable with Lipschitz continuous gradient andHessian functions such that, at all x â L and for some Îś â (0,â), one has
max{ââf(x)â22,âÎťmin(â2f(x))3} ⼠Μ(f(x)â finf),
then R1 âŞR2 = L.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 33 of 43
![Page 42: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/42.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 34 of 43
![Page 43: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/43.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Linearly convergent behavior over Rp
Let swp (x) be a minimum norm global minimizer of the regularized Taylor model
wp(x, s) = tp(x, s) +lp
p+ 1âsâp+1
2
Theorem 6
If {xk} is generated by the iteration
xk+1 â xk + swp (x),
then, with Îľf â (0, f(x0)â fref), the number of iterations in
Rp ⊠{x â Rn : f(x)â fref ⼠ξf}
is bounded above byâlog
(f(x0)â fref
Îľf
)(log
(1
1â Îş
))â1â
= O(
log
(f(x0)â fref
Îľf
))
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 35 of 43
![Page 44: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/44.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Regularized gradient and Newton methods
I Regularized gradient method: Computes sk by solving
minsâRn
f(xk) +âf(xk)T s+l1
2âsâ22 =â sk = â
1
l1âf(xk)
I Regularized Newton method: Computes sk by solving
minsâRn
f(xk) +âf(xk)T s+1
2sTâ2f(xk)s+
l2
3âsâ32,
also known as cubic regularization (mentioned earlier)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 36 of 43
![Page 45: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/45.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Characterization: Contemporary
Let RG and RN represent regularized gradient and Newton, respectively.
Theorem 7
With p ⼠2, let
K1(Îľg) := {k â N : ââf(xk)â2 > Îľg}
and K2(ÎľH) := {k â N : Îťmin(â2f(xk)) < âÎľH}.
Then, the cardinalities of K1(Îľg) and K2(ÎľH) are of the order. . .
Algorithm |K1(Îľg)| |K2(ÎľH)|
RG O(l1(f(x0)âfinf)
Îľ2g
)â
RN O(l1/22 (f(x0)âfinf)
Îľ3/2g
)O(l22(f(x0)âfinf)
Îľ3H
)
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 37 of 43
![Page 46: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/46.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Characterization: Our approach
Theorem 8
The numbers of iterations in R1 and R2 with fref = finf are of the order. . .
Algorithm R1 R2
RG O(log
(f(x0)âfinf
Îľf
))â
RN O(l22(f(x0)âfinf)
r31
)+ O
(log
(f(x0)âfinf
Îľf
))O
(log
(f(x0)âfinf
Îľf
))There is an initial phase, as seen in Nesterov & Polyak (2006)
A â can appear, but one could consider probabilistic bounds, too
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 38 of 43
![Page 47: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/47.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Characterization: Our approach
Theorem 8
The numbers of iterations in R1 and R2 with fref = finf are of the order. . .
Algorithm R1 R2
RG O(log
(f(x0)âfinf
Îľf
))â
RN O(l22(f(x0)âfinf)
r31
)+ O
(log
(f(x0)âfinf
Îľf
))O
(log
(f(x0)âfinf
Îľf
))There is an initial phase, as seen in Nesterov & Polyak (2006)
A â can appear, but one could consider probabilistic bounds, too
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 38 of 43
![Page 48: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/48.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Trust region method: Gradient-dependent radii
minsâRn
f(xk) +âf(xk)T s+ 12sTâ2f(xk)s s.t. âsâ2 ⤠δk
I Set δk â νkââf(xk)â2I Initialize ν0 â [ν, ν]
I For some (Ρ, β) â (0, 1)Ă (0, 1), if
Ďk =f(xk)â f(xk + sk)
t2(xk, 0)â t2(xk, sk)⼠Ρ,
then xk+1 â xk + sk and νk+1 â [ν, ν]; else, xk+1 â xk and νk+1 â βνk.
Theorem 9
# of iterations in R1 is at most O(Ď log
(f(x0)âfref
Îľf
)). For R2, no guarantee.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 39 of 43
![Page 49: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/49.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Trust region method: Gradient-dependent radii
minsâRn
f(xk) +âf(xk)T s+ 12sTâ2f(xk)s s.t. âsâ2 ⤠δk
I Set δk â νkââf(xk)â2I Initialize ν0 â [ν, ν]
I For some (Ρ, β) â (0, 1)Ă (0, 1), if
Ďk =f(xk)â f(xk + sk)
t2(xk, 0)â t2(xk, sk)⼠Ρ,
then xk+1 â xk + sk and νk+1 â [ν, ν]; else, xk+1 â xk and νk+1 â βνk.
Theorem 9
# of iterations in R1 is at most O(Ď log
(f(x0)âfref
Îľf
)). For R2, no guarantee.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 39 of 43
![Page 50: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/50.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Trust region method: Gradient- and Hessian-dependent radii
minsâRn
f(xk) +âf(xk)T s+ 12sTâ2f(xk)s s.t. âsâ2 ⤠δk
I Set
δk â νk
{ââf(xk)â2 ââf(xk)â22 ⼠|Îť(â2f(xk))|3
|Îť(â2f(xk))| otherwise
I Initialize ν0 â [ν, ν]
I For some (Ρ, β) â (0, 1)Ă (0, 1), if
Ďk =f(xk)â f(xk + sk)
t2(xk, 0)â t2(xk, sk)⼠Ρ,
then xk+1 â xk + sk and νk+1 â [ν, ν]; else, xk+1 â xk and νk+1 â βνk.
Theorem 10
# of iterations in R1 is at most O(Ď log
(f(x0)âfref
Îľf
)).
# of iterations in R2 is at most O(Ď2 log
(f(x0)âfref
Îľf
)).
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 40 of 43
![Page 51: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/51.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Trust region method: Gradient- and Hessian-dependent radii
minsâRn
f(xk) +âf(xk)T s+ 12sTâ2f(xk)s s.t. âsâ2 ⤠δk
I Set
δk â νk
{ââf(xk)â2 ââf(xk)â22 ⼠|Îť(â2f(xk))|3
|Îť(â2f(xk))| otherwise
I Initialize ν0 â [ν, ν]
I For some (Ρ, β) â (0, 1)Ă (0, 1), if
Ďk =f(xk)â f(xk + sk)
t2(xk, 0)â t2(xk, sk)⼠Ρ,
then xk+1 â xk + sk and νk+1 â [ν, ν]; else, xk+1 â xk and νk+1 â βνk.
Theorem 10
# of iterations in R1 is at most O(Ď log
(f(x0)âfref
Îľf
)).
# of iterations in R2 is at most O(Ď2 log
(f(x0)âfref
Îľf
)).
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 40 of 43
![Page 52: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/52.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Trust region method: Always good?
What about the classical update?
δk+1 â{⼠δk if Ďk ⼠Ρ< δk otherwise.
Two challenges:
I Proving a uniform upper bound on number of consecutive rejected steps
I Proving that accepted steps yield sufficient decrease in R1 and R2
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 41 of 43
![Page 53: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/53.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Outline
Motivation
Contemporary Analyses
Partitioning the Search Space
Behavior of Common Methods
Summary & Perspectives
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 42 of 43
![Page 54: Characterizing the Worst-Case Performance of Algorithms ...coral.ise.lehigh.edu/frankecurtis/files/talks/rutgers_18.pdfLks kk22; one nds with f inf:= inf x2Rnf(x) that f k f k+1 1](https://reader033.fdocuments.us/reader033/viewer/2022060820/609933007b2fe220f629cca4/html5/thumbnails/54.jpg)
Motivation Contemporary Analyses Partitioning Common Methods Summary
Summary & Perspectives
Our goal: A complementary approach to characterize algorithms.
I global convergence
I worst-case complexity, contemporary type + our approach
I local convergence rate
Our idea is to
I partition the search space (dependent on f and x0)
I analyze how an algorithm behaves over different regions
I characterize an algorithmâs behavior by region
For some functions, there are holes, but for others the characterization is complete.
F. E. Curtis and D. P. Robinson, âHow to Characterize the Worst-CasePerformance of Algorithms for Nonconvex Optimization,â Lehigh ISE/COR@LTechnical Report 18T-003, submitted for publication, 2018.
Characterizing the Worst-Case Performance of Algorithms for Nonconvex Optimization 43 of 43