Andrew Fitzgibbon Optimization

The right solution to a nearby problem? Or a nearby solution to the right problem?

OPTIMIZATION

HOW TO GET IDEAS

Chase the high-hanging fruit.

Try to make stuff really work.

Look for things that confuse/annoy you.

And occasionally you’ll get a good idea.

Write about it well.

COMPUTER VISION AS OPTIMIZATION

Computing PCA, LDA, ....

Clustering, Gaussian mixture fitting, ....

Sometimes the optimization is easy...

...sometimes it’s hard

Sometimes the hard problem is the one you must solve

Given function 𝑓 𝑥 :ℝ𝑑 ↦ ℝ,

Devise strategies for finding 𝑥 which minimizes 𝑓

CLASSES OF FUNCTIONS

Given function 𝑓 𝑥 :ℝ𝑑 ↦ ℝ

Devise strategies for finding 𝑥 which minimizes 𝑓

quadratic convex single-extremum

multi-extremum

noisy horrible

EXAMPLE

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

EXAMPLE

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

>> print -dmeta

EXAMPLE

>> print –dpdf % then go to pdf and paste

EXAMPLE

>> set(gcf, 'paperUnits', 'centimeters', 'paperposition', [1 1 9 6.6]) >> print –dpdf % then go to pdf and paste

switch to matlab…

ALTERNATION

Easy Hard

WHAT TO DO?

Ignore the hard ones

If you search, you’ll find plenty of easy examples

But your customers won’t…

Or update more than one coordinate at once

GRADIENT DESCENT

Alternation is slow because valleys may not be axis aligned

So try gradient descent?

-5 0 5 10 15 -5

GRADIENT DESCENT

Note that convergence proofs are available for both of the above

But so what?

AND ON A HARD PROBLEM

USE A BETTER ALGORITHM

(Nonlinear) conjugate gradients

Uses 1st derivatives only

Avoids “undoing” previous work

USE A BETTER ALGORITHM

(Nonlinear) conjugate gradients

Uses 1st derivatives only

And avoids “undoing” previous work

101 iterations on this problem

but we can do better…

USE SECOND DERIVATIVES…

Starting with 𝑥 how can I choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

So compute min𝛿 𝑓 𝑥 + 𝛿

But hang on, that’s the same problem we were trying to solve?

Starting with 𝑥 how can I choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

So compute min𝛿 𝑓 𝑥 + 𝛿

≈ min𝛿 𝑓 𝑥 + 𝛿⊤𝑔(𝑥) + 12𝛿

⊤𝐻 𝑥 𝛿

𝑔 𝑥 = 𝛻𝑓 𝑥 𝐻 𝑥 = 𝛻𝛻⊤𝑓(𝑥)

How does it look?

𝑓 𝑥 + 𝛿⊤𝑔(𝑥) + 12𝛿⊤𝐻 𝑥 𝛿

𝑔 𝑥 = 𝛻𝑓 𝑥 𝐻 𝑥 = 𝛻𝛻⊤𝑓(𝑥)

Choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

Compute

min𝜹𝑓 + 𝜹⊤𝑔 + 12𝜹

⊤𝐻 𝜹

[derive]

Compute

min𝜹𝑓 + 𝜹⊤𝑔 + 12𝜹

⊤𝐻 𝜹

𝜹 = −𝐻−1𝑔

IS THAT A GOOD IDEA?

demo_taylor_2d(0, 'newton', 'rosenbrock')

demo_taylor_2d(1, 'newton', 'rosenbrock')

demo_taylor_2d(1, 'newton', ‘sqrt_rosenbrock')

Updates: 𝜹Newton = −𝐻

−1𝑔

𝜹GradientDescent = −𝜆𝑔

Updates: 𝜹Newton = −𝐻

−1𝑔 𝜹GradientDescent = −𝜆𝑔

So combine them: 𝜹DampedNewton = − 𝐻 + 𝜆

−1𝐼𝑑−1𝑔

= −𝜆 𝜆𝐻 + 𝐼𝑑−1𝑔

𝜆 small ⇒conservative gradient step

𝜆 large ⇒Newton step

1ST DERIVATIVES AGAIN

Levenberg-Marquardt

Just damped Newton with approximate 𝐻

For a special form of 𝑓

𝑓 𝑥 = 𝑓𝑖 𝑥2

where 𝑓𝑖(𝑥) are zero-mean

small at the optimum

BACK TO FIRST DERIVATIVES

Levenberg Marquardt

𝛻𝑓 𝑥 = 𝛻𝛻⊤𝑓 𝑥 =

BACK TO FIRST DERIVATIVES

Levenberg Marquardt

𝛻𝑓 𝑥 = 2𝑓𝑖 𝑥 𝛻𝑓𝑖(𝑥)

𝛻𝛻⊤𝑓 𝑥 = 2 𝑓𝑖 𝑥 𝛻𝛻⊤𝑓𝑖 𝑥 + 𝛻𝑓𝑖(𝑥)𝛻

⊤𝑓𝑖 𝑥

CONCLUSION: YMMV

GIRAFFE

500 runs

CONCLUSION: YMMV

1000 runs

DINOSAUR

1000 runs

GIRAFFE

500 runs

Truth B𝒖T

On many problems, alternation is just fine Indeed always start with a

couple of alternation steps

Computing 2nd derivatives is a pain anyone with compiler/parser

experience please contact me.

Just alternation is fine Unless you’re willing to

problem-select

Alternation has a convergence guarantee

Inverting the Hessian is always 𝑂(𝑛3)

TRUTH AND B𝑢T [∃𝑢 ∈ H, I, L, S,T,U 6]

There is a universal optimizer

Andrew Fitzgibbon Optimization

Education

Transcript of Andrew Fitzgibbon Optimization

John FitzGibbon Social Movements and the Eurocrisis

COMMANDoptimize: Dispatch Optimization Andrew Dyment Adyment@commandalkon.com (905) 870 1410.

H-0258.3 Representatives Hackney, Berry, Fitzgibbon, Chopp ...lawfilesext.leg.wa.gov/biennium/2021-22/Pdf/Bills/House Bills/1304.pdf · By Representatives Hackney, Berry, Fitzgibbon,

Jet Fuel Hedging Strategies - Thomas FitzGibbon

Finding the Data in Conversion Optimization By Andrew Garberson

Diamantina Park, Fitzgibbon Chase...FITZGIBBON CHASE PARK 1 PLAN 0 SCALE 1:1000 @ A3 10 20 30 40 50M N DIAMANTINA CRESCENT CARNARVON CRESCENT SILKY OAK CIRCUIT E features of a neighbourhood

Andrew Fitzgibbon

Reconstructing PASCAL VOC - Semantic Scholar · 2017-03-18 · Building 3D morphable models from 2D images, Thomas J. Cashman and Andrew W. Fitzgibbon, PAMI 2013 Model Evolution:

Andrew Shotland LOCAL SEO GUIDE Andrew Shotland LOCAL SEO GUIDE Local Search Engine Optimization & Monetization Made Simple Local Media SEO: Don’t Give.

Far Av Zla Regis Tro Fitzgibbon

Highly efficient Machine learning for HoloLens - microsoft.com · Highly efficient Machine learning for HoloLens Andrew Fitzgibbon, Microsoft @awfidius

cup.edu.incup.edu.in/Syllabi/2019_21/IQAC/M.TECH CST.docx · Web viewRobert B. Fisher, Kenneth Dawson-Howe, Andrew Fitzgibbon, Dictionary of Computer Vision and Image Processing,

Real-time gesture recognition using deterministic boosting · 2013-02-05 · Real-time gesture recognition using deterministic boosting Raymond Lockton and Andrew W. Fitzgibbon Department

Wjgtll 5 magali fitzgibbon

PDFlib PLOP: PDF Linearization, Optimization, Protection ... Abuse of Power.pdf · PDFlib PLOP: PDF Linearization, Optimization, ... during Andrew Jackson’s administration and were

William E. Fitzgibbon, III - University of Houston · William E. Fitzgibbon, III Personal: Marital Status: Married to Jan Brooks Fitzgibbon ... Summer Faculty Research Participant,

Dr Andrew Scarsbrook Consultant in Radiology and … modern imaging techniques for optimization of treatment Dr Andrew Scarsbrook Consultant in Radiology and Nuclear Medicine Pancreatic

2021 - Fitzgibbon

arXiv:1604.00730v1 [cs.CV] 4 Apr 2016 · and Fitzgibbon 2006;Joshi and Carr 2008;Oswald et al. 2012; Vicente and Agapito 2013]. The method of [Prasad and Fitzgibbon 2006] reconstructs

A Buyer's Guide to Conic Fitting* - BMVA · A Buyer's Guide to Conic Fitting* Andrew W. Fitzgibbon ... real data sets due to its inherent statistical bias, ... BOOK requires the 26n