Optimization II: Unconstrained...

28
Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1 / 24

Transcript of Optimization II: Unconstrained...

Page 1: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Optimization II:Unconstrained Multivariable

CS 205A:Mathematical Methods for Robotics, Vision, and Graphics

Doug James (and Justin Solomon)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1 / 24

Page 2: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Announcements

I Today’s class:I Unconstrained optimization:

I Newton’s method (uses Hessians)I BFGS method (no Hessians)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 2 / 24

Page 3: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

UnconstrainedMultivariable Problems

minimizef : Rn→ R

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 3 / 24

Page 4: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Recall

∇f (~x)

“Direction ofsteepest ascent”

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 4 / 24

Page 5: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Recall

−∇f (~x)

“Direction ofsteepest descent”

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 5 / 24

Page 6: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Observation

If ∇f (~x) 6= ~0, forsufficiently small α > 0,

f (~x− α∇f (~x)) ≤ f (~x)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 6 / 24

Page 7: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Gradient Descent Algorithm

Iterate until convergence:

1. gk(t) ≡ f (~xk − t∇f (~xk))2. Find t∗ ≥ 0 minimizing

(or decreasing) gk3. ~xk+1 ≡ ~xk − t∗∇f (~xk)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 7 / 24

Page 8: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Stopping Condition

∇f (~xk) ≈ ~0Don’t forget:

Check optimality!

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 8 / 24

Page 9: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Line Search

gk(t) ≡ f (~xk − t∇f (~xk))

I One-dimensional optimization

I Don’t have to minimize completely:

Wolfe conditions

I Constant t: “Learning rate”

Worth reading about:Nesterov’s Accelerated Gradient Descent

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24

Page 10: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Line Search

gk(t) ≡ f (~xk − t∇f (~xk))

I One-dimensional optimization

I Don’t have to minimize completely:

Wolfe conditions

I Constant t: “Learning rate”

Worth reading about:Nesterov’s Accelerated Gradient Descent

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 9 / 24

Page 11: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Gradient Descent (book)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 10 / 24

Page 12: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Newton’s Method (again!)

f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)

+1

2(~x− ~xk)>Hf(~xk)(~x− ~xk)

=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)

Consideration:What if Hf is not positive (semi-)definite?

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24

Page 13: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Newton’s Method (again!)

f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)

+1

2(~x− ~xk)>Hf(~xk)(~x− ~xk)

=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)

Consideration:What if Hf is not positive (semi-)definite?

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24

Page 14: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Newton’s Method (again!)

f (~x) ≈f (~xk) +∇f (~xk)>(~x− ~xk)

+1

2(~x− ~xk)>Hf(~xk)(~x− ~xk)

=⇒ ~xk+1 = ~xk − [Hf(~xk)]−1∇f (~xk)

Consideration:What if Hf is not positive (semi-)definite?

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 11 / 24

Page 15: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Motivation

I∇f might be hard tocompute but Hf is harder

IHf might be dense: n2

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 12 / 24

Page 16: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Quasi-Newton Methods

Approximate derivatives toavoid expensive calculations

e.g. secant, Broyden, ...

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 13 / 24

Page 17: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Common Optimization Assumption

I∇f known

IHf unknown or hard tocompute

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 14 / 24

Page 18: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Quasi-Newton Optimization

~xk+1 = ~xk − αkB−1k ∇f (~xk)Bk ≈ Hf(~xk)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 15 / 24

Page 19: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Warning

<advanced material>

See Nocedal & Wright

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 16 / 24

Page 20: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Broyden-Style Update

Bk+1(~xk+1 − ~xk) =∇f (~xk+1)−∇f (~xk)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 17 / 24

Page 21: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Additional Considerations

I Bk should be symmetricI Bk should be positive

(semi-)definite

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 18 / 24

Page 22: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Davidon-Fletcher-Powell (DFP)

minBk+1

‖Bk+1 −Bk‖

s.t. B>k+1 = Bk+1

Bk+1(~xk+1 − ~xk) = ∇f (~xk+1)−∇f (~xk)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 19 / 24

Page 23: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Observation

‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B

−1k ‖ is small

Idea: Try to approximateB−1k directly

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24

Page 24: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Observation

‖Bk+1−Bk‖ small does notmean ‖B−1k+1−B

−1k ‖ is small

Idea: Try to approximateB−1k directly

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 20 / 24

Page 25: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

BFGS Update

minHBk+1

‖Hk+1 −Hk‖

s.t. H>k+1 = Hk+1

~xk+1 − ~xk = Hk+1(∇f (~xk+1)−∇f (~xk))

State of the art!

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 21 / 24

Page 26: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

BFGS (book - typo)

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 22 / 24

Page 27: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Lots of Missing Details

I Choice of ‖ · ‖

I Limited-memoryalternative

Next

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 23 / 24

Page 28: Optimization II: Unconstrained Multivariablegraphics.stanford.edu/.../lecture_slides/optimization_ii.pdf · 2018. 2. 22. · Announce Multivariable Problems Gradient Descent Newton’s

Announce Multivariable Problems Gradient Descent Newton’s Method Quasi-Newton Missing Details AutoDiff

Automatic Differentiation

I Techniques to numerically evaluate the derivative of a function

specified by a computer program.

I https://en.wikipedia.org/wiki/Automatic_differentiation

I Different from finite differences (approximation) and symbolic

differentiation.

I In Julia: http://www.juliadiff.orgI Example: ForwardDiff. (uses dual numbers)https://github.com/JuliaDiff/ForwardDiff.jl

Next

CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 24 / 24