L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent...

24
50.579 Optimization for Machine Learning Ioannis Panageas ISTD, SUTD L05 Non-convex Optimization: GD + noise converges to second order stationarity

Transcript of L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent...

Page 1: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

50.579 Optimization for Machine Learning

Ioannis Panageas

ISTD, SUTD

L05Non-convex Optimization: GD + noise converges to second order stationarity

Page 2: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Recap

Optimization for Machine Learning

Page 3: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Recap

Optimization for Machine Learning

• This is only true in the unconstrained case!

Page 4: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Recap

Optimization for Machine Learning

• This is only true in the unconstrained case!

Page 5: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Recap

Optimization for Machine Learning

• This is only true in the unconstrained case!

Page 6: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Vanishing step-sizes

Optimization for Machine Learning

Page 7: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Vanishing step-sizes

Optimization for Machine Learning

Page 8: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Vanishing step-sizes

Optimization for Machine Learning

Page 9: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Vanishing step-sizes

Optimization for Machine Learning

Page 10: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Vanishing step-sizes

Optimization for Machine Learning

Page 11: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Definitions

Optimization for Machine Learning

Page 12: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Convergence to first order stationarity

Optimization for Machine Learning

Page 13: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Convergence to first order stationarity

Optimization for Machine Learning

Page 14: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Convergence to first order stationarity

Optimization for Machine Learning

Page 15: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Convergence to first order stationarity

Optimization for Machine Learning

Page 16: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Perturbed Gradient Descent

Optimization for Machine Learning

Page 17: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

• High level proof strategy:

1) When the current iterate is not an 𝜖-second order stationary point, it must either (a) have a large gradient or (b) have a strictly negative eigenvalue the Hessian.

2) We can show in both cases that yield a significant decrease in function value in a controlled number of iterations.

3) Since the decrease cannot be more that f x0 − 𝑓(𝑥∗) (global minimum is bounded) we can reach contradiction.

Page 18: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 19: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 20: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 21: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 22: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 23: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Analysis of Perturbed Gradient Descent

Optimization for Machine Learning

Page 24: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current

Conclusion

• Introduction to Non-convex Optimization.

– Perturbed Gradient Descent avoids strict saddles!

– Same is true for Perturbed SGD.

• Next lecture we will talk about more about accelerated methods.

• Week 8 we are going to talk about min-max optimization (GANs).

SUTD ISTD 50.004 Intro to Algorithms