IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2]...

18
IE 598: Incremental Gradient Methods - SAGA Meghana Bande mbande2

Transcript of IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2]...

Page 1: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

IE 598: Incremental Gradient Methods - SAGA

Meghana Bandembande2

Page 2: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Outline

• Introduction• SAGA algorithm• Convergence proof

Page 3: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Finite sum problem

Minimize f(x) of the form

Each function is µ-strongly convex and L-smooth.

Applications:Empirical risk minimization

Page 4: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Motivation

• Gradient DescentConvergence rate Iteration cost Total complexity

• Stochastic Gradient DescentConvergence rateIteration costTotal complexity

Algorithms with linear convergence and cheap iteration cost

Page 5: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Variance reduction technique

To be estimated: E[X]Given: Y which is correlated with X. E[Y] can be easily computed

α=1: unbiasedα=0: highly biasedIf Cov[X,Y] is large, variance of estimator is lower

Page 6: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

SAGA: Algorithm

Page 7: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Convergence results

Page 8: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Convergence result

• Define the following function

• Show that

• Note that and conclude the result

Page 9: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 10: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 11: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 12: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 13: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 14: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method
Page 15: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Composite case

Consider F(x)=f(x)+h(x) where h(x) is convex but not L-smooth.

Same convergence rate due to non-expansiveness of proximal operator.

Page 16: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Other Variance Reduction Techniques

Page 17: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

Convergence rates

• SAG

• SAGA

• SVRG

Page 18: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

References

[1] M. W. Schmidt, N. L. Roux, and F. R. Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388, 2013.

[2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In NIPS 27, pages 1646-1654. 2014.

[3] R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS 26, pages 315-323. 2013.

[4] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2004.

[5] Incremental Gradient Methods, IE 598 Course Notes,http://niaohe.ise.illinois.edu/IE598/pdf/IE598-lecture23-incremental%20gradient%20algorithms.pdf