Download - IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method

IE 598: Incremental Gradient Methods - SAGA

Meghana Bandembande2

Outline

• Introduction• SAGA algorithm• Convergence proof

Finite sum problem

Minimize f(x) of the form

Each function is µ-strongly convex and L-smooth.

Applications:Empirical risk minimization

Motivation

• Gradient DescentConvergence rate Iteration cost Total complexity

• Stochastic Gradient DescentConvergence rateIteration costTotal complexity

Algorithms with linear convergence and cheap iteration cost

Variance reduction technique

To be estimated: E[X]Given: Y which is correlated with X. E[Y] can be easily computed

α=1: unbiasedα=0: highly biasedIf Cov[X,Y] is large, variance of estimator is lower

SAGA: Algorithm

Convergence results

Convergence result

• Define the following function

• Show that

• Note that and conclude the result

Composite case

Consider F(x)=f(x)+h(x) where h(x) is convex but not L-smooth.

Same convergence rate due to non-expansiveness of proximal operator.

Other Variance Reduction Techniques

Convergence rates

• SAG

• SAGA

• SVRG

References

[1] M. W. Schmidt, N. L. Roux, and F. R. Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388, 2013.

[2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In NIPS 27, pages 1646-1654. 2014.

[3] R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS 26, pages 315-323. 2013.

[4] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2004.

[5] Incremental Gradient Methods, IE 598 Course Notes,http://niaohe.ise.illinois.edu/IE598/pdf/IE598-lecture23-incremental%20gradient%20algorithms.pdf

http://niaohe.ise.illinois.edu/IE598/pdf/IE598-lecture23-incremental%20gradient%20algorithms.pdf