IE 598: Incremental Gradient Methods - SAGA
Meghana Bandembande2
Outline
• Introduction• SAGA algorithm• Convergence proof
Finite sum problem
Minimize f(x) of the form
Each function is µ-strongly convex and L-smooth.
Applications:Empirical risk minimization
Motivation
• Gradient DescentConvergence rate Iteration cost Total complexity
• Stochastic Gradient DescentConvergence rateIteration costTotal complexity
Algorithms with linear convergence and cheap iteration cost
Variance reduction technique
To be estimated: E[X]Given: Y which is correlated with X. E[Y] can be easily computed
α=1: unbiasedα=0: highly biasedIf Cov[X,Y] is large, variance of estimator is lower
SAGA: Algorithm
Convergence results
Convergence result
• Define the following function
• Show that
• Note that and conclude the result
Composite case
Consider F(x)=f(x)+h(x) where h(x) is convex but not L-smooth.
Same convergence rate due to non-expansiveness of proximal operator.
Other Variance Reduction Techniques
Convergence rates
• SAG
• SAGA
• SVRG
References
[1] M. W. Schmidt, N. L. Roux, and F. R. Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388, 2013.
[2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In NIPS 27, pages 1646-1654. 2014.
[3] R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS 26, pages 315-323. 2013.
[4] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2004.
[5] Incremental Gradient Methods, IE 598 Course Notes,http://niaohe.ise.illinois.edu/IE598/pdf/IE598-lecture23-incremental%20gradient%20algorithms.pdf
Top Related