CS B553: Algorithms for Optimization and Learning

21
CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNING Gradient descent

description

CS B553: Algorithms for Optimization and Learning. Gradient descent. Key Concepts. Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts. - PowerPoint PPT Presentation

Transcript of CS B553: Algorithms for Optimization and Learning

Page 1: CS B553: Algorithms for Optimization and Learning

CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGGradient descent

Page 2: CS B553: Algorithms for Optimization and Learning

KEY CONCEPTS

Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate

descent Random restarts

Page 3: CS B553: Algorithms for Optimization and Learning

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Page 4: CS B553: Algorithms for Optimization and Learning

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Page 5: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 6: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 7: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 8: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 9: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 10: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 11: CS B553: Algorithms for Optimization and Learning

Gradient descent: iteratively move in direction

Page 12: CS B553: Algorithms for Optimization and Learning

Line search: pick step size to lead to decrease in function value

Page 13: CS B553: Algorithms for Optimization and Learning

Line search: pick step size to lead to decrease in function value

(Use your favorite univariate optimization method)

a

f(x-af(x))

*a

Page 14: CS B553: Algorithms for Optimization and Learning

GRADIENT DESCENT PSEUDOCODE

Input: f, starting value x1, termination tolerances

For t=1,2,…,maxIters: Compute the search direction dt = -f(xt) If ||dt||< εg then:

return “Converged to critical point”, output xt

Find t so that f(xt+t dt) < f(xt) using line search If ||t dt||< εx then:

return “Converged in x”, output xt

Let xt+1 = xt+t dt

Return “Max number of iterations reached”, output xmaxIters

Page 15: CS B553: Algorithms for Optimization and Learning
Page 16: CS B553: Algorithms for Optimization and Learning
Page 17: CS B553: Algorithms for Optimization and Learning
Page 18: CS B553: Algorithms for Optimization and Learning
Page 19: CS B553: Algorithms for Optimization and Learning
Page 20: CS B553: Algorithms for Optimization and Learning

RELATED METHODS

Steepest descent (discrete) Coordinate descent

Page 21: CS B553: Algorithms for Optimization and Learning

Many local minima: good initialization, or random restarts