CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent.
CS B553: Algorithms for Optimization and Learning
-
Upload
sokanon-brown -
Category
Documents
-
view
35 -
download
0
description
Transcript of CS B553: Algorithms for Optimization and Learning
CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGGradient descent
KEY CONCEPTS
Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate
descent Random restarts
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Gradient descent: iteratively move in direction
Line search: pick step size to lead to decrease in function value
Line search: pick step size to lead to decrease in function value
(Use your favorite univariate optimization method)
a
f(x-af(x))
*a
GRADIENT DESCENT PSEUDOCODE
Input: f, starting value x1, termination tolerances
For t=1,2,…,maxIters: Compute the search direction dt = -f(xt) If ||dt||< εg then:
return “Converged to critical point”, output xt
Find t so that f(xt+t dt) < f(xt) using line search If ||t dt||< εx then:
return “Converged in x”, output xt
Let xt+1 = xt+t dt
Return “Max number of iterations reached”, output xmaxIters
RELATED METHODS
Steepest descent (discrete) Coordinate descent
Many local minima: good initialization, or random restarts