Qualifier
-
Upload
alex-chuck -
Category
Education
-
view
469 -
download
5
description
Transcript of Qualifier
![Page 1: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/1.jpg)
Qualifier Exam in HPC
February 10th, 2010
![Page 2: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/2.jpg)
Quasi-Newton methods
Alexandru Cioaca
![Page 3: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/3.jpg)
Quasi-Newton methods(nonlinear systems)
Nonlinear systems:F(x) = 0, F : Rn Rn
F(x) = [ fi(x1,…,xn) ]T
Such systems appear in the simulation of processes (physical, chemical, etc.)
Iterative algorithm to solve nonlinear systems
Newton’s method != Nonlinear least-squares
![Page 4: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/4.jpg)
Quasi-Newton methods(nonlinear systems)
Standard assumptions1.F – continuously differentiable in an open
convex set D2.F – Lipschitz continuous on D3.There is x* in D s.t. F(x*)=0, F’(x*) nonsingular
Newton’s method:Starting from x0 (initial iterate)
xk+1 = xk – F’(xk)-1 * F(xk), {xk} x*
Until termination criterion is satisfied
![Page 5: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/5.jpg)
Quasi-Newton methods(nonlinear systems)
Linear model around xk:
Mn(x) = F(xn) + F’(xn)(x-xn)
Mn(x) = 0 xn+1 = xn - F’(xn)-1 *F(xn)
Iterates are computed as:F’(xn) * sn = F(xn)
xn+1 = xn - sn
![Page 6: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/6.jpg)
Quasi-Newton methods(nonlinear systems)
Evaluate F’(xn) Symbolically Numerically with finite differences Automatic differentiation
Solve the linear system F’(xn) * sn = F(xn) Direct solve: LU, Cholesky Iterative methods: GMRES, CG
![Page 7: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/7.jpg)
Quasi-Newton methods(nonlinear systems)
Computation:
F(xk) n scalar functions F’(xk) n2 scalar functions
LU O(2n3/3) Cholesky O(n3/3)
Krylov methods (depends on condition number)
![Page 8: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/8.jpg)
Quasi-Newton methods(nonlinear systems)
LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit)
Difficult to parallelize and balance the workload Cholesky is faster and more stable but needs SPD
(!)
For n large, factorization is very impractical (n~106) Krylov methods contain elements easily
parallelizable (updates, inner products, matrix-vector products)
CG is faster and more stable but needs SPD
![Page 9: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/9.jpg)
Quasi-Newton methods(nonlinear systems)
Advantages:
Under standard assumptions, Newton’s method converges locally and quadratically
There exists a domain of attraction S which contains the solution
Once the iterates enter S, they stay in S and eventually converge to x*
The algorithm is memoryless (self-corrective)
![Page 10: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/10.jpg)
Quasi-Newton methods(nonlinear systems)
Disadvantages:
Convergence depends on the choice of x0
F’(x) has to be evaluated for each xk
Computation can be expensive: F(xk), F’(xk), sk
![Page 11: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/11.jpg)
Quasi-Newton methods(nonlinear systems)
Implicit schemes for ODEs
y’ = f(t,y)
Forward Euler: yn+1 = yn + hf(tn,yn) (explicit)
Backward Euler: yn+1 = yn + hf(tn+1, yn+1) (implicit)
Implicit schemes need the solution of a nonlinear system
(also CN, RK, LMF)
![Page 12: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/12.jpg)
Quasi-Newton methods(nonlinear systems)
How to circumvent evaluating F’(xk) ? Broyden’s method
Bk+1 = Bk + (yk – Bk*sk)*skT / <sk, sk>
xk+1 = xk – Bk-1 * F(xk)
Inverse update (Sherman-Morrison formula)Hk+1=Hk+(sk-Hk*yk)*sk
T*Hk/<sk,Hk*yk>
xk+1 = xk – Hk * F(xk)
( sk+1 = xk+1 – xk, yk+1 = F(xk+1) – F(xk) )
![Page 13: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/13.jpg)
Quasi-Newton methods(nonlinear systems)
Advantages: No need to compute F’(xk) For inverse update – no linear system to solve
Disadvantages: Superlinear convergence No longer memoryless
![Page 14: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/14.jpg)
Quasi-Newton methods(unconstrained optimization)
Problem:Find the global minimizer of a cost function
f : Rn R, x* = arg min f
f differentiable means the problem can be attacked by looking for zeros of the gradient
![Page 15: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/15.jpg)
Quasi-Newton methods(unconstrained optimization)
Descent methodsxk+1=xk – λk*Pk*f(xk)
Pk = In - steepest descent
Pk = 2f(xk)-1 - Newton’s method
Pk = Bk-1 - Quasi-Newton
Angle between Pk,f(xk) less than 90
Bk has to mimic the behavior of the Hessian
![Page 16: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/16.jpg)
Quasi-Newton methods(unconstrained optimization)
Global convergence
Line searchStep length: backtracking, interpolationSufficient decrease: Wolfe conditions
Trust regions
![Page 17: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/17.jpg)
Quasi-Newton methods(unconstrained optimization)
For Quasi-Newton, Bk has to resemble 2f(xk)
Single-Rank:
Symmetry:
Positive def.:
Inverse update:
2,
,
,
)()(
ss
sssBsy
ss
BsyssBsyBB
TTT
PSB
2,
,
,
)()(
sy
yysBsy
sy
BsyyyBsyBB
TTT
DFP
sy
ss
sy
ysIH
sy
syIH
TTT
BFGS ,)
,()
,(
sBsy
BsyBsyBB
T
SR ,
))((1
![Page 18: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/18.jpg)
Quasi-Newton methods(unconstrained optimization)
Computation Matrix updates, inner products DFP, PSB 3 matrix-vector products BFGS 2 matrix-matrix products
Storage Limited memory versions (L-BFGS) Store {sk, yk} for the last m iterations and
recompute H
![Page 19: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/19.jpg)
Further improvements
Preconditioning the linear system
For faster convergence one may solve K*Bk*pk = K*F(xk)
If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner
This preconditioner can be refined on a subspace of Bk using an algebraic multigrid technique
We need to solve the eigenvalue problem
![Page 20: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/20.jpg)
Further improvements
Model reduction
Sometimes the dimension of the system is very large
Smaller model that captures the essence of the original
An approximation of the model variability can be retrieved from an ensemble of forward simulations
The covariance matrix gives the subspace
We need to solve the eigenvalue problem
![Page 21: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/21.jpg)
QR/QL algorithmsfor symmetric matrices
Solves the eigenvalue problem Iterative algorithm Uses QR/QL factorization at each step
(A=Q*R, Q unitary, R upper triangular)
for k = 1,2,..Ak=Qk*Rk
Ak+1=Rk*Qk
end
Diagonal of Ak converges to eigenvalues of A
![Page 22: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/22.jpg)
QR/QL algorithmsfor symmetric matrices
The matrix A is reduced to upper Hessenberg form before starting the iterations
Householder reflections (U=I-v*v’) Reduction is made column-wise If A is symmetric, it is reduced to tridiagonal
form
![Page 23: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/23.jpg)
QR/QL algorithmsfor symmetric matrices
Convergence to a triangular form can be slow Origin shifts are used to accelerate it
for k = 1,2,..Ak-zk*I=Qk*Rk
Ak+1=Rk*Qk+zk*Iend
Wilkinson shift QR makes heavy use of matrix-matrix
products
![Page 24: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/24.jpg)
Alternatives to quasi-Newton
Inexact Newton methods Inner iteration – determine a search direction by
solving the linear system with a certain tolerance Only Hessian-vector products are necessary Outer iteration – line search on the search
direction
Nonlinear CG Residual replaced by gradient of cost function Line search Different flavors
![Page 25: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/25.jpg)
Alternatives to quasi-Newton
Direct search
Does not involve derivatives of the cost function
Uses a structure called simplex to search for decrease in f
Stops when further progress cannot be achieved
Can get stuck in a local minima
![Page 26: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/26.jpg)
More alternatives
Monte Carlo
Computational method relying on random sampling
Can be used for optimization (MDO), inverse problems by using random walks
In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it
![Page 27: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/27.jpg)
Conclusions
Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers)
The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance
![Page 28: Qualifier](https://reader035.fdocuments.us/reader035/viewer/2022070302/548b9844b47959963d8b46bd/html5/thumbnails/28.jpg)
Thank you for your time!