Reducing the memory footprint in Krylov methods through...

April 15th, Knoxville, Tennessee

Reducing the memory footprint in Krylov methodsthrough lossy compressionProject talk @ 9th JLESC workshop

Nick Schenkels1

Joint work with: Emmanuel Agullo1 Luc Giraud1 Sheng Di2 Franck Cappello2

1Inria2Argonne National Laboratory

2

Overview

I Introduction• Krylov methods & inexactness• Mixed precision & compression

I Preliminary results• Some theory• Some experiments

I Outlook

3

Introduction

I Krylov methods & inexactnessI Mixed precision & compression

4

Krylov methods

We want to solveAx = b

with A ∈ Rn×n and x , b ∈ Rn.

Krylov methods:I Very efficient when A is large and sparse.I Typically require 1 matrix-vector multiplication with A and/or AT

per iteration.I Construct a Krylov subspace

Kk(A, r0) ={

r0,Ar0,A2r0, . . . ,Ak−1r0},

with r0 = b − Ax0 and xk ∈ x0 +Kk(A, r0).

5

Inexact matvec

What happens if the matrix-vector product is not calculatedaccurately?

zk = Avk ←→{

zk = Avk + pk

zk = (A + Ek)vk

Recent results:A. Bouras and V. Frayssé, Inexact matrix-vector products in Krylov methods forsolving linear systems: a relaxation strategy, SIAM journal on matrix analysisand applications (2005).

V. Simoncini and D. B. Szyld, Theory of inexact Krylov subspace methods andapplications to scientific computing, SIAM journal on scientific computing(2003).

L. Giraud, S. Gratton and J. Langou, Convergence in backward error of relaxedGMRES, SIAM journal on scientific computing (2007).

6

An example …

I A ∈ R225×225 is PDE225 matrix, x = (1, . . . , 1)T ∈ R225.I Stopping criterion: ηA,b(xk) = ‖b−Axk‖

‖A‖‖xk‖+‖b‖ ≤ ε = 1e-12.

7

Mixed precision – method I

Method: perform part of the computation in 32 bit or 16 bit, while stillachieving full accuracy for the reconstruction.

I Exploit faster 32 bit & 16 bit calculations on modern hardware.I O�en it is the preconditioner that is calculated in lower precision.

Recent results:A. Buttari et al., Mixed precision iterative refinement techniques for the solutionof dense linear systems, International journal of high performance computingapplications (2007).

M. Arioli and I. S. Duff, Using FGMRES to obtain backward stability in mixedprecision, Electronic transactions on numerical analysis (2009).

E. Carson and N. J. Higham, Accelerating the solution of linear systems byiterative refinement in three precisions, MIMS EPrint (2017).

8

Mixed precision – method II

method: store part of the data in lower precision, while still achievingfull accuracy for the reconstruction.

I Calculations are done in full precision.I Less communication and lower memory requirements.I Read/write operations consume more energy than computations.

Recent results:H. Antz et al., Adaptive precision in block-Jacobi preconditioning for iterativesparse linear system solvers, MIMS EPrint (2017).

Why limit ourselves to 64 bit, 32 bit, 16 bit, etc?⇓⇓⇓

Use data compression techniques

9

The SZ compressor developed @ Argonne et al.

Lossy compression technique:I Higher compression factors than lossless techniques.I Designed to deal with irregular data and spiky changes.I Compression error can controlled:

absolute error, relative error or point-wise relative error.

D. Tao et al., Significantly improving lossy compression for scientific data setsbased on multidimensional prediction and error-controlled quantization, IEEEIPDPS (2017).

https://github.com/disheng222/SZ

Project goal:Incorporate the SZ compressor into a Krylov method in order toreduce the memory and communication costs.

https://github.com/disheng222/SZ

10

Preliminary results

I Some theoryI Some experiments

11

GMRES inexact vs. compression

The GMRES algorithm revolves around the Arnoldi-relation

AVk = Vk+1Hk .

Here Vk ∈ Rn×k , V Tk Vk = Ik , span Vk = Kk(A, r0), Hk ∈ R(k+1)×k is

upper Hessenberg and

xk = x0 + Vkyk with yk = arg miny∈Rk

∥∥∥ ‖r0‖ − Hky∥∥∥.

I inexact GMRES:→ z = (A + Ek)vk→ AVk = Vk+1Hk→ still GMRES

I compressed GMRES:→ compress vk at some point→ z ⊥ the compressed vk ???→ no Arnoldi relation …

GMRES: Arnoldi part

1: v1 = r0/ ‖r0‖2: for k = 1, 2, . . . do3: z = Avk4: z ← Arnoldi(Vk ).5: vk+1 = z/ ‖z‖6: end for

12

FGMRES = GMRES with varying right preconditioner M−1k

Different Arnoldi-relation:

AZk = Vk+1Hk .

Here Vk ∈ Rn×k , V Tk Vk = Ik , Zk ∈ Rn×k , Hk ∈ R(k+1)×k is upper

Hessenberg and

xk = x0 + Zkyk with yk = arg miny∈Rk

∥∥∥ ‖r0‖ − Hky∥∥∥.

I span Vk 6= Kk(A, r0).I Not a Krylov method.I zk can in theory be random.

FGMRES: Arnoldi part

1: v1 = r0/ ‖r0‖2: for k = 1, 2, . . . do3: zk = M−1

k vkzk = M−1k vkzk = M−1k vk

4: w = Azk5: w ← Arnoldi(Vk ).6: vk+1 = z/ ‖z‖7: end for

13

FGMRES

Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM journal onscientific computing (1993).

I Allows iterative methods to be used as preconditioners.I Is not guaranteed to converge.I Double the memory: Zk and Vk+1

zk are not orthogonal⇒ compress zk instead of vk .

If compressed FGMRES converges within the same number ofiterations, then we gain in memory.

14

Assumptions & compression model

I Assume that FGMRES converges with the preconditioners{M−1

k

}n

k=1

I Let εk be the maximum relative point-wise error.

We introduce an error a�er applying the preconditioner M−1k :

zk = M−1k vk ←→ zk = (I + diag(δk)) zk = (I + diag(δk)) M−1

k vk

with |(δk)i | ≤ εk , for i = 1, . . . n

I Still FGMRES, but with the preconditioners{(I + diag(δk)) M−1

k

}n

k=1

15

Compressed FGMRES

Compressed Arnoldi-relation:

AZk = Vk+1Hk

−→ optimization in a different subspace of Rn !!!If

yk = arg miny∈Rk

∥∥∥r0 − AZ y∥∥∥ and yk = arg min

y∈Rk‖r0 − AZky‖

then∥∥∥r0 − AZk yk∥∥∥︸︷︷︸

compressed residual

≤∥∥∥r0 − AZkyk

∥∥∥︸︷︷︸exact residual

+∥∥∥A [diag(δ1)z1, . . . , diag(δk)zk ] yk

∥∥∥︸︷︷︸residual gap

How large can εk become such that ηA,b(xk) ≤ ε ?

16

Bounding εk

For some targeted tolerance ε and c ∈]0, 1[ we know that there existsa k such that

ηA,b(xk) =

∥∥∥b − Axk∥∥∥

‖A‖ ‖xk‖+ ‖b‖ ≤ cε

Result:If, for all i = 1, . . . , k ,

εi ≤ (1− c)σmin(Hk)k2 ‖A‖

‖b‖‖ri−1‖

ε (1)

then

ηA,b(xk) =

∥∥∥b − Axk∥∥∥

‖A‖ ‖xk‖+ ‖b‖ ≤ ε

17

Some practical bounds

I (1) contains values that are not available.I (1) is a very conservative bound.I Consider the relaxed bounds:

εi ≤ (1− c)σmin(A)k2 ‖A‖

‖b‖‖ri−1‖

ε (b1)

εi ≤ (1− c) 1‖A‖

‖b‖‖ri−1‖

ε (b2)

εi ≤ (1− c) ‖b‖‖ri−1‖

ε (b3)

18

Numerical experiments

I FMGRES-GMRES(25)I Outer stopping criterium: ηA,b(xk) ≤ 1e-12 and maxit≤ 256.I Inner stopping criterium: ηA,vk (zk) ≤ 1e-4 and maxit≤ 50.

Matrix n FGMRES cFGMRES-(b1) cFGMRES-(b2) cFGMRES-(b3)pde225 225× 225 3 3 5.5e-11 3 1.4e-5 3 1.6e-4e05r0000 236× 236 11 11 2.1e-11 11 3.2e-3 11 3.6e-2orsirr_1 1030× 1030 33 33 2.3e-15 33 2.4e-11 33 1.2e-51138_bus 1138× 1138 72 72 1.1e-15 71 1.6e-8 72 5.9e-4cavity05 1182× 1182 22 22 3.8e-13 22 1.6e-3 24 3.0e-2fidap003 1821× 1821 136 136 0.0 136 4.0e-10 194 9.8e-3watt__1 1856× 1856 138 138 2.9e-16 138 8.8e-2 138 8.7e-2bwm2000 2000× 2000 31 31 6.6e-14 31 8.1e-9 32 2.5e-3olm2000 2000× 2000 113 113 6.8e-15 115 4.1e-9 190 3.0e-3add20 2395× 2395 13 13 2.9e-11 50 5.5e-2 19 2.9e-2

Matrices taken from matrix market and x = (1, . . . , 1)T ∈ Rn.

19

Results for 1138_bus

20

Results for add20

21

Outlook

22

Outlook

Done:I Theoretical framework for the size of the compression.I Initial numerical experiments.I A compressor.

To be done:I Combine the theoretical results with the compressor.I Study the gain in memory & speedup.I Empirical generalizations:

• Compression of blocks.• Restarting strategies.• …

23

Is this the end …

Thank you for your attention.

Any questions?

Reducing the memory footprint in Krylov methods through...

Documents

Transcript of Reducing the memory footprint in Krylov methods through...