Reducing the memory footprint in Krylov methods through...

23
April th, Knoxville, Tennessee Reducing the memory footprint in Krylov methods through lossy compression Project talk @ th JLESC workshop Nick Schenkels Joint work with: Emmanuel Agullo Luc Giraud Sheng Di Franck Cappello 1 Inria 2 Argonne National Laboratory

Transcript of Reducing the memory footprint in Krylov methods through...

Page 1: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

April 15th, Knoxville, Tennessee

Reducing the memory footprint in Krylov methodsthrough lossy compressionProject talk @ 9th JLESC workshop

Nick Schenkels1

Joint work with: Emmanuel Agullo1 Luc Giraud1 Sheng Di2 Franck Cappello2

1Inria2Argonne National Laboratory

Page 2: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

2

Overview

I Introduction• Krylov methods & inexactness• Mixed precision & compression

I Preliminary results• Some theory• Some experiments

I Outlook

Page 3: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

3

Introduction

I Krylov methods & inexactnessI Mixed precision & compression

Page 4: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

4

Krylov methods

We want to solveAx = b

with A ∈ Rn×n and x , b ∈ Rn.

Krylov methods:I Very efficient when A is large and sparse.I Typically require 1 matrix-vector multiplication with A and/or AT

per iteration.I Construct a Krylov subspace

Kk(A, r0) ={

r0,Ar0,A2r0, . . . ,Ak−1r0},

with r0 = b − Ax0 and xk ∈ x0 +Kk(A, r0).

Page 5: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

5

Inexact matvec

What happens if the matrix-vector product is not calculatedaccurately?

zk = Avk ←→{

zk = Avk + pk

zk = (A + Ek)vk

Recent results:A. Bouras and V. Frayssé, Inexact matrix-vector products in Krylov methods forsolving linear systems: a relaxation strategy, SIAM journal on matrix analysisand applications (2005).

V. Simoncini and D. B. Szyld, Theory of inexact Krylov subspace methods andapplications to scientific computing, SIAM journal on scientific computing(2003).

L. Giraud, S. Gratton and J. Langou, Convergence in backward error of relaxedGMRES, SIAM journal on scientific computing (2007).

Page 6: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

6

An example …

I A ∈ R225×225 is PDE225 matrix, x = (1, . . . , 1)T ∈ R225.I Stopping criterion: ηA,b(xk) = ‖b−Axk‖

‖A‖‖xk‖+‖b‖ ≤ ε = 1e-12.

Page 7: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

7

Mixed precision – method I

Method: perform part of the computation in 32 bit or 16 bit, while stillachieving full accuracy for the reconstruction.

I Exploit faster 32 bit & 16 bit calculations on modern hardware.I O�en it is the preconditioner that is calculated in lower precision.

Recent results:A. Buttari et al., Mixed precision iterative refinement techniques for the solutionof dense linear systems, International journal of high performance computingapplications (2007).

M. Arioli and I. S. Duff, Using FGMRES to obtain backward stability in mixedprecision, Electronic transactions on numerical analysis (2009).

E. Carson and N. J. Higham, Accelerating the solution of linear systems byiterative refinement in three precisions, MIMS EPrint (2017).

Page 8: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

8

Mixed precision – method II

method: store part of the data in lower precision, while still achievingfull accuracy for the reconstruction.

I Calculations are done in full precision.I Less communication and lower memory requirements.I Read/write operations consume more energy than computations.

Recent results:H. Antz et al., Adaptive precision in block-Jacobi preconditioning for iterativesparse linear system solvers, MIMS EPrint (2017).

Why limit ourselves to 64 bit, 32 bit, 16 bit, etc?⇓⇓⇓

Use data compression techniques

Page 9: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

9

The SZ compressor developed @ Argonne et al.

Lossy compression technique:I Higher compression factors than lossless techniques.I Designed to deal with irregular data and spiky changes.I Compression error can controlled:

absolute error, relative error or point-wise relative error.

D. Tao et al., Significantly improving lossy compression for scientific data setsbased on multidimensional prediction and error-controlled quantization, IEEEIPDPS (2017).

https://github.com/disheng222/SZ

Project goal:Incorporate the SZ compressor into a Krylov method in order toreduce the memory and communication costs.

Page 10: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

10

Preliminary results

I Some theoryI Some experiments

Page 11: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

11

GMRES inexact vs. compression

The GMRES algorithm revolves around the Arnoldi-relation

AVk = Vk+1Hk .

Here Vk ∈ Rn×k , V Tk Vk = Ik , span Vk = Kk(A, r0), Hk ∈ R(k+1)×k is

upper Hessenberg and

xk = x0 + Vkyk with yk = arg miny∈Rk

∥∥∥ ‖r0‖ − Hky∥∥∥.

I inexact GMRES:→ z = (A + Ek)vk→ AVk = Vk+1Hk→ still GMRES

I compressed GMRES:→ compress vk at some point→ z ⊥ the compressed vk ???→ no Arnoldi relation …

GMRES: Arnoldi part

1: v1 = r0/ ‖r0‖2: for k = 1, 2, . . . do3: z = Avk4: z ← Arnoldi(Vk ).5: vk+1 = z/ ‖z‖6: end for

Page 12: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

12

FGMRES = GMRES with varying right preconditioner M−1k

Different Arnoldi-relation:

AZk = Vk+1Hk .

Here Vk ∈ Rn×k , V Tk Vk = Ik , Zk ∈ Rn×k , Hk ∈ R(k+1)×k is upper

Hessenberg and

xk = x0 + Zkyk with yk = arg miny∈Rk

∥∥∥ ‖r0‖ − Hky∥∥∥.

I span Vk 6= Kk(A, r0).I Not a Krylov method.I zk can in theory be random.

FGMRES: Arnoldi part

1: v1 = r0/ ‖r0‖2: for k = 1, 2, . . . do3: zk = M−1

k vkzk = M−1k vkzk = M−1k vk

4: w = Azk5: w ← Arnoldi(Vk ).6: vk+1 = z/ ‖z‖7: end for

Page 13: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

13

FGMRES

Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM journal onscientific computing (1993).

I Allows iterative methods to be used as preconditioners.I Is not guaranteed to converge.I Double the memory: Zk and Vk+1

zk are not orthogonal⇒ compress zk instead of vk .

If compressed FGMRES converges within the same number ofiterations, then we gain in memory.

Page 14: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

14

Assumptions & compression model

I Assume that FGMRES converges with the preconditioners{M−1

k

}n

k=1

I Let εk be the maximum relative point-wise error.

We introduce an error a�er applying the preconditioner M−1k :

zk = M−1k vk ←→ zk = (I + diag(δk)) zk = (I + diag(δk)) M−1

k vk

with |(δk)i | ≤ εk , for i = 1, . . . n

I Still FGMRES, but with the preconditioners{(I + diag(δk)) M−1

k

}n

k=1

Page 15: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

15

Compressed FGMRES

Compressed Arnoldi-relation:

AZk = Vk+1Hk

−→ optimization in a different subspace of Rn !!!If

yk = arg miny∈Rk

∥∥∥r0 − AZ y∥∥∥ and yk = arg min

y∈Rk‖r0 − AZky‖

then∥∥∥r0 − AZk yk∥∥∥︸ ︷︷ ︸

compressed residual

≤∥∥∥r0 − AZkyk

∥∥∥︸ ︷︷ ︸exact residual

+∥∥∥A [diag(δ1)z1, . . . , diag(δk)zk ] yk

∥∥∥︸ ︷︷ ︸residual gap

How large can εk become such that ηA,b(xk) ≤ ε ?

Page 16: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

16

Bounding εk

For some targeted tolerance ε and c ∈]0, 1[ we know that there existsa k such that

ηA,b(xk) =

∥∥∥b − Axk∥∥∥

‖A‖ ‖xk‖+ ‖b‖ ≤ cε

Result:If, for all i = 1, . . . , k ,

εi ≤ (1− c)σmin(Hk)k2 ‖A‖

‖b‖‖ri−1‖

ε (1)

then

ηA,b(xk) =

∥∥∥b − Axk∥∥∥

‖A‖ ‖xk‖+ ‖b‖ ≤ ε

Page 17: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

17

Some practical bounds

I (1) contains values that are not available.I (1) is a very conservative bound.I Consider the relaxed bounds:

εi ≤ (1− c)σmin(A)k2 ‖A‖

‖b‖‖ri−1‖

ε (b1)

εi ≤ (1− c) 1‖A‖

‖b‖‖ri−1‖

ε (b2)

εi ≤ (1− c) ‖b‖‖ri−1‖

ε (b3)

Page 18: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

18

Numerical experiments

I FMGRES-GMRES(25)I Outer stopping criterium: ηA,b(xk) ≤ 1e-12 and maxit≤ 256.I Inner stopping criterium: ηA,vk (zk) ≤ 1e-4 and maxit≤ 50.

Matrix n FGMRES cFGMRES-(b1) cFGMRES-(b2) cFGMRES-(b3)pde225 225× 225 3 3 5.5e-11 3 1.4e-5 3 1.6e-4e05r0000 236× 236 11 11 2.1e-11 11 3.2e-3 11 3.6e-2orsirr_1 1030× 1030 33 33 2.3e-15 33 2.4e-11 33 1.2e-51138_bus 1138× 1138 72 72 1.1e-15 71 1.6e-8 72 5.9e-4cavity05 1182× 1182 22 22 3.8e-13 22 1.6e-3 24 3.0e-2fidap003 1821× 1821 136 136 0.0 136 4.0e-10 194 9.8e-3watt__1 1856× 1856 138 138 2.9e-16 138 8.8e-2 138 8.7e-2bwm2000 2000× 2000 31 31 6.6e-14 31 8.1e-9 32 2.5e-3olm2000 2000× 2000 113 113 6.8e-15 115 4.1e-9 190 3.0e-3add20 2395× 2395 13 13 2.9e-11 50 5.5e-2 19 2.9e-2

Matrices taken from matrix market and x = (1, . . . , 1)T ∈ Rn.

Page 19: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

19

Results for 1138_bus

Page 20: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

20

Results for add20

Page 21: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

21

Outlook

Page 22: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

22

Outlook

Done:I Theoretical framework for the size of the compression.I Initial numerical experiments.I A compressor.

To be done:I Combine the theoretical results with the compressor.I Study the gain in memory & speedup.I Empirical generalizations:

• Compression of blocks.• Restarting strategies.• …

Page 23: Reducing the memory footprint in Krylov methods through ...icl.utk.edu/jlesc9/files/PTM1.1/jlesc9_schenkels.pdf7 Mixed precision – method I Method: perform part of the computation

23

Is this the end …

Thank you for your attention.

Any questions?