Preconditioning Techniques for Large Linear Systems Part ...benzi/Web_papers/padovaIII.pdf ·...

Preconditioning Techniques for Large Linear Systems

Part III: General-Purpose Algebraic Preconditioners

Michele Benzi

Department of Mathematics and Computer Science

Emory University

Atlanta, Georgia, USA

Scuola di Dottorato di Ricerca in Scienze Matematiche

Dipartimento di Matematica

Università degli Studi di Padova

1

Outline

1 Introduction

2 Generalities about preconditioning

3 Basic concepts of algebraic preconditioning

4 Incomplete factorizations

5 Sparse approximate inverses

6 IF via approximate inverses

7 Balanced Incomplete Factorization (BIF)

8 Conclusions

2

Outline

1 Introduction







8 Conclusions

3

Preconditioned iterative methods

Solving large linear systems by Krylov-type methods

Ax = b

4



Ax = b

Preconditioning may be viewed as a transformation:

M−1Ax = M−1b, or AM−1y = b, x = M−1y

4



Ax = b


M−1Ax = M−1b, or AM−1y = b, x = M−1y

Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...

4



Ax = b


M−1Ax = M−1b, or AM−1y = b, x = M−1y


preconditioner M (or M−1) should be cheap, fast to compute, andresult in rapid convergence of the preconditioned iterative method

4



Ax = b


M−1Ax = M−1b, or AM−1y = b, x = M−1y



but also: sufficiently robust

4



Ax = b


M−1Ax = M−1b, or AM−1y = b, x = M−1y




sparse (i.e., low storage requirements)

4



Ax = b


M−1Ax = M−1b, or AM−1y = b, x = M−1y




sparse (i.e., low storage requirements)

The case of sequences of linear systems A(k)x(k) = b(k),k = 0, 1, 2, . . .

4


Structure of this lecture:

5



1 Brief discussion of algebraic vs. problem-specific preconditioning

5




2 Description of guiding principles behind algebraic preconditioning(IF and SAI). Robustness problems of standard techniques

5





3 Some recent approaches which exploit info on matrix inverse

5






4 An approach based on a novel decomposition of the input matrix

5






4 An approach based on a novel decomposition of the input matrix

5 Other recent developments: hybrid and multi-level methods (briefly)

5

Outline

1 Introduction







8 Conclusions

6

A quote

In ending this book with the subject of preconditioners, we find ourselvesat the philosophical center of the scientific computing of the future...Nothing will be more central to computational science in the next centurythan the art of transforming a problem that appears intractable intoanother whose solution can be approximated rapidly. For Krylov subspacematrix iterations, this is preconditioning.

From N. L. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM,1997.

7

Algebraic vs. Problem-Specific Preconditioning

Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.

8



Main examples include:

Preconditioners based on classical (block) splittings A = M −N

8





Incomplete factorizations: M = LU ≈ A

8






Approximate inverse preconditioners: G = M−1 ≈ A−1

8







Algebraic Multi-Grid (AMG).

8








Hybrids obtained by combining some of the above

8









Such preconditioners are good candidates for inclusion in general-purposesoftware packages. Although they may not be “optimal” for almost anyproblem, they are widely applicable and have proven to be reasonablyrobust in countless applications.

8









Such preconditioners are good candidates for inclusion in general-purposesoftware packages. Although they may not be “optimal” for almost anyproblem, they are widely applicable and have proven to be reasonablyrobust in countless applications.Also, they are being continually improved.

8


Discretization of a continuous problem (a system of PDEs, an integralequation, etc.) leads to a sequence of linear systems Anxn = bn where An

is n× n and n→∞ as the discretization is refined (that is, “as h→ 0 ”).

9




Definition: A preconditioner is optimal if it results in a rate ofconvergence of the preconditioned iteration that is asymptotically constantas the problem size increases, and if the cost of each preconditionediteration scales linearly in the size of the problem.

9





For integral equations, the scaling of each iteration may be O(n log n) orsomething like that.

9





For integral equations, the scaling of each iteration may be O(n log n) orsomething like that.

For example, in the SPD case if κ2(M−1n An) ≤ C where C is some

constant independent of n, then Mn is an optimal preconditioner if theaction of M−1

n An on a vector can be computed in O(n) work.

9


In contrast, problem-specific preconditioners, which are designed to solve anarrow class of problems, are often optimal. These methods makeextensive use of the developer’s knowledge of the application at handincluding information about the physics, the geometry, and the particulardiscretization technique used.

10



These preconditioners are usually not suitable for other types of problems,so their range of applicability is limited.

10




Many PDE-based (or physics-based) preconditioners belong to this class.An example is Diffusion Synthetic Acceleration (DSA) in radiationtransport.

10




Many PDE-based (or physics-based) preconditioners belong to this class.An example is Diffusion Synthetic Acceleration (DSA) in radiationtransport.

Other examples of problem-specific preconditioners, especially forincompressible flow problems, will be discussed later in these lectures.

10


The two approaches, algebraic and problem-specific, are not necessarilymutually exclusive—similar to ‘direct vs. iterative methods’.

11



Most problem-specific preconditioners use algebraic ones as buildingblocks, e.g., to solve or to approximate subproblems arising within theoverall preconditioning strategy.

11




Some algebraic preconditioners are flexible enough that they can betailored to specific applications.

11




Some algebraic preconditioners are flexible enough that they can betailored to specific applications.

Conversely, there has been a trend in recent years to build algebraicpreconditioners that mimic the properties of specialized preconditioners;for instance, algebraic multilevel methods.

11

Outline

1 Introduction







8 Conclusions

12

Implicit vs. explicit preconditioners

An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.

13



An explicit, or inverse, preconditioner is an approximation of the inverse ofthe input matrix: G = M−1 ≈ A−1. This is motivated by the observationthat even though A−1 is a dense matrix, many of its entries are negligiblysmall.

13




Examples of implicit preconditioners include classical splittings, incompletefactorizations, block and multilevel variants.

13





Examples of explicit preconditioners include polynomial preconditioners,sparse approximate inverses, and data-sparse approximate inverses.

13





Examples of explicit preconditioners include polynomial preconditioners,sparse approximate inverses, and data-sparse approximate inverses.

Both factored and non-factored forms are in use.

13


Application of an implicit preconditioner within a Krylov method (like CGor GMRES) requires solving one or more linear systems, often withtriangular or block triangular matrices.

14



In contrast, application of an explicit preconditioner requires one or morematrix-vector products.

14




Explicit preconditioners are easier to parallelize. Generally speaking,however, the construction of an explicit preconditioner tends to be morecostly than an implicit one. This is to be expected, since A (or its action)is known but A−1 is not.

14




Explicit preconditioners are easier to parallelize. Generally speaking,however, the construction of an explicit preconditioner tends to be morecostly than an implicit one. This is to be expected, since A (or its action)is known but A−1 is not.

Also, convergence rates are usually better with implicit preconditionersthan with explicit ones. But there are exceptions!

14

Outline

1 Introduction







8 Conclusions

15

Incomplete Factorization (IF) methods

When a sparse matrix is factored by Gaussian elimination, fill-in usuallytakes place. This means that the triangular factors L and U of thecoefficient matrix A are considerably less sparse than A.

16



Even though sparsity-preserving reordering techniques can be used toreduce fill-in, sparse direct methods are not considered viable for solvingvery large linear systems such as those arising from the discretization ofthree-dimensional boundary value problems, due to time and spaceconstraints.

16



Even though sparsity-preserving reordering techniques can be used toreduce fill-in, sparse direct methods are not considered viable for solvingvery large linear systems such as those arising from the discretization ofthree-dimensional boundary value problems, due to time and spaceconstraints.

However, by discarding part of the fill-in in the course of the factorizationprocess, simple but powerful preconditioners can be obtained in the formM = LU where L and U are the incomplete (approximate) LU factors.

16


Incomplete factorization algorithms differ in the rules that govern thedropping of fill-in in the incomplete factors. Fill-in can be discarded basedon several different criteria, such as position, value, or a combination ofthe two.

17



Letting n = {1, 2, . . . , n}, one can fix a subset S ⊆ n× n of positions inthe matrix, usually including the main diagonal and all (i, j) such thataij 6= 0, and allow fill-in in the LU factors only in positions which are in S.

17



Letting n = {1, 2, . . . , n}, one can fix a subset S ⊆ n× n of positions inthe matrix, usually including the main diagonal and all (i, j) such thataij 6= 0, and allow fill-in in the LU factors only in positions which are in S.

Formally, an incomplete factorization step can be described as

aij ←

{aij − aika

−1kk akj if (i, j) ∈ S

aij otherwise

for each k and for i, j > k.

17

Incomplete Factorizations (IF) methods

Very simple patterns for cheap / cache-efficient preconditioners?

18

Incomplete Factorizations (IF) methods

Very simple patterns for cheap / cache-efficient preconditioners?

Example: banded pattern: BCSSTK38, n = 8032, nnz = 181, 746; SPD(small structural analysis problem from Boeing).

bandwidth (full) PCG its

1 426

3 821

5 648

9 1638

15 792

1011 105

1311 56

1511 nc

3111 35

4111 18

18


Notice that the incomplete factorization may fail due to division by zeroor near-zero (this is usually referred to as a pivot breakdown), even if Aadmits an LU factorization without pivoting.

19



Partial pivoting can help, but it is costly and does not always suffice inthe incomplete case.

19




If S coincides with the set of positions which are nonzero in A, we obtainthe no-fill ILU factorization, or ILU(0). For SPD matrices the sameconcept applies to the Cholesky factorization A = LLT , resulting in theno-fill IC factorization, or IC(0).

19




If S coincides with the set of positions which are nonzero in A, we obtainthe no-fill ILU factorization, or ILU(0). For SPD matrices the sameconcept applies to the Cholesky factorization A = LLT , resulting in theno-fill IC factorization, or IC(0).

When used with the conjugate gradient algorithm, this preconditionerleads to the ICCG method (Meĳerink & van der Vorst, 1977).

19


The no-fill ILU and IC preconditioners are very simple to implement, theircomputation is inexpensive, and they are reasonably effective for significantproblems, such as low-order discretizations of scalar elliptic PDEs leadingto M -matrices or to diagonally dominant ones. No pivot breakdown canoccur in these cases (Meĳerink & van der Vorst, 1977; Manteuffel, 1980).

20


The no-fill ILU and IC preconditioners are very simple to implement, theircomputation is inexpensive, and they are reasonably effective for significantproblems, such as low-order discretizations of scalar elliptic PDEs leadingto M -matrices or to diagonally dominant ones. No pivot breakdown canoccur in these cases (Meĳerink & van der Vorst, 1977; Manteuffel, 1980).

However, for more difficult and realistic problems the no-fill factorizationsresult in too crude an approximation of A, and more sophisticatedpreconditioners, which allow some fill-in in the incomplete factors, areneeded. For instance, this is the case for highly nonsymmetric andindefinite matrices such as those arising in many CFD applications.

20


A hierarchy of ILU preconditioners may be obtained based on the “levels offill-in” concept. A level of fill is attributed to each matrix entry thatoccurs in the incomplete factorization process. Fill-ins are dropped basedon the value of the level of fill. The formal definition is as follows.

21



The initial level of fill of a matrix entry aij is defined to be

levij =

{0, if aij 6= 0, or i = j

∞ otherwise.

Each time this element is modified by the ILU process, its level of fill mustbe updated according to

levij = min{levij , levik + levkj + 1}.

21



The initial level of fill of a matrix entry aij is defined to be

levij =

{0, if aij 6= 0, or i = j

∞ otherwise.

Each time this element is modified by the ILU process, its level of fill mustbe updated according to

levij = min{levij , levik + levkj + 1}.

Let ℓ be a nonnegative integer. With ILU(ℓ), all fill-ins whose level isgreater than ℓ are dropped. Note that for ℓ = 0, we recover the no-fillILU(0) preconditioner.

21

Example

Level-based incomplete LU factorizations ILU(ℓ)

22

Example


Motivated by decay in factors of diagonally dominant matrices

22

Example



Structure of incomplete factors can be predicted using matrix graph

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 217

ILU(0)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 289

ILU(0) ILU(1)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 349

ILU(0) ILU(2)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 457

ILU(0) ILU(3)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 541

ILU(0) ILU(4)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 601

ILU(0) ILU(5)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 637

ILU(0) ILU(6)

22

Example




0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 2170 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 649

ILU(0) ILU(7)

22

Numerical Example

Fast symbolic costruction (Hysom & Pothen, SISC 2001)

23

Numerical Example


But, typically expensive to apply even for modest number of levels

23

Numerical Example


But, typically expensive to apply even for modest number of levels

Example: Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822; SPD.

levels size prec PCG its.

0 2,424,822 523

1 4,458,588 300

2 7,595,466 199

3 12,128,289 115

4 18,078,603 87

5 25,474,380 54

6 34,153,746 45

7 43,861,328 46

8 54,276,063 36

23

Preprocessing incomplete factorizations

Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:

24



Symmetric reorderings (RCM, MD, ND, etc.)

24




“Static pivoting”: nonsymmetric permutations and scalings aimed atincreasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001;B., Haws & T �uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008)

24





Extension to symmetric indefinite problems (Duff & Pralet, SIMAX2005; Hagemann & Schenk, SISC 2006)

24






Block variants (many authors)

24






Block variants (many authors)

But, for very tough problems still not enough to guarantee convergence ofpreconditioned iteration.

24

Example (cont.)

Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.

25

Example (cont.)

Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822, MMD ordering

25

Example (cont.)


levels size its size its

0 2,424,822 523 2,424,822 439

1 4,458,588 300 4,394,040 214

2 7,595,466 199 6,509,826 159

3 12,128,289 115 8,859,522 96

4 18,078,603 87 11,292,927 66

5 25,474,380 54 13,664,157 49

6 34,153,746 45 15,891,321 34

7 43,861,328 46 – nc

8 54,276,063 36 19,590,303 18

25

Example (cont.)


levels size its size its

0 2,424,822 523 2,424,822 439

1 4,458,588 300 4,394,040 214

2 7,595,466 199 6,509,826 159

3 12,128,289 115 8,859,522 96

4 18,078,603 87 11,292,927 66

5 25,474,380 54 13,664,157 49

6 34,153,746 45 15,891,321 34

7 43,861,328 46 – nc

8 54,276,063 36 19,590,303 18

Some improvement observed, but not entirely robust.

25

The use of drop tolerances

In many cases, an efficient preconditioner can be obtained from anincomplete factorization where new fill-ins are accepted or discarded onthe basis of their size. In this way, only fill-ins that contribute significantlyto the quality of the preconditioner are stored and used.

26



A drop tolerance is a positive number τ which is used in a droppingcriterion. An absolute dropping strategy can be used, whereby new fill-insare accepted only if greater than τ in absolute value. This criterion maywork poorly if the matrix is badly scaled, in which case it is better to use arelative drop tolerance.

26



A drop tolerance is a positive number τ which is used in a droppingcriterion. An absolute dropping strategy can be used, whereby new fill-insare accepted only if greater than τ in absolute value. This criterion maywork poorly if the matrix is badly scaled, in which case it is better to use arelative drop tolerance.

For example, when eliminating row i, a new fill-in is accepted only if it isgreater in absolute value than τ‖ai‖2, where ai denotes the ith row of A.Other criteria are also in use.

26


A drawback of this approach is that it is difficult to choose a good value ofthe drop tolerance: usually, this is done by trial-and-error for a few samplematrices from a given application, until a satisfactory value of τ is found.In many cases, good results are obtained for values of τ in the range10−4-10−2, but the optimal value is strongly problem-dependent.

27



Another difficulty is that it is impossible to predict the amount of storagethat will be needed to store the incomplete LU factors. An efficient,predictable algorithm is obtained by limiting the number of nonzerosallowed in each row of the triangular factors. Saad (1994) has proposedthe following dual threshold strategy:

27



Another difficulty is that it is impossible to predict the amount of storagethat will be needed to store the incomplete LU factors. An efficient,predictable algorithm is obtained by limiting the number of nonzerosallowed in each row of the triangular factors. Saad (1994) has proposedthe following dual threshold strategy:

Fix a drop tolerance τ and a number p of fill-ins to be allowed in each rowof the incomplete L/U factors; at each step of the elimination process,drop all fill-ins that are smaller than τ times the 2-norm of the current row;of all the remaining ones, keep (at most) the p largest ones in magnitude.

27


A variant of this approach allows in each row of the incomplete factors pnonzeros in addition to the positions that were already nonzeros in theoriginal matrix A. This makes sense for irregular problems in which thenonzeros in A are not distributed uniformly.

28



The resulting preconditioner, denoted by ILUT(τ, p), is quite powerful. Ifit fails on a problem for a given choice of the parameters τ and p, it willoften succeed by taking a smaller value of τ and/or a larger value of p.The corresponding incomplete Cholesky preconditioner for SPD matrices,denoted ICT, can also be defined.

28



The resulting preconditioner, denoted by ILUT(τ, p), is quite powerful. Ifit fails on a problem for a given choice of the parameters τ and p, it willoften succeed by taking a smaller value of τ and/or a larger value of p.The corresponding incomplete Cholesky preconditioner for SPD matrices,denoted ICT, can also be defined.

ILUT(τ, p) and the variant with partial pivoting ILUTP(τ, p) are quiteeffective and widely used in many industrial applications. However, failurescan still occur.

28

Example

IC(0)/ICT may fail and simple diagonal scaling work!

29

Example


Matrix LDOOR (structural analysis of car door), n = 952, 203,nnz = 23, 737, 339.

29

Example


Matrix LDOOR (structural analysis of car door), n = 952, 203,nnz = 23, 737, 339.

precond / precond. size PCG its

Jacobi / 952,203 810

IC(0) / 23,737,339 > 1000

ICT / 23,838,704 > 1000

ICT / 24,614,381 > 1000

ICT / 26,167,321 > 1000

ICT / 30,047,027 > 1000

ICT / 37,809,756 > 1000

29

Stability considerations

ILU preconditioners attempt to make the residual matrix

R := A−M

small in some norm. However, this does not always result in goodpreconditioners.

30



R := A−M


As observed by several authors (Elman, Saad, ...), a more meaningfulapproximation measure is based on the size of the error matrix

E := I −AM−1

30



R := A−M



E := I −AM−1

Approximate inverse preconditioners attempt to make ‖E‖ small, but thismay require a huge number of nonzeros in the preconditioner (unless theentries of A−1 exhibit fast off-diagonal decay).

30



R := A−M



E := I −AM−1

Approximate inverse preconditioners attempt to make ‖E‖ small, but thismay require a huge number of nonzeros in the preconditioner (unless theentries of A−1 exhibit fast off-diagonal decay).

Note that ‖E‖ = ‖RM−1‖ ≤ ‖R‖‖M−1‖. Hence, if M is veryill-conditioned (‖M−1‖ is very large), then a very large error matrix mayoccur even if ‖A−M‖ is small. This often results in failure to converge.

30


Example (B., Szyld & van Duin, SISC 1999):

System Ax = b is a discretization of a convection-dominated,convection-diffusion equation. Solver: Bi-CGSTAB.Orderings: lexicographic and MMD.

31




Let N1 := ‖A− LU‖F and N2 := ‖I −A(LU)−1‖F .

31




Let N1 := ‖A− LU‖F and N2 := ‖I −A(LU)−1‖F .

ILU(0) Lexicogr. MMD

N1 4.06 · 10−1 4.53 · 100

N2 3.26 · 106 2.00 · 102

Its nc 59

ILUT(0.01,5) Lexicogr. MMD

N1 1.78 · 10−1 7.39 · 101

N2 2.79 · 101 5.81 · 106

Its 11 nc

31

Permuting large entries of A to the main diagonal

0 500 1000 1500 2000 2500 3000 3500

0

500

1000

1500

2000

2500

3000

3500

nz = 254070 500 1000 1500 2000 2500 3000 3500

0

500

1000

1500

2000

2500

3000

3500

nz = 25407

Jacobian from Navier-Stokes equations (original and permuted with MC64+ RCM). After preprocessing, ILUT with Bi-CGSTAB converges in 24iterations. No convergence on original system.

32

Outline

1 Introduction







8 Conclusions

33

Sparse approximate inverses

Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.

34



Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.

34




Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.

34





By now, a large body of literature exists (100’s of papers since the 1990s).

34






Successfully used in numerous applications, including

solution of dense linear systems from BEM in electromagnetics,acoustics, and elastodynamics problems

34








solution of sparse linear systems from photon and neutron transport,CFD, Markov chains, eigenproblems, etc

34









quantum chemistry applications

34









quantum chemistry applications

image processing (restoration, deblurring, inpainting)

34


Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.

35



Factored forms are of the type G = ZW where, for instance, Z ≈ U−1

and W ≈ L−1.

35




and W ≈ L−1.

Factored forms are especially useful if A is SPD. In this case W = ZT andthe approximate inverse G = ZZT is guaranteed to be SPD. This allowsfor the use of the conjugate gradient (CG) method.

35




and W ≈ L−1.


Another advantage is that factored forms contain more info for the samenumber of nonzeros than unfactored ones. However, application of thepreconditioner requires two mat-vecs (with Z and W ) rather than just one(with G).

35




and W ≈ L−1.


Another advantage is that factored forms contain more info for the samenumber of nonzeros than unfactored ones. However, application of thepreconditioner requires two mat-vecs (with Z and W ) rather than just one(with G).

Sparse approximate inverses can be computed by different methods.

35

Sparse approximate inversesFrobenius norm minimization: SPAI

This class of approximate inverse techniques was the first to be proposedand investigated, back in the early 1970s (Benson et al.).

36



The basic idea is to compute a sparse matrix G ≈ A−1 as the solution ofthe following constrained minimization problem:

minG∈S‖I −AG‖F

where S is a set of matrices with a given sparsity pattern.

36






Since

‖I −AG‖2F =n∑

j=1

‖ej −Agj‖22,

where ej denotes the jth column of the identity matrix, the computationof G reduces to solving n independent linear least squares problems,subject to sparsity constraints.

36






Since

‖I −AG‖2F =n∑

j=1

‖ej −Agj‖22,

where ej denotes the jth column of the identity matrix, the computationof G reduces to solving n independent linear least squares problems,subject to sparsity constraints.

These (small) LS problems can be solved efficiently by QR factorization.36

Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)

The main issue is the choice of the sparsity pattern S for G.

37



Two options: fixed (static), or adaptive (dynamic).

37




Static sparsity patterns are usually based on the pattern of A or of somepower Ak of A, with k small. This is motivated by the Neumann seriesexpansion of A−1.

37





Small entries aij (with i 6= j) are usually removed from A prior todetermining the pattern of A2, A3, ...

37





Small entries aij (with i 6= j) are usually removed from A prior todetermining the pattern of A2, A3, ...

Heuristics for dynamically determining the sparsity pattern have beenproposed by Cosgrove, Diaz & Griewank (ĲCM 1992) and by Grote &Huckle (SISC 1997). Several user-defined parameters needed in input.

37


There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.

38



Available implementations include:

38




ParaSAILS (Chow, 2000), based on fixed sparsity patterns. Availablein the hypre software package, seehttps://computation.llnl.gov/casc/hypre/software.html

38





SPAI (Grote & Huckle, 1997), based on dynamic sparsity patterns; see

http://www.computational.unibas.ch/software/spai

38







MSPAI (Modified SPAI: Huckle at al., 2008), seehttp://www5.in.tum.de/wiki/index.php/MSPAI

38







MSPAI (Modified SPAI: Huckle at al., 2008), seehttp://www5.in.tum.de/wiki/index.php/MSPAI

NOTE: All these implementations are MPI-based.

38

Sparse approximate inversesFactorized forms: FSAI, AINV

Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.

39



Two main approaches: Frobenius norm minimization (FSAI) andbi-conjugation (AINV).

39




FSAI (Kolotilina & Yeremin, SIMAX 1994): compute Z by minimizing‖I − LT Z‖F over all triangular matrices Z with a given sparsity pattern.Remarkably, this can be done without knowing the Cholesky factor L.Inherently parallel construction.

39





AINV (B., Meyer & Tuma, SISC 1996; B. & Tuma, SISC 1998): computeZ and W by A-biconjugation of the standard basis vectors e1, e2, . . . , en.

39





AINV (B., Meyer & Tuma, SISC 1996; B. & Tuma, SISC 1998): computeZ and W by A-biconjugation of the standard basis vectors e1, e2, . . . , en.

If A is SPD, this is just Gram-Schmidt orthogonalization wrt inner product〈x, y〉A := xT Ay. Sparsity is preserved by dropping small entries in Z, W .

39

Sparse approximate inversesFactorized forms: FSAI, AINV (cont.)

If A is not SPD, the bilinear form 〈x, y〉A := xT Ay does not define aninner product and breakdowns can occur, due to division by zero (sincexT Ax = 0 can happen even if x 6= 0). Moreover, due do dropping of smallentries the incomplete process could break down even if the complete onedoes not.

40



However, we have proved that the incomplete process does not break downif A is an M -matrix or a diagonally dominant matrix (more generally, anH-matrix: B., Meyer & Tuma, SISC 1996).

40




Furthermore, there exist a stabilized variant of AINV (SAINV: B., Cullum& Tuma, SISC 2000) that does not break down if A is positive definite.

40




Furthermore, there exist a stabilized variant of AINV (SAINV: B., Cullum& Tuma, SISC 2000) that does not break down if A is positive definite.

In practice, the robustness of (S)AINV is essentially the same as for ILUTpreconditioning.

40

Sparse approximate inversesFactorized forms: Parallel Block AINV

Construction phase in AINV is sequential, but can be parallelized usinggraph partitioning and the fact that the inverse factors of

41

Sparse approximate inversesFactorized forms: Parallel Block AINV

Construction phase in AINV is sequential, but can be parallelized usinggraph partitioning and the fact that the inverse factors of

A =

A1 B1

A2 B2

. . ....

Ap Bp

C1 C2 . . . Cp AS

have the same block stucture as the lower and upper block triangular partsof A, allowing for considerable parallelism in the construction of thepreconditioner (B., Marín & Tuma, 1999).

41

Sparse approximate inversesExample

200 400 600 800 1000 1200 1400

200

400

600

800

1000

1200

1400

200 400 600 800 1000 1200 1400

200

400

600

800

1000

1200

1400

Coefficient matrix and sparse approximate inverse: FEM for fluid-structureinteraction problem; nnz(Z + W )/nnz(A) ≈ 1.56. PreconditionedBi-CGSTAB converges in 39 iterations.

42

Sparse approximate inversesSample parallel AINV-PCG results

Table: 2D neutron diffusion problem; FEM, n = 804, 609.

p 2 4 8 16 32 64

Prec-Time 11.6 5.89 3.24 1.79 1.12 0.94It-Time 227.1 113.3 56.9 26.7 13.6 11.7PCG Its 157 157 156 157 156 157

43



p 2 4 8 16 32 64


Table: Barotropic equation; FDM, n = 370, 000.

p 2 4 8 16 32

Prec-Time 13.4 7.0 3.72 1.91 1.15It-Time 119.2 59.3 25.9 11.9 6.8PCG Its 189 189 189 202 187

43



p 2 4 8 16 32 64


Table: Barotropic equation; FDM, n = 370, 000.

p 2 4 8 16 32

Prec-Time 13.4 7.0 3.72 1.91 1.15It-Time 119.2 59.3 25.9 11.9 6.8PCG Its 189 189 189 202 187

Note: Computations done 12 yrs ago on an SGI Origin 2000 (LANL).43


Implementations of AINV and the stabilized variant SAINV are available at

http://www2.cs.cas.cz/∼tuma/sparslab.html

44




andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm

and also in the CEA library SLOOP (Meurant, 2006).

44






Sparse approximate inverses share some of the limitations of IF methods:

44







factored form may suffer breakdowns, esp. if A is highly indefinite

44








convergence rate may be unsatisfactory for sparse G

44









in general, performance is unpredictable and failures may occur

44










like IF, lack of scalability for increasing problem size

44










like IF, lack of scalability for increasing problem size

Several authors have addressed these issues in the last few years.

44


The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:

45



Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)

45




Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)

45





MLAINV (Meurant, Numer. Alg. 2002)

45






Multiresolution AINV (Bridson & Tang, SISC 2002)

45







SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)

45








Spectral preconditioners based on SPAI (Carpentieri, Duff, Giraud etal. 2005)

45









Data-sparse approximate inverses (Bebendorf, SIMAX 2006)

45










Multilevel SPAI for AMR (Wang and de Sturler, LAA 2009)

45










Multilevel SPAI for AMR (Wang and de Sturler, LAA 2009)

Although not always h-independent, these preconditioners exhibit muchbetter scalability than one-level SAIs.

45

Outline

1 Introduction







8 Conclusions

46

IF via approximate inversesRIF motivation

RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)

47



Based on factored approximate inverse SAINV

47




Consider the triangular decomposition A−1 ≈ L−T D−1L−1

47





Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)

47






Compare with the (exact) LDLT decomposition of A:

47






Compare with the (exact) LDLT decomposition of A:Factor L of A = LDLT is L = AL−TD−1

It can be easily retrieved from this inverse factorization

⇓

AZ = AL−T = LD, lower triangular

47








⇓


〈ek, Aℓj〉

dk

= lkj for k ≥ j

47








⇓


〈ek, Aℓj〉

dk

= lkj for k ≥ j

Using ZT = L−1 we can get L (from L−1 get L) at no extra cost

47

IF via approximate inversesRIF implementation

Note: lkj =〈ek, Aℓj〉

dk

≡〈ℓk, Aℓj〉

dk

for k ≥ j

48



dk


dk

for k ≥ j

⇓

48



dk


dk

for k ≥ j

⇓

The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)

48



dk


dk

for k ≥ j

⇓

The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)Experimentally, often more space-efficient for the same iterationcounts

48



dk


dk

for k ≥ j

⇓


L

donenot used

doneused

inv(L)

48



dk


dk

for k ≥ j

⇓


L

donenot used

doneused

inv(L)

One-way transfer of information48

Outline

1 Introduction







8 Conclusions

49

IF with approximate inverses(I − A

−1)−1 biconjugation

Consider

A = I +

n∑

k=1

ek(ak − ek)T

50



Consider

A = I +

n∑

k=1

ek(ak − ek)T

Apply n Sherman-Morrison updates to get A−1

(Bru, Cerdán, Marín, Mas, SISC 2003)

50



Consider

A = I +

n∑

k=1

ek(ak − ek)T



The process for R = (rk), V = (vk), D = diag(d1, . . . , dn) fork = 1, 2, . . . , n:

rk = ek −

k−1∑

i=1

vTi ek

sriri , vk = (ak − ek)k −

k−1∑

i=1

(ak − ek)Tk ri

srivi,

dk = 1 + (ak − ek)Tk rk = 1 + vT

k ek

50



Consider

A = I +

n∑

k=1

ek(ak − ek)T



The process for R = (rk), V = (vk), D = diag(d1, . . . , dn) fork = 1, 2, . . . , n:

rk = ek −

k−1∑

i=1

vTi ek

sriri , vk = (ak − ek)k −

k−1∑

i=1

(ak − ek)Tk ri

srivi,

dk = 1 + (ak − ek)Tk rk = 1 + vT

k ek

I −A−1 = RD−1V T , R unit upper triangular

50

IF with approximate inversesbalancing L and L

−1

Theorem

(Bru, Marín, Mas & Tuma, SISC 2008) For A SPD, let there exist thedecomposition A−1 = I −RDV T (1)

and let A = L∆LT be the LDLT decomposition of A. ThenV = L∆− L−T , R = L−1, ∆ = D.

51


−1

Theorem



Pictorially:

51


−1

Theorem



Pictorially:

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (2)

51


−1

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (3)

52


−1

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (3)

That is, we compute L and L−1 at the same time, by columns. Toget L, only V is needed

52


−1

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (3)


Can be extended to nonsymmetric matrices (Bru, Marín, Mas &Tuma, SIMAX 2010)

52


−1

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (3)



Sparse case used for preconditioning: The factors L and L−1

influence (balance) each other during the computation and can beconnected via dropping (Bru, Marín, Mas & Tuma, SISC 2008)

52


−1

V =

. . . −L−T

. . .

LD. . .

, diag(V ) = D − I. (3)



Sparse case used for preconditioning: The factors L and L−1

influence (balance) each other during the computation and can beconnected via dropping (Bru, Marín, Mas & Tuma, SISC 2008)

Note that this preconditioner is based on a novel matrix factorization

52

IF with approximate inversesBIF experiments

Example: matrix PWTK, n=217,918, nnz=5,926,171

53

IF with approximate inversesBIF experiments

Example: matrix PWTK, n=217,918, nnz=5,926,171

0 1 2 3 4 5 6

x 106

0

5

10

15

20

25tim

e to

com

pute

the

prec

ondi

tione

r (in

sec

onds

)

size of the preconditioner (in the number of nonzeros)

RIF BIF

53

IF with approximate inversesBIF experiments (cont.)

0 1 2 3 4 5 6

x 106

0

5

10

15

20

25

30

35

40to

tal t

ime

(in s

econ

ds)

size of the preconditioner (in the number of nonzeros)

RIF BIF

54

IF with approximate inversesBIF pros and cons

Generally much faster and smoother preconditioner construction thanRIF, for similar or even better preconditioner quality.

55



Taking approximate inverses into account, dropping must be alwaysaggressive. Prefiltration of entries of A seems to be the standardstrategy.

55



Taking approximate inverses into account, dropping must be alwaysaggressive. Prefiltration of entries of A seems to be the standardstrategy.

BIF uses the inverse-based dropping rules based on Bollhöfer & Saad,2002. They need to be further investigated. They often seem toinfluence entries of the factors nonuniformly. Also, the dropping oftenforces skipping a lot of updates in the decomposition. Is this reallythe right way to go?

55

IF with approximate inversesOther recent work

Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).

56



The resulting algorithm keeps the size of the error matrix I −A(LU)−1

bounded.

56




bounded.

Combined with preprocessing (nonsymmetric permutations/scalings,reorderings), this approach results in preconditioners that are much morerobust and effective than standard ILUs.

56




bounded.


Inverse-based multilevel ILUs have been developed by Bollhöfer and Saadand implemented in ILUPACK, see

56




bounded.



http://www-public.tu-bs.de/∼bolle/ilupack/

56




bounded.



http://www-public.tu-bs.de/∼bolle/ilupack/

ILUPACK can handle symmetric indefinite and complex symmetricmatrices. Parallel version is still under development.

56

IF with approximate inversesOther recent work (cont.)

Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):

57



http://www.iluplusplus.de/

57




Finally, we mention recent work by Raghavan & Teranishi (SISC 2010)combining parallel IC factorization with SAI. Here the IC factorizationA ≈ LLT is computed in parallel using a nested dissection ordering, andSAI is used to approximately invert the diagonal blocks in L.

57





The resulting hybrid algorithm, ICT-SSAI, achieves good convergencerates (close to those of IC) and scales very well on parallel architectures(like SAI). Parallel code available at

57





The resulting hybrid algorithm, ICT-SSAI, achieves good convergencerates (close to those of IC) and scales very well on parallel architectures(like SAI). Parallel code available at

http://www.cse.psu.edu/∼teranish/dscpack-ic.html

57

Outline

1 Introduction







8 Conclusions

58

Conclusions

Many advances in algebraic preconditioning in last few years

59

Conclusions


‘Old’ methods, like ILUs, are continually being improved

59

Conclusions



New methods are often hybrids, taking the best features of existingmethods

59

Conclusions




Better robustness by borrowing techniques designed for direct solvers

59

Conclusions





Better scalability by borrowing features of PDE solvers (multilevelschemes)

59

Conclusions






Also: many excellent software packages available

59

Conclusions






Also: many excellent software packages available

Many challenges remain. Highly indefinite problems?

59

References

M. Benzi, Preconditioning techniques for large linear systems: a survey,Journal of Computational Physics, 182 (2002), pp. 418–477.

K. Chen, Matrix Preconditioning Techniques and Applications, CambridgeUniversity Press, 2005.

G. Meurant, Computer Solution of Large Linear Systems, North-Holland,Elsevier, 1999.

Y. Saad, Iterative Methods for Sparse Linear Systems. Second Edition,SIAM, Philadelphia, 2003.

P. S. Vassilevski, Multilevel Block Factorization Preconditioners, Springer,2008.

60

Preconditioning Techniques for Large Linear Systems Part ...benzi/Web_papers/padovaIII.pdf ·...

Documents

Transcript of Preconditioning Techniques for Large Linear Systems Part ...benzi/Web_papers/padovaIII.pdf ·...