LU Dec Sparse
-
Upload
amir-a-sadr -
Category
Documents
-
view
226 -
download
0
Transcript of LU Dec Sparse
-
7/31/2019 LU Dec Sparse
1/21
A grid-based multilevel incomplete LUfactorization preconditioning technique for
general sparse matrices
Jun Zhang
Department of Computer Science, University of Kentucky, 773 Anderson Hall, Lexington,
KY 40506-0046, USA
Abstract
We design a grid-based multilevel incomplete LU preconditioner (GILUM) for
solving general sparse matrices. This preconditioner combines a high accuracy ILU
factorization with an algebraic multilevel recursive reduction. The GILUM precondi-tioner is a compliment to the domain-based multilevel block ILUT preconditioner. A
major dierence between these two preconditioners is the way that the coarse level nodes
are chosen. The approach of GILUM is analogous to that of algebraic multigrid
method. The GILUM approach avoids some controversial issues in algebraic multigrid
method such as how to construct the interlevel transfer operators and how to compute
the coarse level operator. Numerical experiments are conducted to compare GILUM
with other ILU preconditioners. 2001 Elsevier Science Inc. All rights reserved.
Keywords: Incomplete LU factorization; Multilevel ILU preconditioner; Algebraic multigrid
method; Sparse matrices
1. Introduction
We propose a new preconditioning technique that is based on a multilevel
recursive incomplete LU factorization of general sparse matrices. Unstructured
sparse matrices are often solved by Krylov subspace methods coupled with
Applied Mathematics and Computation 124 (2001) 95115
www.elsevier.com/locate/amc
E-mail address: [email protected] (J. Zhang).
http://www.cs.uky.edu/ $jzhang
0096-3003/01/$ - see front matter 2001 Elsevier Science Inc. All rights reserved.
PII: S 0 0 9 6 - 3 0 0 3 ( 0 0 ) 0 0 0 8 1 - 3
-
7/31/2019 LU Dec Sparse
2/21
suitable preconditioners [34]. The development of robust preconditioners has
received lots of attention in recent years due to their critical roles in precon-
ditioned iterative schemes.
Although originally proposed for structured matrices the standard incom-
plete LU factorization (ILU(0)) has been used as a general purpose precon-
ditioner for general sparse matrices for more than two decades [27]. For many
realistic problems, however, this rather simple preconditioner is inecient and
may fail completely. More robust preconditioners, many of them are based on
dierent extensions of ILU(0), have since been proposed. We refer to [34] for a
partial account of the literature along this line.
The multi-elimination ILU preconditioner (ILUM), introduced in [33], is
based on exploiting the idea of successive independent set orderings. It has amultilevel structure and oers a good degree of parallelism without sacricing
overall eectiveness. Similar preconditioners developed in [6,36] show near-grid
independent convergence for certain types of problems. Block versions of
ILUM have recently been designed using small dense blocks (BILUM) or large
domains (BILUTM) as pivots instead of scalars [36,37,40]. For some hard to
solve problems, BILUM and BILUTM may perform much better than ILUM.
Various strategies have been proposed to invert or factor the blocks or do-
mains eciently. We remark that extracting parallelism from ILU factoriza-
tions has been the initial motivation behind the development of these multilevel
ILU preconditioners [33,36,37]. In a recent paper [45], BILUM was tested withseveral popular Krylov subspace accelerators for solving a few nonsymmetric
matrices from applications in computational uid dynamics. The test results
show that the quality of the preconditioner determines the convergence rates of
preconditioned iterative schemes.
Alternative multilevel approaches have been developed by other researchers.
Examples of such approaches include nested recursive two-level factorization
and repeated red black orderings [1], generalized cyclic reduction [24], and
parallel point- and domain-oriented multilevel methods [26]. Some recently
developed multilevel methods require only the adjacency graph of the coe-
cient matrices [33,36,37]. Other generalized multigrid techniques are algebraicmultigrid method [8,12,31] and certain types of multigrid methods employing
matrix-dependent interlevel transfer operators [19]. Equally interesting are
multilevel preconditioning techniques based on hierarchical basis [5], multi-
graph [3], approximate cyclic reduction [30], Schur complement [43], ILU de-
composition [4], and other approaches associated with nite dierence or nite
element matrices [10,41].
One major dierence between multilevel preconditioning technique and al-
gebraic multigrid method is the choice of the coarse level nodes (the rows of the
coarse level matrix). The coarse level nodes in multilevel preconditioning
technique [36] are the ne level nodes in algebraic multigrid method [31], and
vice versa. There are dierences in constructing multilevel preconditioning
96 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
3/21
matrices and algebraic multigrid matrices. However, a series of recently pro-
posed methods [3,7,30,43], which we refer to as algebraic multigrid precondi-
tioning methods, have brought the two classes of methods close to each other.
In this paper we propose a fully algebraic multilevel ILU preconditioning
technique. This grid-based multilevel ILU preconditioning technique (GI-
LUM) takes the approach of algebraic multigrid method in choosing coarse
level nodes, in contrast to the approach of BILUM. However, GILUM is fully
algebraic with respect to general sparse matrices. Such a generality does not
seem to have been achieved by other algebraic multigrid method or algebraic
multigrid preconditioning method.
This paper is organized as follows. Section 2 gives an overview and back-
ground on algebraic multigrid method and multilevel preconditioning tech-nique. Section 3 illustrates a partial Gaussian elimination process for
constructing the coarse level systems. Section 4 introduces some diagonal
threshold strategy. Section 5 discusses the grid-based multilevel ILU precon-
ditioner (GILUM). Section 6 contains numerical experiments and Section 7
gives concluding remarks.
2. Multilevel preconditioners and multigrid methods
Multilevel preconditioning technique and algebraic multigrid method takeadvantage of the fact that dierent parts of the error spectrum can be treated
independently on dierent levels. In construction, multilevel preconditioners
also exploit, explicitly or implicitly, the property that a set of unknowns that
are not coupled to each other can be eliminated simultaneously in a Gaussian
elimination type process. Such a set is usually called an independent set. The
concept of independentness can easily be generalized to a block version. Thus a
block independent set is a set of groups (blocks) of unknowns such that there is
no coupling between unknowns of any two dierent groups (blocks) [36].
Unknowns within the same group (block) may be coupled.
Various heuristic strategies may be used to nd an independent set withdierent properties [33,36]. A maximal independent set is an independent set
that cannot be augmented by other nodes and still remains independent. In-
dependent sets are often constructed with some constraints such as to guar-
antee certain diagonal dominance for the nodes of the independent set or of the
vertex cover, which is dened as the complement of the independent set. Thus,
in practice, the maximality of an independent set is rarely guaranteed, espe-
cially when some dropping strategies or diagonal threshold strategies are ap-
plied [40].
Algebraic and black box multigrid methods attempt to mimic geometric
multigrid method by choosing the coarse level nodes as those in the indepen-
dent set [8,31]. These methods usually dene a prolongation operator Iaa1
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 97
-
7/31/2019 LU Dec Sparse
4/21
based on some heuristic arguments, here 06 a
-
7/31/2019 LU Dec Sparse
5/21
inverse technique is used to compute D1a by inverting each small block inde-
pendently (in parallel). In [40], some regularized inverse technique based on
singular value decomposition is used to invert the (potentially near singular)
blocks approximately. The domain-based BILUTM preconditioner utilizes an
ILUT factorization procedure similar to the one used in this paper and avoids
the sparsity problems associated with inverting large domains [37].
Although multilevel preconditioning technique and algebraic multigrid
method originated from dierent sources, there has been reported success in
using multigrid methods as preconditioners for Krylov subspace methods
[28,29,42]. Further, a series of papers recently published by several multigrid
practitioners advocate algebraic multigrid preconditioning methods for dis-
cretized partial dierential equations or sparse matrices. These methods in-clude multigraph algorithm of Bank and Smith [3], multilevel ILU
decomposition of Bank and Wagner [4], algebraic multigrid method of Braess
[7], approximate cyclic reduction preconditioning method of Reusken [30], and
Schur complement multigrid of Wagner, Kinzebach and Wittum [43]. These
methods began to adopt the concept of (incomplete) matrix factorization and
preconditioning in algebraic multigrid type approaches. However, most of
these methods are not fully algebraic and do not aim at solving general sparse
matrices.
The grid-based multilevel ILU preconditioning technique (GILUM) of this
paper is a fully algebraic multilevel method. It targets general sparse matrices.GILUM preconditioner can be considered as a converging point of algebraic
multigrid method and multilevel preconditioning technique. This is because
GILUM adopts the coarse level choice of algebraic multigrid method, employs
the concept of preconditioning, and takes the approach of ILU factorization to
construct coarse level operator and interlevel transfer operators.
In reality, we can use algebraic multigrid ordering as in the right part of (2)
to write another block LU factorization analogous to (3) as
Ca Ea
Fa Da
Ia 0
FaC1a Ia
Ca Ea
0 Aa1
; 4
where Aa1 is the Schur complement with respect to Da. Now C1a is not easy to
compute exactly and we will use ILU factorization of Ca instead.
The initial motivation for developing grid-based multilevel preconditioner
was to utilize the nice property of the independent set of algebraic multigrid
method for discretized partial dierential equations on regular grids. For ex-
ample, if standard central dierence scheme is used to discretize Poisson
equation with Dirichlet boundary conditions on a square domain. The greedy
algorithm [34] will nd an independent set and its vertex cover roughly of equal
size. Both ILUM and GILUM will yield similar preconditioners. However, if a
fourth-order 9-point compact scheme is employed, the greedy algorithm will
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 99
-
7/31/2019 LU Dec Sparse
6/21
nd an independent set that is only one-third as large as its vertex cover, see
Fig. 1. This will cause slow reduction of system size in ILUM which chooses
the independent set as the ne level system. On the other hand, algebraic
multigrid method uses the vertex cover as the ne level system and thus a faster
reduction of system size can be expected. Of course, the diculty of slow re-
duction of system size in ILUM can be alleviated in BILUM and BILUTM
which utilize block independent sets.For unstructured general sparse matrices with many nonzero elements in
each row the standard greedy algorithm will yield a very small independent set
with a very large vertex cover. It is not uncommon that the size of the vertex
cover is more than 10 times larger than that of the independent set. Such a
partitioning of the nodes is not suitable for either ILUM or algebraic multigrid
method. Hence, certain strategy must be employed to restrict (balance) the
sizes of both the independent set and the vertex cover (see Section 4).
Furthermore, there are controversies in algebraic multigrid method as how
to dene the interlevel transfer operators and how to compute the coarse level
operator [11,12]. For general sparse matrices, the concept of conventional re-laxation may not be reliable as there is no guarantee that a given relaxation
method will converge or will even have smoothing eect. These problems of
algebraic multigrid method are avoided in multilevel preconditioning technique
such as ILUM and BILUTM, which do not use conventional relaxation
methods and do not use heuristic formula to dene interlevel transfer opera-
tors.
The previous discussions show that neither multilevel preconditioning
technique nor algebraic multigrid method is perfect for all types of problems. It
may be benecial to take the advantages of and to avoid the disadvantages of
both approaches. GILUM is designed as a hybrid of multilevel preconditioning
technique and algebraic multigrid method.
Fig. 1. Results of greedy algorithm search on 5-point (left) and 9-point (right) stencils. The emptycircles are independent set nodes and the solid circles are vertex cover nodes.
100 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
7/21
3. Partial ILUT factorization
The partial ILUT factorization for a reordered sparse matrix is similar to
that described in [37] with the exception that the matrix is under dierent or-
dering. For the purpose of a clear illustration we highlight a few key parts.
ILUT is a high accuracy preconditioner and its implementation is based on
the IKJ variant of Gaussian elimination [32,34,37]. ILUT attempts to limit the
ll-in elements by applying a dual dropping strategy during the construction.
The accuracy of ILUT(s;p) is controlled by two dropping parameters s and p.Elements with small magnitude relative to s are dropped as soon as they are
computed. After an incomplete row is computed a sorting operation is per-
formed such that only the largest p elements in absolute value are kept. Afterthe dual dropping strategy, there are at most p elements kept in each row of the
L and U factors [34].
Assume that the rst m equations are associated with the vertex cover as in
(4) without the subscript a. If we perform an LU factorization (Gaussian
elimination) to the upper part (the rst m rows) of the matrix, i.e., to the
submatrix C E. We have
C E LU L1E:
We then continue the Gaussian elimination to the lower part, but the elimi-
nation is only performed with respect to the submatrix F. In other words, we
only eliminate those elements ai;k for which m < i6 n; 16 k6m. Appropriatelinear combinations are also performed with respect to the D submatrix, in
connection with the eliminations in the F submatrix, as in the usual Gaussian
elimination. Note that, when doing these operations on the lower part, the
upper part of the matrix is only accessed, but is not modied [37]. The pro-
cessed rows of the lower part are never accessed again. Note again that the
nodes in the lower part are processed independently, since they only need to use
the nodes in the upper part to eliminate nonzero elements in the F submatrix,
see [37]. This partial (restricted) Gaussian elimination is equivalent to a blockLU factorization of the form
C E
F D
L 0
FU1 I
U L1E
0 A1
LU:
The ai;k's (of the lower part) for k6m are the elements in FU1 and the other
elements are those in A1.
It has been proved in [37] that the matrix A1 computed by the partial
Gaussian elimination is the Schur complement of A with respect to D. Note
that the submatrices FU1 and L1E are formed automatically, and the Schur
complement is formed implicitly, during the partial Gaussian elimination with
respect to the lower part of A.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 101
-
7/31/2019 LU Dec Sparse
8/21
Dropping strategies similar to those used in ILUT can be applied to the
partial Gaussian elimination, resulting in an ILU factorization with an ap-
proximate Schur complement A1. We formally describe the partial ILUT fac-
torization as in Algorithm 3.1, where w is a work array of length n.
Algorithm 3.1 (Partial ILUT(s;p) factorization).
In Line 5 the function nzavgai;b returns the average absolute value of thenonzero elements of a given sparse row. We mention that in Algorithm 3.1 the
diagonals of the approximate Schur complement (A1) are not dropped re-
gardless of their values. It may be protable to use dierent dropping pa-
rameter set s;p for the upper and lower parts of the ILU factorizations.However, the issue of adjusting parameters within the GILUM construction
process is not discussed in this paper.
4. Independent set and diagonal thresholding
It is obvious that Algorithm 3.1 will fail when a zero-pivot is encountered in
the upper part ILU factorization. Even a small pivot may cause stability
problem by producing large size elements in the L and U factors [23]. In
Gaussian elimination such a problem may be avoided by employing a partial or
full pivoting strategy. In multilevel preconditioning technique a diagonal
threshold strategy may be used to force the nodes (rows) with small size di-
agonal elements into the coarse level system [38,39].
Thus, all rows in the vertex cover should have large absolute diagonal
values. Moreover, there is a concept of strong coupling (connection) among a
group of nodes. A node j is said to be strongly connected to a node i if jai;jj is
1. For i 2; . . . ; n; Do
2. w : ai;b3. For k 1; . . . ; mini 1; m and when wk T 0; Do
4. wk : wk=ak;k5. Set wk : 0 if jwkj < s nzavgai;b
6. If wk T 0; then7. w : w wk uk;b8. End If
9. End Do
10. Apply a dropping strategy to row w
11. Set li;j : wj for j 1; . . . ; mini 1; m whenever wj T 0
12. Set ui;j : wj for j mini; m; . . . ; n whenever wj T 0
13. Set w : 0
14. End Do
102 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
9/21
large. In multilevel block preconditioning technique (BILUM and BILUTM)
the nodes that are strongly connected to each other are solved together (within
one block independent set) in order to preserve physical couplings among them
[36]. However, algebraic multigrid method requires that each node on the ne
level is strongly connected to some nodes on the coarse level [31]. Since GI-
LUM takes the approach of algebraic multigrid method, we require that a
node in the vertex cover is strongly connected to at least one node in the in-
dependent set. Hence, if a node jis in the vertex cover (ne level) it must satisfy
the conditions that jaj;jj is greater than a certain threshold tolerance and jai;jj islarge for at least one node i in the independent set.
It is important to design an ecient implementation of the multilevel ILU
preconditioner with a diagonal threshold strategy. A preconditioner with adiagonal threshold strategy certainly incurs an additional cost over the one
which does not use the matrix values in constructing the independent sets. With
a carefully designed implementation the additional cost of implementing di-
agonal threshold strategy can be kept to a minimum. A diagonal threshold
strategy may be implemented with respect to a certain norm of the rows of the
matrix. This is called a diagonal threshold strategy with a relative tolerance. It
is not ecient to compute the norm of a given row of the matrices during an
independent set search. Thus, before the search of an independent set begins,
we use Algorithm 4.1 to compute a measure of each row of the matrix based on
the diagonal value and the sum of the absolute nonzero values of the row.
Algorithm 4.1 (Computing a measure for each row of the matrix).
In Line 2 of Algorithm 4.1 the set Nzj is the set of the indices j for whichai;j T 0, i.e., the nonzero row pattern for the row i. A row with a small diagonalvalue will have a small ti measure. A row with a zero diagonal value will have
an exactly zero ti measure. The real array ftig of length n is used in the inde-pendent set algorithm. Note that 06 ti6 1. The diagonal threshold strategy is
enforced by forcing a node i into the independent set if ti < . Such an im-plementation only uses the matrix values once to compute the measure ftig.
1. For i 1; . . . ; n; Do
2. ri
jPNzj jai;jj
3. If ri T 0, then
4. ti jai;ij=ri5. End If
6. End Do
7. T maxiftig8. For i 1; . . . ; n; Do
9. ti ti=T
10. End Do
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 103
-
7/31/2019 LU Dec Sparse
10/21
The graph of the matrix is used to build an independent set, along with the
array ftig.
Algorithm 4.2 is an implementation of the greedy algorithm for constructing
an independent set with a diagonal threshold strategy and strong coupling
constraint, where S and Scv denote the independent set and the vertex cover,
respectively.
Algorithm 4.2 (Greedy algorithm for independent set with constraints).
The number 0:001 in Line 9 is used to determine strong connection between thenode jand the node iand was chosen based on a few numerical experiments. It
could be made as an input parameter as well, but we kept it xed for the
numerical results reported in this paper. If the node i is put in the D submatrix
because of its small diagonal value and it has no link to any other nodes in theC submatrix, then the same row in the F submatrix contains all zero elements.
According to our partial Gaussian elimination, elimination is only applied to
nonzero elements in the F submatrix. It follows that there will be no modi-
cation on the ith row in either the For the D submatrices. Hence, the node with
a small diagonal value will not change in the Schur complement. However,
since we use a relative threshold tolerance to compute the measure of the rows,
a node with a small diagonal value may have dierent measures on dierent
levels. Hence, it may be included in the vertex cover on a coarse level, even if it
was excluded from the vertex cover on ne levels.
In Algorithm 4.2 the parameter plays two roles. The rst role is to controlthe diagonal values of the nodes in the vertex cover such that a stable ILU
1. Set S Svc Y and select a threshold tolerance > 0
2. For i 1; . . . ; n; Do
3. If the node i is not marked, then
4. S S fig and mark the node i
5. For j 1; . . . ; n (the neighbors of the node i), Do
6. If the node j is not marked, then
7. If tj6 , then
8. S S fjg and mark the node j
9. Else If tj > and jai;jj > 0:001ti, then
10. Svc Svc fjg and mark the node j
11. End If
12. End If
13. End Do
14. End If
15. End Do
16. Put all unmarked nodes in S
104 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
11/21
factorization may be obtained. The second role, which is implicit in Line 9, is to
balance the sizes of the independent set and the vertex cover and to make sure
none of them is too small or too large. Based on our experience, we would like
to have a vertex cover that is slightly larger than the independent set. With
these constraints the independent set and the vertex cover found by Algorithm
4.2 obviously have symbolic meaning only.
4.1. Minimum degree ordering
After the independent set is found, we may reorder the nodes in the vertex
cover using a minimum degree algorithm [25]. (Note that the minimum degree
algorithm is applied before the ILU factorization, not during the ILU fac-torization.) Then an explicit permutation is performed. This variant of GI-
LUM will be denoted by GILUMm. Such a reordering strategy can usually
reduce ll-in elements during ILU factorization. We will compare GILUM and
GILUMm through numerical experiments.
There are other graph reordering algorithms [18,21,22] that may be used to
reorder the nodes in the vertex cover. However, the overall eect of these re-
ordering algorithms on the convergence of preconditioned iterative methods is
not clear. For this reason, we did not experiment other reordering algorithms.
We remark that there is no need to reorder the nodes in the independent set,
even if they are not really independent. This is because they are processedindependently, per our discussion in Section 3.
5. Multilevel ILU preconditioning
The grid-based multilevel ILU preconditioner (GILUM) is based on the
partial ILUT Algorithm 3.1. On each level a, a partial ILUT factorization is
performed and an approximate coarse level system Aa1 is formed. Formally,
we have
Ca EaFa Da
La 0
FaU1a Ia
Ua L1a Ea
0 Aa1
LaUa: 5
The whole process of nding independent independent set, permuting the
matrix, and performing the partial ILUT factorization, is recursively repeated
on the matrix Aa1. The recursion is stopped when the coarsest level system ALis small. Then a standard ILUT factorization LLUL is performed on AL.
However, we do not store any coarse level systems on any level, including the
last one. Instead, we store two sparse matrices on each level
La La 0
FaU1a Ia
and Ua U
a L1a Ea0 0
for 06 a
-
7/31/2019 LU Dec Sparse
12/21
along with the factors LL and UL. All such matrices are stored one followed by
another, level by level, in one long vector. The preconditioning matrix has a
multilevel structure of the form
L0U0 L10 E0
F0U10
L1U1 L11 E1
F1U11
LL1UL1 L
1L1EL1
FL1U1L1 LLUL
2 3Hfd
Ige
Hfffd
Iggge:
The preconditioning process consists of a level by level forward elimination,
the coarsest level approximate solution, and a level by level backward substi-
tution. Vector permutations and reverse permutations with respect to the in-
dependent set orderings are performed on each level. The preconditioned
iteration process structurally looks like a multigrid V cycle algorithm [37]. A
Krylov subspace iteration is performed on the nest level acting as a smoother,
the residual is then transferred level by level to the coarsest level, where one
sweep of ILUT is used to yield an approximate solution. In the current situ-
ation, the coarsest level ILUT is actually a direct solver with limited accuracy
comparable to the accuracy of the whole preconditioning process.
Let us rewrite Eq. (5) as
Ia 0
FaU1a L1a Ia
LaUa 0
0 Aa1
Ia U
1a L
1a Ea
0 Ia
; 6
and examine a few interesting properties. It is clear that the central part of (6) is
an operator acting on the full vector on level a. LaUa may also be viewed as an
ILU smoother on the ne grid nodes on level a. In a two-level analysis, we may
dene
Ia1a
FaU1a L
1a Ia
and Iaa1
U1a L1a Ea
Ia
as the restriction and interpolation operators, respectively. Then the following
results linking GILUM with algebraic multigrid method can be veried di-rectly.
Proposition 5.1. Suppose factorization (6) exists and is exact. Then
1. the coarse level system Aa1 satisfies the Galerkin condition (1), and
2. if, in addition, Aa is symmetric, then Ia1a I
aT
a1.
One advantage of the ILUT type factorizations is that the memory cost of
the resulting preconditioner can be predicted in advance. The sparsity of GI-
LUM depends primarily on the parameter p used to control the amount of ll-
in allowed and on the size of the vertex covers. The following proposition is
analogous to the one for BILUTM in [37] and the proofs are exactly the same.
106 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
13/21
Proposition 5.2. Let ma be the size of the vertex cover on levela. The number of
nonzero elements of GILUM with L levels of reductions is bounded by
p2n L
a1 ama.
Note that in the above bound the term 2pn is the bound for the number of
nonzero elements of standard ILUT. The term pL
a1 ama represents the extra
nonzeros for the multilevel implementation. Since m0 is not in the second term
and the factor a grows as the level increases, it is advantageous to have large
vertex covers.
6. Numerical experiments
Implementations of multilevel preconditioning techniques have been de-
scribed in detail in [33,36,37]. We also added a diagonal threshold strategy
described in Section 4 and a local reordering of the blocks by reverse-Cuthill-
Mckee algorithm to the BILUTM preconditioner [37,39]. Unless otherwise
indicated explicitly, we used the following default parameters for our precon-
ditioned iterative solver: GMRES1 0 0 without restart was used as the ac-celerator; the maximum number of levels allowed was 10, i.e., L 10; thediagonal threshold parameter was 0:3. For BILUTM, the block size was
chosen to equal p.A set of unstructured sparse matrices from realistic applications were tested.
Most of these matrices have been used in other tests [36,37] and none of them
are easy to solve by standard ILU preconditioning techniques. The right-hand
side was generated by assuming that the solution was a vector of all ones and
the initial guess was a vector of some random numbers. The computations were
terminated when the 2-norm of the residual was reduced by a factor of 107. The
numerical experiments were conducted on an SGI Power Challenge worksta-
tion. The codes were written in Fortran 77 programming language and were
run in 64 bit arithmetic.
In all tables with numerical results, ``iter'' shows the number of GMRESiterations, p and s are the parameters used to control ll-in elements, ``spar''
shows the sparsity ratio which is the ratio between the number of nonzeros of
the preconditioner to that of the original matrix. 1 GILUMm represents GI-
LUM with a minimum degree reordering of the Ca submatrix. The symbol ``''
indicates more than 100 iterations required. Since these ILU preconditioners
approach direct solvers as p 3 n and s 3 0, we compare their robustness with
1 The denition of sparsity ratio is dierent from that of operator complexity in algebraic
multigrid method [31]. The operator complexity does not count storage cost of the interleveltransfer operators, which may account for more than half of the total storage cost in multilevel ILU
preconditioners.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 107
-
7/31/2019 LU Dec Sparse
14/21
respect to the memory cost (sparsity ratio). We remark that our codes have not
been optimized and the chosen parameters were not meant to be optimal.
6.1. RAEFSKY4 matrix
The RAEFSKY4 matrix 2 has 19,779 unknowns and 1,328,611 nonzeros. It
is from buckling problem for container model and was supplied by H. Simonfrom Lawrence Berkeley National Laboratory (originally created by A.
Raefsky from Centric Engineering). Table 1 lists a set of test results when the
parameters p and s were varied.
Based on the results in Table 1 we can make some comments with respect to
solving the RAEFSKY4 matrix. It is seen that GILUMm is the most robust
preconditioner and used least memory to achieve fast convergence. ILUT is
least robust among the four preconditioners compared. BILUTM is more
robust than GILUM, but is less ecient than GILUMm. The number of it-
erations is directly related to the sparsity ratios. High accuracy preconditioners
(with large sparsity ratios) have fast convergence rates.
6.2. WIGTO966 matrix
The WIGTO966 matrix 3 has 3864 unknowns and 238,252 nonzeros. It
comes from an Euler equation model and was supplied by L. Wigton from
Boeing. It is solvable by ILUT with large values ofp [13]. This matrix was also
2
The RAEFSKY matrices are available online from the University of Florida sparse matrixcollection [17] at http://www.cise.u.edu/$davis/sparse.
Table 1
Solving the RAEFSKY4 matrix by dierent preconditioners with dierent choices of the ll-in
parameters
Parameters BILUTM GILUM GILUMm ILUT
p s iter spar iter spar iter spar iter spar
70 104 23 2.44 2.93 44 1.97 1.95
90 104 41 2.95 98 3.40 40 2.25 2.45
70 105 26 2.58 3.21 21 2.05 1.96
100 105 22 3.44 86 3.25 20 2.84 2.71
70 106 22 2.58 2.68 9 2.10 1.95
100 106 11 3.53 72 3.38 4 2.91 2.72
120 106 4 4.01 13 4.36 5 3.42 3.22
110 10
7
2 3.79 78 3.71 1 3.22 2.98130 107 43 4.45 2 3.91 1 3.72 73 3.47
3 The WIGTO966 matrix is available from the author.
108 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
15/21
used to compare BILUM with ILUT in [35] and to test point and block pre-
conditioning techniques in [14,15].
The test results in Table 2 show that BILUTM is the most robust precon-
ditioner and ILUT is the least robust in solving the WIGTO966 matrix. In fact,
for all the parameters tested, ILUT did not converge at all, even if it used more
storage space than other preconditioners did in some cases. It is interesting to
see that GILUMm used even less storage space than ILUT did. This and other
tests show that the minimum degree reordering of the vertex cover nodes doesreduce the amount of ll-in elements signicantly.
6.3. BARTHS1A matrices
The BARTHS1A matrix 4 has 15,735 rows and 539,225 nonzeros and was
supplied by T. Barth of NASA Ames. It is for a 2D high Reynolds number
airfoil problem, with one equation turbulence model. For this set of tests we
chose 0:05. The results are given in Table 3.Once again, we see ILUT did not converge under our test conditions. BI-
LUTM seems to perform slightly better than GILUM and GILUMm and usedless storage space. GILUM is the only one that converged for all the param-
eters tested and it used the most storage space.
6.4. OLAFU matrix
The OLAFU matrix 5 has 16,146 unknowns and 1,015,156 nonzeros. It is a
structural modeling problem from NASA Langley. The diagonal threshold
Table 2
Solving the WIGTO966 matrix by dierent preconditioners with dierent choices of the ll-in
parameters
Parameters BILUTM GILUM GILUMm ILUT
p s iter spar iter spar iter spar iter spar
60 104 27 2.29 2.32 1.75 1.86
90 104 29 3.21 52 3.20 2.53 2.78
140 104 8 4.47 25 4.62 36 3.73 4.28
80 105 22 3.00 3.02 67 2.31 2.48
140 105 11 4.64 25 4.85 30 3.84 4.28
80 106 22 3.01 3.06 74 2.33 2.48
140 106 10 4.67 26 4.98 29 3.92 4.29
90 10
7
22 3.25 53 3.39 2.63 2.78140 107 10 4.70 25 5.08 29 3.97 4.29
4
The BARTHS1A matrix is available from the author.5 The OLAFU matrix is available online from the University of Florida sparse matrix collection
[17] at http://www.cise.u.edu/$davis/sparse.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 109
-
7/31/2019 LU Dec Sparse
16/21
parameter was 0:2. Table 4 lists test results with a few set of parametersp; s.
We point out that both GILUM and GILUMm did much better than BI-
LUTM in this set of tests. GILUM converged with less iteration counts but
used more storage space than GILUMm did. ILUT did very poorly and for all
test parameters chosen there was no convergence. In fact, in our tests, we
observed little residual reduction in 100 iterations.
6.5. Diagonal threshold parameter
The choice of the diagonal threshold parameter plays an important role indetermining the convergence rate of the grid-based multilevel preconditioner.
A good choice of can result in a stable and accurate preconditioner while abad choice can lead to a useless preconditioner. Fig. 2 shows the convergence
history of GILUMm with dierent values of for solving the BARTHS1Amatrix with the other parameters xed as p 120 and s 105. We see the bestchoice for in this case is 0:15. Larger and smaller hampered the convergenceof the preconditioner. In particular, choosing 0, which is equivalent to no
Table 4
Solving the OLAFU matrix by dierent preconditioners with dierent choices of the ll-in pa-
rameters
Parameters BILUTM GILUM GILUMm ILUT
p s iter spar iter spar iter spar iter spar
120 105 4.10 77 3.86 96 3.61 3.42
150 105 4.78 52 4.63 69 4.39 4.05
130 106 98 4.58 52 4.28 69 4.02 3.70
150 106 89 5.16 68 4.85 73 4.55 4.14
110 107 4.14 84 3.74 95 3.51 3.23
150 107 47 5.21 40 4.96 39 4.66 4.17
Table 3
Solving the BARTHS1A matrix by dierent preconditioners with dierent choices of the ll-in
parameters
Parameters BILUTM GILUM GILUMm ILUT
p s iter spar iter spar iter spar iter spar
110 105 90 6.74 92 8.49 7.37 6.00
140 105 8.24 87 10.85 82 8.86 7.58
110 106 86 7.16 86 8.79 7.77 6.06
140 106 50 8.54 78 10.99 77 9.58 7.65
140 107 45 8.75 77 11.26 77 9.99 7.68
110 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
17/21
diagonal threshold strategy, yielded a preconditioner that almost did not
provide any preconditioning eect at all.
Corresponding to the test results in Fig. 2, Fig. 3 shows the dimensions ofthe original matrix and the coarse level systems for dierent values of the
parameter . It can be seen that a small leads to fast reduction of the system
Fig. 3. Dimension of the original and coarse level systems of GILUMm with dierent values of the
parameter for solving the BARTHS1A matrix with p 120; s 105.
Fig. 2. Convergence history of GILUMm with dierent values of the parameter for solving theBARTHS1A matrix with p 120; s 105.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 111
-
7/31/2019 LU Dec Sparse
18/21
size as the number of reduction level increases. As depicted in Fig. 2, a faster
reduction of system size usually yields a faster convergence rate. This obser-
vation, however, just tells half of the story. Another half of the story will say
that too small will reduce the eect of diagonal thresholding. Choosing 0leads to very fast reduction of system size. In fact, only three reductions are
needed. However, as we see in Fig. 2 and remarked in Section 4 a precondi-
tioner constructed without a diagonal threshold strategy may be unstable. (We
note that the BARTHS1A matrix can be solved by BILUTM without a di-
agonal threshold strategy, but the required parameters were p 250; s 106,see [37].)
7. Concluding remarks
We have presented a grid-based multilevel ILU preconditioning technique
(GILUM) with a dual dropping strategy for solving general sparse matrices.
The method oers exibility in controlling the amount of ll-in during the ILU
factorization and a cost-eective construction of coarse level operator. We also
implemented a diagonal threshold strategy in both the grid- and domain-based
multilevel preconditioning techniques. GILUM combines ideas and concepts
from multilevel preconditioning technique and algebraic multigrid method and
demonstrates the convergence of two classes of most promising iterativetechniques.
Our numerical experiments with several unstructured realistic sparse
matrices show that the proposed preconditioning technique indeed demon-
strates the anticipated robustness and eectiveness. Both GILUM and BI-
LUTM are more robust and are more ecient than standard ILUT. We
also showed that it is sometimes useful to reorder the ne level nodes with a
minimum degree ordering before the ILU factorization is applied. Such a
reordering can at least reduce the amount of ll-in elements during the
ILUT factorization. Our numerical experiments seem to show that the ro-
bustness of the grid- and domain-based multilevel ILU preconditioningtechniques is comparable. One implication of the results of this paper is that
multilevel ILU preconditioning techniques and algebraic multigrid precon-
ditioning approaches should have comparable robustness when they are
fully algebraic with respect to general sparse matrices. Thus, future research
on either multilevel preconditioning technique or algebraic multigrid method
should take both approaches into consideration and combine the strengths
of both.
Unlike BILUTM, the current version of GILUM does not seem to possess
inherent parallelism. However, parallelism may be introduced by using a sparse
approximate inverse strategy to replace ILUT [46,44]. The construction process
will have to be modied. We will extend our research along this line.
112 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
19/21
Acknowledgements
This work was supported by the US National Science Foundation under
grants CCR-9902022 and CCR-9988165, and in part by the University of
Kentucky Center for Computational Sciences.
References
[1] O. Axelsson, P.S. Vassilevski, Algebraic multilevel preconditioning methods, SIAM J. Numer.
Anal. 27 (6) (1990) 15691590.
[2] V.A. Bandy, Black box multigrid for convectiondiusion equations on advanced computers,
Ph.D. Thesis, University of Colorado, Denver, CO, 1996.[3] R.E. Bank, R.K. Smith, The incomplete factorization multigraph algorithm, SIAM J. Sci.
Comput. 20 (4) (1999) 13491364.
[4] R.E. Bank, C. Wagner, Multilevel ILU decomposition, Numer. Math. 82 (4) (1999) 543576.
[5] R.E. Bank, J. Xu, The hierarchical basis multigrid method and incomplete LU decomposition,
in: D. Keyes, J. Xu (Eds.), Proceedings of the Seventh International Symposium on Domain
Decomposition Methods for Partial Dierential Equations, AMS, Providence, RI, 1994, pp.
163173.
[6] E.F.F. Botta, F.W. Wubs, Matrix renumbering ILU: an eective algebraic multilevel ILU
preconditioner for sparse matrices, SIAM J. Matrix Anal. Appl. 20 (4) (1999) 10071026.
[7] D. Braess, Towards algebraic multigrid for elliptic problems of second order, Computing 55
(4) (1995) 379393.
[8] A. Brandt, S. McCormick, J. Ruge, Algebraic multigrid (AMG) for sparse equations, in: D.J.Evans (Ed.), in: Sparsity and its Applications (Loughborough 1983), Cambridge University
Press, Cambridge, MA, 1985, pp. 257284.
[9] M. Brezina, A.J. Cleary, R.D. Falgout, V.E. Henson, J.E. Jones, T.A. Manteuel, S.F.
McCormick, J.W. Ruge, Algebraic multigrid based on element interpolation (AMGe), SIAM
J. Sci. Comput. 22 (5) (2000) 15701592.
[10] T.F. Chan, S. Go, J. Zou, Multilevel domain decomposition and multigrid methods for
unstructured meshes: algorithms and theory, Technical Report CAM 95-24, Department of
Mathematics, UCLA, Los Angeles, CA, 1995.
[11] Q.S. Chang, Y.S. Wong, L.Z. Feng, New interpolation formulas of using geometric
assumptions in the algebraic multigrid method, Appl. Math. Comput. 50 (23) (1992) 223254.
[12] Q.S. Chang, Y.S. Wong, H.Q. Fu, On the algebraic multigrid method, J. Comput. Phys. 125
(1996) 279292.
[13] A. Chapman, Y. Saad, L. Wigton, High-order ILU preconditioners for CFD problems, Int.
J. Numer. Meth. Fluids 33 (6) (2000) 767788.
[14] E. Chow, M.A. Heroux, An object-oriented framework for block preconditioning, ACM
Trans. Math. Software 24 (2) (1998) 159183.
[15] E. Chow, Y. Saad, Experimental study of ILU preconditioners for indenite matrices,
J. Comput. Appl. Math. 86 (2) (1997) 387414.
[16] A.J. Cleary, R.D. Falgout, V.E. Henson, J.E. Jones, T.A. Manteuel, S.F. McCormick, G.N.
Miranda, J.W. Ruge. Robustness and scalability of algebraic multigrid, SIAM J. Sci. Comput.
21 (5) (2000) 18861908.
[17] T. Davis, University of Florida sparse matrix collection, NA Digest 97 (23) (1997).
[18] E.F. D'Azevedo, F.A. Forsyth, W.P. Tang, Ordering methods for preconditioned conjugategradient methods applied to unstructured grid problems, SIAM J. Matrix Anal. Appl. 13
(1992) 944961.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 113
-
7/31/2019 LU Dec Sparse
20/21
[19] P.M. de Zeeuw, Matrix-dependent prolongations and restrictions in a blackbox multigrid
solver, J. Comput. Appl. Math. 33 (1990) 125.
[20] J.E. Dendy Jr., Black box multigrid, J. Comput. Phys. 48 (3) (1982) 366386.[21] I.S. Du, G.A. Meurant, The eect of reordering on preconditioned conjugate gradients, BIT
29 (1989) 635657.
[22] L.C. Dutto, The eect of reordering on the preconditioned GMRES algorithm for solving the
compressible NavierStokes equations, Int. J. Numer. Meth. Engrg. 36 (3) (1993) 457497.
[23] H.C. Elman, A stability analysis of incomplete LU factorization, Math. Comput. 47 (175)
(1986) 191217.
[24] H.C. Elman, Approximate Schur complement preconditioners on serial and parallel
computers, SIAM J. Sci. Statist. Comput. 10 (3) (1989) 581605.
[25] J.A. George, J.W.H. Liu, The evolution of the minimum degree ordering algorithm, SIAM
Rev. 31 (1989) 119.
[26] M. Griebel, T. Neunhoeer, Parallel point- and domain-oriented multilevel methods for
elliptic PDEs on workstation networks, J. Comput. Appl. Math. 66 (1996) 267278.
[27] J.A. Meijerink, H.A. van der Vorst, An iterative solution method for linear systems of which
the coecient matrix is a symmetric M-matrix, Math. Comput. 31 (1977) 148162.
[28] C.W. Oosterlee, T. Washio, An evaluation of parallel multigrid as a solver and a
preconditioner for singularly perturbed problems, SIAM J. Sci. Comput 19 (1) (1998) 87110.
[29] A. Ramage, A multigrid preconditioner for stabilised discretizations of advectiondiusion
problems, Technical Report 33, Department of Mathematics, University of Strathclyde,
Glasgow, UK, 1998.
[30] A.A. Reusken, Approximate cyclic reduction preconditioning, Technical Report RANA 97-02,
Department of Mathematics and Computing Science, Eindhoven University of Technology,
The Netherlands, 1997.
[31] J.W. Ruge, K. Stuben, Algebraic multigrid, in: S. McCormick (Ed.), Multigrid Methods,Frontiers in Appl. Math., SIAM, Philadelphia, PA, 1987, pp. 73130 (Chapter 4).
[32] Y. Saad, ILUT: a dual threshold incomplete LU preconditioner, Numer. Linear Algebra Appl.
1 (4) (1994) 387402.
[33] Y. Saad, ILUM: a multi-elimination ILU preconditioner for general sparse matrices, SIAM J.
Sci. Comput. 17 (4) (1996) 830847.
[34] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing, New York, 1996.
[35] Y. Saad, M. Sosonkina, J. Zhang, Domain decomposition and multi-level type techniques for
general sparse linear systems, in: J. Mandel, C. Farhat, X.-C. Cai (Eds.), Domain
Decomposition Methods 10, Contemporary Mathematics, vol. 218, AMS, Providence, RI,
1998, pp. 174190.
[36] Y. Saad, J. Zhang, BILUM: block versions of multielimination and multilevel ILU
preconditioner for general sparse linear systems, SIAM J. Sci. Comput. 20 (6) (1999) 21032121.
[37] Y. Saad, J. Zhang, BILUTM: a domain-based multilevel block ILUT preconditioner for
general sparse matrices, SIAM J. Matrix Anal. Appl. 21 (1) (1999) 279299.
[38] Y. Saad, J. Zhang, Diagonal threshold techniques in robust multi-level ILU preconditioners
for general sparse linear systems, Numer. Linear Algebra Appl. 6 (4) (1999) 257280.
[39] Y. Saad, J. Zhang, A multi-level preconditioner with applications to the numerical simulation
of coating problems, in: D.R. Kincaid, A.C. Elster (Eds.), Iterative Methods in Scientic
Computing II, IMACS, New Brunswick, NJ, 1999, pp. 437449.
[40] Y. Saad, J. Zhang, Enhanced multilevel block ILU preconditioning strategies for general
sparse linear systems, J. Comput. Appl. Math. 130 (2001) 99118.
[41] G. Starke, Multilevel minimal residual methods for nonsymmetric elliptic problems, Numer.
Linear Algebra Appl. 3 (5) (1996) 351367.
114 J. Zhang / Appl. Math. Comput. 124 (2001) 95115
-
7/31/2019 LU Dec Sparse
21/21
[42] O. Tatebe, The multigrid preconditioned conjugate gradient method, in: N.D. Melson, T.A.
Manteuel, S.F. McCormick (Eds.), Proceedings of the Sixth Copper Mountain Conference on
Multigrid Methods, Copper Mountain, CO, 1993, pp. 621634.[43] C. Wagner, W. Kinzelbach, G. Wittum, Schur-complement multigrid a robust method for
groundwater ow and transport problems, Numer. Math. 75 (1997) 523545.
[44] J. Zhang, Two-grid analysis of minimal residual smoothing as a multigrid acceleration
technique, Appl. Math. Comput. 96 (1) (1998) 2745.
[45] J. Zhang, Preconditioned Krylov subspace methods for solving nonsymmetric matrices from
CFD applications, Comput. Meth. Appl. Mech. Engrg. 189 (3) (2000) 825840.
[46] J. Zhang, Sparse approximate inverse and multilevel block ILU preconditioning techniques for
general sparse matrices, Appl. Numer. Math. 35 (1) (2000) 89108.
J. Zhang / Appl. Math. Comput. 124 (2001) 95115 115