Coloring update methods

BIT 38:1 (1998), 177-185

COLORING UPDATE METHODS*

YAIR SHAPIRA t

Computer Science Department, Technion-lsrael Institute of Technology Haifa 32000, Israel

A b s t r a c t .

For linear update methods (such as SOR), a coloring method is introduced for which the multicolor iteration matrix has the same spectrum as the original iteration matrix. It applies to general linear systems, not necessarily arising from PDEs. When the iteration matrices are nonsingular, it is shown that they are similar to each other.

AMS subject classification: 65F10.

Key words: Linear system, iteration matrix, SOR, multicoloring.

1 I n t r o d u c t i o n .

The Successive Over Relaxation (SOR) method introduced in [8] is one of the most popular methods for the solution of linear systems of equations. Although it is inferior to multigrid for certain problems arising from PDEs ([2]) it has the advantage of being suitable for parallel implementations. Maximum parallelism is often obtained when the equations are relaxed color by color, where a color is a set of variables which can be relaxed simultaneously. This ordering is also more stable than the standard one; this is especially important when indefinite problems are encountered.

In [1] it is shown that, for linear systems involving structured grids and structurally symmetric stencils (typically, 9-coefficient stencils arising from 2- D PDEs), the iteration matrix of lexicographically ordered SOR and that of multicolor SOR (with the coloring introduced there) have the same spectrum. This, however, does not imply, similarity of the iteration matrices. This gap is filled here, where it is shown that nonsingularity of the iteration matrices is a sufficient condition for similarity. The framework used here applies to general update methods, including point and block SOR for general linear systems, such as those arising from unstructured grid and Markov chain problems. The result of [1] is obtained as a particular case of the present result.

To illustrate the necessity of this sufficient condition consider a lower trian- gular bidiagonal linear system. For this system, the Gauss-Seidel iteration is a direct solver and the iteration matrix is zero. The present coloring method

*Received April 1996. Revised November 1996. Communicated by Petter Bj0rstad. ?Present address: Los Alamos National Laboratory, Mall Stop B-256, Los Alamos, NM

87545, USA. emaih yairs~lanl.gov

178 Y A I R SHAPIRA

yields the usual odd-even ordering, for which the Gauss-Seidel iteration is ob- viously not a direct solver. Therefore, the iteration matrix cannot possibly be similar to that for the standard ordering. However, the spectrum of both iteration matrices is equal to {0}, since they are both nilpotent. In this particular case, the spectrum does not contain the whole information about the iteration matrix and, hence, the pseudo-spectrum is a more appropriate character. For more complicated situations such as those involving upwind schemes for convec- tion problems with variable coefficients the colored relaxation is not necessarily inferior to the sequential one as it is in the above example.

2 Analys is o f the coloring method .

2.1 The update method.

Let N be a positive integer. For 1 < i < N, let ~2i and Oi be some index sets and ai,j and ~/i,j (1 _< j _< N) nonvanishing scalars. Consider the update method

ALGORITHM 1. For i = 1, 2 , . . . , N do:

(2.1) xi +- Z cq,jfj + Z ~/i,jxj, jEo~ j ~

where x and f are N-d imens iona l vectors. For a fixed i, the substitution ' ~ ' in (2.1) is referred to as an update of the

ith component of x. Such update methods are commonly used for the iterative solution of linear systems. For example, when Oi = {i} (1 < i < N), 1)1 = {2}, l)N = { N - 1}, ~i = { i - 1,i + 1} (2 < i < N - 1), (~i,i -- 1/2 and "Yi,j - 1/2, Algorithm 1 represents a Gauss-Seidel iteration for the solution of A x = f , where A = tridiag ( -1 , 2 , - 1). More complicated choices of the above parameters may lead to point and block relaxation methods for multi-dimensional problems.

2.2 General data-flow algorithms.

Let l)i : {1 _< j _< N ] j E ~i or i E ~j}.

From this definition, we have

(2.2) j �9 l~i r162 i �9 l~j.

Let L denote the iteration matrix of Algorithm 1, i.e., Algorithm 1 with f - 0 is represented by x +- Lx . Let v be an arbitrary N-dimensional initial vector. The following algorithm is a general form of the data f low parallel implementation presented in [1]. The parameter ki(t) used in it represents the number of times the variable vi is updated in steps 1, 2 , . . . , t.

ALGORITHM 2. Let ki(O) = O, 1 i =v k i ( t - 1 ) _< k j ( t - 1 )

C O L O R I N G U P D A T E M E T H O D S 179

then update vi (using (2.1) with f - 0 and x replaced by v) and set ki(t) = ki(t - 1) + 1; otherwise, set ki(t) = ki(t - 1).

The following lemma will be used in Lemma 2.2 below to give an equivalent representation for the conditions (2.3)-(2.4).

LEMMA 2.1. For any t > 0 and 1 _ kj(t)

(2.6) J �9 fii and j > i ~ ki(t) > kj(t).

PROOF. The proof is by induction on t (t >_ 0) for a fixed 1 0). Consider step to of Algorithm 2 and pick a j E ~i- If no update takes place for j at this step, (2.5)-(2.6) clearly hold for this j also for t = to; otherwise, (2.2) implies that (2.5) follows from (2.4) (with the roles of i and j interchanged) and (2.6) follows from (2.3) for this j and t = to. []

In the following lemma we show that the updates in Algorithm 2 are done in a synchronous manner, namely, they axe equivalent to those of Algorithm 1 with f = 0. In the sequel, we refer to the situation before step 1 also as the situation after step 0.

LEMMA 2.2. Right after step t of Algorithm 2 (t >_ O) there holds

(2.7) vi = (Lk'(t)v)i 1 < i < g .

PROOF: Note that Lemma 2.1 implies that an update takes place for vi at step t of Algorithm 2 if and only if

(2.8) j E ~ i a n d j i ~ k j ( t - 1 ) = k i ( t - 1 ) .

The proof of the lemma is by induction on t (t > 0) uniformly for all 1 0). If no update takes place for vi at step to of Algorithm 2, (2.7) holds trivially; otherwise, it follows from (2.8)-(2.9) that the update is equivalent to that of Algorithm 1. []

The next lemma gives more information about the dynamics of Algorithm 2.

LEMMA 2.3. For any t > 1, there is at least one index i for which an update takes place for vi at step t of Algorithm 2.

PROOF. Let

n = m i n { l < j < N I k j ( t - 1 ) = min k t ( t - 1 ) } . - - - - l < l < N

From (2.3)-(2.4), an update takes place for v,~ at step t. []

180 YAIR SHAPIRA

The next lemma shows that each step in Algorithm 2 can be done in parallel.

LEMMA 2.4. If j E l~i then vi and vj are not updated together at any step t (t > 1) of Algorithm 2.

PROOF. Since j E fli C ~i, we have from (2.2) that i E ~j . Without loss of generality, assume j < i. Suppose vi and vj are updated together for some step t > 1. Then, from (2.3), kj( t - 1) > ki(t - 1). But from (2.4) (with the roles of i and j interchanged) we have kj( t - 1) < ki(t - 1), which is a contradiction. []

Next, we show that Algorithm 2 does not stagnate for any of the variables.

LEMMA 2.5. ki(t) -~t-~oo 0% 1 < i < N.

PROOF. For each i, {ki ( t )}~o is a monotonically nondecreasing sequence of integers. Suppose there exists an index i for which ki(t) -~t-+oo Mi < oo. From (2.2) and (2.3)-(2.4), there exist such constants Mj also for every j E ~i and, hence, also for every j in the connectivity domain of i defined by

F i = { m [ 3 i l , i 2 , . . . , i n suchtha t n > l ,

il = i, in = m, it+l E ~i~, I = 1 , 2 , . . . , n - 1} U (i}.

Let r be so large that kj (r) = Mj Vj E Fi. Let

n = min{j e r i l Mj = min Mr}. IEF~

From (2.3)-(2.4), an update takes place for vn at step v + 1, which contradicts the definition of r. []

We now define the 'complementary' algorithm of Algorithm 2 (in a sense which is made clear below). Let v (t) denote the vector v right after step t of Algo- rithm 2. Let T be a fixed nonnegative integer so large that

(2.10) k i ( T + l ) > O , l < i < N .

(The reason for this choice is that it yields a reasonable coloring method in Section 2.3 below.) From Lemma 2.5, T exists. Let

k = max k i ( T ) + l . I < i < N

For an arbitrary N-dimensional initial vector u, define

ALGORITHM 3. For t : T + 1, T + 2 , . . . do: for i = 1, 2 , . . . , N, if

(2.11) j E (~i and j i ~ k i ( t - 1 ) _< k j ( t - 1 )

(2.13) k i ( t - 1) < k

then update ui (using (2.1) with f - 0 and x replaced by u).

COLORING UPDATE METHODS 181

Let u (t) denote the vector u right after step t of Algorithm 3. Let T, T > T, be the first integer for which ki(T) > k, 1 < i < N. From Lemma 2.5, T exists. Clearly, Algorithms 2 and 3 uniquely define linear operators L2 : v --~ v (T) and L3 : u --~ u (r). We now show that Algorithm 3, when applied to the initial vector L2v, is synchronous, namely, its updates are done as in Algorithm 1 (with f - - 0 ) .

LEMMA 2.6. Let Algorithm 3 be applied to the initial vector L2v. For any 1 1) to be the total number of updates of vi in steps 1 , 2 , . . . , m i n ( t , T ) of Algorithm 2 and (when t > T ) in steps T + 1, T + 2 , . . . , t of Algorithm 3. Then

(2.14) mi(t) = min(k, k~(t)), t > O, 1 O) we have

(2.15) vi = (Lm'(t)v)i 1 0) for a fixed 1 0). Suppose (2.13) holds for t = to. Then, since (2.11)-(2.12) are the same as (2.3)-(2.4), we have

mi(to) = mi(to - 1) + 1 = ki(to - 1) + 1 = ki(to)

if they hold for t = to and

mi(to) = mi(to - 1) = ki(to - 1) = ki(to)

if they don't. Suppose (2.13) does not hold for t = to. Let s be the last step for which it does hold, namely, ki(s - 1) < k. From the above, we have mi(s) = ki(s) = k. Since s < to, we have from (2.13) that mi(to) = k.

The second part of the lemma is proved by induction on t (t > 0) uniformly for all 1 0) for all 1 < i < N. Consider a specific 1 < i < N. If no update takes place for vi at step to, (2.15) holds trivially; otherwise, (2.11)- (2.13) hold for t = to and, since (2.11)-(2.12) imply (2.8)-(2.9), one may use (2.13) and (2.14) to make the substitution k +-- m in (2.8)-(2.9) and obtain

j E ~ i a n d j i :~ mj(to - 1 ) = mi(to - 1 ) ,

which implies that the update is done as in Algorithm 1. []

From Lemma 2.6 we have the useful result

(2.16) L3L2 = L k.

182 YAIR SHAPIRA

2.3 The coloring method.

We axe now able to define the coloring method. Let the first color be the set of indices i for which an update takes place for vi at step T + 1 of Algorithm 2. Define the j t h color, j = 2, 3 , . . . , to be the set of indices i for which an update takes place for v~ at step T + j of Algorithm 2, minus the first, second,. . . , and (j - 1)st colors. With this construction the colors axe disjoint. Furthermore, they have the following properties.

LEMMA 2.7. I f the ith color (i >_ 1) is empty, so are all the next colors.

PROOF. Let t = T + i and F = {j I kj( t - 1) = kj(T)} . Suppose F is not empty. Let

n = min{j e r I kj(T ) = rain kt(T)}. IEF

From Lemma 2.1 (with t = T) and (2.3)-(2.4) an update takes place for vn at step t, which contradicts the assumption that the ith color is empty. Conse- quently, F is empty and the lemma follows from the definition of the colors.

LEMMA 2.8. Let c be the number of non-empty colors. Then

c

U (jth color) = {1 ,2 , . . . ,N} . j = l

PROOF. From Lemma 2.5, we have

U (j th color) = {1 ,2 , . . . ,N} . j = l

From Lemma 2.7 and the definition of c we have that the first c colors axe non-empty, which implies that all the others are empty. []

Consider the multicolor update method

ALGORITHM 4. For n = 1, 2 , . . . , c do: for every i in the nth color, do

xi +- Z (~i,jfj + Z 7iSxj" jEol j e l l

Let Lmc denote the iteration matrix of Algorithm 4, namely, Algorithm 4 with f --- 0 is represented by x +-- Lmcx. Let w be an arbitrary N-dimensional initial vector. From Lemmas 2.2 and 2.8, the ith component of w is updated in the operation LmcL2w exactly ki(T) + 1 times. Furthermore, Lemmas 2.1- 2.2 guarantee that the updates are synchronous, namely, they are equivalent to those of Algorithm 1 (with f ~_ 0). From that and (2.16), we have

(2.17) L3LmcL2 = L3L2L = L k+l.

This implies the main result of this paper:

C O L O R I N G U P D A T E M E T H O D S 183

THEOREM 2.9. Assume that L is nonsingular. Then L = L21LmcL2.

PROOF: Since L is nonsingular we have from (2.16) that (L-kL3)L2 is the identity matrix. From this and (2.17) we have

L = L-kL3LmcL2 = L21LmcL2.

COROLLARY 2.10. Assume that L is nonsingular. Then, given an eigenvector of L, an eigenvector of Line with the same eigenvalue may be obtained from it by applying Algorithm 2 to it until Step T is completed.

PROOF. From Theorem 2.9, we have Lv = )~v ~ LmcL2v = )~L2v. []

The matrix L2 can be also obtained explicitly by applying Algorithm 2 until step T is completed to the columns of the identity matrix. We conclude with the following more general result, which does not assume nonsingularity of L.

COROLLARY 2.11. L and nmc have the same spectrum. PROOF. If L is nonsingular, this follows from Theorem 2.9. Suppose L is

singular. Let L (~) and L(m ~) denote the corresponding iteration matrices for the modified update method

ALGORITHM 5. For i = 1, 2 , . . . , N do:

Clearly, Algorithm 2 yields the same coloring for every w ~ 0 in (2.18). Since IL(~)I (the determinant of L (~)) is a polynomial in w, IL(~)I = 0 for all w in a neighborhood of 1 implies that IL(~ = 0 which contradicts the fact that

co L (~ = I. Therefore, there exists a sequence {Wk)k=l such that wk "-+k-~ 1

and IL(~)I ~ 0 for all k > 1. From Theorem 2.9, L (~) and L(,~c k) have the same spectrum for all k > 1. Consider the polynomials in the variable )~ (and

- r(~) AII. F o r t Wk, k > l , t h e z e r o s o f w as a parameter) IL (~) AII and ~,mc - = _ these polynomials are the same. Since the zeros of a polynomial are continuous functions of its coefficients, they are also the same for w :- 1. []

3 Discuss ion .

For linear update methods, a coloring method is introduced whose iteration matrix has the same spectrum as that of the sequential ordering. This result is more general than that of [1] in the following senses.

�9 It applies to general update methods and is independent of the definition of grids and stencils.

�9 It guarantees similarity of the iteration matrices of the multicolor and sequential orderings, provided that they are nonsingular.

184 YAIR SHAPIRA

It is interesting to consider certain particular cases. Let d be a positive integer (denoting the dimension of the problem) and consider a d-dimensional structured grid consisting of No d nodes, where No is a positive integer. Denote a grid point by a d-dimensional integer vector of the form ~"-- ( i0, i l , . . - , id-1), where the components ij are integers between 0 and No - 1. The stencil of the coefficient matrix for which a Gauss-Seidel relaxation is considered is a structurally symmetric d-dimensional cube of 3 d coefficients. Two coloring methods are proposed:

METHOD A:

(n + 1)st color { ~" d-1 24 } = )"]~j=O 2Jij = n mod ,

n = 0 ,1 , . . . , 2 d - 1.

METHOD B:

(~'~jdz_l o 2Jnj + i ) s t color = { ~ ] i , = n t m o d 2 , 1 = O , 1 , . . . , d - 1 ) ,

n m = 0,1, m = O, 1 , . . . , d - 1.

Method A is obtained from the usual lexicographical ordering by applying the present coloring method to it. (In 2-D, it is equivalent to the method of [1].) Method B was introduced to me by Irad Yavneh and has the advantage that its vectorization is obvious. However, Method A is also efficiently vec- torized by decomposing each color as a collection of disjoint sub-colors which are aligned with the grid and therefore suitable for vector operations or by using a periodic continuation as in [7]. For Gauss-Seidel smoothing in multigrid, Method A appears to be slightly superior for certain diffusion problems with discontinuous coefficients in 3-D. We have tested Example (11) in [5] with the Gauss-Seidel smoother in a four-level V(1,1)-cycle of the Black Box multigrid method of [4]. The convergence factor of the multigrid iteration is .136, .145 and .187 (respectively) when lexicographical ordering, Method A and Method B are used. The advantage of Method A may be explained heuristically by the close relation expressed in Theorem 2.9 and Corollaries 2.10 and 2.11 between Method A and the lexicographical ordering, the natural ordering for smoothing purposes ([3]). Indeed, the red-black ordering which is known to be the best ordering for Gauss-Seidel smoothing in multigrid for the Poisson equation with the usual (2d + 1)-coefficient stencil ([6]) is obtained from the present coloring method applied to the lexicographical ordering. (An important exception to this is the periodic case, where the red-black ordering yields a bad smoother and, indeed, cannot be obtained from the present coloring method.)

The present coloring method is not always optimal in terms of minimizing the number of colors. Indeed, it is found for certain unstructured grids that colorings in the spirit of Methods A and B might use smaller numbers of colors than that of the present coloring method. Still, the number of colors used in the present method is also acceptable. In practice, it is advisable in some cases to

COLORING UPDATE METHODS 185

eliminate certain points (or even a line of points) from the grid, color the rest and then color the eliminated points accordingly. (This is recommended, e.g., for periodic problems or for irregular finite element meshes.) It is well-known that the coloring of a graph using a minimal number of colors is an NP-Complete problem; therefore, one cannot expect a general coloring method to be optimal in this sense. It should also be kept in mind that a small number of colors is not always advantageous. On vector computers, for example, arithmetical operations are usually performed on vectors of length 64; therefore, colors with more than 64 elements are not superior to colors with 64 elements in terms of vectorization.

Acknowledgment.

The author wishes to thank Irad Yavneh for his valuable comments and for introducing Method B.

REFERENCES

1. L. M. Adams and H. F. Jordan, Is SOR color-blind? SIAM J. Sci. Stat. Comput., 7 (1986), pp. 490-506.

2. R. Alcouife, A. Brandt, J. E. Dendy, and J. Painter, The multigrid method for the diffusion equation with strongly discontinuous coel~cients, SIAM J. Sci. Stat. Comput., 2 (1981), pp. 430-454.

3. A. Bremdt, Guide to multigrid development, in Multigrid Methods, W. Hackbusch and U. Trottenberg, eds., Lecture Notes in Mathematics 960, Springer-Verlag, Berlin, Heidelberg, 1982, pp. 220-312.

4. J. E. Dendy, Two multigrid methods for the three-dimensional problems with discontinuous and anisotropic eoe~cients, SIAM J. Sci. Stat. Comput., 8 (1987), pp. 673-685.

5. Y. Shapira, Multigrid methods for 3-D definite and indefinite problems, Appl. Nu- met. Math. (to appear).

6. K. Stuben and U. Trottenberg, Multigrid methods: Fundamental algorithms, model problem analysis and applications, in Multigrid Methods, W. Hackbusch and U. Trottenberg, eds., Lecture Notes in Mathematics 960, Springer-Verlag, Berlin, Hei- delberg, 1982, pp. 1-176.

7. T. Washio and K. Hayami, Overlapped multicolor MIL U preconditioning, SIAM J. Sci. Comput., 16 (1995), pp. 636-650.

8. D. Young, Iterative Solution of Large Linear Systems, Academic Press, N.Y., 1971.

Coloring update methods

Documents

Transcript of Coloring update methods