1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is...

38
1SE-TRd-90-86 On the Efficiency of an SOR-like Method Suited by Masaaki Sugihara, Yoshio Oyanagi, Masatake Mo October 4, 1990

Transcript of 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is...

Page 1: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

1SE-TRd-90-86

On the Efficiency of an SOR-like Method Suited to Vector Processor

                          by

Masaaki Sugihara, Yoshio Oyanagi, Masatake Mori and Seiji Fujino

                     October 4, 1990

Page 2: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Qn the Efiiciency of an SOR-like Method Suited to

                   Vector Processor *

               Masaaki Sugihara

  Depαrtment of EconomiC8, Hitotsiめashi Univers吻

           Ifunitachi, Tokyo, 186 Japan

                Yoshio Oyanagi

Institzeteげlnformαtion Sciences, Univers吻。!TSttkubα

           Tsukuba,乃a rα ki,305」’αPαn

                Masatake Mori

     Fac吻・!恥卿εε7吻, Unive7’吻・ノT・ky・

          Bu吻。一紘Tokyo, 113 」αpan

                  Seij i Fuj ino

     Institute of Computational 171uid Dynamz’cs

   Haramachi 2-1-4, Meguro-ku, Tokyo, 152 Japan

                 October 199. 0

                           Abstract

      A simple implementation on vector processors of the SOR method for solving linear

   system of equations ari’sing’ from’the discretization of partial differential equations makes

   the SOR method into another, which, though, looks like the SOR met・hod, lii this

   paper, the efficiency of this SOR-like method (pseudo-SOR method) is investiga,tcd,

 “lnvited talk presented by Oyanagi at lnternatio7}al Congress en Computational an(1 A;)p17ied A・fathematics

held at Katholieke Universiteit Leuven, Belgium, on July 23-28, 1990.

                              1

Page 3: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

For the Poisson equation in a rectangular region, in both five-point and nine-point

discretization, we plove a皿alytica皿y that the optimal acceleratio皿1)arameter is smaller

and the optimal convergence rate is lower in the pseudo-SOR method. Comparison

on several vector computers was made between the SOR method vectorized with the

hyperplane technique and the pseudoSOR method. lt turned out that the hyperplane

SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on

vector processors. We also tested an example of Poisson equation descretized in a curved

coordinates in a region bounded by two eccentric circles and found numerically that

the optimal acceleration parameter becomes small and the convergence becomes slow in

the pseudo-SOR method, while the optimal acceleration parameter in the SOR method

does not change appreciably. The genuin SOR method is superior to the pseudo-SOR

method.

1 Introduction

    Various vectorization techniques have been proposed to improve the computational speed

on vector processors. The essential ingredients are to remove recursion in loops and to make

the vector length as long ag possible. We will consider the vectorization (>f the SOR(Successive

OverRelaxation) method for two-dimentional partial differential equations.

   The following discussion holds for general linear second-order elliptic partial differential

equations. For simplicity, however, we will start with the Poisson equation in a rect angular

re 9i・nの={@,y)10<x<1,0<y<1},

                △一1舞+1舞一・・nD,姻一ψ(ゆn∂P・ (1)

   Discretization of the second derivatives of (1) by five-point or nine-point finite differences

on an (N 十 1) × (N 十 1)一grid with grid spacing h = 1/N leads to a set of linear equations:

σ臼,ゴ+σi+、,ゴ+σらゴー、+σらゴ+一4Ui,ゴ=’ O (2)

or

4(Ui-i,」 + Ui+i,」 + Ui,」一i + Ui,」+i) + Ui+i,」一i + Ui+i,」+i + Ui-i,」一i + Ui-i,」+i 一一 20Uif一一・一一 O (3)

for i,2’ 一一“= 1,2,...,N 一 1, which form a system of linear equations for t,he solution vector U.

2

Page 4: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

   Eqs.(2) and (3) can be solved with the polular SOR method. We vvill show here a program

for the five-point case:

   DO 10 J=1, N-1

    DO 20 1=1, N-1

      U(1,J)=U(1,J)+OMEGA*(O.25*(U(1,J-i)+U(1,J+1)+U(1-1,J)+U(1+1,J))一U(1,J))

20 CONTINUE10 CONTINUE

Here, OMEGA(w) is the aceeleration parameter. Running with i and 2’ from 1 to IV 一一 1, two

neighboring grid points of (i,1’), (i 一 1,」’) and (i,2’ 一一 1) have already been updated, so that

Ui,」 i computed with these new values and old values of Ui+i,」 and Ui,」t+i. The SOR method,

therefore, is recursive in both i一 and 2Ldirections. This data dependency makes this loop

hard to be vectorized.

   There are some well-known techniques to vectorize the SOR method [1]. We will consider

here a so-ca皿ed hyperplane method[2], which essentially reorders the numbering of the lattice

points without changing the algorithm. The program will be as follows:

  DO 10 K=2, 2*(N-1)

*VOPTION NODEP(U)

    DO 20 J=MAX(1,K+1・一N), MIN(N-1,K-1)

      工嵩K-J

      U(1,J)=U(1,J)+ONEGA*(O.25*(U(1,J-1)+U(Z,J+1)+U(r-1,J)+U(1+1,J))rU(1,J))

20 CONTIN’UE10 CONT工]EfUE

This program is methematica皿y equivalent to the SOR method. The inner loop has no

recursion and the access to the array U is with only constant strides, Howe’ 魔?s the hyperplane

SOR method involves short average vector length (N 一 1)2/(21V 一一 3) f」 N/2 and ordinary

vectorizing compilers cannot recognize that there are no data dePendencies, so that we have

to insert a directive, *VOPTION in this example, to the compiler. For some vector computers,

access with stride more than one may cause a decrease in the data transfer rate from memory

to vector registers and shorter vector length may increase relative start-up time. Nine-point

discretization can also be processed by a hyperplane method, but with average vector length

fu N/3.

   A more naive implementation of this loop on a vector processor is to modify the program

as

3

Page 5: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

  DO 10 J=1, N-1

    DO 20 1=1, N-1

      V(1)=U(工,J) + OMEGA*(0.25*(U(1,J-1)+U(1,J+1)+U(1-1,J)+U(工+1,J))一U(1,J))

20 CONTIN’UE    DO 30 工=1, N-1

30 U(1,J)=V(1)

tO CONTINUE

In this program七he data dependency in the SOR method is removed so that the loops

can be successfully vectorized. The program involves only stride one access to the memory.

Obviously this is a different algorithm, since Ui-i.・ on the right hand side comes from the old

i七eration Ievel in contrast to the SOR method. We will call it the pseu(1(”SOR method. The

pseudo-SOR method is sometimes used as a simple vectori zation technique in computational

fluid dynamics.

   In the next section, the optimal acceleration parameter w and the rate of converngence of

the two methods for five point discretization is discussed. Similar arguments for nine-point

discretization are given in section 3, Numerical results on various vector processors are shown

in section 4, inc}uding one for curvilinear coordinates. ln section 5, we will conclude that the

hyperplane SOR is superior to the pseudo-SOR method.

2 Comparison of SOR and pseudo-SOR Methods for

Five-Point Discretization

2.1 Matrix Formulation

Equation(2) forms a system of liner equations with matrix As for the solution vector u. If

we order the elements in the solution vector lexicographically as

              (Ul,1)Ul,2;…7Ul,N一一1)112,1)U2,2)…)U2,N一一1)..・)UN-1,N-1))

the coefficient matrix A.r is,

4

Page 6: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

with

Bs =

 1

-1 4

o

 1-m一@4

 1

As :

 1

 4

一  一  一

Bs

Cs

o

o

一一一 獅P

oBs Cs

一  一  一 “  一  一

   一  一  一 一  一  一

   Cs Bs Cs

       Cs Bs

         一1

          4

,Cs =

o

1

4

1

4

o

1

4

   The iteration in the SOR method is written as usual,

        u(m+i) = lcgORu(m) + b’ = (1 一一 cvL,SOR)一’((1 一 c,7)1 + cvU,SOR)u(M) + b’

where

and

 SORL,Dv. =

一.Bま

一〇、一瑳

o

o

一C, 一B,L

B,L =

σ『o丑=(五ぎOE)T.

o

一一 1

 4

o

o

o

一1 0 4

   On the other hand, the iteration in七he pseudo-SOR method is forlnulated

     u(m+i) = ,ce,一SORu(m) + bt ,., (1 . cdLe,一SOR)一i((1 一 co)1 + cou,p’SOR),t,(m) + btt

where

                              o     O

                            -C, O

                   .Lg-SOR,=1 …,.. 1,

                             0 一C,0

                                  5

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Page 7: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

and

             (一B,L-B,U) 一Cs O                        (一B,L-B,U) 一Cs

  u,p-SOR=1 ... ... 1, (11)                                        (一瑠一瑠) 一〇5

                 0 (一 B,L-B,U)

with B,U 一一一: (B,L)T.

2.2 Acceleration Parameter w and the Spectral Radius

The order of convergence of Eq.(6) and Eq.(9) is determined by the spectral radius of LsSOR

and Lg’一SOR, respectively, As an illustration, p(LgOR) and p(Lg-SOR) are numerically esti-

mated for IV = 6 and shown in Fig. 1. The result shows:

  1.The optimal accelelation parameterω。pt f()I the pseudo。SOR method is substantia皿y

    smaller than that for the SOR method.

  2. The rate of convergence of the pseudo-SOR method using the optimal acceleration

    parameter is worse than that of the SOR method.

  3. The region of the acceleration parameter which ensures the convergence of the iteration

    is narrower for the pseud()一SOR method.

In the next subsection, we will generalize the results to arbitrary N.

2.3 Analytic Determination of the Optimal Acceleration Param-

     eter of the SOR and the pseudo-SOR Methods

The optimal acceleration parameter of the SOR method for our case and the correspon(ling

spectra} radius is given by Varga [3]:

                cvopt = ir一 Fgtllk’lki;ii’in .z1.),Aopt =p(LgOR(cvopt)) = cc一,.pt 一一 i. (i2)

Asymptotically,                c-o.,t =2 一T 3i’li + o( 1N2), A.t =i一 3i’; + o( ?」}i,T)・ (i3)

                                   6

Page 8: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

   On the other hand, we will later show that the optimal acceleration parameter for the

pseudo-SOR method w.pt and the corresponding spectral radius A.pt are given as the solution

of the following equation with respect to w and A:

                  憶鰍潜醐 (14)

The asymptotic estimates are:

             ,..,, = g一 g(.z[一)2+ o(lieT, ), A.,, =1 一一一 (ft)2 + o(7(!:,r). (ls)

We note that the optimal spectral radius A.pt approaches unity in O(N-i) in the SOR

method, while in 0(IV-2) in the pseudo-SOR method. The observation for IV == 6 given in

the previous subsection also applies to large N.

In order to derive Eq.(14), we will find all the eigenvalues of Cg-SOR by solving t he secular

equation. We note t he following equality holds:

det(cg-SOR(c,7) 一一 Ai)

   = det((1 一 cvLg-SOR)一i.((1 一 co)1 十 cvU,PtaSOR) 一一 AJ)

   =(det(1-cv L9曙SOR))一1 det((レω一λ)1+λcvLg-SOR+ωσぎ一SOR),

(16)

We have only to solve the equation

det((1 一 c,7 一一 A)1 + AcvLg-SOR + co u,p-SOR) = o. (17)

The matrix in the parenthesis is a block triangular matrix:

E  F

A.F E lii

    -  e  一 一  一

o

o

AF E

    N 一一 1 blocks

7

Page 9: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

with

E=

1一一 cL2 一A

   里   4

o

   里   4

1一 cv 一A

  一  一  一

里4

一  一  一

4

o

1一ω一λ  景

   ¥ 1一一cv一一一A   4

N 一一 1 elements

       wand F= =1.       4

We will show the following lemma.

エemma 1ゐθ田αnd・C be squαre mαtrices・∫・rder・n,抗θノ・〃・ω吻iden鞠ん・lds・

det

B  C

AC B

    一  一  一

o

C

一  一  一

AC

o

B C

AC B

 : det

Proof Since the first equality is self-evident

matrix of order m with

A=

B V‘]iCC

>仮O B vCXc

      一  一  一 {  一  一

           viXc

o

o

B vi;Slc

vi51C B

m blocks

               一壷d・t(B卿),一一2…(m律1)・(18)

                     , we will prove the second one. The following

    null dia,gonal elements is diagonalized by an orthogonal matrix P.

  i O 1 l  l 62     ... .” 1 -pl .” Ip-i. (lg)

         1 0 1 1 1 6m-i

Replacing ea・h element aiゴwith a bl・・k曜rσand adding B t・the diag・nal b1・cks, we

8

Page 10: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

have the folowing relation between blocked matrices of ordel nm.

    β〉「AC  O B+δ1>『o   ’ 0   ~/XO    jB    ~/ヲ(C                                   1ヲ十δ2 v!天0

          ..,  ...        =戸                  ...            戸一1

                                                                               ,

               ●  .  ●             o  ●  ●                                                            ・  ●  0

    0   ~バoB   O     B+δm>ro                                                                          (20)

      ガwhere」P is a matrix of order nm extended from、P, where p‘ゴis replaced with a block, jpiゴ1,、.

Estimating the determinants on both sides, we obtain Eq.(18).          Q.E,D.

    We applyもhis lemma to Eq.(17)and丘nd that the equation is equivalent to the system

of equations:

                       d・t(E+δゴ〉伐∬)一・,δゴー2…(影)

forゴ=1,_,1V-1. Si皿ce」醒十6ゴ〉「A∬is again a七ridiagonal matrix, applying this lemma

again, we丘nd that Eq.(17)is equivalent to the sysmtem of equations forλ:

          1一(1+生   4)ω一λ一害孤ω(ゴー1,…,N-1;k・・= 1,...㌍)(21)

The following proposition holds:

Proposition l Theαろsolute vαlues o/th e solutions o/th e egzzαtion(21?is smα〃eT thαn or

eguαl to

   ・The 1αrgest abs・lute vαlue・f the s・lutions for 1一(1一圭。・s(k))ω一λ=一去。・s(持)V伐ω

     07’

   ・1+圭。・s(T))ω一1・

If this proposition is proved, it is straiglltforward to show thatω。pt is given as the solution

of Eq.(15)、

Pr・gf Eq・(21)・an b・ ・xpressgd a・ th・fa血1y・f quad・atic equati・・曲rλ,

                       1一αω一λ                                 =士β〉「A,withα〉β≧0.             (22)

                           ω

   We will first discuss the case forβ.>0. Figure 2 sllows that Eq.(22)has two real solutiolls

for O<ω≦ω1 and two complex solutions forω1<ωwith the absohlte valueαω一1, where

                                     9

Page 11: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

ω1=2/(α十〉両).For real solu七ions, the larger one monotonicaHy decreases asω

increases and the absolute value of complex solutions increases as w increases.

   It c an easily be understood that the absolute value of the complex solution of Eq.(22) is

smaller than or equal t o (1 十 t2 cos(ft))w 一 1, where 2 cos(ft) is the maximum value of 6k.

   Let us consider the re al solutions. Figure 3 shows the curves for two sets of equations

with the same a and different fi, fi2 〉 fii. lf the equation with fii has real solutions, the

equation with fi2 has also real solutiohs whose larger value is greater than that for 6i. Figure・

4 shows the curves for two sets of equations with the same fi and different a, or2 〉 ori. It

can easily be seen that, if the equation with a2 has real solutions, the equation with ai has

also real solutions whose larger value is greater than that for a2.

   From these observations the absolute value of the real solution for Eq.(22) is always

smaller than or equal to the larger solution for Eq.(22) with the smallest or and the largest

6. The real solution of Eq.(21) is no greater than the larger solution of

                   1 一一 (1 一 icos(ft))cL, 一一 A = 一icos(ft-v/]Xcv).

This completes the proof for fi 〉 O.

   If 6 = O, Eq.(22) is a linear equation. The same arguments hold if real and complex

solutions are understood as positive and negative solutions, respectively. Q.E.D.

   The numerical results of w.pt and A.pt are shown in Table. 1.

3 Comparison of SOR and pseudo-SOR Methods for

Nine-Point Discretization

    In this section we wi皿analyse the SOR and pseudo-SOR method for the nine-point

discretization of Poisson equation, Eq,(3). In the SOR method, Ui,,・ i s computed with four

new values Ui-i,」’, Ui-i,」’一i, Ui,」’.i and Ui,」+i, while in the pseudQSOR method three new

values Ui-i“・一i, Ui“’一i and Ui,」+i are used, so that we expect the degradation in using pseude

SOR method may not be so large as in the five-point discretization.

10

Page 12: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

3.1 Matrix Formulation

For the nine-point discretization, the linear equation is described in the matrix form,

                                  B, Cg’ O

                                  Cg Bg Cg

                    Agu=b, Ag=1 ...... 1, (23)

                                          Cg Bg Cg

                                   O C,B,

wiもh

               1-9 0  -t一売  0

              一一去1一去 .  一売一一去一売

       Bg=1 ”.”. 1,C,一1 .” ”, 1. (24)                                                      1    1    1                       一  一  一 一  一  一                                                     20 5 20

               0 一gi7 kO 一“,一g

   We皿ote that Og is not a diago皿al matrix in contrast to O5. The iteration in the SOR

method is,

         .(m+i)=,C80Ru(M)+bt==(1一一ctiL,SOR)一i((1-cu)1+coU,SOR)u(M)+bt (2s)

where

            一瑠       o     o     O

            -C, 一B,L 1 1一一gOL,SOR eI .,. ., 1,B,L一一.p 1 ...... 1, (26)

                       一C,一B,L 1 1 .一一・

             0  一σ9一増  0  一去・

and

                              U,SOR ., ,(L,SOR)T. (27)

   On the other hand, the iteration of the pseudo-SOR method for nine-point discretization

is formulated in the same way as for five-point discretization,

    u(m+i):,cB-SORu(m)+bt=(1-cvLgTSOR)tsi((1-cv)1+cou,P-SOR)u(m)+b” (2s)

                                    11

Page 13: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

where

                              O      o

                             “Cg 〇

                   五召一50R=  ...,..   ,    (29)

                                     一一C, 0

                              0 一一 C, O

and

             (一Bポ殉 一〇・     0                        (一B,L-B,U) 一Cg

   u,p-SOR,.1 ... ... 1. (30)                                        (一B,L-B,U) 一Cg

                 O (一B,L-B,U)

3.2 Acceleration Parameter w and the Spectral Radius

First we estimated numerically the spectral radia of LgOR(w) and Lg-SOR(w) for IV = g. The

results are shown in Fig. 5. The same statements holds as in Section 2.2. Although three

new values are used to update Ui“・, the pseudeSOR method is considerably inefficient, We

will discuss the convergence of these two methods for general N.

3.3 Analytic Determination of the Optimal Acceleration Param-

     eter of SOR and pseudo-SOR Methods

In a similar manner to the five-point case, we have to solve det((1 一一 cv 一 A)1 十 Acx7LgSOR 十

wUgSOR) == O, to obtain the eigenvalues of LgSOR. Using Lemma 1 twice, the problem is

reduced to a set of equations,

        レω一λ一6ゴ(ω/5)vCX 一 6,(ω/5)(λ一(δゴ/4)孤)(1一(δ、/4)一N/‘iY)=0, (31)

for 2’ 一一一 1,...,IV 一一 1,k = 1,...,N一 1. Based on our numerical experience, we conjecture that

the optimal acceleration parameter w.pt is given by the w for which the largest real solution

coincides wi七h the absolute value of the complex solutio皿of the quartic equaもion wi七h Iespect

to A,

               (1 一 ct7 一一一 A)4 一一 262(8 + 62)(!ll)2A(1 一一一一 cv 一 A)2 + 68(llll)‘ +

                                  12

Page 14: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

                166‘(w5)3A(A+ 1)(1 一 cv 一 A) 一一一一 466({ilA(A+ 1)2 = O. (32)

Assuming this conjecture, we calculated w.pt and A.pt numerically for IV S 1000 and found

an approximate formula:                                               5.61                                  6.63

                        cuopt f;2一 :[i’i(1:VV) Aopt f s 1一” E’iiiir:V““・ (33)

   On the other hand, in a similar manner to the伽e-point case, the opもimal acceleration

parameter and the corresponding spectral radius for the pseudo-SOR method are determined

in terms of the se七〇f equation:

                 1 一 (1 一 g ,os(ft,)),, 一 A = 一!ill ,.,(li(i,)(2 + ,.,(ft,))vlx

                        2 .r

                A=(1-Eicos(:ii’7・))co-1. (34)

Asymptotically,

             ,,,.,=IUti一一:51(k)2+o(liel,T), A.,,=1-g(ft)2+o(lier,). ’ (3s)

Table 2 shows the numerical values for w.pt and A.pt. VSie can see that the optimal accelera-

tion parameter for the pseudo-SOR method is substantially smaller than that for the SOR

method and the rate of convergence of the pseudo-SOR method using the optimal acceleration

parameter is worse than that of the SOR method,

4 Numerical Results on Vector Processors

    We have seen that the pseudo-SOR method is inferior to the original SOR method in

terms of the convergence ratio for both five-point and ninepoint discretization, so that the

former method requires more number of i七erations than the latter. On the other hand, the

pseudeSOR method is very suited to vector supercomputers, since it involves only stride

one access and has longer vector length than the hyperplane SOR. We will implement those

method on various supercomputers and compare the CPU time for solving the problem.

13

Page 15: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

4.1 Dirichlet Problem on a Rectangular Mesh

As an example, we will consider a Dirichlet problem on the unit square with the following

boundary conditions:

                U(O,g)=O, U(1,y)=O (O〈y〈1)

                u(x,o) = o, U(x,1) = O.5 一 lx 一一一 O.51. (O 〈 x 〈 1)

4.1.1 Numerical Verification of w.pt for IV =6

As a numerical test, we discretized the square by six in one direction (N = 6) and solved the

equation. We show in Figs.6-9 the behavior of convergence for SOR method and the pseudo-

SOR method for both five-point and nine-point discretizations in ternrs of the Li-norm of

the residual vector. The initial vector u(O) is a zero vector. The calculations were performed

in the double precision. Figures (a) are for cv 〈 wopt and (b) for w 〉 co.pt. The theoretical

optimal acceleration par ameters w.pt for N = 6. are given in Table 1. Those figures show that

the predicted w.pt is actually optimal.

4.1.2 Performance on Vector Computers

In order to make a realistic evaluation, we have tested the same problem with more grip points

and measured the performance on various vector supercomputers. We only applyed the five

poin七discre七ization, We tested七en cases with/V=45,63,77,89,99,109,119,ユ27,135,141.

The acceleration parameter is fixed on the thgoretical optimal value. The initial vector is

null and the computations were done in the double precision. The convergence criterion is

that the L2-norm of the residual is less than 10-6. The computers used are,

                      Vendor Model Max. Speed

a) Fujitsu

b) Fujitsu・

c) Hitachi

d) NEC

e)C・皿vex

f) Ardent

g) Stellar

VP-200

VP-400E

S-820/80

SX-2

C-1 XP

TITAN

GS 1000

570 MFLOPS

ユ700MFLOPS

2000 MFLOPS

1300 MFLOPS

 20 MFLOPS

  ユ6MF:LOPS

 40 MFLOPS

14

Page 16: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

The CPU time spent till the convergence is shown in Fig.10 as a function of the number

of grid points, The hyperplane vectorized SOR is superior to the pseudo-SOR for all the

supercomputers tested here.

4.2 Dirichret Problem on a Curved Mesh

In general curved coordinate (〈, ny), the ’Laplacian contains a cross term a2u/O〈Oo, which

is discretized in terms of (Ui+i“・+i 十 Ui.一i,」’.i 一一 Ui+i,」一i 一一 Ui-i,」’+i), so that the discretized

equation is a nine-point difference formula in general [4]. We test here two cases, a) where

the numbers of grid points in two dimensions are the same, and b) where the number of grid

points in the horizontal direction is larger than the one in the vertical directibn.

   a) The domain is shown in Fig.11. The boundary values are 20 on the top, 50 on

the bottom and have interpolated values on the right and left sides. The grid points are

462,622,782,902,1022,1102,1182,1262,1342,1422. The initial value is all O and the conVer-

gence condition is that the L2-norm of the residual vector be less than 10-6. Since the optimal

acceleration parameter is not known theoretically, we searched the optimal one in step O.02.

The CPU time for the SOR and pseudo-SOR methods is shown in Fig.12. The hyperplane

vectorized SOR method is much faster than the pseudfrSOR method even in the nine-point

discretization.

   b) In this case a horizontalIy long domain with curved boundaries is treated(Fig.13). We

tested three types of grids, 40×82, 40 × 162,40 ×322. Other conditions are same as in a). Table

3 shows the CPU time for VP-200 and S-820/80. Although a long vector length is favorable

for the pseudo-SOR method, the result in this table indicates, the larger the number of grid

points, the faster the hyperplane SOR method as compared to the pseud(>SOR method.

4.3 Mixed Boundary Problem on a Circular Mesh

In this subsection we wi11 discuss Poisson equation

                         O2u 02u                         52tilii + 3ii71/ = 一一2sin(x)sin(y)

in regions bounded by two eccentric circles, as shown i.n Fig,14. VkTe tested six kinds of

grid with different eccentricity and concentricity [5]. Such grids are used in calculating

                                    15

Page 17: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

the stream around a circular cylinder. Two kinds of boundary conditions are assumed: a)

Dirichlet contition on all boundaries, and b) Dirichlet on the inner boundary and Neumann

on the outer boundary. The size of the grid is 16(r-direction) × 50(e-direction). The analytic

solution is u = sin(x)sin(y).

   The optimal acceleration parameter and the corresponding spectral radius are searched

numerically in step O.02. The results are shown in Table 4 for the six kinds of grid in Fig.14.

The optimal acceleration parameters in the hyperplane SOR method are stable for both

Dirichet and mixed conditions with respect to the eccentricity, while those the pseudo-SOR

method changes appreciably. ln the case of hyperplane SOR method, the optimal acceleration

parameter for the mixed condition is larger by O.12-O.18 than that for Dirichlet condition,

while in the case of the pseudQSOR method, there is no difference.

   Moreover, the spectral radius for the optimal acceleration parameter for the mixed con-

dition is larger than that for Dirichlet condition. Especially, the spectral radius for the

pseudo-SOR method is nearly 1.0, the convergence limit. Fig.15 shows the numbers of iter-

ations for different acceleration parameters. It is to be noted that the pseudo-SOR method

is very critical, in the sense that an acceleration parameter slightly larger than the optimal

value may cause divergence of the iteration. On the other hand, the converge’nce region for

the hyperplane SOR method is large.

5 C o nclusio n

    We have aI1記ysed both theoretically and numeIica皿y the performance of the genuine

SOR and pseudo-SOR. Our observation is as follows.

  1. For the rectangular region, in both five-point and nine-point discretiz ation, the asymp-

    totic convergence rate is 1 一 C/IV for the SOR method, while 1 一 C’/N2 for the pseudQ

     SOR method. Although the latter is easily vectorizable, the overa11 efliciency of the

    former is higher.

  2. Even when the aspect ratio of the grid is large, which case is favorable for the pseudo-

     SOR method, this method is inferior to the hyperplane SOR method.

3. ln the curved mesh discussed here,

16

Page 18: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

      (a) As the eccentricity increases, the optimal acceleration parameter becomes small

         and the spectral radius approches unity in the pseudo-SOR method. On the other

         hand, the optimal acceleration parameter in the hyperplane SOR does not change

         appreciably.

     (b) ln the pseudcHSOR method, acceleration parameter slightly larger than the opti-

         mal value may cause divergence.

      (c) When the outer boundary is with Neumann condition, the spectral radius for the

         pseudo-SOR method is nearly unity.

   We conclude the pseudo-SOR method is less than the genuin SOR method in efficiency

on vector supercomputers, although the former is easily vectoriz able with unit stride. The

rate of convergence should be most respected in the vectorization of an algorithm.

AcknowledgementThis work is partly supported by the Grand-in-Aid for General Scientific Research of The

Ministry of Education, Science and Culture of the Japanese Government (No.01540169).

References

[1] J. M. Ortega, ”lntroduction to Parallel and Vector solution of Linear Systems”, Plenum

    Press, New York and London, 1988,

[2] L. Lamport, ”The Parallel Execution of DO Loops”, Comm, of ACM, 17, pp.83-93, 1974.

[3] R. S. Varga, ”Matrix lterative Analysis”, Prentice-Hall, New York, 1962.

[4] J. F. Thompson, Z. U. A. Warsi, ”Numerical Grid Generation”, North Holland, 1974.

[5] T, Kawamura, H. Takami, K. Kuwahara, ”Computaion of High Reynolds Number Flow

   around a Circular Cylinder with Surface Roughness”, Fluid Dynamics Research, 1,

   pp.145-162, 1986.

17

Page 19: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Table 1 co,pt and A,pt for five-point discretization

SOR method pseud(トSOR method

ω。μ   λ、ゆ ω。μ  λ。μ

1V=6 1.33333  0.33333 1.23431   0.76878

ノV=10 1.52786  0.52786 1.29285   0.90764

1V=20 1.72945  0.72945 1.32259   0.97584

1V=50 1.88183  0.88183 1.33158    0.99606

2V=100 1.93909  0.93909 1.33289   0.99901

1V=1000 1.99373  0.99373 1.33332   0.99999

Table 2 wopt and Aopt for nine-point discretization

SOR method pseud(トSOR method

ω。Pε λ。P・ ω。P・  λ。μ

1V’=6 1.31393  0.37071 1.26184   0.69896

1V=10 1.50902  0.56335 1.35459   0.86991

ノV=20 1.71627  0.75377 1.40799   0.96425

!V=50 1.87542  0。89351 1.42517   0.99411

N=100 1.93567  0.94529 1.42772   0.99852

ノV=1000 1.99337  0.99439 1.42856   0.99999

Table 3 CPU time in sec for the optimal acceleration

grid 40×82  40×162  40×322

pseudo-SOR

?凾垂?窒垂撃≠獅?@SOR

0.49     2.55      17.2

O.27     0.72      3.29

(VP-200)

grid 40×82  40×162  40×322

pseud(トSOR

?凾垂?窒垂撃≠獅?@SOR

0.17     0.82      9.08

O.08     0.27      1.69

(S-820/80)

18

Page 20: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Table 4 tu,pt and popt for an eccentric double circle

mesll pseud(》SOR hyperplane SOR

ω・μ ρ・μ ω(μ   ρqp舌

Fig.14(a) 1.22 0.971 1.68  0.720

Fig.14(b) 1.12 0.971 1.70  0.717

Fig。14(c) 1.00 0.977 1.70  0.782

Fig.14(d) 1.20 0.970 1.68  0.720

Fig.14(e) 1.14 0.975 1.72  0.766

Fig.14(f) 1.02 0.975 1.72  0.766

a) Dirichlet condition on both boundaries

mesh pseud(トSOR hyperplane SOR

ω・Pε ρ・P診 ωqP舌  ρqp舌

Fig.14(a) 1.26 0.991 1.84  0.840

Fig.14(b) 1.16 0.993 1.88  0.856

Fig.14(c) 1.02 0.994 1.82  0.953

Fig.14(d) 1.26 0.998 1.88  0.900

Fig.14(e) 1.16 0.998 1.88  0.913

Fig.14(f) 1.00 0.997 1.80  0.924

b) mixed boundary conditions

19

Page 21: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Figure Captions

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

Spectral radii of L,SOR(cx7) and Lg一一SOR(cv).

Parabola士βへ/:X and lines(1一αω一λ)/ω)withω〈ω1,ω=・ω1 andω〉ω1・

Line (1 一 cvco 一一 A)/cv) and parabolas ±x3v’ZX for (6i 〈 P2).

   Par ab ol as ±60 V‘jX and Iines (1 一一 cMcv 一 A)/co) for (cvi 〈 cu2).

   Spectral radii of C,SOR(cv) and 1 g-SOR(c,?).

   Convergence behavior of the SOR method with five-point discretization.

   Convergence behavior of the pseudo-SOR method with five-point discretization,

   Convergence behavior of the SOR method with nine-point discretization.

   Convergence behavior of the pseudo-SOR method with nine-point discretization.

   Comparison of the CPU time on several vector computers (five-point discretization).

   The horizontal a・xis is the number of grid points and the vertical axis is the CPU time

   in seconds.

11. The domain for analysis.

12. Comparison of the CPU time on several vector computers (nine-point discretization in

   curved mesh).

13. The domain for analysis.

14. The grid systems of eccentric double circles.

15. The number of iterations needed for convergence as a function of the acceleration pa一

rameter for (1) pseudoSOR and (II) hyperplane SOR method, The boundary condition

is Dirichlet and the grid systems are (a), (b) and (c) in Fig. 14.

20

Page 22: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

・,Ji i)

 le5

t6 i.o

           O.5t(cos(g))=v一? 一ii =o.437t一..

11

0.0/

 1eO

1

1

P(瑠凸))

1

1

111

    ロ    i

ア(劇ω  1))

    i

    i

    :

1.5 2.0 u,)re laxat ion factor

F  i g . 1

Page 23: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

一 ct

1

p /x

j)t一’

    (W = U・)1)

F i g. 2

22

Page 24: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

一輪

1

ρ2灰

Bi 1X

  :  i  i’

  /

 F i g. 3

x

一戸i!叉

一P 2M

23

Page 25: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

1

噺吻

   2@  i@  i●圏幽  」■●  鯛■●  ■嗣) ●■画 一   1…フ、凶ω一λ

危1天

A

・巫

C‘Ci)

Fi g. 4

24

Page 26: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Jl’

g(cos(g))一fg=o.34i6...

1.5

も0 P(縄澱ω))

L一鋤。 ■9■■DO■顧齢●■馳■一一■ ■一■■爾 ロー ●願■・騨。■膠口■.陶

8紆               {

@             3

P/ P(趣))0.5 1                 1

!1                  1

@              62

46…33

()0

@ 1.0 1.5    2Drelaxation facto

F  i g . 5

Page 27: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

ほ邸

℃リリ

ga

o』

ロ〇一一s’

翼,gi IT,

静碓1.3ω箋i∵

の6;

℃  の

の①』

コ〇一

fo一

章鶏

llω議

9i 、鍋慣ul =1.7

to =1.9

to =1.8

o No. of iterations

(a)

50 o No. of iterations

(b)

so

F i g. 6

Page 28: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

5℃   の

mの』

の〇一

”i

鱒?3

担hヨω=ωopt        園口。」

覇補

to =1.0

 /

 争ω=1・1to tl.2

;ゴ=1●鶴1・6

93

o No. of iterations 50

 e“;, .F

蓉8塁

鉾ヨ2  ヨ

?3

鷺蜀績

CO = (D OPt

w =1. 5

to =1.4

ul =1.3

o No. of iterations so

(a) (b)

F i g. 7

Page 29: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

〇一 9

り㊦=

で  ロ

ca

o』

:, g

9ヨ

難9耳

翼。”,...

9

/」ωoptヂ

(D =1.0

    to =L2to =1. 3

 邸,

での  ロ

co

o』

一コ

雪当

日鷺電もヨ

ー{ωopt

顎a

 ’lx

to tl.4

so

to =1.9

            w =1i8(D =1e6      to =1.7

oNo. of iterations o No.’ of iterations se

(a)(b)

F i g. 8

Page 30: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

一,

℃ のり

の。』

o.,..,

.,1

翼12 i’

〇一

   /

to = to opt

(D =1.0

 亦

(D ’= 1. 2

の¢コ

℃ ロ

の①

 o肖

一ロ

0の甲

0り門ほ

〇一〒

oの頃1

の㌣

oの氏8

 oI

 3.ミ

g・ {

tl g

襲ちヨ

(D =1.9

   Nto =L8     a) =1.7

to = w opt

to =1.6

to =L4

叡  w =1.3

“3一一一一’N一一一一一一一一一一一一一一一一一一一’r一一一’一一

R

o No. ef iterations

(a)

50 o No. of iterations

(b)

so

F i g一 9

Page 31: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

8「一一一「6い_。R/l

8[

6,i

41

2 1

mo

     1 04

(a) vp-200

pseudo-SOR

     Hyperplane

      1 04

(b) VP-400E

8, L,

4 [i

2[

  h

2Xl 04

   20 Oi

2Xl 04

   200

pseudo-SOR

      Hyperplane

8

6F

41 1

oL∠一===========コ          104 2XIO4

    (c) S-820/80

 r

 ト

2[

o

pseudo-SOR

Hyperplane

200

 r

r pseudo-soR t

L. /Hyperpianeo1

     1 04

(d) SX-2

2Xl 04

    L

    L

    L

    L

    l

loor    江

    t’

o

pseudo-SOR

     Hyperplane

l

ili oo

ll

l

    L

lOOY    L

    l

    r

    い

   o

pseudo-SOR

ffyperpianel

        l

     1 04

(e)C-1 XP

2Xl 04      1 04

(f) TITAN

    F  i g

 2Xl 04

. 1 0

    104 2xlO4

(g) Gslooo

fiii]

Page 32: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Y

X 扁

Page 33: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

40u2。[

    セ

 0

{103

    「

     r

     亡

1 03/!

     1

     「

     「

     の     l

     P

   O

,se、d。.s。R 40i

           Hyperplane

                     lO

      104        2×104

(a)VP-200

                         「

                  1  「

                         [

                         に                         t

・seudo-SOR @    「                         ロ                         し

                  }   0

      104       2×104

(e)C-1 XP

            2 Ot

                                         /

         2×103「一一一「2×103し                                             ii

  pseudo’SOR / 1

                     20

           Hyperplane i .

      一t, 2xl o//

(b) VP-400E

                  l

    pseudo-SOR/ 1 l o3

         kyperpian6

                  i

                       o

      104 2xlO4

(f) TITAN

      F  i g . 1 2

                     i40

      pseudo-soR /20

           Hyperplane

       104 2XIO4 0

(c) S“820/80

      pseudo“SOR

        Hyperp lane

       104 2XIO4

 (g) Gslooo

pseud6-soR /zl

      1 04

(d) SX-2

Hyperplane       ;

     2Xl 04

Page 34: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Y

ヨ嵩

El:ti#. il”一:i,}:ILtr’.?.J.].1

螺こに園

海気雌遡i雌

耳隅司塾㌃.一一

      鋼製総懸

一L・“U.1.“b

墾 X E3

F i g一 1 3

Page 35: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

Y Y

X

N

X

(a)Y

(d)Y

X X

(b) (e)

X

(c) (f)

Fig一 1434

Page 36: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

No. of iterations

300:’ic・=r.m

1

200

100  0e9

雛 、へし

@  ’⊆L     、

5                         5                           .

@       ∫        5

、口、  、

@  、口、、 、口、   、《國L-r■」     一

一:

1(b)       「

P

(a)

D

1eO        lel le2

  aceeleration parameter

Fi g一 15 (1)

le3

Page 37: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

No. of iterations

50●1、

q 、 、

、、、 \●

、、 11口 、 !●

I

、自 ノ@ノ

獣\

40、、b 、  ’

@ノ@’

ヒ●、

  (c)、、 、 、

   1@ 1@ !

@1鴻m

mヒ 、

Q(b)

、、 、 、

  ’

@ノS

b”

3

(a)

2望601.70

acceleration parameter

F  i g. 1 5 ( II)

36

Page 38: 1SE-TRd-90-86 On the Efficiency of an SOR-like Method … · 2020. 4. 1. · SOR is superior to the pseudo-SOR method although the latter is easily vectorizable on vector

INSTITUTE OF INFORMATION SCIENCES AND ELECTRONICS

              UNIVERSITY OF TSUKVBA

           TSUKUBA-SHI, IBARAKI 305 JAPAN

REPORTA

DOCUMENTATION PAGE

REPORT NUMBER

      工SE-TR-90-h86

TITLEOn the E伍ciency of a:n SOR-like Method Suited to Vector

Processor “

AUTHOR (S)

Masaaki Sugihara, Department of Economics, Hitotsubashi University

Yoshio Oyanagi, lnstitute of lnformation Sciences, UniveTsity of Tsukuba

      Masatake Mori,Fαculty o/EngineeTing, Univers吻of Tokyo

        Seiji Fujino, lnstitute of Computational Fluid Dynamics

REPORT DATEOctober 1990

MAIN CATEGORY

numerica1 analysユS

NUMBER OF PAGES

    36

CR CATEGORIES

KEY WORDS

 SOR, vector processor, acceleration parameter, hyperplane method

ABSTRACT        A・impl・impl・m・nt・ti…nv・ct・・p・・ce…rs・fth・SOR m・th・d f・・s・1・i・g li・…ly・tem・f・quati・n・

      arising from the discretization of partial differential equations makes the SOR method into another, which,

      though, looks like the SOR method・In this paper, the.e伍ciency of this SOR-like method(pseudo-SOR

                                           コ      method)is investigated. For the Poisson equation in a rectangular region, in both五ve-point and nine.point

      discretization, we prove analytically that the optimal acceleration parameter is smalIer and the optimal

      convergence rate is lower in the pseudo-SOR method. Comparison on several vector computers was

      made between the SOR method vectorized with the hyperplane technique and the pseudo-SOR method.

      It turned out that the hyperplane SOR is superior to the pseudo-SOR method although the latter is

      easily vectorizable on vector processors. We also tested an example of Poisson equation descretized in a

      curved coordinates in a region bounded by two eccentric circles and found numerically that the optimal

      acceleration parameter becomes small and the convergence becomes slow in the pseudo-SOR method,

      while the optimal acceleration parameter in the SOR method does not change appreciably. The genuill

      SOR method is superior to the pseudo-SOR method.

SUPPLEMENTARY NeTES   .lnvited talk presented by Oyanagi at lnternational Congress on Computational and Applied Mathematics held ,at Katholieke

 Universiteit Leuven, Belgium, on July 23-28, 1990.