1SE－TRd－90－86 On the Efficiency of an SOR－like Method … · 2020. 4. 1. · SOR is...

1SE－TRd－90－86

On　the　Efficiency　of　an　SOR－like　Method　Suited　to　Vector　Processor

　　　　　　　　　　　　　　　　　　　　　　　　　　by

Masaaki　Sugihara，　Yoshio　Oyanagi，　Masatake　Mori　and　Seiji　Fujino

　　　　　　　　　　　　　　　　　　　　　October　4，　1990

Qn　the　Efiiciency　of　an　SOR－like　Method　Suited　to

　　　　　　　　　　　　　　　　　　　Vector　Processor　＊

　　　　　　　　　　　　　　　Masaaki　Sugihara

　　Depαrtment　of　EconomiC8，　Hitotsiめashi　Univers吻

　　　　　　　　　　　Ifunitachi，　Tokyo，　186　Japan

　　　　　　　　　　　　　　　　Yoshio　Oyanagi

Institzeteげlnformαtion　Sciences，　Univers吻。！TSttkubα

　　　　　　　　　　　Tsukuba，乃a　rα　ki，305」’αPαn

　　　　　　　　　　　　　　　　Masatake　Mori

　　　　　Fac吻・！恥卿εε7吻，　Unive7’吻・ノT・ky・

　　　　　　　　　　Bu吻。一紘Tokyo，　113　」αpan

　　　　　　　　　　　　　　　　　　Seij　i　Fuj　ino

　　　　　Institute　of　Computational　171uid　Dynamz’cs

　　　Haramachi　2－1－4，　Meguro－ku，　Tokyo，　152　Japan

　　　　　　　　　　　　　　　　　October　199．　0

　　　　　　　　　　　　　　　　　　　　　　　　　　　Abstract

　　　　　　A　simple　implementation　on　vector　processors　of　the　SOR　method　for　solving　linear

　　　system　of　equations　ari’sing’　from’the　discretization　of　partial　differential　equations　makes

　　　the　SOR　method　into　another，　which，　though，　looks　like　the　SOR　met・hod，　lii　this

　　　paper，　the　efficiency　of　this　SOR－like　method　（pseudo－SOR　method）　is　investiga，tcd，

　“lnvited　talk　presented　by　Oyanagi　at　lnternatio7｝al　Congress　en　Computational　an（1　A；）p17ied　A・fathematics

held　at　Katholieke　Universiteit　Leuven，　Belgium，　on　July　23－28，　1990．

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　1

For　the　Poisson　equation　in　a　rectangular　region，　in　both　five－point　and　nine－point

discretization，　we　plove　a皿alytica皿y　that　the　optimal　acceleratio皿1）arameter　is　smaller

and　the　optimal　convergence　rate　is　lower　in　the　pseudo－SOR　method．　Comparison

on　several　vector　computers　was　made　between　the　SOR　method　vectorized　with　the

hyperplane　technique　and　the　pseudoSOR　method．　lt　turned　out　that　the　hyperplane

SOR　is　superior　to　the　pseudo－SOR　method　although　the　latter　is　easily　vectorizable　on

vector　processors．　We　also　tested　an　example　of　Poisson　equation　descretized　in　a　curved

coordinates　in　a　region　bounded　by　two　eccentric　circles　and　found　numerically　that

the　optimal　acceleration　parameter　becomes　small　and　the　convergence　becomes　slow　in

the　pseudo－SOR　method，　while　the　optimal　acceleration　parameter　in　the　SOR　method

does　not　change　appreciably．　The　genuin　SOR　method　is　superior　to　the　pseudo－SOR

method．

1 Introduction

　　　　Various　vectorization　techniques　have　been　proposed　to　improve　the　computational　speed

on　vector　processors．　The　essential　ingredients　are　to　remove　recursion　in　loops　and　to　make

the　vector　length　as　long　ag　possible．　We　will　consider　the　vectorization　（＞f　the　SOR（Successive

OverRelaxation）　method　for　two－dimentional　partial　differential　equations．

　　　The　following　discussion　holds　for　general　linear　second－order　elliptic　partial　differential

equations．　For　simplicity，　however，　we　will　start　with　the　Poisson　equation　in　a　rect　angular

re　9i・nの＝｛＠，y）10＜x＜1，0＜y＜1｝，

　　　　　　　　　　　　　　　　△一1舞＋1舞一・・nD，姻一ψ（ゆn∂P・　（1）

　　　Discretization　of　the　second　derivatives　of　（1）　by　five－point　or　nine－point　finite　differences

on　an　（N　十　1）　×　（N　十　1）一grid　with　grid　spacing　h　＝　1／N　leads　to　a　set　of　linear　equations：

σ臼，ゴ＋σi＋、，ゴ＋σらゴー、＋σらゴ＋一4Ui，ゴ＝’ O （2）

or

4（Ui－i，」　＋　Ui＋i，」　＋　Ui，」一i　＋　Ui，」＋i）　＋　Ui＋i，」一i　＋　Ui＋i，」＋i　＋　Ui－i，」一i　＋　Ui－i，」＋i　一一　20Uif一一・一一　O　（3）

for　i，2’　一一“＝　1，2，．．．，N　一　1，　which　form　a　system　of　linear　equations　for　t，he　solution　vector　U．

2

　　　Eqs．（2）　and　（3）　can　be　solved　with　the　polular　SOR　method．　We　vvill　show　here　a　program

for　the　five－point　case：

　　　DO　10　J＝1，　N－1

　　　　DO　20　1＝1，　N－1

　　　　　　U（1，J）＝U（1，J）＋OMEGA＊（O．25＊（U（1，J－i）＋U（1，J＋1）＋U（1－1，J）＋U（1＋1，J））一U（1，J））

20　CONTINUE10　CONTINUE

Here，　OMEGA（w）　is　the　aceeleration　parameter．　Running　with　i　and　2’　from　1　to　IV　一一　1，　two

neighboring　grid　points　of　（i，1’），　（i　一　1，」’）　and　（i，2’　一一　1）　have　already　been　updated，　so　that

Ui，」　i　computed　with　these　new　values　and　old　values　of　Ui＋i，」　and　Ui，」t＋i．　The　SOR　method，

therefore，　is　recursive　in　both　i一　and　2Ldirections．　This　data　dependency　makes　this　loop

hard　to　be　vectorized．

　　　There　are　some　well－known　techniques　to　vectorize　the　SOR　method　［1］．　We　will　consider

here　a　so－ca皿ed　hyperplane　method［2］，　which　essentially　reorders　the　numbering　of　the　lattice

points　without　changing　the　algorithm．　The　program　will　be　as　follows：

　　DO　10　K＝2，　2＊（N－1）

＊VOPTION　NODEP（U）

　　　　DO　20　J＝MAX（1，K＋1・一N），　MIN（N－1，K－1）

　　　　　　工嵩K－J

　　　　　　U（1，J）＝U（1，J）＋ONEGA＊（O．25＊（U（1，J－1）＋U（Z，J＋1）＋U（r－1，J）＋U（1＋1，J））rU（1，J））

20　CONTIN’UE10　CONT工］EfUE

This　program　is　methematica皿y　equivalent　to　the　SOR　method．　The　inner　loop　has　no

recursion　and　the　access　to　the　array　U　is　with　only　constant　strides，　Howe’ 魔?s　the　hyperplane

SOR　method　involves　short　average　vector　length　（N　一　1）2／（21V　一一　3）　f」　N／2　and　ordinary

vectorizing　compilers　cannot　recognize　that　there　are　no　data　dePendencies，　so　that　we　have

to　insert　a　directive，　＊VOPTION　in　this　example，　to　the　compiler．　For　some　vector　computers，

access　with　stride　more　than　one　may　cause　a　decrease　in　the　data　transfer　rate　from　memory

to　vector　registers　and　shorter　vector　length　may　increase　relative　start－up　time．　Nine－point

discretization　can　also　be　processed　by　a　hyperplane　method，　but　with　average　vector　length

fu　N／3．

　　　A　more　naive　implementation　of　this　loop　on　a　vector　processor　is　to　modify　the　program

as

3

　　DO　10　J＝1，　N－1

　　　　DO　20　1＝1，　N－1

　　　　　　V（1）＝U（工，J）　＋　OMEGA＊（0．25＊（U（1，J－1）＋U（1，J＋1）＋U（1－1，J）＋U（工＋1，J））一U（1，J））

20　CONTIN’UE　　　　DO　30　工＝1，　N－1

30　U（1，J）＝V（1）

tO　CONTINUE

In　this　program七he　data　dependency　in　the　SOR　method　is　removed　so　that　the　loops

can　be　successfully　vectorized．　The　program　involves　only　stride　one　access　to　the　memory．

Obviously　this　is　a　different　algorithm，　since　Ui－i．・　on　the　right　hand　side　comes　from　the　old

i七eration　Ievel　in　contrast　to　the　SOR　method．　We　will　call　it　the　pseu（1（”SOR　method．　The

pseudo－SOR　method　is　sometimes　used　as　a　simple　vectori　zation　technique　in　computational

fluid　dynamics．

　　　In　the　next　section，　the　optimal　acceleration　parameter　w　and　the　rate　of　converngence　of

the　two　methods　for　five　point　discretization　is　discussed．　Similar　arguments　for　nine－point

discretization　are　given　in　section　3，　Numerical　results　on　various　vector　processors　are　shown

in　section　4，　inc｝uding　one　for　curvilinear　coordinates．　ln　section　5，　we　will　conclude　that　the

hyperplane　SOR　is　superior　to　the　pseudo－SOR　method．

2 Comparison　of　SOR　and　pseudo－SOR　Methods　for

Five－Point　Discretization

2．1　Matrix　Formulation

Equation（2）　forms　a　system　of　liner　equations　with　matrix　As　for　the　solution　vector　u．　If

we　order　the　elements　in　the　solution　vector　lexicographically　as

　　　　　　　　　　　　　　（Ul，1）Ul，2；…7Ul，N一一1）112，1）U2，2）…）U2，N一一1）．．・）UN－1，N－1））

the　coefficient　matrix　A．r　is，

4

with

Bs　＝

　1

－1　4

o

　1－m一@4

　1

As　：

　1

　4

一　　一　　一

Bs

Cs

o

o

一一一獅P

oBs　Cs

一　　一　　一　“　　一　　一

　　　一　　一　　一　一　　一　　一

　　　Cs　Bs　Cs

　　　　　　　Cs　Bs

　　　　　　　　　一1

　　　　　　　　　　4

，Cs　＝

o

’

1

4

1

4

o

1

4

　　　The　iteration　in　the　SOR　method　is　written　as　usual，

　　　　　　　　u（m＋i）　＝　lcgORu（m）　＋　b’　＝　（1　一一　cvL，SOR）一’（（1　一　c，7）1　＋　cvU，SOR）u（M）　＋　b’

where

and

　SORL，Dv．　＝

一．Bま

一〇、一瑳

o

o

一C，　一B，L

，

B，L　＝

σ『o丑＝（五ぎOE）T．

o

一一　1

　4

o

o

o

一1　0　4

　　　On　the　other　hand，　the　iteration　in七he　pseudo－SOR　method　is　forlnulated

　　　　　u（m＋i）　＝　，ce，一SORu（m）　＋　bt　，．，　（1　．　cdLe，一SOR）一i（（1　一　co）1　＋　cou，p’SOR），t，（m）　＋　btt

where

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　o　　　　　O

　　　　　　　　　　　　　　　　　　　　　　　　　　　　－C，　O

　　　　　　　　　　　　　　　　　　　．Lg－SOR，＝1　…，．．　1，

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　0　一C，0

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　5

’

（4）

（5）

（6）

（7）

（8）

（9）

（10）

and

　　　　　　　　　　　　　（一B，L－B，U）　一Cs　O　　　　　　　　　　　　　　　　　　　　　　　　（一B，L－B，U）　一Cs

　　u，p－SOR＝1　．．．　．．．　1，　（11）　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　（一瑠一瑠）　一〇5

　　　　　　　　　　　　　　　　　0　（一　B，L－B，U）

with　B，U　一一一：　（B，L）T．

2．2　Acceleration　Parameter　w　and　the　Spectral　Radius

The　order　of　convergence　of　Eq．（6）　and　Eq．（9）　is　determined　by　the　spectral　radius　of　LsSOR

and　Lg’一SOR，　respectively，　As　an　illustration，　p（LgOR）　and　p（Lg－SOR）　are　numerically　esti－

mated　for　IV　＝　6　and　shown　in　Fig．　1．　The　result　shows：

　　1．The　optimal　accelelation　parameterω。pt　f（）I　the　pseudo。SOR　method　is　substantia皿y

　　　　smaller　than　that　for　the　SOR　method．

　　2．　The　rate　of　convergence　of　the　pseudo－SOR　method　using　the　optimal　acceleration

　　　　parameter　is　worse　than　that　of　the　SOR　method．

　　3．　The　region　of　the　acceleration　parameter　which　ensures　the　convergence　of　the　iteration

　　　　is　narrower　for　the　pseud（）一SOR　method．

In　the　next　subsection，　we　will　generalize　the　results　to　arbitrary　N．

2．3　Analytic　Determination　of　the　Optimal　Acceleration　Param－

　　　　　eter　of　the　SOR　and　the　pseudo－SOR　Methods

The　optimal　acceleration　parameter　of　the　SOR　method　for　our　case　and　the　correspon（ling

spectra｝　radius　is　given　by　Varga　［3］：

　　　　　　　　　　　　　　　　cvopt　＝　ir一　Fgtllk’lki；ii’in　．z1．），Aopt　＝p（LgOR（cvopt））　＝　cc一，．pt　一一　i．　（i2）

Asymptotically，　　　　　　　　　　　　　　　　c－o．，t　＝2　一T　3i’li　＋　o（　1N2），　A．t　＝i一　3i’；　＋　o（　？」｝i，T）・　（i3）

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　6

　　　On　the　other　hand，　we　will　later　show　that　the　optimal　acceleration　parameter　for　the

pseudo－SOR　method　w．pt　and　the　corresponding　spectral　radius　A．pt　are　given　as　the　solution

of　the　following　equation　with　respect　to　w　and　A：

　　　　　　　　　　　　　　　　　　憶鰍潜醐　（14）

The　asymptotic　estimates　are：

　　　　　　　　　　　　　，．．，，　＝　g一　g（．z［一）2＋　o（lieT，　），　A．，，　＝1　一一一　（ft）2　＋　o（7（！：，r）．　（ls）

We　note　that　the　optimal　spectral　radius　A．pt　approaches　unity　in　O（N－i）　in　the　SOR

method，　while　in　0（IV－2）　in　the　pseudo－SOR　method．　The　observation　for　IV　＝＝　6　given　in

the　previous　subsection　also　applies　to　large　N．

In　order　to　derive　Eq．（14），　we　will　find　all　the　eigenvalues　of　Cg－SOR　by　solving　t　he　secular

equation．　We　note　t　he　following　equality　holds：

det（cg－SOR（c，7）　一一　Ai）

　　　＝　det（（1　一　cvLg－SOR）一i．（（1　一　co）1　十　cvU，PtaSOR）　一一　AJ）

　　　＝（det（1－cv　L9曙SOR））一1　det（（レω一λ）1＋λcvLg－SOR＋ωσぎ一SOR），

（16）

We　have　only　to　solve　the　equation

det（（1　一　c，7　一一　A）1　＋　AcvLg－SOR　＋　co　u，p－SOR）　＝　o．（17）

The　matrix　in　the　parenthesis　is　a　block　triangular　matrix：

E　　F

A．F　E　lii

　　　　－　　e　　一　一　　一

o

o

AF　E

　　　　N　一一　1　blocks

，

7

with

E＝

1一一　cL2　一A

　　　里　　　4

o

　　　里　　　4

1一　cv　一A

　　一　　一　　一

里4


！

4

o

1一ω一λ　　景

　　　￥　1一一cv一一一A　　　4

N　一一　1　elements

　　　　　　　wand　F＝　＝1．　　　　　　　4

We　will　show　the　following　lemma．

エemma　1ゐθ田αnd・C　be　squαre　mαtrices・∫・rder・n，抗θノ・〃・ω吻iden鞠ん・lds・

det

B　　C

AC　B

　　　　一　　一　　一

o

C


AC

o

B　C

AC　B

　：　det

Proof　Since　the　first　equality　is　self－evident

matrix　of　order　m　with

A＝

B　V‘］iCC

＞仮O　B　vCXc

　　　　　　一　　一　　一　｛　　一　　一

　　　　　　　　　　　viXc

o

o

B　vi；Slc

vi51C　B

m　blocks

　　　　　　　　　　　　　　　一壷d・t（B卿），一一2…（m律1）・（18）

　　　　　　　　　　　　　　　　　　　　　，　we　will　prove　the　second　one．　The　following

　　　　null　dia，gonal　elements　is　diagonalized　by　an　orthogonal　matrix　P．

　　i　O　1　l　　l　62　　　　　．．．　．”　1　－pl　．”　Ip－i．　（lg）

　　　　　　　　　1　0　1　1　1　6m－i

Replacing　ea・h　element　aiゴwith　a　bl・・k曜rσand　adding　B　t・the　diag・nal　b1・cks，　we

8

have　the　folowing　relation　between　blocked　matrices　of　ordel　nm．

　　　　β〉「AC　　O　B＋δ1＞『o　　　’　0　　　～／XO　　　　jB　　　　～／ヲ（C　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　1ヲ十δ2　v！天0

　　　　　　　　　　．．，　　．．．　　　　　　　　＝戸　　　　　　　　　　　　　　　　　　．．．　　　　　　　　　　　　戸一1

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　，

　　　　　　　　　　　　　　　●　　．　　●　　　　　　　　　　　　　o　　●　　●　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　・　　●　　0

　　　　0　　　～バoB　　　O　　　　　B＋δm＞ro　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　（20）

　　　　　　ガwhere」P　is　a　matrix　of　order　nm　extended　from、P，　where　p‘ゴis　replaced　with　a　block，　jpiゴ1，、．

Estimating　the　determinants　on　both　sides，　we　obtain　Eq．（18）．　　　　　　　　　　Q．E，D．

　　　　We　applyもhis　lemma　to　Eq．（17）and丘nd　that　the　equation　is　equivalent　to　the　system

of　equations：

　　　　　　　　　　　　　　　　　　　　　　　d・t（E＋δゴ〉伐∬）一・，δゴー2…（影）

forゴ＝1，＿，1V－1．　Si皿ce」醒十6ゴ〉「A∬is　again　a七ridiagonal　matrix，　applying　this　lemma

again，　we丘nd　that　Eq．（17）is　equivalent　to　the　sysmtem　of　equations　forλ：

　　　　　　　　　　1一（1＋生　　　4）ω一λ一害孤ω（ゴー1，…，N－1；k・・＝　1，．．．㌍）（21）

The　following　proposition　holds：

Proposition　l　Theαろsolute　vαlues　o／th　e　solutions　o／th　e　egzzαtion（21？is　smα〃eT　thαn　or

eguαl　to

　　　・The　1αrgest　abs・lute　vαlue・f　the　s・lutions　for　1一（1一圭。・s（k））ω一λ＝一去。・s（持）V伐ω

　　　　　07’

　　　・1＋圭。・s（T））ω一1・

If　this　proposition　is　proved，　it　is　straiglltforward　to　show　thatω。pt　is　given　as　the　solution

of　Eq．（15）、

Pr・gf　Eq・（21）・an　b・　・xpressgd　a・　th・fa血1y・f　quad・atic　equati・・曲rλ，

　　　　　　　　　　　　　　　　　　　　　　　1一αω一λ　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　＝士β〉「A，withα〉β≧0．　　　　　　　　　　　　　（22）

　　　　　　　　　　　　　　　　　　　　　　　　　　　ω

　　　We　will　first　discuss　the　case　forβ．＞0．　Figure　2　sllows　that　Eq．（22）has　two　real　solutiolls

for　O＜ω≦ω1　and　two　complex　solutions　forω1＜ωwith　the　absohlte　valueαω一1，　where

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　9

ω1＝2／（α十〉両）．For　real　solu七ions，　the　larger　one　monotonicaHy　decreases　asω

increases　and　the　absolute　value　of　complex　solutions　increases　as　w　increases．

　　　It　c　an　easily　be　understood　that　the　absolute　value　of　the　complex　solution　of　Eq．（22）　is

smaller　than　or　equal　t　o　（1　十　t2　cos（ft））w　一　1，　where　2　cos（ft）　is　the　maximum　value　of　6k．

　　　Let　us　consider　the　re　al　solutions．　Figure　3　shows　the　curves　for　two　sets　of　equations

with　the　same　a　and　different　fi，　fi2　〉　fii．　lf　the　equation　with　fii　has　real　solutions，　the

equation　with　fi2　has　also　real　solutiohs　whose　larger　value　is　greater　than　that　for　6i．　Figure・

4　shows　the　curves　for　two　sets　of　equations　with　the　same　fi　and　different　a，　or2　〉　ori．　It

can　easily　be　seen　that，　if　the　equation　with　a2　has　real　solutions，　the　equation　with　ai　has

also　real　solutions　whose　larger　value　is　greater　than　that　for　a2．

　　　From　these　observations　the　absolute　value　of　the　real　solution　for　Eq．（22）　is　always

smaller　than　or　equal　to　the　larger　solution　for　Eq．（22）　with　the　smallest　or　and　the　largest

6．　The　real　solution　of　Eq．（21）　is　no　greater　than　the　larger　solution　of

　　　　　　　　　　　　　　　　　　　1　一一　（1　一　icos（ft））cL，　一一　A　＝　一icos（ft－v／］Xcv）．

This　completes　the　proof　for　fi　〉　O．

　　　If　6　＝　O，　Eq．（22）　is　a　linear　equation．　The　same　arguments　hold　if　real　and　complex

solutions　are　understood　as　positive　and　negative　solutions，　respectively．　Q．E．D．

　　　The　numerical　results　of　w．pt　and　A．pt　are　shown　in　Table．　1．

3 Comparison　of　SOR　and　pseudo－SOR　Methods　for

Nine－Point　Discretization

　　　　In　this　section　we　wi皿analyse　the　SOR　and　pseudo－SOR　method　for　the　nine－point

discretization　of　Poisson　equation，　Eq，（3）．　In　the　SOR　method，　Ui，，・　i　s　computed　with　four

new　values　Ui－i，」’，　Ui－i，」’一i，　Ui，」’．i　and　Ui，」＋i，　while　in　the　pseudQSOR　method　three　new

values　Ui－i“・一i，　Ui“’一i　and　Ui，」＋i　are　used，　so　that　we　expect　the　degradation　in　using　pseude

SOR　method　may　not　be　so　large　as　in　the　five－point　discretization．

10

3．1　Matrix　Formulation

For　the　nine－point　discretization，　the　linear　equation　is　described　in　the　matrix　form，

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　B，　Cg’　O

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Cg　Bg　Cg

　　　　　　　　　　　　　　　　　　　　Agu＝b，　Ag＝1　．．．．．．　1，　（23）

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Cg　Bg　Cg

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　O　C，B，

wiもh

　　　　　　　　　　　　　　　1－9　0　　－t一売　　0

　　　　　　　　　　　　　　一一去1一去　．　　一売一一去一売

　　　　　　　Bg＝1　”．”．　1，C，一1　．”　”，　1．　（24）　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　1　　　　1　　　　1　　　　　　　　　　　　　　　　　　　　　　　一　　一　　一　一　　一　　一　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　20　5　20

　　　　　　　　　　　　　　　0　一gi7　kO　一“，一g

　　　We皿ote　that　Og　is　not　a　diago皿al　matrix　in　contrast　to　O5．　The　iteration　in　the　SOR

method　is，

　　　　　　　　　．（m＋i）＝，C80Ru（M）＋bt＝＝（1一一ctiL，SOR）一i（（1－cu）1＋coU，SOR）u（M）＋bt　（2s）

where

　　　　　　　　　　　　一瑠　　　　　　　o　　　　　o　　　　　O

　　　　　　　　　　　　－C，　一B，L　1　1一一gOL，SOR eI　．，．　．，　1，B，L一一．p　1　．．．．．．　1，　（26）

　　　　　　　　　　　　　　　　　　　　　　　一C，一B，L　1　1　．一一・

　　　　　　　　　　　　　0　　一σ9一増　　0　　一去・

and

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　U，SOR　．，　，（L，SOR）T．　（27）

　　　On　the　other　hand，　the　iteration　of　the　pseudo－SOR　method　for　nine－point　discretization

is　formulated　in　the　same　way　as　for　five－point　discretization，

　　　　u（m＋i）：，cB－SORu（m）＋bt＝（1－cvLgTSOR）tsi（（1－cv）1＋cou，P－SOR）u（m）＋b”　（2s）

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　11

where

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　O　　　　　　o

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　“Cg　〇

　　　　　　　　　　　　　　　　　　　五召一50R＝　　．．．，．．　　　，　　　　（29）

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　一一C，　0

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　0　一一　C，　O

and

　　　　　　　　　　　　　（一Bポ殉　一〇・　　　　　0　　　　　　　　　　　　　　　　　　　　　　　　（一B，L－B，U）　一Cg

　　　u，p－SOR，．1　．．．　．．．　1．　（30）　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　（一B，L－B，U）　一Cg

　　　　　　　　　　　　　　　　　O　（一B，L－B，U）

3．2　Acceleration　Parameter　w　and　the　Spectral　Radius

First　we　estimated　numerically　the　spectral　radia　of　LgOR（w）　and　Lg－SOR（w）　for　IV　＝　g．　The

results　are　shown　in　Fig．　5．　The　same　statements　holds　as　in　Section　2．2．　Although　three

new　values　are　used　to　update　Ui“・，　the　pseudeSOR　method　is　considerably　inefficient，　We

will　discuss　the　convergence　of　these　two　methods　for　general　N．

3．3　Analytic　Determination　of　the　Optimal　Acceleration　Param－

　　　　　eter　of　SOR　and　pseudo－SOR　Methods

In　a　similar　manner　to　the　five－point　case，　we　have　to　solve　det（（1　一一　cv　一　A）1　十　Acx7LgSOR　十

wUgSOR）　＝＝　O，　to　obtain　the　eigenvalues　of　LgSOR．　Using　Lemma　1　twice，　the　problem　is

reduced　to　a　set　of　equations，

　　　　　　　　レω一λ一6ゴ（ω／5）vCX　一　6，（ω／5）（λ一（δゴ／4）孤）（1一（δ、／4）一N／‘iY）＝0，　（31）

for　2’　一一一　1，．．．，IV　一一　1，k　＝　1，．．．，N一　1．　Based　on　our　numerical　experience，　we　conjecture　that

the　optimal　acceleration　parameter　w．pt　is　given　by　the　w　for　which　the　largest　real　solution

coincides　wi七h　the　absolute　value　of　the　complex　solutio皿of　the　quartic　equaもion　wi七h　Iespect

to　A，

　　　　　　　　　　　　　　　（1　一　ct7　一一一　A）4　一一　262（8　＋　62）（！ll）2A（1　一一一一　cv　一　A）2　＋　68（llll）‘　＋

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　12

　　　　　　　　　　　　　　　　166‘（w5）3A（A＋　1）（1　一　cv　一　A）　一一一一　466（｛ilA（A＋　1）2　＝　O．　（32）

Assuming　this　conjecture，　we　calculated　w．pt　and　A．pt　numerically　for　IV　S　1000　and　found

an　approximate　formula：　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　5．61　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　6．63

　　　　　　　　　　　　　　　　　　　　　　　　cuopt　f；2一　：［i’i（1：VV）　Aopt　f　s　1一”　E’iiiir：V““・　（33）

　　　On　the　other　hand，　in　a　similar　manner　to　the伽e－point　case，　the　opもimal　acceleration

parameter　and　the　corresponding　spectral　radius　for　the　pseudo－SOR　method　are　determined

in　terms　of　the　se七〇f　equation：

　　　　　　　　　　　　　　　　　1　一　（1　一　g　，os（ft，）），，　一　A　＝　一！ill　，．，（li（i，）（2　＋　，．，（ft，））vlx

　　　　　　　　　　　　　　　　　　　　　　　　2　．r

　　　　　　　　　　　　　　　　A＝（1－Eicos（：ii’7・））co－1．　（34）

Asymptotically，

　　　　　　　　　　　　　，，，．，＝IUti一一：51（k）2＋o（liel，T），　A．，，＝1－g（ft）2＋o（lier，）．　’　（3s）

Table　2　shows　the　numerical　values　for　w．pt　and　A．pt．　VSie　can　see　that　the　optimal　accelera－

tion　parameter　for　the　pseudo－SOR　method　is　substantially　smaller　than　that　for　the　SOR

method　and　the　rate　of　convergence　of　the　pseudo－SOR　method　using　the　optimal　acceleration

parameter　is　worse　than　that　of　the　SOR　method，

4　Numerical　Results　on　Vector　Processors

　　　　We　have　seen　that　the　pseudo－SOR　method　is　inferior　to　the　original　SOR　method　in

terms　of　the　convergence　ratio　for　both　five－point　and　ninepoint　discretization，　so　that　the

former　method　requires　more　number　of　i七erations　than　the　latter．　On　the　other　hand，　the

pseudeSOR　method　is　very　suited　to　vector　supercomputers，　since　it　involves　only　stride

one　access　and　has　longer　vector　length　than　the　hyperplane　SOR．　We　will　implement　those

method　on　various　supercomputers　and　compare　the　CPU　time　for　solving　the　problem．

13

4．1　Dirichlet　Problem　on　a　Rectangular　Mesh

As　an　example，　we　will　consider　a　Dirichlet　problem　on　the　unit　square　with　the　following

boundary　conditions：

　　　　　　　　　　　　　　　　U（O，g）＝O，　U（1，y）＝O　（O〈y〈1）

　　　　　　　　　　　　　　　　u（x，o）　＝　o，　U（x，1）　＝　O．5　一　lx　一一一　O．51．　（O　〈　x　〈　1）

4．1．1　Numerical　Verification　of　w．pt　for　IV　＝6

As　a　numerical　test，　we　discretized　the　square　by　six　in　one　direction　（N　＝　6）　and　solved　the

equation．　We　show　in　Figs．6－9　the　behavior　of　convergence　for　SOR　method　and　the　pseudo－

SOR　method　for　both　five－point　and　nine－point　discretizations　in　ternrs　of　the　Li－norm　of

the　residual　vector．　The　initial　vector　u（O）　is　a　zero　vector．　The　calculations　were　performed

in　the　double　precision．　Figures　（a）　are　for　cv　〈　wopt　and　（b）　for　w　〉　co．pt．　The　theoretical

optimal　acceleration　par　ameters　w．pt　for　N　＝　6．　are　given　in　Table　1．　Those　figures　show　that

the　predicted　w．pt　is　actually　optimal．

4．1．2　Performance　on　Vector　Computers

In　order　to　make　a　realistic　evaluation，　we　have　tested　the　same　problem　with　more　grip　points

and　measured　the　performance　on　various　vector　supercomputers．　We　only　applyed　the　five

poin七discre七ization，　We　tested七en　cases　with／V＝45，63，77，89，99，109，119，ユ27，135，141．

The　acceleration　parameter　is　fixed　on　the　thgoretical　optimal　value．　The　initial　vector　is

null　and　the　computations　were　done　in　the　double　precision．　The　convergence　criterion　is

that　the　L2－norm　of　the　residual　is　less　than　10－6．　The　computers　used　are，

　　　　　　　　　　　　　　　　　　　　　　Vendor　Model　Max．　Speed

a）　Fujitsu

b）　Fujitsu・

c）　Hitachi

d）　NEC

e）C・皿vex

f）　Ardent

g）　Stellar

VP－200

VP－400E

S－820／80

SX－2

C－1　XP

TITAN

GS　1000

570　MFLOPS

ユ700MFLOPS

2000　MFLOPS

1300　MFLOPS

　20　MFLOPS

　　ユ6MF：LOPS

　40　MFLOPS

14

The　CPU　time　spent　till　the　convergence　is　shown　in　Fig．10　as　a　function　of　the　number

of　grid　points，　The　hyperplane　vectorized　SOR　is　superior　to　the　pseudo－SOR　for　all　the

supercomputers　tested　here．

4．2　Dirichret　Problem　on　a　Curved　Mesh

In　general　curved　coordinate　（〈，　ny），　the　’Laplacian　contains　a　cross　term　a2u／O〈Oo，　which

is　discretized　in　terms　of　（Ui＋i“・＋i　十　Ui．一i，」’．i　一一　Ui＋i，」一i　一一　Ui－i，」’＋i），　so　that　the　discretized

equation　is　a　nine－point　difference　formula　in　general　［4］．　We　test　here　two　cases，　a）　where

the　numbers　of　grid　points　in　two　dimensions　are　the　same，　and　b）　where　the　number　of　grid

points　in　the　horizontal　direction　is　larger　than　the　one　in　the　vertical　directibn．

　　　a）　The　domain　is　shown　in　Fig．11．　The　boundary　values　are　20　on　the　top，　50　on

the　bottom　and　have　interpolated　values　on　the　right　and　left　sides．　The　grid　points　are

462，622，782，902，1022，1102，1182，1262，1342，1422．　The　initial　value　is　all　O　and　the　conVer－

gence　condition　is　that　the　L2－norm　of　the　residual　vector　be　less　than　10－6．　Since　the　optimal

acceleration　parameter　is　not　known　theoretically，　we　searched　the　optimal　one　in　step　O．02．

The　CPU　time　for　the　SOR　and　pseudo－SOR　methods　is　shown　in　Fig．12．　The　hyperplane

vectorized　SOR　method　is　much　faster　than　the　pseudfrSOR　method　even　in　the　nine－point

discretization．

　　　b）　In　this　case　a　horizontalIy　long　domain　with　curved　boundaries　is　treated（Fig．13）．　We

tested　three　types　of　grids，　40×82，　40　×　162，40　×322．　Other　conditions　are　same　as　in　a）．　Table

3　shows　the　CPU　time　for　VP－200　and　S－820／80．　Although　a　long　vector　length　is　favorable

for　the　pseudo－SOR　method，　the　result　in　this　table　indicates，　the　larger　the　number　of　grid

points，　the　faster　the　hyperplane　SOR　method　as　compared　to　the　pseud（＞SOR　method．

4．3　Mixed　Boundary　Problem　on　a　Circular　Mesh

In　this　subsection　we　wi11　discuss　Poisson　equation

　　　　　　　　　　　　　　　　　　　　　　　　　O2u　02u　　　　　　　　　　　　　　　　　　　　　　　　　52tilii　＋　3ii71／　＝　一一2sin（x）sin（y）

in　regions　bounded　by　two　eccentric　circles，　as　shown　i．n　Fig，14．　VkTe　tested　six　kinds　of

grid　with　different　eccentricity　and　concentricity　［5］．　Such　grids　are　used　in　calculating

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　15

the　stream　around　a　circular　cylinder．　Two　kinds　of　boundary　conditions　are　assumed：　a）

Dirichlet　contition　on　all　boundaries，　and　b）　Dirichlet　on　the　inner　boundary　and　Neumann

on　the　outer　boundary．　The　size　of　the　grid　is　16（r－direction）　×　50（e－direction）．　The　analytic

solution　is　u　＝　sin（x）sin（y）．

　　　The　optimal　acceleration　parameter　and　the　corresponding　spectral　radius　are　searched

numerically　in　step　O．02．　The　results　are　shown　in　Table　4　for　the　six　kinds　of　grid　in　Fig．14．

The　optimal　acceleration　parameters　in　the　hyperplane　SOR　method　are　stable　for　both

Dirichet　and　mixed　conditions　with　respect　to　the　eccentricity，　while　those　the　pseudo－SOR

method　changes　appreciably．　ln　the　case　of　hyperplane　SOR　method，　the　optimal　acceleration

parameter　for　the　mixed　condition　is　larger　by　O．12－O．18　than　that　for　Dirichlet　condition，

while　in　the　case　of　the　pseudQSOR　method，　there　is　no　difference．

　　　Moreover，　the　spectral　radius　for　the　optimal　acceleration　parameter　for　the　mixed　con－

dition　is　larger　than　that　for　Dirichlet　condition．　Especially，　the　spectral　radius　for　the

pseudo－SOR　method　is　nearly　1．0，　the　convergence　limit．　Fig．15　shows　the　numbers　of　iter－

ations　for　different　acceleration　parameters．　It　is　to　be　noted　that　the　pseudo－SOR　method

is　very　critical，　in　the　sense　that　an　acceleration　parameter　slightly　larger　than　the　optimal

value　may　cause　divergence　of　the　iteration．　On　the　other　hand，　the　converge’nce　region　for

the　hyperplane　SOR　method　is　large．

5 C　o　nclusio　n

　　　　We　have　aI1記ysed　both　theoretically　and　numeIica皿y　the　performance　of　the　genuine

SOR　and　pseudo－SOR．　Our　observation　is　as　follows．

　　1．　For　the　rectangular　region，　in　both　five－point　and　nine－point　discretiz　ation，　the　asymp－

　　　　totic　convergence　rate　is　1　一　C／IV　for　the　SOR　method，　while　1　一　C’／N2　for　the　pseudQ

　　　　　SOR　method．　Although　the　latter　is　easily　vectorizable，　the　overa11　efliciency　of　the

　　　　former　is　higher．

　　2．　Even　when　the　aspect　ratio　of　the　grid　is　large，　which　case　is　favorable　for　the　pseudo－

　　　　　SOR　method，　this　method　is　inferior　to　the　hyperplane　SOR　method．

3．　ln　the　curved　mesh　discussed　here，

16

　　　　　　（a）　As　the　eccentricity　increases，　the　optimal　acceleration　parameter　becomes　small

　　　　　　　　　and　the　spectral　radius　approches　unity　in　the　pseudo－SOR　method．　On　the　other

　　　　　　　　　hand，　the　optimal　acceleration　parameter　in　the　hyperplane　SOR　does　not　change

　　　　　　　　　appreciably．

　　　　　（b）　ln　the　pseudcHSOR　method，　acceleration　parameter　slightly　larger　than　the　opti－

　　　　　　　　　mal　value　may　cause　divergence．

　　　　　　（c）　When　the　outer　boundary　is　with　Neumann　condition，　the　spectral　radius　for　the

　　　　　　　　　pseudo－SOR　method　is　nearly　unity．

　　　We　conclude　the　pseudo－SOR　method　is　less　than　the　genuin　SOR　method　in　efficiency

on　vector　supercomputers，　although　the　former　is　easily　vectoriz　able　with　unit　stride．　The

rate　of　convergence　should　be　most　respected　in　the　vectorization　of　an　algorithm．

AcknowledgementThis　work　is　partly　supported　by　the　Grand－in－Aid　for　General　Scientific　Research　of　The

Ministry　of　Education，　Science　and　Culture　of　the　Japanese　Government　（No．01540169）．

References

［1］　J．　M．　Ortega，　”lntroduction　to　Parallel　and　Vector　solution　of　Linear　Systems”，　Plenum

　　　　Press，　New　York　and　London，　1988，

［2］　L．　Lamport，　”The　Parallel　Execution　of　DO　Loops”，　Comm，　of　ACM，　17，　pp．83－93，　1974．

［3］　R．　S．　Varga，　”Matrix　lterative　Analysis”，　Prentice－Hall，　New　York，　1962．

［4］　J．　F．　Thompson，　Z．　U．　A．　Warsi，　”Numerical　Grid　Generation”，　North　Holland，　1974．

［5］　T，　Kawamura，　H．　Takami，　K．　Kuwahara，　”Computaion　of　High　Reynolds　Number　Flow

　　　around　a　Circular　Cylinder　with　Surface　Roughness”，　Fluid　Dynamics　Research，　1，

　　　pp．145－162，　1986．

17

Table　1　co，pt　and　A，pt　for　five－point　discretization

SOR　method pseud（トSOR　method

ω。μ　　　λ、ゆ ω。μ　　λ。μ

1V＝6 1．33333　　0．33333 1．23431　　　0．76878

ノV＝10 1．52786　　0．52786 1．29285　　　0．90764

1V＝20 1．72945　　0．72945 1．32259　　　0．97584

1V＝50 1．88183　　0．88183 1．33158　　　　0．99606

2V＝100 1．93909　　0．93909 1．33289　　　0．99901

1V＝1000 1．99373　　0．99373 1．33332　　　0．99999

Table　2　wopt　and　Aopt　for　nine－point　discretization

SOR　method pseud（トSOR　method

ω。Pε　λ。P・ ω。P・　　λ。μ

1V’＝6 1．31393　　0．37071 1．26184　　　0．69896

1V＝10 1．50902　　0．56335 1．35459　　　0．86991

ノV＝20 1．71627　　0．75377 1．40799　　　0．96425

！V＝50 1．87542　　0。89351 1．42517　　　0．99411

N＝100 1．93567　　0．94529 1．42772　　　0．99852

ノV＝1000 1．99337　　0．99439 1．42856　　　0．99999

Table　3　CPU　time　in　sec　for　the　optimal　acceleration

grid 40×82　　40×162　　40×322

pseudo－SOR

?凾垂?窒垂撃≠獅?@SOR

0．49　　　　　2．55　　　　　　17．2

O．27　　　　　0．72　　　　　　3．29

（VP－200）

grid 40×82　　40×162　　40×322

pseud（トSOR

?凾垂?窒垂撃≠獅?@SOR

0．17　　　　　0．82　　　　　　9．08

O．08　　　　　0．27　　　　　　1．69

（S－820／80）

18

Table　4　tu，pt　and　popt　for　an　eccentric　double　circle

mesll pseud（》SOR hyperplane　SOR

ω・μ　ρ・μ ω（μ　　　ρqp舌

Fig．14（a） 1．22　0．971 1．68　　0．720

Fig．14（b） 1．12　0．971 1．70　　0．717

Fig。14（c） 1．00　0．977 1．70　　0．782

Fig．14（d） 1．20　0．970 1．68　　0．720

Fig．14（e） 1．14　0．975 1．72　　0．766

Fig．14（f） 1．02　0．975 1．72　　0．766

a）　Dirichlet　condition　on　both　boundaries

mesh pseud（トSOR hyperplane　SOR

ω・Pε　ρ・P診 ωqP舌　　ρqp舌

Fig．14（a） 1．26　0．991 1．84　　0．840

Fig．14（b） 1．16　0．993 1．88　　0．856

Fig．14（c） 1．02　0．994 1．82　　0．953

Fig．14（d） 1．26　0．998 1．88　　0．900

Fig．14（e） 1．16　0．998 1．88　　0．913

Fig．14（f） 1．00　0．997 1．80　　0．924

b）　mixed　boundary　conditions

19

Figure　Captions

1．

2．

3．

4．

5．

6．

7．

8．

9．

10．

Spectral　radii　of　L，SOR（cx7）　and　Lg一一SOR（cv）．

Parabola士βへ／：X　and　lines（1一αω一λ）／ω）withω〈ω1，ω＝・ω1　andω〉ω1・

Line　（1　一　cvco　一一　A）／cv）　and　parabolas　±x3v’ZX　for　（6i　〈　P2）．

　　　Par　ab　ol　as　±60　V‘jX　and　Iines　（1　一一　cMcv　一　A）／co）　for　（cvi　〈　cu2）．

　　　Spectral　radii　of　C，SOR（cv）　and　1　g－SOR（c，？）．

　　　Convergence　behavior　of　the　SOR　method　with　five－point　discretization．

　　　Convergence　behavior　of　the　pseudo－SOR　method　with　five－point　discretization，

　　　Convergence　behavior　of　the　SOR　method　with　nine－point　discretization．

　　　Convergence　behavior　of　the　pseudo－SOR　method　with　nine－point　discretization．

　　　Comparison　of　the　CPU　time　on　several　vector　computers　（five－point　discretization）．

　　　The　horizontal　a・xis　is　the　number　of　grid　points　and　the　vertical　axis　is　the　CPU　time

　　　in　seconds．

11．　The　domain　for　analysis．

12．　Comparison　of　the　CPU　time　on　several　vector　computers　（nine－point　discretization　in

　　　curved　mesh）．

13．　The　domain　for　analysis．

14．　The　grid　systems　of　eccentric　double　circles．

15．　The　number　of　iterations　needed　for　convergence　as　a　function　of　the　acceleration　pa一

rameter　for　（1）　pseudoSOR　and　（II）　hyperplane　SOR　method，　The　boundary　condition

is　Dirichlet　and　the　grid　systems　are　（a），　（b）　and　（c）　in　Fig．　14．

20

・，Ji　i）

　le5

雪

空

t6　i．o

誤

　　　　　　　　　　　O．5t（cos（g））＝v一？　一ii　＝o．437t一．．

11

0．0／

　1eO

1

／

／

1

P（瑠凸））

ノ

1

1

111

　　　　ロ　　　　i

ア（劇ω　　1））

　　　　i

　　　　i

　　　　：

1．5　2．0　u，）re　laxat　ion　factor

F　　i　g　．　1

一　ct

1

p　／x

j）t一’

　　　　（W　＝　U・）1）

F　i　g．　2

22

一輪

1

ρ2灰

Bi　1X

　　：　　i　　i’

　　／

　F　i　g．　3

x

一戸i！叉

一P　2M

23

1

噺吻

　　　2@　　i@　　i●圏幽　　」■●　　鯛■●　　■嗣）　●■画　一　　　1…フ、凶ω一λ

危1天

A

・巫

C‘Ci）

Fi　g．　4

24

Jl’

g（cos（g））一fg＝o．34i6．．．

1．5

も0 P（縄澱ω））

冨

L一鋤。 ■9■■DO■顧齢●■馳■一一■　■一■■爾　ロー　●願■・騨。■膠口■．陶

8紆　　　　　　　　　　　　　　｛

@　　　　　　　　　　　　　3

P／　P（趣））0．5 1　　　　　　　　　　　　　　　　　1

！1　　　　　　　　　　　　　　　　　　1

@　　　　　　　　　　　　　　62

46…33

（）0

@　1．0 1．5　　　　2Drelaxation　facto

F　　i　g　．　5

ほ邸

慶

℃リリ

ga

o』

ロ〇一一s’

翼，gi　IT，

鍵

静碓1．3ω箋i∵

の6；

℃　　の

の①』

コ〇一

“

fo一

章鶏

llω議

9i 、鍋慣ul　＝1．7

to　＝1．9

，

to　＝1．8

爲

o No．　of　iterations

（a）

，

50 o No．　of　iterations

（b）

，

so

F　i　g．　6

一

5℃　　　の

mの』

の〇一

”i

翼

鱒？3

担hヨω＝ωopt　　　　　　　　園口。」

覇補

to　＝1．0

　／

　争ω＝1・1to　tl．2

；ゴ＝1●鶴1・6

93

o No．　of　iterations 50

　e“；，　．F

蓉8塁

鉾ヨ2　　ヨ

？3

鷺蜀績

岱

CO　＝　（D　OPt

w　＝1．　5

to　＝1．4

ul　＝1．3

鎚

o No．　of　iterations so

（a）（b）

F　i　g．　7

コ

〇一 9

り㊦＝

で　　ロ

ca

o』

：，　g

9ヨ

醤

難9耳

翼。”，．．．

；

9

／」ωoptヂ

（D　＝1．0

　　　　to　＝L2to　＝1．　3

　邸，

での　　ロ

co

o』

一コ

補

雪当

日鷺電もヨ

ー｛ωopt

顎a

／

　’lx

to　tl．4

so

to　＝1．9

　　　　　　　　　　　　w　＝1i8（D　＝1e6　　　　　　to　＝1．7

圏

oNo．　of　iterations o No．’　of　iterations se

（a）（b）

F　i　g．　8

一，

℃　のり

の。』

o．，．．，

．，1

導

葺

翼12　i’

昏

〇一

　　　／

to　＝　to　opt

（D　＝1．0

　亦

（D　’＝　1．　2

の¢コ

℃　ロ

の①

』

　o肖

一ロ

0の甲

0り門ほ

〇一〒

oの頃1

の㌣

oの氏8

　oI

　3．ミ

g・　｛

tl　g

襲ちヨ

（D　＝1．9

　　　Nto　＝L8　　　　　a）　＝1．7

to　＝　w　opt

to　＝1．6

to　＝L4

叡　　w　＝1．3

“3一一一一’N一一一一一一一一一一一一一一一一一一一’r一一一’一一

R

，

o No．　ef　iterations

（a）

50 o No．　of　iterations

（b）

so

F　i　g一　9

8「一一一「6い＿。R／l

8［

6，i

41

2　1

mo

　　　　　1　04

（a）　vp－200

pseudo－SOR

　　　　　Hyperplane

　　　　　　1　04

（b）　VP－400E

8，　L，

4　［i

2［

　　h

2Xl　04

　　　20　Oi

2Xl　04

　　　200

pseudo－SOR

　　　　　　Hyperplane

8

6F

41　1

oL∠一＝＝＝＝＝＝＝＝＝＝＝コ　　　　　　　　　　104　2XIO4

　　　　（c）　S－820／80

　r

　ト

2［

o

pseudo－SOR

Hyperplane

200

　r

r　pseudo－soR　t

L．　／Hyperpianeo1

　　　　　1　04

（d）　SX－2

2Xl　04

　　　　L

　　　　L

　　　　L

　　　　L

　　　　l

loor　　　　江

　　　　t’

o

pseudo－SOR

　　　　　Hyperplane

l

ili　oo

ll

l

　　　　L

lOOY　　　　L

　　　　l

　　　　r

　　　　い

　　　o

pseudo－SOR

ffyperpianel

　　　　　　　　l

　　　　　1　04

（e）C－1　XP

2Xl　04 　　　　　1　04

（f）　TITAN

　　　　F　　i　g

　2Xl　04

．　1　0

　　　　104　2xlO4

（g）　Gslooo

fiii］

Y

X 扁

40u2。［

　　　　セ

　0

｛103

　　　　「

　　　　　r

　　　　　亡

1　03／！

　　　　　1

　　　　　「

　　　　　「

　　　　　の　　　　　l

　　　　　P

　　　O

，se、d。．s。R　40i

　　　　　　　　　　　Hyperplane

　　　　　　　　　　　　　　　　　　　　　lO

　　　　　　104　　　　　　　　2×104

（a）VP－200

　　　　　　　　　　　　　　　　　　　　　　　　　「

　　　　　　　　　　　　　　　　　　1　　「

　　　　　　　　　　　　　　　　　　　　　　　　　［

　　　　　　　　　　　　　　　　　　　　　　　　　に　　　　　　　　　　　　　　　　　　　　　　　　　t

・seudo－SOR @　　　　「　　　　　　　　　　　　　　　　　　　　　　　　　ロ　　　　　　　　　　　　　　　　　　　　　　　　　し

　　　　　　　　　　　　　　　　　　｝　　　0

　　　　　　104　　　　　　　2×104

（e）C－1　XP

　　　　　　　　　　　　2　Ot

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　／

　　　　　　　　　2×103「一一一「2×103し　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　ii

　　pseudo’SOR　／　1

　　　　　　　　　　　　　　　　　　　　　20

　　　　　　　　　　　Hyperplane　i　．

　　　　　　一t，　2xl　o／／

（b）　VP－400E

　　　　　　　　　　　　　　　　　　l

　　　　pseudo－SOR／　1　l　o3

　　　　　　　　　kyperpian6

　　　　　　　　　　　　　　　　　　i

　　　　　　　　　　　　　　　　　　　　　　　o

　　　　　　104　2xlO4

（f）　TITAN

　　　　　　F　　i　g　．　1　2

　　　　　　　　　　　　　　　　　　　　　i40

　　　　　　pseudo－soR　／20

　　　　　　　　　　　Hyperplane

　　　　　　　104　2XIO4　0

（c）　S“820／80

　　　　　　pseudo“SOR

　　　　　　　　Hyperp　lane

　　　　　　　104　2XIO4

　（g）　Gslooo

pseud6－soR　／zl

　　　　　　1　04

（d）　SX－2

Hyperplane　　　　　　　；

　　　　　2Xl　04

朗

Y

ヨ嵩

El：ti＃．　il”一：i，｝：ILtr’．？．J．］．1

螺こに園

海気雌遡i雌

耳隅司塾㌃．一一

　　　　　　鋼製総懸

一L・“U．1．“b

墾 X E3

F　i　g一　1　3

Y Y

X

N

X

（a）Y

（d）Y

X X

（b）（e）

X

（c）（f）

Fig一　1434

No．　of　iterations

300：’ic・＝r．m

1

200

100　　0e9

雛　、へし

@　　’⊆L　　　　　、

5　　　　　　　　　　　　　　　　　　　　　　　　　5　　　　　　　　　　　　　　　　　　　　　　　　　　　．

@　　　　　　　∫　　　　　　　　5

、口、　　、

@　　、口、、　、口、　　　、《國L－r■」　　　　　一

一：

1（b）　　　　　　　「

P

（a）

D

1eO 　　　　　　　lel　le2

　　aceeleration　parameter

Fi　g一　15　（1）

le3

霧

No．　of　iterations

50●1、

q　、　、

、、、　＼●

、、 11口　、！●

I

、自　ノ@ノ

獣＼

40、、b　、　’

@ノ@’

ﾋ●、

　　（c）、、　、　、

　　　1@　1@　！

@1鴻m

mヒ　、

Q（b）

、、　、　、

’

　　’

@ノS

b”

3

（a）

2望601．70

acceleration　parameter

F　　i　g． 1　5 （　II）

36

INSTITUTE　OF　INFORMATION　SCIENCES　AND　ELECTRONICS

　　　　　　　　　　　　　　UNIVERSITY　OF　TSUKVBA

　　　　　　　　　　　TSUKUBA－SHI，　IBARAKI　305　JAPAN

REPORTA

DOCUMENTATION　PAGE

REPORT　NUMBER

　　　　　　工SE－TR－90－h86

TITLEOn　the　E伍ciency　of　a：n SOR－like　Method　Suited　to　Vector

Processor　“

AUTHOR　（S）

Masaaki　Sugihara，　Department　of　Economics，　Hitotsubashi　University

Yoshio　Oyanagi，　lnstitute　of　lnformation　Sciences，　UniveTsity　of　Tsukuba

　　　　　　Masatake　Mori，Fαculty　o／EngineeTing，　Univers吻of　Tokyo

　　　　　　　　Seiji　Fujino，　lnstitute　of　Computational　Fluid　Dynamics

REPORT　DATEOctober　1990

MAIN　CATEGORY

numerica1 analysユS

NUMBER　OF　PAGES

　　　　36

CR　CATEGORIES

KEY　WORDS

　SOR，　vector processor，　acceleration　parameter，　hyperplane　method

ABSTRACT　　　　　　　　A・impl・impl・m・nt・ti…nv・ct・・p・・ce…rs・fth・SOR　m・th・d　f・・s・1・i・g　li・…ly・tem・f・quati・n・

　　　　　　arising　from　the　discretization　of　partial　differential　equations　makes　the　SOR　method　into　another，　which，

　　　　　　though，　looks　like　the　SOR　method・In　this　paper，　the．e伍ciency　of　this　SOR－like　method（pseudo－SOR

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　コ　　　　　　method）is　investigated．　For　the　Poisson　equation　in　a　rectangular　region，　in　both五ve－point　and　nine．point

　　　　　　discretization，　we　prove　analytically　that　the　optimal　acceleration　parameter　is　smalIer　and　the　optimal

　　　　　　convergence　rate　is　lower　in　the　pseudo－SOR　method．　Comparison　on　several　vector　computers　was

　　　　　　made　between　the　SOR　method　vectorized　with　the　hyperplane　technique　and　the　pseudo－SOR　method．

　　　　　　It　turned　out　that　the　hyperplane　SOR　is　superior　to　the　pseudo－SOR　method　although　the　latter　is

　　　　　　easily　vectorizable　on　vector　processors．　We　also　tested　an　example　of　Poisson　equation　descretized　in　a

　　　　　　curved　coordinates　in　a　region　bounded　by　two　eccentric　circles　and　found　numerically　that　the　optimal

　　　　　　acceleration　parameter　becomes　small　and　the　convergence　becomes　slow　in　the　pseudo－SOR　method，

　　　　　　while　the　optimal　acceleration　parameter　in　the　SOR　method　does　not　change　appreciably．　The　genuill

　　　　　　SOR　method　is　superior　to　the　pseudo－SOR　method．

SUPPLEMENTARY　NeTES　　　．lnvited　talk　presented　by　Oyanagi　at　lnternational　Congress　on　Computational　and　Applied　Mathematics　held　，at　Katholieke

　Universiteit　Leuven，　Belgium，　on　July　23－28，　1990．

1SE－TRd－90－86 On the Efficiency of an SOR－like Method … · 2020. 4. 1. · SOR is...

Documents

Transcript of 1SE－TRd－90－86 On the Efficiency of an SOR－like Method … · 2020. 4. 1. · SOR is...