MUltigriD software for elliptic partial differential ...125/datastream/PDF... · elliptic partial...

NCAR/TN-357+STRNCAR TECHNICAL NOTE

February 1991

MUltigriD Software for Elliptic PartialDifferential Equations

MUDPACK :::::

John C. Adams

* *

* * *

0 0 0 a * * * * 0

* 0 *0 0 0 00 0

* 0 *0 *00 0 0

* . .0 0 0 0 *

* * 0 0 0 0 00

SCIENTIFIC COMPUTING DIVISION

NATIONAL CENTER FOR ATMOSPHERIC RESEARCHBOULDER, COLORADO

* 0 0 0

* 0 0 0 0

* 0 0 0

* * * * *·* · * ·

0 0 0

0

0

0 0 0

I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~II

II

iii

Preface

This Technical Note describes the multigrid package MUDPACK which includes

Fortran subroutines for efficiently approximating the solution to a variety of linear

elliptic partial differential equations. The software was developed over the past three

years by the author at NCAR. Its purpose is to make the complex collection of

integrated numerical procedures known as "multigrid iteration" available in a user-

friendly form to atmospheric scientists and others.

The package is introduced in the first section. The second section outlines some of

the special features of MUDPACK. The third section describes contents of the pack-

age. Section four contains seven examples which illustrate software use and super-

computer performance on a variety of problems. A sample documentation is given in

the appendix.

iv

Foreword

Dr. John Adams has invested a very substantial amount ofeffort in the development of MUDPACK, a general purpose softwarepackage for multigrid solution of elliptic partial differentialequations. It is already in serious demand as a software tool inmany scientific disciplines. I understand that the methods usedare very efficient and robust multigrid algorithms that werecarefully implemented and extensively tested for reliability andaccuracy.

MUDPACK is a very important contribution to the Multigridfield in particular, and computational sciences in general.Multigrid is often perceived as difficult to implement, so it hasbeen slow to find its way into applications software. John's workis therefore an especially important advance in providing thescientific community with access to multigrid technology.

Achi3iandtProfessor, Weizmann Institute

V

Table of Contents

P refa ce ..............................................................................................................................

Foreword ....... ..................................... iv

1. Introduction..........................................................................................................................l

2. Special Features................................................. 3Solving Linear Elliptic PDEs in a Variety of Forms ................................................. 3Handling of General Boundary Conditions...............................................................3Ease of Input of the Continuous Problem .......... ....................................................... 4Automatic Discretization of the Continuous Problem...........................................4Use of Multigrid Iteration to Approximate the Discretization...........................8Selection of Multigrid Options.......................................................10Selection of the Relaxation Method.......................................................11Generating Second- and Fourth-Order Approximations ...................................... 13Flexibility in Choosing Grid Size................................................................................15Availability of "Hybrid" Multigrid/Direct Method Solvers.................................16Availability of Subroutines to Compute Residuals ............................................... 17No Initial Guess Requirement...................................................................................17Non-initialization Calls ........................................................ 18Error Control...................................................................................................................18Flagging of Errors Involving Input Parameters....... ....... ........................................ 19Output of Exact Minimal Work Space Requirements...........................................19Extensive Documentation and Test Programs.......................................................20

3. M U D PA C K fil es ............................................................................................................... 2File D escription .. ................... .......... ............ .... .. ............ ......................... 21Obtaining -Files............................................22Selecting Solvers....................................................... 22

4. Examples........................................................24Example 1. 2-D Separable Elliptic Equation........................................................ 24Example 2. 2-D Nonseparable Elliptic Equation...................................................27Example 3. Helmholtz Equation on the Sphere (One Degree Grid)...................29Example 4. 3-D Separable Elliptic Equation ................ ........ .................................... 31Example 5. 3-D Helmholtz Equation in Spherical Coordinates ........................... 33Example 6. 3-D Anisotropic Elliptic Equation..........................................................34Example 7. Asymmetric Grid Size................................. ........................... 35

A pp endix ...... ................. ................................ . ...... ..... .... ........... ... ... ... .. 37

References....................................................... 49

A cknow ledgem ents .......................................... ...... . ................................. 51

-1-

1. Introduction

MUDPACK is a collection of portable Fortran 77 subprograms, vectorized on Cray

computers, that efficiently solve linear elliptic partial differential equations (PDEs)

using multigrid iteration. The package was created to make multigrid methods

available in user friendly form. It is similar in format to the separable elliptic pack-

age FISHPAK ([4]). MUDPACK extends the domain of solvable problems to include

both separable and nonseparable PDEs. Detailed descriptions of the current and an

earlier version of MUDPACK are given in [1,6]. Some of this is repeated in the

technical note.

Multigrid iteration ([7,8,10,12,14,16]) combines classical iterative techniques, such as

Gauss-Seidel line or point relaxation, with subgrid refinement procedures to yield a

method superior to the iterative techniques alone. By iterating and transferring

approximations and corrections at subgrid levels, it can achieve a good initial guess

and rapid convergence at the fine grid level. Multigrid iteration requires less storage

and computation than direct methods for nonseparable elliptic PDEs and is competi-

tive with direct methods such as cyclic reduction ([4,9,20,21]) for separable equations.

In particular, three-dimensional problems can often be handled at reasonable compu-

tational cost.

The generality of the equations solved by MUDPACK may sometimes result in loss

of efficiency since hand-tailored coding is required to achieve optimal performance for

certain problems. It is hoped that this is compensated for by the package's ease of

use, applicability to a wide range of real problems (including those typically encoun-

tered in geophysical fluid dynamics at NCAR ([5]), and avoidance of repeated "re-

inventions of the wheel." Savings in code development time can be at least as impor-

tant as economic use of machine cycles. With careful selection of relaxation and

multigrid parameters, near optimal performance can often be obtained using MUD-

PACK software. See the examples in this document and in [2,6] for a variety of

problems where discretization level error (i.e., the same error level that a direct

method will reach in approximating the continuous solution) is reached in only one

full multigrid cycle using MUDPACK solvers.

-3-

2. Special Features of MUDPACK

Some of the special features MUDPACK software brings to bear in approximating

the continuous linear elliptic PDE (in operator form)

l(u) - f (a)

are listed and discussed in this section.

Solving Linear Elliptic PDEs in a Variety of Forms

These forms include real and complex, two- and three-dimensional, self-adjoint, and

separable and nonseparable. For example, the most general two dimensional PDE

solved can have the form:

a(x,y) 2u l/ax2 + b(x,y)82u /laxy + c(x,y)02u/ay + (b)d(x,y)9u/Ox + e(x,y)9u/9y + f(x,y)u(x,y) = r(x,y)

on the region A <_x <B, C < y <D. The coefficients a, b, c, d, e, f; the right

hand side r; and u are all real or complex valued functions of the independent vari-

ables x, y.

The solution regions are rectangular in the sense that the domain of each indepen-

dent variable must be a bounded interval on the real line. This means that curvi-

linear coordinate systems such as spherical or cylindrical coordinates are acceptable.

The codes are not restricted to Cartesian coordinates (see Examples 3 and 5).

Handling of General Boundary Conditions

Any combination of periodic, specified (Dirichlet), and mixed derivative boundary

conditions is allowed. For example, in (b) u(x,y) can satisfy any one of the following

boundary conditions at x = A:

(1) u(x+B-A,y) = u(x,y) for all x, y (periodic in x)

(2) u(A,y) is specified for all y (Dirichlet)

-4-

(3) a(y)8u/Ox + ly)9u/Oy + K(y)u(A,y) = g(y) for all y

Similar boundary conditions, in any combination, are allowed at the other boun-

daries. Pure tangential derivatives (e.g., a(y)=0 at any point in (3)) are not

allowed and result in a fatal error flag.

Ease of Input of the Continuous Problem

User defined input subroutines are the mechanisms for passing PDE coefficients and

boundary conditions. For example, a SUBROUTINE COF(X,Y,A,B,C,D,E,F) is used

to input the coefficients in (b) at any grid point (x,y). A SUBROUTINE

BNDY(KBDYXORY,ALFA,BETA,GAMA,GBDY) is used to input oblique mixed

derivative conditions for (b). KBDY=1,2,3, and 4 identify the x = A, x = B,

y = C, and y = D boundaries. For example, if BNDY is called with KBDY=1 then

XORY will input the current y value and ALFA, BETA, GAMA, GBDY should out-

put values for a(y), 2y), (y), g(y).

Automatic Discretization of the Continuous Problem

The discretization is transparent to a user who only needs to supply the PDE, boun-

dary conditions, and grid size information. Standard second-order finite difference

formula, on a uniform grid G superimposed on the solution region, are used to

approximate the partial derivatives in (a). The result is a linear system of equations

L U=F (c)

where the coefficient matrix L is block tridiagonal. The coefficients multiplying the

second partial derivatives in the PDE are adjusted during discretization at coarser

grid levels if there are nonzero first-order coefficients which would destroy the diago-

nal dominance of L. This helps preserve convergence of the relaxation schemes.

The internal discretization process is illustrated by outlining part of it for the PDE in

(b). Assume that an equally spaced n by m grid is superimposed on the rectangle

-5-

[A,B] X [C,D] and let

Ax - (B-A)/(n -1); Ay = (D-C)/(m -1)

be the grid increments in the x and y direction respectively. Then (x(i),y(j)) is the

solution grid where x(i) = A + (i-1)Ax for i = 1,...,n and y(j) = C + (j-1)Ay

for j=l,...,m Let U(i,j) approximate the continuous solution u(x(i),y(j)) . The

standard second order finite difference formulas

a2u /ax2 = (U(i ++l,j)-2 U(i,j)+U(i-l,j))/x 2

2u /lx8y = (U(i+l,j+l)+U(i-l,j-l)-U(i+l,j-l)-U(i-l,j+l))/4Ax Ay

2u /8y 2 = (U(i,j+l)-2U(i,j)+U(i-l,j))/Ay2

au/lx = (U(i+l.j)-U(i-l,j))/2Ax

au/y -= (U(i,j+l)-U(i,j-1))/2Ay

are substituted into (b) on the interior. This gives the 9 point stencil

cl(i,j)U(i+l,j) + c 2 (i,j)U(i+l,j+1) + c 3(i,j)U(i,j+1) +

c 4 (i,j)U(i-l,j+1) + c 5(i,j)U(i-l,j) + c 6 (i,j)U(i-l,j-1) +

c 7 (i,j)U(i,j-1) + c 8 (i,j)U(i+l,j-1) + c 9(i,j)U(i,j) = r(ij)

where

(x,y) = (x(i),y(j))r(i,j) r(x,y)

cl(i,j) = a(x,y)/Ax 2 + d(x,y)/2Ax

c 2 (ij) = b(x,y)/4AxAy

C3 (i,j) = c(x,y)/Ay2 + e(x,y)/2Ay

C4(i,j) = -b(x,y)/4AxAy

c 5 (i',j) = a(xy)/ x 2 - d(x,y)/2Ax

c 6 (ij) = b(x,y)/4AxAy

c 7(i,j) = c(x,y)/Ay 2 - e(x,y)/2Ay

c 8(i,j)-b (x,y)/4Ax Ay

c 9 (i,j) = f(x,y) - 2(a(x,y)/Ax2 + C(xy)/Ay)

for 1 < i < n and 1 < j < m. The "virtual" points

x(0) =A-Ax, y(0)=C--Ay

are used to center difference formula along the boundaries x =A, y = C. For

example, if u(x,y) satisfies the corner mixed derivative boundary conditions:

CO A(Y)U/8X +3A (Y)Ou/ay + A (Y)u(A,Y) = gA (Y) (at x A)

cec(z)u/ax + fc()xQu/Qy + ^y(x)u(xC) = g (x) (at y = C)

then the second order finite difference approximations

au/ x = (U(2,j)-U(O,j))/2Ax (x = A)

au/8y = (U(i,2)-U(i,0))/2Ay (y - C)

are used. This permits the elimination of the unknowns U(i,0), U(O,j) from the

discretization of (b) and gives a 2 by 2 system for U(1,0), U(0,1). The latter will

lead to a division by zero if the determinant quantity

oA (C)f/C(A )-a(A )fA (C) =0.

Consequently this is flagged as a fatal singular error condition. At the nonspecified

corner (A ,C), the special second order formula,

2u /Oxay = ( U(2,1)+U(O,1)+U(1,O)+U(1,2)-

2U(1,1)-U(2,2)-U(2,0))/2Ax A y

which eliminates the need to compute U(0,0), is used ([131).

If lexicographic ordering is employeed and

U = (U(1,1),...,U(n,l),...,U(l,m),...,U(n,m))

R - (r(l,l),...,r(n,l),...,r(l,m),..,r(n,m))

then the substitutions above give the linear system of equations (c). The same pro-

cess is performed at all subgrid levels (see the next feature). The diagonal elements

of the coefficient matrix at any grid level contain the terms (see c,(ij) above)

-7-

-2(a(x,y)/Ax 2 + c(x,y)/Ay 2)

while the off diagonal elements have the form

a(x,y)/Ax 2 (±) d(x,y)/2Ax

and

c(x,y)/Ay 2 (-) e(x,y)/2Ay

Consequently, if

O.SAx | d(x,y) | > a(x,y)

or

0.5y I e(xy) > c(x,y)

then diagonal dominance of the coefficient matrix may be lost. This is more likely at

coarser grid levels in the presence of nonzero first order terms. If corrective action is

not taken, this can inhibit or even prevent convergence of the multigrid iteration.

Consequently the replacements

a(x,y) = max(O.5Ax | d(x,y) , a(x,y))

and

c(x,y) = max(O.5Ay I e(x,y) |, c(x,y))

are made. If this changes the diagonal element at the finest grid level, a nonfatal

error condition is flagged. It may be remedied by increasing resolution. If effected at

coarser grid levels, this is equivalent to replacing the PDE with a first order approxi-

mation. Convergence and and accuracy at the finest grid level are preserved. If the

zero order term has the wrong sign then this can also destroy diagonal dominance

(e.g., f (x,y) < 0 in (b)). There is currently no correction for this in MUDPACK.

In principle, coarse grid coefficient adjustments similar to those described for nonzero

first order terms could also be made.

-8-

Use of Multigrid Iteration to Approximate the Discretization Equations

This is the essential feature of the MUDPACK software. Introductions to multigrid

iteration are given in [8,10,14]. We give an abbreviated and simplified basic descrip-

tion here. Assume

G(0) < * * < G(s) < ... < G(t)= G

is an ascending chain of subgrids terminating in G and let

L(s) U(s) = F(s)

denote the discretization of the continuous PDE (a) on G(s) for each s=0,...,t. Ordi-

narily G(s) is an "every other point subset" of G(s+l) which includes boundaries.

Let I(s-1,s) denote a "prolongation" operator for transferring grid values from

G(s-1) to G(s) and let I(s,s-1) denote a "restriction" operator for transferring grid

values from G(s) to G(s-1). Linear or cubic interpolation are used in defining

I(s-1,s). The identity restriction mapping is the most obvious choice for I(s,s-1)

but this works poorly in practice. Some form of weighted averaging is necessary.

In steps 1-6 we describe how an initial guess U(s) on the grid G(s) can be inexpen-

sively improved using the subgrid G(s-1). This is called the "coarse grid correction"

algorithm and is the heart of multigrid iteration as implemented in MUDPACK.

(take f(s) = F(s))

Step 1. Perform relaxation sweeps (usually one or two in practice) on

L(s) U(s) = (s)

Step 2. Compute the residual

R(s) = F(s)-L(s) U(s)

(Clearly, if we solve the residual equation L (s) E(s) = R (s) exactly then

U(s) = U(s) + E(s) is the exact solution. So we approximate E(s) economically

using the subgrid G(s-1)).

Step 3. Restrict the residual to G(s-1)

-9-

f (s-1) = I(s,s-1) R(s)

Step 4. "Solve"

L(s-1) E(s-1) = f(s-1)

Step 5. Prolong the correction E(s-1) to G(s) and add to U(s)

U(s) = U(s) + I(s-1,s) E(s-1)

(If I(s-1,s) E(s-1) is a good approximation to E(s) then the result should be an

improved approximation in U(s)).

Step 6. Perform more relaxation sweeps (usually one in practice) on the new

L(s) U(s) = F(s)

With recursion, steps 1-6 can be used within step 4 when "solving" for the correction

term on G(s-1). In this way, all the subgrids G(s-1), G(s-2),..., G(O) are brought

into play in improving the approximation on G(s). When s =0 the linear system

can be solved with a direct method or approximated with the relaxation sweeps in

steps 1 and 6. After step 6, the improved approximation in U(s) can be prolonged to

serve as an initial guess at G(s+1).

Step 7. Prolong U(s) to G(s+1)

U(s +1)=I(s ,s+1) U(sk)

Repeating steps 1-7 for s = 0,1,...,t-1 lifts the approximation to the finest grid level

and is called a Full Multigrid Cycle. Once the fine grid G(t) is reached, steps 1-6

can be repeated with s = t until the desired accuracy is achieved. In practice, con-

vergence to discretization level error should occur in a few (1-3) cycles. Achieving

this rapid convergence requires error free coding and selection of the "correct" under-

lying numerical schemes. For example, the relaxation method used and form of the

prolongation and restriction operators can be crucial in obtaining good algorithm per-

formance.

Optimal multigrid iteration will converge in 0(n 2 ) operations on n by n grids in

- 10-

two-dimensions. Much of the theoretical work has concentrated on proving this

[7,12,16]. By way of comparison, the direct solution of (c) on n by n grids requires

0(n log(n)) operations for separable PDEs with cyclic reduction [9,20,21] and O(n 4 )

operations for nonseparable PDEs with banded Gaussian elimination. The potential

improvement with three-dimensional problems is even greater.

A heuristic description of why multigrid iteration works, focuses on the Fourier com-

ponents of the error in the approximation at the finest grid level. The higher fre-

quencies in the error are rapidly damped by standard relaxation procedures. The

lower frequency components are damped slowly by the fixed fine grid relaxation thus

reducing the overall convergence rate. The transference to coarser grids with mul-

tigrid iteration effectively raises the relative frequency of the remaining fine grid

error components where they too are rapidly damped. The coarsest grid must be

sparse for removal of the lowest frequency fine grid error components with relaxa-

tion.

Selection of Multigrid Options

The current version of MUDPACK [6] has options for implementing variants of mul-

tigrid iteration and default options for those preferring black box solvers. The default

options were chosen for robustness and set cubic prolongation, fully weighted residual

restriction, and W(2,1) cycling. The earlier version of MUDPACK described in [1,2]

only allowed V(2,1) cycling with linear prolongation. This is still available as a possi-

bly more efficient choice for certain problems. More general cycling including F

cycles [15] is also available. The description of the integer vector parameter

MGOPT in the appendix of this document contains more discussion of multigrid

options. The grid schedules in executing one full multigrid cycle (FMG) for both

V(2,1) and W(2,1) cycles are illustrated for a four level grid in figure 1. The number

of relaxation sweeps executed at each grid level is recorded.

-11-

Figure 1

One FMG with V(2,1) Cycles

92 1 level 4

/A /9 2 1_ level 3

/\ / \ /291 _ I9,- 1 level 2

33_________ level 1

One FMG with W(2,1) Cycles

9______2__________________1 lJevel 4

_ _ //\ / \ /\ /

9 , 3 9 .level 229 1 2 3a 1 9 3 9 2 3 1 level 2

/\/ \/ / \/\/ \1\//6 6 1 6i 6B 6 6 6 level 1

Selection of the Relaxation Method

A relaxation menu is provided. It includes vectorized Gauss-Seidel schemes [11] on

alternating points (red/black), lines (in any combination of directions) and planes (for

three-dimensional anisotropic elliptic PDEs [22]). Choice of the correct method for a

particular problem can be crucial. It depends on the relative grid and PDE

coefficient size. Usually this can be pre-determined. Sometimes experimentation is

required. Advice on method selection is given in the documentation. For example,

suppose we wish to solve a three-dimensional problem of the form:

a(x,y, b(,yz)u/y 2 + c(x,y,z)2u/z 2 = r(x,y,z).

Let A,B,C denote the quantities a(x,y,z)/Ax 2 , b(x,y,z)/Ay 2 , c(x,y,z)/Az 2 where

- 12 -

Ax, Ay, Az are the ax, y, z grid increments. Choice of the method proceeds as fol-

lows:

(0) If A ; B ; C then choose point relaxation

(1) If A > > B ~ C then choose line relaxation in the x direction

(2) If B > > A : C then choose line relaxation in the y direction

(3) If C > > B , A then choose line relaxation in the z direction

(4) If A B > > C then choose line relaxations in the x and y directions

(5) If A ~ C > > B then choose line relaxations in the x and z directions

(6) If C ~ B > > A then choose line relaxations in the y and z directions

(7) If A ~ B ; C and these quantities vary considerably then choose

line relaxations in the x and y and z directions

(8) If A > > B > > C or B > > A > > C then choose x-y planar relaxation

(9) If A > > C > > B or C > > A > > B then choose x-z planar relaxation

(10) If C > > B > > A or B > > C > > A then choose y-z planar relaxation

In (8),(9) and (10), point or line relaxation in the plane can be set. Point relaxation is

the least expensive of all methods in cost per multigrid cycle and coefficient storage

and should be used when appropriate. Line relaxation requires computation, factori-

zation and storage of tridiagonal matrices (pentadiagonal if boundary conditions are

periodic). Planar relaxation uses full two-dimensional multigrid cycling on each plane

and is expensive to implement. Nevertheless it solves three-dimensional anisotropic

PDEs anisotropic PDEs which are not otherwise tractable (see Example 6 and [22]).

- 13-

Generating Second- and Fourth-order Approximations

Second-order finite difference approximations are generated on uniform grids super-

imposed on the solution region. These can be improved to fourth-order estimates

using "deferred corrections" ([17]). We briefly review this technique in one dimension.

The extension to higher dimensions is is straightforward. Suppose we wish to solve

the PDE in (a) and have obtained the linear system (c) on a one dimensional grid of

size h. We can solve this to discretization level error using multigrid iteration. The

truncation error

t =F -L u

measures how closely the exact continuous solution satisfies the discretization equa-

tions. Simple Taylor series arguments show that t has the form

t = h2(cu, + du ) + 0(h 4)

where c,d are known coefficients from the elliptic equation. If U satisfies the

discretization equations exactly, then it can be used to generate second-order approxi-

mations to uz and uz . For example if the uniform grid on the solution region

[a,b] is

a = x(1) < < x(i) < < x(n)= b

and U(i) is the approximation at x(i) then the following difference formula can be

used to approximate the third and fourth partial derivatives of u(x):

at x =-a

uzZ = (-5U(1)+18U(2)-24U(3)+14U(4)-3U(5))/2h3 + 0(h 2)

uzzzz = (3U(1)-14 U(2)+26 U(3)-24 U(4)+11 U(5)-2 U(6)/h4 + 0(h 2 )

at x =a +h

ux = (-3 3U(1)+10 U(2)-12 U(3)- U(4)-U(5))/2h3 + O(h 2)

uxzzz= (2 U(1)- U(2)+16 U(3)-14 U(4)+6 U(5)-U(6))/h 4 + O(h 2)

at x-x(i) where a+h < x < b-h

uzzz = (-U(i-2)+2U(i-1)-2 U(i+l)+U(i+2))/(2h3 ) + O(h2)

- 14-

u.xxx = (U(i-2)-4U(i-1)-+6U(i)-U(i+l)+U(i+2))/h4 + O(h2).

Similar difference formulae are used at x = b-h and x = b. These can be obtained

by using the Fortran difference formula generators described in [3]. The necessary

difference equations are encoded in the MUDPACK fourth order solvers. If we

denote all of these by the difference operators

6(U) = uz + O(h )

6(U) = u^ + O(h )

and let

T = h 2(c(U) + d(U)),

then

T=t +O(h 4).

The fourth-order truncation error estimate T is computed and passed down to

coarser grids using weighted averaging. Then one full multigrid cycle is used to solve

the correction equation

L E=-T.

Since E is an O(h4) approximation to the exact error

e =u - U =L-(-t)

it follows that

V =U + E

yields a fourth-order approximation in V. Another related and effective method for

generating higher-order approximations with multigrid is r-extrapolation (see [7]).

The use of higher-order stencils is investigated in [19].

- 15-

Flexibility in Choosing Grid Size

Second- and fourth-order approximations are generated on uniform I by m by n

grids superimposed on boxes in three dimensions or I by m grids superimposed on

rectangles in two dimensions. The grid sizes have the form:

I p2. +1

m -q2j +1k

n =r2 +1

where p, q, and r are integers greater than 1 and i , j , and k are nonnegative

integers. In earlier versions of MUDPACK, i = j = k was required. Since p, q, r

should be small for effective error reduction with multigrid iteration, the old con-

straint was not well suited for asymmetric grid sizes (see Examples 1,5 and 7). An

earlier MUDPACK requirement that p (q, r) must be greater than 2 when line

relaxation in the x (y, z) direction is used and the x (y, z) boundary condition is

periodic has also been removed.

Let G denote the I by m by n fine grid. In MUDPACK, multigrid iteration is

implemented on the ascending chain of grids

G(O) < *.. < G(s) < * * < G(t) G,

where t = max(i,j,k) and each G(s) for s=O,...,t has l(s) by m(s) by n(s) grid

points given by

I(s) = p 2maz (8+i- t'°) + 1

m(s) = q 2 maz(S+j-t,O) +

mn() = g 2 ma(+k-t,O) +n()-= r 2 + 1.

The coarsest grid, G(O) , has p+1 by q+l by r+1 points and should be small as

possible within grid size constraints for effective error reduction with multigrid itera-

tion. Each of p, q and r should each be 2 or a small odd value since even values

-16-

greater than 2 can be reduced by increasing i, j, or k . Large values for p (q, r)

can reduce the convergence rate even if line relaxation in the x (y, z) directions is

chosen (see Example 3). In two-dimensions, larger values for p or q cause no prob-

lem if one of the "hybrid" solvers (discussed below) is used (see Examples 2 and 3).

Availability of "Hybrid" Multigrid/direct Method Solvers

The certainty of direct methods is combined with the efficiency of multigrid iteration

in "hybrid" multigrid/direct method solvers in MUDPACK. This has been done for

two-dimensional nonseparable elliptic equations. Separable PDEs can be approxi-

mated with cyclic reduction [4,9,20,21] if a direct method is required. The hybrid

solvers use block Gaussian elimination whenever the coarsest grid level is encoun-

tered within multigrid cycling. This provides additional grid size flexibility by elim-

inating the usual constrain ththat p,q be smale Examples 2 and 3) and provides a

natural way to compare solutions from direct and iterative schemes. In the extreme

case of I = p+1 and m = q+1, the hybrid codes become direct method solvers. The

use of Gaussian elimination requires approximately 4(p+1)(p+1)(q+1) additional

words of storage if periodic boundary conditions are set in the y direction and

approximately 2(p+1)(p+1)(q+1) additional words of storage if periodic boundary

conditions are not set in the y direction. If there are I = m = n grid points in each

direction, then balance between multigrid iteration, which is an 0(n ) algorithm, and

the direct method, which requires O(p4 ) operations for solution on the coarsest grid,

is roughly achieved when

4 4 /4k 2p -- n ---- n .

This holds when

k = 1og 2(n)/2

grid levels are used before switching to the direct method. Choosing

p «n/2

-17-

will achieve rough parity between the direct and iterative parts of the hybrid algo-

rithm. Larger values for p mean the direct method will dominate the computation

while smaller values will only marginally increase the cost of multigrid iteration.

Availability of Subroutines to Compute Residuals

Subroutines to compute fine grid residual after calling any of the second-order solvers

are provided. The residual measures how well the current approximation satisfies the

linear system of equations coming from the discretization. If we consider the linear

system (c) and let

u be the exact solution to (a) evaluated on G,

U be the exact solution to (c),

V(n) be the approximation to U after n multigrid cycles

then the residual is given by

R(n)=F-L V(n).

The ratios

IIR(n+l) Il/ IIR(n) |

provide a convenient estimate of the convergence rate of multigrid iteration. A com-

mon measure of multigrid efficiency is to achieve discretization level error, defined by

\I U-V(n) • < 1 U-u II,

in one full multigrid cycle with no initial guess [2]. In such cases, one cannot expect

R(1) to be reduced to the level of roundoff error. Consequently, the norm of the

residual is a conservative measure of accuracy which can be wasteful if multigrid

cycles are executed until it reaches roundoff level error.

No Initial Guess Requirement

Unlike the ae ase with classical iterative schemes, initial guesses are not necessary and

should not be supplied unless they are very good (as, for example, when restarting

- 18-

multigrid iteration using an approximation generated earlier). Full multigrid cycling

[7], beginning at the coarsest grid level, should be used when there is no good initial

guess. Caution should be exercised with time-dependent marching problems where

one is tempted to use the previous time step solution u(t) as an initial guess to

u(t+dt). A better choice is to solve for the time correction term e(t,dt) = u(t+dt)-

u(t) using multigrid cycling which commences at the coarsest grid level (an initial

guess of zero or u(t)-u(t-dt) can be set there). If (in operator notation)

1(u(t)) =r(t), l(u(t+dt)) = r(t+dt)

then

l(e(t,dt)) = r(t+dt) -r(t).

and the discretization for e(t,dt) is the same as for u(t). Time dependent boundary

conditions for u(t) and u(t+dt) transfer similarly to e(t,dt). A few digits of accu-

racy in e(t,dt), which is ordinarily much smaller then u(t,dt), yields several addi-

tional digits of accuracy in the final approximation u (t +dt) = u (t)+e (t,dt). Using

this approach to integrate in time will give more accuracy then using u(t) as an ini-

tial guess at the finest grid level.

Non-initialization Calls

Redundant discretization and matrix factorization processes can and should be

bypassed on recalls to the software. For example, this happens when only the right-

hand side array has changed from a previous call or when more multigrid cycles are

needed for additional accuracy.

Error Control

Maximum relative error can be used to monitor convergence. Use of error control is

optional and requires additional storage and computation. If u(n), u(n+l) are the

last two computed iterates and e is the tolerance then

II u(n+1) - u(n) < f I u(n+l) II

- 19-

is the stopping criteria in error control. The number of relaxation sweeps at the

finest grid level should not be large if the multigrid iteration is working correctly

Flagging of Errors involving Input Parameters

This includes detection of singular and/or nonelliptic PDEs. Fatal and nonfatal

errors are flagged. For example, consider the two-dimensional elliptic PDE with

cross derivative term in (b). If f (x,y) = 0 for all (xy) and the boundary conditions

are either pure derivative or periodic then an arbitrary constant plus the solution is

also a solution. This means that the PDE is singular which is flagged as a nonfatal

error. The PDE is flagged as "nonelliptic" if

b(x,y) -4a(x,y)c(x,y) > 0

In either case, the matrix coming from the discretization might be ill-conditioned and

convergence of the iterative scheme is questionable.

Output of Exact Minimal Work Space Requirements

This is especially important with three-dimensional problems where central memory

is easily exhausted. In certain cases (e.g., if no error control is selected) equivalencing

between right hand side, solution, and work arrays is allowed to save storage. The

work length depends on the grid size and relaxation method. Simplified estimation

formulas are given in the documentation and exact requirements are output. For

example, if the PDE in (b) is approximated on a n by m grid then the appropriate

solver requires at most

4 n m (13 + ix +jy)/3

work space locations where

ix = 0 if point or line y relaxation only is used

ix = 3 if line x relaxation is used and u is not periodic in x

ix =5 if line x relaxation is used and u is periodic in x

jy -0 if point or line x relaxation only is used

-20-

jy = 3 if line y relaxation is used and u is not periodic in y

jy = 5 if line y relaxation is used and u is periodic in y

Extensive Documentation and Test Programs

See the next section and the cappendix of this Technical Note. The test programs can

be used when installing the codes on new systems.

- 21-

3. MUDPACK files

As of March 1, 1991 there are 98 MUDPACK documentation, solver, common sub-

routine, residual subroutine, and test program files. Collectively these contain over

100,000 lines of code and documentation. All the code is written in portable Fortran

77 which has been tested on a variety of computers and operating systems. Care was

taken to achieve vectorization on Cray machines.

File Description

The following seven real solvers are central to the software package:

1. MUD2 solves two-dimensional nonseparable elliptic PDEs.

2. MUD2CR solves two-dimensional elliptic PDEs with cross-derivatives.

3. MUD2SA solves two-dimensional nonseparable self-adjoint elliptic PDEs.

4. MUD2SP solves two-dimensional separable elliptic PDEs.

5. MUD3 solves three-dimensional nonseparable elliptic PDEs.

6. MUD3SA solves three-dimensional nonseparable self-adjoint elliptic PDEs.

7. MUD3SP solves three-dimensional separable elliptic PDEs.

Additional solvers include the hybrid multigrid/direct method codes MUH2 and

MUH2CR. MUD24, MUH24, MUD24CR, MUH24CR, MUD24SP, MUD34, and

MUD34SP are fourth-order solvers corresponding to MUD2, MUH2, MUD2CR,

MUH2CR, MUD2SP, MUD3 and MUD3SP. Second- and fourth-order complex

solvers are identified by replacing the NM" with a "C" in the real solver's names. For

example, CUD24CR produces a fourth-order approximation to the two-dimensional

complex elliptic PDE with a cross-derivative term. More solvers may be added in

the future. A solver's name in lower case followed by ".d" identifies a documentation

file and the name in lower case preceded by a "t" and followed by a ".f" identify a

Fortran test program file. For example, mud3.d contains complete documentation

- 22-

for MUD3 and tmud3.f is a test program illustrating use of MUD3. Users are

encouraged to carefully read the the documentation and study and execute the sam-

ple program associated with the solver wanted. Most of mud3.d is listed in the

appendix of this document.

Routines to compute fine grid residual are resm2.f (for MUD2, MUH2, MUD2SA),

resm2cr.f (for MUD2CR, MUJH2CR), resm2sp.f (for MUD2SP), resm3.f (for MUD3,

MUD3SA), resm3sp.f (for MUD3SP), resc2.f (for CUD2), resc2cr.f (for CUD2CR),

resc2sp.f (for CUD2SP), resc3.f (for CUD3), and resc3sp.f (for CUD3SP).

Obtaining MUDPACK Files:

For proprietary reasons, access to the Fortran source of each solver is not provided.

Instead, each solver is available in relocatable binary form. This saves considerable

computing resources by eliminating lengthy compile times at NCAR. Fortran source

for individual solvers will sometimes be distributed with the understanding that the

codes are not to be modified and/or distributed further and that feedback on their

performance will be provided to the author. All the MUDPACK software is copy-

righted. If you wish to use the codes on other machines, contact John Adams at:

(electronic mail: [email protected], telephone 303-497-1213).

Documentation, fine grid residual and test program files are available via "dsl" (the

distributed software libraries) at NCAR. The user document "Distributed Software

Libraries" has information about access under dsl. This can be obtained by contact-

ing the SCD consulting office. The file "README" in library "mudpack" under dsl

contains a directory for MUDPACK and current information on how to access the

relocatable binary for solvers.

Selecting Solvers

The following "flow chart" can be used in selecting the appropriate second-order

solver for the elliptic PDE to be approximated:

- 23-

(1) If the PDE is complex go to (8) else go to (2)

(2) If the PDE is three-dimensional go to (6) else go to (3)

(3) If the PDE is separable use MUD2SP else go to (4)

(4) If the PDE has a cross derivative use MUD2CR or MUH2CR else go to (5)

(5) If the PDE is self-adjoint use MUD2SA else use MUD2 or MUH2

(6) If the PDE is separable use MUD3SP else go to (7)

(7) If the PDE is self-adjoint use MUD3SA else use MUD3

(8) If the PDE is three-dimensional go to (11) else go to (9)

(9) If the PDE is separable use CUD2SP else go to (10)

(10)If the PDE has a cross derivative use CUD2CR else use CUD2

(11)If the PDE is separable use CUD3SP else use CUD3

The corresponding fourth-order solvers can improve the approximation if the

second-order solver has reached discretization level error [2,6].

- 24-

4. Examples

Use and efficiency of the MUDPACK software is illustrated in two- and three-

dimensional analytic examples which were run on the NCAR CRAY Y-MP8/864

computer. The sensitivity of multigrid iteration to the underlying relaxation method

is emphasized. None of the examples use an initial guess. The required work space

lengths (given in megawords) includes storage of the approximation and right-hand

side arrays. The "once only" initialization time TO (which includes discretization and

matrix factorizations) is separated out from the solution time Ts. The times and

megaflop rates are the result of monitoring one full multigrid cycle (or more if indi-

cated) with the performance monitor PERFMON ([18]) at NCAR. The exact max-

imum error norm

II U-u II

is tabulated where u and U are the exact solution and the approximation evaluated

on the fine grid. If V is the exact solution to the linear system arising from the

discretization then discretization level error is said to have been reached if

1 U - V 11 1• 1 u - v 1 .

Coefficients and boundary conditions are input via simple self-documenting subrou-

tines which must be declared "EXTERNAL" in the calling routines. Some of these

are listed in the examples. More examples demonstrating MUDPACK's applicability

to a wide range of problems in the atmospheric sciences are given in [51. Efficiency is

examined more closely in [2].

Example 1. 2-D Separable Elliptic Equation (high resolution grids)

Solve

(cos2(s)+l)ap/s2 - 2cos(s )sin(s) 8p /os +

O((r9+1)2 p/r)/r - rp(s,r) = f(r,s)

- 25-

on the region 0 < s <27r, 0 < r < 1. Assume p(s,r) is periodic in s and is

specified at r = 0, r = 1. For testing purposes, the exact solution

2 3p(s,r) = sin2(s) (r -r + 1)

is used. The following subroutines will input the PDE coefficients to the solver

MUD2SP at any grid point (s,r).

SUBROUTINE COFS(S,CSS,CS,CES)COSS = COS(S)CSS = COSS*COSS+1.CS = -2.*COSS*SIN(S)CES = 0.0RETURNEND

SUBROUTINE COFR(R,CRR,CR,CER)CRR (R+1)**2CR - 2.(R+1)CER =-RRETURNEND

The grid size parameters p = 3, q = 2, i -j +2 are used to keep spacing roughly

the same in the s and r direction. Experimentation shows that point relaxation with

V(2,1) cycling and cubic prolongation result in a near optimal algorithm. This choice

allows testing on very high resolution grids. The use of line relaxation greatly

increases storage requirements and should be avoided if it is not necessary. One full

multigrid cycle is executed for each grid size. The results are tabulated in Table 1.1.

The error reductions by a factor of four with each doubling of resolution indicate

discretization level error has been reached [2,6]. Improved vectorization accounts for

solution time increases of less than four with grid doubling. The default multigrid

options will produce slightly better error results but at more than twice the execution

times as the runs with V(2,1) cycles.

- 26-

Table 1.1: 2-D Separable Elliptic Equation (MUD2SP)

Grid Size

193 x 33

385 x 65

769 x 129

1537 x 257

3073 x 513

Storage

0.020 Mwords

0.072 Mwords

0.275 Mwords

1.074 Mwords

4.244 Mwords

In the next table the efficiency of MUD24SP

to fourth-order is measured. By default the

in improving the second-order estimate

fourth-order solvers use W(2,1) cycling

with fully weighted residual restrictions and cubic prolongation. This and the calcu-

lation and storage of the truncation error estimate used in "deferred corrections"

[6,17] account for the additional expense.

Table 1.2: 2-D Separable Elliptic Equation (MUD24SP)

Grid Size

193 x 33

385 x 65

769 x 129

1537 x 257

3073 x 513

Storage

0.026 Mwords

0.097 Mwords

0.374 Mwords

1.469 Mwords

5.829 Mwords

If accuracy is measured as a function of computational cost then the advantages of

the fourth-order scheme are obvious. The error on the 193 x 33 grid with the

MUD24SP is less than the error on the 3073 x 513 grid with MUD2SP. As would be

expected with a fourth-order method, there is an error reduction by a factor of 16

with grid doubling. This is true until the error is of order e-10. At this point further

grid refinement gains nothing. Pollution from numerical roundoff error is too great.

Ts

0.01 sec

0.03 sec

0.08 sec

0.26 sec

0.97 sec

Mflop

56

98

140

166

179

Error

0.17e-3

0.44e-4

O.lle-4

0.27e-5

0.68e-6

Ts

0.05 see

0.11 sec

0.27 sec

0.74 sec

2.23 sec

Mflop

27

45

71

102

134

Error

0.48e-7

0.30e-8

0.19e-9

0.16e-10

0.70e-10

-

I

I)

-

-

I

II

-

-{ I

-{

i

- 27-

Example 2. 2-D Nonseparable Elliptic Equation

Solve

2au/aOX2 + 2 ,2u/ly2 -- xyu(xjy) =r(x,y)

on the region 0.5 < x < 1.0, 1.0 < y < 2.0 with boundary conditions

au/lx - yu(0.5,y) g (y) at x - 0.5

u(1,y) is specified for 1 < y < 2

u (x,1) is specified for 0.5 < x < 1

au/ay + x u(x,2) - h(x) at y = 2

The exact solution

u(x,y) = (xy)

is used to set the right-hand side and boundary conditions and to compute the error.

Assume a solution is wanted on a grid as close to 100 by 200 as the the MUDPACK

size constraints allow. Examination of the directory README and documentation

files suggests several options. If a 97 by 193 grid is adequate then the solver MUD2

can be used with p = q =3, i = 6, j = 7 . The hybrid solver MUH2 can provide

a much closer fit with p =q -25, i = 3, j =4 . This yields a 101 by 201 fine

grid and a 26 by 26 coarse grid where Gaussian elimination is used. If an exact grid

fit is mandatory then MUH2 can be used with p = 99, q = 199, i = j = 1. In this

case MUH2 becomes a full direct method on the 100 by 200 grid. All of these are

compared in Table 2.1. Line relaxation in the x direction is used with the iterative

methods along with the default multigrid options. Line relaxation frozen at the finest

grid level (denoted LINEX) is also tested on the 97 by 193 grid. Coefficients and

boundary conditions can be input to MUD2 or MUJH2 with the following two Fortran

subprograms.

- 28-

SUBROUTINE COF(X,Y,CXX,CYY,CX,CY,CE)CXX = Y*YCYY - X*XCX =0.0CY=0.0CE --X*YRETURNEND

SUBROUTINE BC(KBDY,XORYABDY,GBDY)IF (KBDY.EQ.1) THENX =0.5Y=XORYU - (X*Y)**3DUDX = 3.*X*X*Y**3ABDY = -YGBDY = DUDX + ABDY*URETURNEND IFIF (KBDY.EQ.4) THENX =XORYY = 2.0U -(X*Y)**3DUDY = 3.*Y*Y*X**3ABDY= XGBDY = DUDY + ABDY*URETURNEND IFEND

The first three iterative methods in Table 2.1 are allowed to compute until discreti-

zation level error is reached. The required relaxation sweeps at the finest grid level

are recorded under Iter.

Table 2.1 (second-order): 2-D Nonseparable Elliptic Equation

Method (grid) Storage TO Ts Iter Error

MUD2 (97 x 193) 0.252 Mwords 0.08 sec 0.07 sec 3 0.47e-4

LINEX (97 x 193) 0.187 Mwords 0.07 sec 8.90 sec 5,400 0.47e-4

MUH2 (101 x 201) 0.311 Mwords 0.36 sec 0.08 sec 3 0.43e-4

MUH2 (100 x 200) 4.220 Mwords 33.91 sec 0.22 sec n.a. 0.45e-4

MUD2 reaches discretization level error in one full multigrid cycle. LINEX illus-

trates how multigrid can accelerate the convergence of traditional relaxation

- 29-

schemes. Line relaxation frozen at the finest grid requires over 5,000 iterations to

reach the same accuracy achieved in only 3 when used within the multigrid algo-

rithm! The third hybrid method is a reasonably economical combination of Gaussian

elimination and line relaxation on a very close grid fit. The fourth direct method is

very expensive due to storage and factorization of the block tridiagonal coefficient

matrix coming from the discretization. The solution phase with the direct method

involves only a forward and backward matrix vector sweep. Table 2.2 compares the

performance of MUD24 and MUH24 (again used as both an iterative and direct

method) in obtaining fourth-order approximations. Roundoff error and changes in

grid size account for the slight differences in error.

Table 2.2 (fourth-order): 2-D Nonseparable Elliptic Equation

Method Storage Ts Error

MUD24 (97 x 193) 0.439 Mwords 0.08 sec 0.79e-8

MUH24 (101 x 201) 0.514 Mwords 0.09 sec 0.69e-8

MUH24 (100 x 200) 4.420 Mwords 0.26 sec 0.91e-8

Example 3. Helmholtz Equation on the Sphere (one degree grid)

Solve the two-dimensional Helmholtz equation in spherical coordinates

V (o(,0)V(u (,0))) - X(, )u (,0) = f (,0)

on a one degree grid on the full surface of a sphere of radius one (q and 0 are longi-

tude and colatitude). u (4,9) is specified at the poles 0=0, 7r and is periodic in I5.

For testing purposes, we use the exact solution

u(4,q) =- sin2(0)cos(0)cos(Oy)sin(q)

and the coefficient functions

o(4,0) = X(,0) - 3/2 + sin2 (0)cos 2 (q).

- 30-

The exact solution chosen is the restriction of u(x,y,z) = (xyz) to the surface of the

sphere. The self-adjoint form of the PDE suggests using the solver MUD2SA. How-

ever the required one degree grid cannot be fit with small values for p, q. If p =45,

q = 45, i = 4 and j = 3 are selected then an exact grid fit is obtained and multigrid

is implemented on the ascending subgrids of size: 46 x 46; 91 x 46; 181 x 91; 361 x

181. The coarsest grid has too much resolution for efficient solution with MUD2SA

or MUD2. The hybrid code MUH2 circumvents this by using Gaussian elimination

whenever the 46 by 46 grid is encountered within multigrid cycling. The original

PDE must be expanded to input coefficients to MUH2 or MUD2. The following sub-

routine can be used for this:

SUBROUTINE COEF(PHI,THETA,CPP,CTT,CP,CT,CE)SINT = SIN(THETA)IF (ABS(SINT) .GT. 1.E-5) THEN

C NOT AT POLESCOST = COS(THETA)COSP = COS(PHI)CTT = 1.5 + SINT**2*COSP**2CT =-2.*SINT*COST*COSP**2)CPP = (1.5 + SINT**2*COSP**2)/SINT**2CP =--2.*COSP*SINPCE -= -CTTELSE

C AT POLESCTT =1.0CPP 1.0CT =0.CP -=0.CE =0.END IFRETURNEND

Notice that division by zero is avoided at the poles. The coefficient subroutine will

be called but the results will not be used at the poles where u is specified. One full

multigrid cycle with the default multigrid options and line-s relaxation is executed

using both MUD2 and MUH2. MUH24 is used to increase the accuracy of the

approximation generated by MUH2 from second to fourth-order.

-31-

Table 3: Helmholtz Equation on the Sphere (one degree grid)

Method Storage Mflops Ts Error

MUD2 0.618 Mwords 115 0.07 sec 0.52e-1

MUH2 1.256 Mwords 52 0.25 sec 0.24e-4

MUH24 1.909 Mwords 58 0.28 sec 0.22e-8

There is a reduction in the megaflop rate with the hybrid method. Nevertheless, the

results indicate relaxation only is ineffective in reducing error with the high resolution

46 x 46 coarse grid. In fact, if we use the most robust form of two-dimensional relax-

ation (line in both the ( and 9 direction) at all grid levels, then 36 multigrid cycles

and almost 4 seconds of computer time are required to reach discretization level

error with MUD2. This is accomplished in one cycle and 1/4 of a second, when the

coarse grid direct method is used in conjunction with line relaxation at the higher

resolution grids. The fourth-order hybrid method gains four decimal digits of accu-

racy.

Example 4. 3-D Separable Elliptic Equation

Solve

&(e aou/ax)/ax + -(eC lu/ay)/Qy + (e~2du/-z)/az -

(x+y+z)'u(x,y,z) = r(x,y,z)

on the unit cube. Assume the solution is specified (Dirichlet) at all boundaries. The

exact solution

u(x,y,z) = eZYZ

is used for testing. Results are tabulated on three-dimensional grids with 2k + 1

points in each direction for increasing k . The code MUD3SP, which was created to

save work space for separable three-dimensional problems and only allows point

- 32-

relaxation, is used. If a combination of lines or planar relaxation is required then the

one of the nonseparable solvers, MUD3 or MUD3SA, must be used (see Examples 5

and 6). W(1,1) cycles with fully weighted residual restriction and cubic prolongation

are selected.

Table 4.1 3-D Separable Elliptic Equation (MUD3SP)

k Storage Mflop Ts Error

4 0.013 Mwords 25 0.01 sec 0.83e-5

5 0.088 Mwords 49 0.05 sec 0.21e-5

6 0.651 Mwords 86 0.27 sec 0.54e-6

7 4.995 Mwords 131 1.46 sec 0.13e-6

One would expect the execution times to increase a factor of 8 with each doubling of

resolution. MUD3SP does better than this due to enhanced vectorization with higher

resolution, as indicated by the megaflop rates. Since discretization level error was

reached (errors are reduced by a factor of 4 with each doubling of resolution),

MUD34SP can be used to improve accuracy. The fourth-order solver generates an

approximation on the 333 grid with the same error as the second-order approximation

on the 1293 grid.

Table 4.2 3-D Separable Elliptic Equation (MUD34SP)

k Storage Mflop Ts Error

4 0.020 Mwords 19 0.03 sec 0.lle-5

5 0.124 Mwords 35 0.15 sec 0.13e-6

6 0.963 Mwords 62 0.74 sec 0.11e-7

7 7.142 Mwords 97 3.82 sec 0.83e-9

- 33-

Example 5. 3-D Helmholtz Equation in Spherical Coordinates

Solve

V2(u(p,9,)) - u(p,O,q) = (p,A,q).

On the region 0.5 < p < 1, 7r/4 < 0 < 31r/4, 0 < • < 27r. Multiplying the left- and

right-hand sides by p2sin(0) puts it in the following (expanded) form suitable for the

nonseparable self-adjoint solver MUD3SA:

a(p2sin(0)u /lap)/lap + 9(sin(9)9u /la)/9a + (l/sin()89u /la)/8a -

p sin(U)u (p,0,+) = in(o)f (pi9,)

Assume u(p,0,0) is specified at the p, 0 boundaries and is periodic in 0. The exact

solution

u (p,9,A) = sin(0)cos(0)/p

is used for testing. Grid size parameter are p =3, q =r = 2, i = j+l = k.

MUD3SA executes one full multigrid cycle with the default options and with line

relaxation in the p direction. Discretization level error is reached for each grid size

tested.

Table 5: 3-D Helmholtz Equation in Spherical Coordinates (MUD3SA)

Grid Size Storage TO Ts Mflop Error

25 X 9 X 17 0.056 Mwords 0.13 sec 0.02 sec 16 0.17e-2




The growth of TO relative to Ts with increasing resolution (due to nonvectorized

scalar operations in the discretization) underlines the importance of using non-initial

calls when possible.

- 34-

Example 6. 3-D Anisotropic Elliptic Equation

Solve the nonseparable equation

lOyz 0u/ax2 + xz/10 2U/ay2 + lOo10 y 2u/az2 -

(xyz)-u(x,y,z) = r(x,y,z)

on the region 0.5 < x,y,z < 1 with the derivative boundary conditions

au/9x + yzu(1,y,z) = h(y,z) (at x = 1)

au/az + xyu(x,y,1) = g(x,y) (at z = 1)

Assume u(x,y,z) is specified at all other boundaries and use the exact solution

(X,y,z) = (xyz) 2

for testing. Since the exact solution also satisfies the second-order discretization, a

very good approximation (within roundoff error) can be expected. A subroutine for

inputting the derivative boundary conditions is:

SUBROUTINE BND(KBDY,XORY,YORZABDY,GBDY)IF (KBDY.EQ.2) THENX - 1.0Y = XORYZ - YORZU = (X*Y*Z)**2DUDX = 2.*X*(Y*Z)**2ABDY = 1.0GBDY = DUDX + ABDY*URETURNEND IFIF (KBDY.EQ.6) THENX = XORYY -YORZZ = 1.0U = (X*Y*Z)**2DUDZ = 2.*Z*(X*Y)**2ABDY = 1.0GBDY = DUDX + ABDY*URETURNEND IFEND

The solver MUD3 is used on a 97 by 33 by 129 grid. The domination of the z and

(to a lesser extent) x derivative coefficients is amplified in the discretization by

- 35-

choosing more grid points in these directions. The unbalanced coefficients suggest

point or line relaxation will not work well ([22]). To explore this, planar x-z relaxa-

tion, line relaxation in the x and z direction, and point relaxation are all compared

in Table 6. V(2,1) cycles with linear prolongation are used with each relaxation

scheme.

Table 6: 3-D Anisotropic Elliptic Equation (MUD3)

Method Storage Cycles Mflop Ts Error

planar x-z 5.184 Mwords 3 51 9.33 sec 0.16e-10

line x-z 7.150 Mwords 16 86 9.88 sec 0.21e-4

point 4.295 Mwords 38 119 9.54 sec 0.56e-2

Full two-dimensional multigrid cycling with line relaxation in the z direction is used

on each plane within 3-D planar relaxation. This is expensive to implement and

costs more (per three-dimensional cycle) than point or line relaxations. Nevertheless,

planar relaxation requires only three cycles to reach discretization level error for this

anisotropic PDE. The other relaxation methods are allowed to execute cycles until

they have used approximately the same amount of computer time. The results indi-

cate they are not nearly as effective in error reduction. Multigrid iteration should

converge in only a few cycles when used with the "correct" relaxation method.

Example 7. Asymmetric Grid Size

Typically, three-dimensional weather models use more grid points in the horizontal

direction, where the scales are larger, than in the vertical direction. To simulate

this, we use eight times more horizontal than vertical resolution in solving the equa-

tion

(a(x)u ) x + (b(y)uy)y + (c(z)u z = r(x,y,z)

on the region 0 < x,y <10000 km and 0 < z < 10 km. The exact solution

- 36-

u(x,y,z) = e-Z/o

sin(7rx/o) cos(iry/y )

where z0 = 10 and x0 = y- = 2500, is used to set the right-hand side and boundary

conditions and to compute the error. Here we assume u is periodic in x and y,

specified at the lower z boundary, and satisfies the mixed derivative condition

u + u = h(x,y) at z =10.

The PDE coefficients are given by

a (x) = 1 + sin 2(7rx/X 0 )

b(y) = 1 + cos2(7ry/y 0 )

-z/zoc(z) e

Since the PDE is separable MUD3SP should be selected. The default multigrid

options and point relaxation are utilized. Grid size parameters are p = q = r = 3

and i =j = k+3 for increasing k. One full multigrid cycle reaches discretization

level error for each grid size tested.

Table 7 Asymmetric Grid Size (MUD3SP)

Grid

49 X 49 X 7

97 X 97 x 13

193 X 193 X 25

395 X 395 X 49

Storage

0.057 Mwords

0.324 Mwords

2.299 Mwords

17.271 Mwords

Mflop

37

76

128

193

Ts

0.058 sec

0.189 sec

0.829 sec

4.633 sec

Error

0.46e-4

0.12e-4

0.29e-5

0.77e-6

II

-

I

II

- 37-

Appendix

Major portions of the documentation file mud3.d for the MUDPACK solver MUD3

are listed in this appendix. Omitted sections are indicated with dots. Documenta-

tion files for the other solvers have a similar format.

CC SUBROUTINE MUD3(IPARM,FPARM,WORKCOEF,BNDYC,RHS,PHIMGOPT,IERROR)CCC COMPLETE DOCUMENTATION FOR MUD3 IS GIVEN BELOW. A SAMPLEC FORTRAN DRIVER IS FILE "TMUD3" ON LIBRARY "MUDPACK".CCC REQUIRED FILESCC MUD3COMC MUD3LN, MUDFAC (IF LINE RELAXATION(S) IS (ARE) USED)C MUD3PN (IF PLANAR RELAXATION IS USED)C MUDFAC (IF LINE RELAXATION(S) IS (ARE) USED WITHIN MUD3PN)CC PURPOSECC SUBROUTINE MUD3 AUTOMATICALLY DISCRETIZES AND ATTEMPTS TO COMPUTEC THE SECOND ORDER FINITE DIFFERENCE APPROXIMATION TO A THREE-C DIMENSIONAL LINEAR NONSEPARABLE ELLIPTIC PARTIAL DIFFERENTIALC EQUATION ON A BOX. THE APPROXIMATION IS GENERATED ON A UNIFORMC GRID COVERING THE BOX (SEE MESH DESCRIPTION BELOW). BOUNDARYC CONDITIONS MAY BE ANY COMBINATION OF MIXED, SPECIFIED (DIRICHLET)C OR PERIODIC. THE FORM OF THE PDE SOLVED IS ...CC CXX(X,Y,Z)*PXX + CYY(X,Y,Z)*PYY + CZZ(Z,Y,Z)*PZZ +CC CX(X,Y,Z)*PX + CY(X,Y,Z)*PY + CZ(X,Y,Z)*PZ +CC CE(X,Y,Z)*P(X,Y,Z) = R(X,Y,Z)CC HERE CXX,CYY,CZZ,CX,CY,CZ,CE ARE THE KNOWN REAL COEFFICIENTSC OF THE PDE; PXX,PYYPZZ,PX,PY,PZ ARE THE SECOND AND FIRSTC PARTIAL DERIVATIVES OF THE UNKNOWN SOLUTION FUNCTION P(X,Y,Z)C WITH RESPECT TO THE INDEPENDENT VARIABLES X,Y,Z; R(X,Y,Z) ISC IS THE KNOWN REAL RIGHT HAND SIDE OF THE ELLIPTIC PDE.CCC MESH DESCRIPTION ...CC THE APPROXIMATION IS GENERATED ON A UNIFORM NX BY NY BY NZ GRID.C THE GRID IS SUPERIMPOSED ON THE RECTANGULAR SOLUTION REGIONC

- 38-

C [XAB] X [YC,YD] X [ZEZF].

C AUTHOR AND SPECIALISTCC JOHN C. ADAMS (NCAR-1990)CCC PORTABILITYCC MUD3 ADHERES TO FORTRAN-77 STANDARDS

C PARAMETER DESCRIPTIONCCC~~**«*********4**********«**********************************i***<********

C INPUT PARAMETERSCCCC IPARMCC AN INTEGER VECTOR OF LENGTH 23 USED TO EFFICIENTLY PASSC INTEGER PARAMETERS. IPARM IS SET INTERNALLY IN MUD3C AND DEFINED AS FOLLOWS ...CCC INTL=IPARM(1)CC AN INITIALIZATION PARAMETER. INTL =0 MUST BE INPUTC ON AN INITIAL CALL TO MUD3. IN THIS CASE FULL DISCRETIZATIONC OF THE PDE WILL BE PERFORMED. INTL.NE.0 SHOULD BE INPUTC IF MUD3 HAS BEEN CALLED PREVIOUSLY AND ONLY THE VALUESC IN RHS (SEE BELOW) OR GBDY (SEE BNDYC BELOW) OR PHIC (SEE BELOW) HAVE CHANGED. THIS WILL BYPASS DISCRETIZATIONC AND SAVE TIME. MUD3 MUST BE CALLED WITH INTL=IPARM(1)=0 IFC ANY OTHER PARAMETERS HAVE CHANGED FROM THE PREVIOUS CALL.CCC NXA=IPARM(2)CC FLAGS BOUNDARY CONDITIONS ON THE (Y,Z) PLANE X=XACC = 0 IF P(X,Y,Z) IS PERIODIC IN X ON [XAXB]C (I.E., P(X+XB-XA,Y,Z) = P(X,Y,Z) FOR ALL X,Y,Z)CC = 1 IF P(XA,Y,Z) IS SPECIFIED (THIS MUST BE INPUT THRU PHI(1,J,K))CCC

= 2 IF THERE ARE MIXED DERIVATIVE BOUNDARY CONDITIONS AT X=XA(SEE 'BNDYC" DESCRIPTION BELOW WHERE KBDY = 1)

f��ff�ffff�f�f��jt��ff�fJfff�SS�fftt

- 39-

C NZF=IPARM(7)CC FLAGS BOUNDARY CONDITIONS ON THE (X,Y) PLANE Z=ZFCC = 0 IF P(X,Y,Z) IS PERIODIC IN Z ON [ZE,ZF]C (I.E., P(X,Y,Z+ZF-ZE) = P(X,Y,Z) FOR ALL X,Y,ZCC = 1 IF P(X,Y,ZF) IS SPECIFIED (THIS MUST BE INPUT THRU PHI(I,J,NZ))CC = 2 IF THERE ARE MIXED DERIVATIVE BOUNDARY CONDITIONS AT Z=ZFC (SEE 'BNDYC" DESCRIPTION BELOW WHERE KBDY = 6)CCC GRID SIZE PARAMETERSCCC IP = IPARM(8)CCCCCCCCCCCCCCCCCCCCCCCCCC

AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBEROF GRID POINTS IN THE X DIRECTION (SEE NX = IPARM(14)). "IXP+1"IS THE NUMBER OF POINTS ON THE COARSEST X GRID VISITED DURINGMULTIGRID CYCLING. IXP SHOULD BE CHOSEN AS SMALL AS POSSIBLE.RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,ESPECIALLY IF LINE RELAXATION IN THE X DIRECTION IS NOT USED.IF IXP > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF IXP CAN BE REMOVED BY INCREASING IEX = IPARM(11)WITHOUT CHANGING NX =IPARM(14)

JYQ = IPARM(9)

AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Y DIRECTION (SEE NY = IPARM(15)). "JYQ+1"IS THE NUMBER OF POINTS ON THE COARSEST Y GRID VISITED DURINGMULTIGRID CYCLING. JYQ SHOULD BE CHOSEN AS SMALL AS POSSIBLE.RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,ESPECIALLY IF LINE RELAXATION IN THE Y DIRECTION IS NOT USED.IF JYQ > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF JYQ CAN BE REMOVED BY INCREASING JEY = IPARM(12)WITHOUT CHANGING NY = IPARM(15)

CC KZR = IPARM(10)CC AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBERC OF GRID POINTS IN THE Z DIRECTION (SEE NZ = IPARM(16)). "KZR+1"C IS THE NUMBER OF POINTS ON THE COARSEST Z GRID VISITED DURINGC MULTIGRID CYCLING. KZR SHOULD BE CHOSEN AS SMALL AS POSSIBLE.C RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.C LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,

- 40-

CCccCCCIEDCCCCCCCCCCJE.CCCCCCCCCCKECCCCCCCCCC NXCCCCCc

ESPECIALLY IF LINE RELAXATION IN THE Z DIRECTION IS NOT USED.IF KZR > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF KZR CAN BE REMOVED BY INCREASING KEZ = IPARM(13)WITHOUT CHANGING NZ = IPARM(16)

C IP= ARM(11)

A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE X DIRECTION (SEE NX = IPARM(14)).IEX .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,IEX SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND IXP=IPARM(8)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NX = IPARM(14).

Y = IPARM(12)

A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Y DIRECTION (SEE NY = IPARM(15)).JEY .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,JEY SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND JYQ=IPARM(9)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NY = IPARM(15).

;Z =IPARM(13)

A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Z DIRECTION (SEE NZ = IPARM(16)).KEZ .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,KEZ SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND KZR=IPARM(10)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NZ = IPARM(16).

:=IPARM(14)

THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [XAXB](INCLUDING THE BOUNDARIES). NX MUST HAVE THE FORM

NX = DCP*(2**(IEX-1)) + 1

C WHERE DCIP = IPARM(8), IEX = IPARM(1).CCC NY = IPARM(15)CC THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [YC,YD]C (INCLUDING THE BOUNDARIES). NY MUST HAVE THE FORM:CC NY = JYQ*(2**(JEY-1)) + 1CC WHERE JYQ = IPARM(9), JEY = IPARM(12).

-41-

CCC NZ = IPARM(16)CC THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [ZE,ZF]C (INCLUDING THE BOUNDARIES). NZ MUST HAVE THE FORMCC NZ = KZR*(2**(KEZ-1)) + 1CC WHERE KZR = IPARM(10), KEZ = IPARM(13)CCC *** EXAMPLECC SUPPOSE A SOLUTION IS WANTED ON A 33 BY 65 BY 97 GRID. THENC IXP=2, JYQ=4, KZR=6 AND IEX=JEY=KEZ=5 COULD BE USED. A BETTERC CHOICE WOULD BE IXP=JYQ=2, KZR=3, AND IEX=5, JEY=KEZ=6.

C IGUESS=IPARM(17)CC = 0 IF NO INITIAL GUESS TO THE PDE IS PROVIDEDC AND/OR FULL MULTIGRID CYCLING BEGINNING AT THEC COARSEST GRID LEVEL IS DESIRED.CC = 1 IF AN INITIAL GUESS TO THE PDE AT THE FINEST GRIDC LEVEL IS PROVIDED IN PHI (SEE BELOW). IN THIS CASEC CYCLING BEGINNING OR RESTARTING AT THE FINEST GRIDC IS INITIATED.

C MAXCY = IPARM(18)CC THE EXACT NUMBER OF CYCLES EXECUTED BETWEEN THE FINESTC (NX BY NY BY NZ) AND THE COARSEST ((DCP+1) BY (JYQ+1) BYC (KZR+1)) GRID LEVELS WHEN TOLMAX=--FPARM(7)=0.0 (NO ERRORC CONTROL). WHEN TOLMAX=FPARM(7).GT.O.O IS INPUT (ERROR CONTROL)C THEN MAXCY IS A LIMIT ON THE NUMBER OF CYCLES BETWEEN THEC FINEST AND COARSEST GRID LEVELS. IN ANY CASE, AT MOSTC MAXCY*(IPRER+IPOST) RELAXATION SWEEPS ARE PERFORMED AT THEC FINEST GRID LEVEL (SEE IPRER=MGOPT(2),IPOST=MGOPT(3) BELOW)C WHEN MULTIGRID ITERATION IS WORKING "CORRECTLY" ONLY A FEWC CYCLES ARE REQUIRED FOR CONVERGENCE. LARGE VALUES FOR MAXCYC SHOULD NOT BE REQUIRED.CCC METHOD = IPARM(19)CCCC

THIS SETS THE METHOD OF RELAXATION (ALL SCHEMES USERELAXATION ON ALTERNATING POINTS OR LINES OR PLANES)

C = 0 FOR GAUSS-SEIDEL POINTWISE RELAXATION

- 42-

CC = 1 FOR LINE RELAXATION IN THE X DIRECTIONCC = 2 FOR LINE RELAXATION IN THE Y DIRECTIONCC = 3 FOR LINE RELAXATION IN THE Z DIRECTIONCC = 4 FOR LINE RELAXATION IN THE X AND Y DIRECTIONCC = 5 FOR LINE RELAXATION IN THE X AND Z DIRECTIONCC =6 FOR LINE RELAXATION IN THE Y AND Z DIRECTIONCC = 7 FOR LINE RELAXATION IN THE X,Y AND Z DIRECTIONCC = 8 FOR X,Y PLANAR RELAXATIONCC = 9 FOR X,Z PLANAR RELAXATIONCC =10 FOR Y,Z PLANAR RELAXATION

C LENGTH = IPARM(21)CC THE LENGTH OF THE WORK SPACE PROVIDED IN VECTOR WORK.

C FPARMCC A FLOATING POINT VECTOR OF LENGTH 8 USED TO EFFICIENTLYC PASS FLOATING POINT PARAMETERS. FPARM IS SET INTERNALLYC IN MUD3 AND DEFINED AS FOLLOWS ...CCC XA=FPARM(1), XB=FPARM(2)CC THE RANGE OF THE X INDEPENDENT VARIABLE. XA MUSTC BE LESS THAN XBCCC YC=FPARM(3), YD=FPARM(4)CC THE RANGE OF THE Y INDEPENDENT VARIABLE. YC MUSTC BE LESS THAN YD.CCC ZE=FPARM(5), ZF=FPARM(6)CC THE RANGE OF THE Z INDEPENDENT VARIABLE. ZE MUSTC BE LESS THAN ZF.CC

- 43-

C TOLMAX = FPARM(5)CC WHEN INPUT POSITIVE, TOLMAX IS A MAXIMUM RELATIVE ERROR TOLERANCEC USED TO TERMINATE THE RELAXATION ITERATIONS ...

CW(CCCCCCCCBNCCCCC

)RK

A ONE DIMENSIONAL ARRAY THAT MUST BE PROVIDED FOR WORK SPACE.SEE LENGTH = IPARM(21). THE VALUES IN WORK MUST BE PRESERVEDIF MUD3 IS CALLED AGAIN WITH INTL=IPARM(1).NE.0 OR IF MUD34IS CALLED TO IMPROVE ACCURACY.

DYC

A SUBROUTINE WITH PARAMETERS (KBDYXORY,YORZALFA,GBDY).WHICH ARE USED TO INPUT MIXED BOUNDARY CONDITIONS TO MUD3.THE BOUNDARIES ARE NUMBERED ONE THRU SIX AND THE FORM OFCONDITIONS ARE DESCRIBED BELOW.

CCC (1) THE KBDY=1 BOUNDARYCC THIS IS THE (Y,Z) PLANE X=XA WHERE NXA=IPARM(2) = 2 FLAGSC A MIXED BOUNDARY CONDITION OF THE FORMCC DP/DX + ALFXA(Y,Z)*P(XA,Y,Z) = GBDXA(Y,Z)CC IN THIS CASE KBDY=1XORY=Y,YORZ=Z WILL BE INPUT TO BNDYC ANDC ALFA,GBDY CORRESPONDING TO ALFXA(Y,Z),GBDXA(Y,Z) MUST BE RETURNED.

C (6) THE KBDY=6 BOUNDARYCC THIS IS THE (X,Y) PLANE Z=ZF WHERE NZF=IPARM(7) = 2 FLAGSC A MIXED BOUNDARY CONDITION OF THE FORMCC DP/DZ + ALFZF(Y,Z)*P(X,Y,ZF) = GBDZF(X,Y)CC IN THIS CASE KBDY=6,XORY=X,YORZ=Y WILL BE INPUT TO BNDYC ANDC ALFA,GBDY CORRESPONDING TO ALFZF(X,Y),GBDZF(X,Y) MUST BE RETURNED.

C COEFCC A SUBROUTINE WITH PARAMETERS (X,Y,Z,CXX,CYY,CZZ,CX,CY,CZ,CE)C WHICH PROVIDES THE KNOWN REAL COEFFICIENTS FOR THE ELLIPTIC PDEC AT ANY GRID POINT (X,Y,Z). THE NAME CHOSEN IN THE CALLING ROUTINEC MAY BE DIFFERENT WHERE THE COEFFICIENT ROUTINE MUST BE DECLAREDC EXTERNAL.

- 44 -

CCCRHSCC AN ARRAY DIMENSIONED NX BY NY BY NZ WHICH CONTAINSC THE GIVEN RIGHT HAND SIDE VALUES ON THE UNIFORM 3-D MESH.C RHS(I,J,J) = R(XI,YJ,ZK) FOR I=1,...,NX AND J=1,...,NYC AND K=1,...,NZ. RHS CAN BE EQUIVALENCED WITH THE "1+8*NX*NY*NZ"C WORD OF "WORK" IN THE PROGRAM CALLING MUD3 TO SAVE SPACE IFC AND ONLY IF POINT RELAXATION (METHOD=IPARM(17)=0) IS CHOSEN.C IF RHS IS EQUIVALENCED WITH ANY OTHER WORD OF WORK OR IFC EQUIVALENCING IS USED WHEN METHOD.NE.0 THEN AN UNDETECTABLEC ERROR WILL RESULT.CC* WARNINGCC VALUES IN THE ARRAY RHS ARE DESTROYED BY MUD3 IF EQUIVALENCINGC WITH WORK IS USED WHEN METHOD = 0 OR IF METHOD > 0. VALUESC IN RHS ARE PRESERVED ONLY IF METHOD = 0 AND EQUIVALENCING WITHC WORK IS NOT USED.CCC PHICC AN ARRAY DIMENSIONED NX BY NY BY NZ . ON INPUT PHI MUSTC CONTAIN SPECIFIED BOUNDARY VALUES AND AN INITIAL GUESSC TO THE SOLUTION IF FLAGGED (SEE IGUESS=IPARM(17)=1l). FORC EXAMPLE, IF NYD=IPARM(5)=1 THEN PHI(I,NY,K) MUST BE SETC EQUAL TO P(XI,YD,ZK) FOR I=1,...,NX AND K=1,...,NZ PRIOR TOC CALLING MUD3. THE SPECIFIED VALUES ARE PRESERVED BY MUD3.C PHI CAN BE EQUIVALENCED WITH THE FIRST WORD OF WORK IN THEC PROGRAM CALLING MUD3 TO SAVE SPACE IF ERROR CONTROL IS NOTC SELECTED (TOLMAX=FPARM(7)=0.0 IS INPUT). EQUIVALENCING PHIC WITH WORK WILL CAUSE AN UNDETECTABLE ERROR IF ERROR CONTROLC IS REQUESTED. PHI MUST NOT BE EQUIVALENCED WITH WORK IF MUD34C WILL LATTER BE CALLED TO IMPROVE ACCURACYCC IF NO INITIAL GUESS IS GIVEN (IGUESS=0) THEN PHI MUST STILLC BE INITIALIZED AT NON-DIRICHLET GRID POINTS (THIS IS NOTC CHECKED). THESE VALUES ARE PROJECTED DOWN AND SERVE AS AN INITIALC GUESS TO THE PDE AT THE COARSEST GRID LEVEL. SET PHI TO 0.0 ATC NONDIRICHLET GRID POINTS IF NOTHING BETTER IS AVAILABLE.CCC MGOPTCC AN INTEGER VECTOR OF LENGTH 5 WHICH ALLOWS THE USER TO SELECTC AMONG VARIOUS MULTIGRID OPTIONS. IF MGOPT(1)=0 IS INPUT THENC A DEFAULT SET OF MULTIGRID PARAMETERS (CHOSEN FOR ROBUSTNESS)C WILL BE INTERNALLY SELECTED AND THE REMAINING VALUES IN MGOPTC WILL BE IGNORED. IF MGOPT(1) IS NONZERO THEN THE PARAMETERSC IN MGOPT ARE SET INTERNALLY AND DEFINED AS FOLLOWS: (SEE THEC BASIC COARSE GRID CORRECTION ALGORITHM BELOW)C

- 45-

CCKCCCCCCCCCCCCCCCC IPRCCCCC

YCLE = MGOPT(1)

=-1 IF F CYCLING IS TO BE USED...

= 0 IF DEFAULT MULTIGRID OPTIONS ARE TO BE USED

= 1 IF V CYCLING IS TO BE USED (THE LEAST EXPENSIVE PER CYCLE)

= 2 IF W CYCLING IS TO BE USED (THE DEFAULT)

> 2 IF MORE GENERAL K CYCLING IS TO BE USED(WARNING-VALUES LARGER THAN 1 OR 2 INCREASETHE EXECUTION TIME PER CYCLE CONSIDERABLY ANDRESULT IN THE NON-FATAL ERROR, IERROR = -5)

'ER = MGOPT(2)

THE NUMBER OF PRE-RELAXATION" SWEEPS EXECUTED BEFORE THERESIDUAL IS RESTRICTED AND CYCLING IS INVOKED AT THE NEXTCOARSER GRID LEVEL (DEFAULT VALUE IS 2 WHENEVER MGOPT(1)=0)

C IPOST = MGOPT(3)C

THE NUMBER OF 'OST RELAXATION" SWEEPS EXECUTED AFTER CYCLINGHAS BEEN INVOKED AT THE NEXT COARSER GRID LEVEL AND THE RESIDUALCORRECTION HAS BEEN TRANSFERRED BACK (DEFAULT VALUE IS 1WHENEVER MGOPT(1)=0).

C IRESW = MGOPT(4)CC = 0 IF UNWEIGHTED (IDENTITY) RESIDUAL RESTRICTIONS ARE USED.C * WARNING-ORDINARILY THIS OPTION GIVES VERY POOR RESULTSC WHEN USED WITHIN MULTIGRID ITERATION. IT IS INCLUDED ASC AN OPTION ONLY FOR ALGORITHM EXPERIMENTATION.CC = 1 IF FULLY WEIGHTED RESIDUAL RESTRICTIONS ARE USED (THIS IS THEC DEFAULT VALUE WHENEVER MGOPT(l)=0 AND IS THE MOST ROBUST).CC =2 IF HALF WEIGHTING IS USED WITH RESIDUAL RESTRICTIONS. THISC OPTION REQUIRES LESS COMPUTATION THAN FULL WEIGHTING AND,C WITH RED/BLACK POINT RELAXATION, SOMETIMES GIVES SIMILARC CONVERGENCE RATES. EXPERIENCE HAS SHOWN IT IS NOT AS ROBUSTC AS FULL WEIGHTING AND SHOULD BE USED WITH CAUTION.CCC INTPOL = MGOPT(5)CC = 1 IF MULTILINEAR PROLONGATION (INTERPOLATION) IS USED TOC TRANSFER RESIDUAL CORRECTIONS AND THE PDE APPROXIMATIONC FROM COARSE TO FINE GRIDS WITHIN FULL MULTIGRID CYCLING.CC = 3 IF MULTICUBIC PROLONGATION (INTERPOLATION) IS USED TO

CCCCCC

- 46-

C TRANSFER RESIDUAL CORRECTIONS AND THE PDE APPROXIMATIONC FROM COARSE TO FINE GRIDS WITHIN FULL MULTIGRID CYCLING.C (THIS IS THE DEFAULT VALUE WHENEVER MGOPT(1)=0).CC THE DEFAULT VALUES (2,2,1,1,3) IN THE VECTOR MGOPT WERE CHOSEN FORC ROBUSTNESS. IN SOME CASES V(2,1) CYCLES WITH LINEAR PROLONGATION WILLC GIVE GOOD RESULTS WITH LESS COMPUTATION (ESPECIALLY IN TWO-DIMENSIONS).C THIS WAS THE DEFAULT AND ONLY CHOICE IN AN EARLIER VERSION OF MUDPACKC (SEE [1]) AND CAN BE SET WITH THE INTEGER VECTOR (1,2,1,1,1) IN MGOPT.

C********************************************************************

C*OUTPUT PARAMETERS**************************************************C********************************************************************

CCC IPARM(22)CC ON OUTPUT IPARM(22) CONTAINS THE ACTUAL WORK SPACE LENGTHC REQUIRED FOR THE CURRENT GRID SIZES AND METHOD.CCC IPARM(23)CC IF ERROR CONTROL IS SELECTED (TOLMAX = FPARM(7) .GT. 0.0) THENC ON OUTPUT IPARM(23) CONTAINS THE ACTUAL NUMBER OF CYCLES EXECUTEDC BETWEEN THE COARSEST AND FINEST GRID LEVELS IN OBTAINING THEC APPROXIMATION IN PHI. THE QUANTITY (IPRER+IPOST)*IPARM(23) ISC THE NUMBER OF RELAXATION SWEEPS PERFORMED AT THE FINEST GRID LEVEL.CCC FPARM(8)CC ON OUTPUT FPARM(8) CONTAINS THE FINAL COMPUTED MAXIMUM RELATIVEC DIFFERENCE BETWEEN THE LAST TWO ITERATES AT THE FINEST GRID LEVEL.C FPARM(8) IS COMPUTED ONLY IF THERE IS ERROR CONTROL (TOLMAX.GT.0.)

C WORKCC ON OUTPUT WORK CONTAINS INTERMEDIATE VALUES THAT MUST NOT BEC DESTROYED IF MUD3 IS TO BE CALLED AGAIN WITH IPARM(1)=l ORC IF MUD34 IS TO BE CALLED TO IMPROVE THE ESTIMATE TO FOURTHC ORDER.CCC PHICC ON OUTPUT PHI(I,J,K) CONTAINS THE APPROXIMATION TOC P(XI,YJ,ZK) FOR ALL MESH POINTS I=1,...,NX; J=1,...,NY;C K=1,...,NZ. THE LAST COMPUTED ITERATE IN PHI IS RETURNEDC EVEN IF CONVERGENCE IS NOT OBTAINED (IERROR=-1)

- 47-

CCC IERRORCC AN ERROR FLAG THAT INDICATES INVALID INPUT PARAMETERSC WHEN RETURNED POSITIVE.CCC NON-FATAL ERRORS * * *CC =-5 IF KCYCLE=MGOPT(1) IS GREATER THAN 2. VALUES LARGER THAN 2 RESULTC IN AN ALGORITHM WHICH PROBABLY DOES FAR MORE COMPUTATION THANC NECESSARY. KCYCLE = -1 (F CYCLES) OR KCYCLE = 1 (V CYCLES)C OR KCYCLE = 2 (W CYCLES) SHOULD SUFFICE FOR MOST PROBLEMS. THEC IERROR=-5 FLAG IS OVERRIDDEN BY ANY OTHER FATAL OR NON-FATAL ERROR.CC =-4 IF THERE ARE DOMINANT NONZERO FIRST ORDER TERMS IN THE PDE WHICHC MAKE IT "HYPERBOLIC" AT THE FINEST GRID LEVEL ...C =-3 IF THE PDE IS SINGULAR (ALL BOUNDARY CONDITIONS ARE PERIODIC ORC PURE DERIVATIVE AND CE(X,Y,Z) = 0 FOR ALL (X,Y,Z)). A SOLUTIONC IS ATTEMPTED BUT CONVERGENCE MAY NOT OCCUR DUE TO ILL CONDITIONINGC OF THE DISCRETIZED LINEAR SYSTEM. THE IERROR = -3 FLAG OVERRIDESC THE IERROR = -2 AND IERROR = -1 FLAGS BELOW.CCC =-2 IF THE PDE IS NOT ELLIPTIC AT SOME MESH POINT(S). THIS MEANSC CXX,CYY,CZZ ARE NOT POSITIVE FOR ALL GRID POINTS (X,Y,Z)C IN THIS CASE A SOLUTION IS STILL ATTEMPTED BUT CONVERGENCE MAYC NOT OCCUR DUE TO ILL CONDITIONING OF THE DISCRETIZED LINEARC SYSTEM. THE IERROR = -2 FLAG OVERRIDES THE IERROR = -1C FLAG BELOW.CC =-1 IF CONVERGENCE TO THE TOLERANCE SPECIFED IN TOLMAX=FPARM(7)C IS NOT OBTAINED IN MAXCY=IPARM(18) MULTIGRID CYCLES. IN THIS CASEC THE LAST COMPUTED ITERATE IS STILL RETURNED.CCC NO ERRORS * * *CC = 0 IF THE SOLUTION IS OBTAINEDCCC FATAL ERRORS * * *CC = 1 IF INTL=IPARM(1) IS NONZERO ON AN INITIAL CALL TO MUD3CC =2 IF ANY OF THE BOUNDARY CONDITION FLAGS NXA,NXB,NYC,NYD,NZE,NZFC IN IPARM(2),IPARM(3),IPARM(4),IPARM(5),IPARM(6),IPARM(7)C ARE NOT 0, 1 OR 2. ALSO IF NXA,NXB OR NYC,NYD OR NZE,NZFC ARE NOT PAIRWISE ZERO FOR PERIODIC B.C. (E.G., NXA=0,NXB=2).CC = 3 IF MIN0(IXP,JYQ,KZR) < 2 (IXP=IPARM(8),JYQ=IPARM(9),KZR=IPARM(10))CC = 4 IF ANY OF THE EXPONENTS EX,JEY,KEZ DO NOT LE BETWEEN 1 AND 50

- 48-

C (IEX=IPARM(11),JEY=IPARM(12),KEZ=IPARM(13))CC = 5 IF NX.NE.IP*2**(IEX-1)+ OR NY.NE.JYQ*2**(JEY-1)+1C OR NZ.NE.KZR*2**(KEZ-1)+l (NX=IPARM(14),NY=IPARM(15),NZ=IPARM(16))CC = 6 IF IGUESS = IPARM(17) IS NOT EQUAL TO 0 OR 1CC = 7 IF MAXCY = IPARM(18) < 1CC = 8 IF METHOD.LT.0 OR METHOD.GT.10C ORC METHOD = 8 AND METH2.LT.0 OR METH2.GT.3C ORC METHOD = 9 AND METH2.LT.0 OR METH2.GT.3C ORC METHOD =10 AND METH2.LT.0 OR METH2.GT.3C (METHOD = IPARM(19), METH2=IPARM(20))CC = 9 IF LENGTH = IPARM(21) < IPARM(22) (INSUFFICIENT WORK SPACE)CC =10 IF XA > XB OR YC > YD OR ZE > ZFC (XA=FPARM(1),XB=FPARM(2),YC=FPARM(3),YD=FPARM(4),C ZE=FPARM(5),ZF=FPARM(6))CC =11 IF TOLMAX=FPARM(7) < 0.0CC * * * ERRORS IN SETTING MULTIGRID OPTIONS * * *CC =21 IF KCYCLE=MGOPT(1) < -1 (SEE ALSO IERROR = -5)CC =22 IF IPRER=MGOPT(2) < 1 WHEN KCYCLE IS NONZEROCC =23 IF IPOST=MGOPT(3) < 1 WHEN KCYCLE IS NONZEROCC =24 IF IRESW=MGOPT(4) IS NOT 0,1 OR 2 WHEN KCYCLE IS NONZEROCC =25 IF INTPOL=MGOPT(5) IS NOT 1 OR 3 WHEN KCYCLE IS NONZEROC

***********************************

CC END OF MUD3 DOCUMENTATIONCC*******,**********ii****4:**************

- 49-

References

[1] J. Adams, 'MUDPACK: Multigrid Fortran Software for the Efficient Solution of

Linear Elliptic Partial Differential Equations," Applied Math. and Comput.

Vol.34, Nov 1989, pp.113-146.

[2] J. Adams,'FMG Results with the Multigrid Software Package MUDPACK,"

Proceedings of the Fourth Copper Mountain Conference on Multigrid, SIAM,

1989 pp.1-12.

[3] J. Adams, '"Fortran Subprograms for Finite Difference Formula," J. Comp.

Phys.,Vol 26, Jan 1978, pp. 113-116

[4] J. Adams, P. Swarztrauber, R. Sweet, '"Efficient Fortran Subprograms for the

Solution of Elliptic Partial Differential Equations," Elliptic Problem Solvers,

Academic Press, 1982, pp.333-390.

[5] J. Adams, R. Garcia, B. Gross, J. Hack, D. Haidvogel, V. Pizzo, E. Ridley,

"Applications of Multigrid Scientific Software in Atmospheric Science," (in

preparation).

[6] J. Adams, "Recent Enhancements in MUDPACK, A Multigrid Software Package

for Elliptic Partial Differential Equations," Applied Math. and Comp., Vol. 43,

May 1991, pp.79-94.

[7] A. Brandt, 'Multi-level Adaptive Solutions to Boundary Value Problems," Math.

Comp.,31,1977,pp.333-390.

[8] W. Briggs, "A Multigrid Tutorial," SIAM, Philadelphia,1987.

[9] B. Buzbee, G. Golub, and C. Nielson, "On direct methods for solving Poisson's

equations,"SIAM J. Numer. Anal.,7,1970,pp.627-656.

[101 S. Fulton, R. Ciesielski, and W. Schubert, Multigrid methods for elliptic prob-

lems: a review. Monthly Weather Review 114:943-959 (1986).

-50-

[11] W. Gentzsch, "Vectorization of Computer Programs with Applications to Com-

putational Fluid Dynamics," Vieweg & Sohn, 1984 (246 pages).

[12] W. Hackbush and U. Trottenberg, 'Multigrid Methods," Springer-Verlag, Ber-

lin,1982.

[13] Handbook of Mathematical Functions, National Bureau of Standards Applied

Math. Series 55, p. 884.

[14] D. Jespersen, 'Multigrid Methods for Partial Differential Equations." Studies in

Numerical Analysis, Vol.24, MAA, 1984

[15] J. Mandel and S, Parter, "On the Multigrid F-Cycle," Applied Math. and Comp.,

37, 1990, pp. 19-36.

[16] S. McCormick, 'Multigrid Methods," Vol 3 of SIAM Frontiers Series, SIAM, Phi-

ladelphia, 1987.

[17] V. Pereyra, 'Highly Accurate Numerical Solution of Casilinear Elliptic

Boundary-Value Problems in n Dimensions," Math. Comp.,24, 1970 pp.771-783.

[18] D. Sato, 'PERFMON: The Cray Performance Monitor Utility," SCD UserDoc,

Version 2.0, NCAR,March 1989.

[19] S. Schaffer, 'Higher Order Multigrid Methods," Math. Comp., Vol 43, July 1984,

pp. 89-115.

[201 P. Swarztrauber, "Fast Poisson Solvers," Studies in Numerical Analysis. G.

Golub ed., Math. Assoc., America, 1985, pp. 319-369.

[21] R. Sweet, "A Parallel and Vector Variant of the Cyclic Reduction Algorithm,"

SIAM J. Sci. and Stat. Comp., Vol. 9, July 1988, pp. 761-766.

[22] C. Thole and U. Trottenberg, "Basic Smoothing Procedures for the Multigrid

treatment of Elliptic 3D Operators," Applied Math. and Comp.,19,1986,pp.

333-345.

-51-

Acknowledgements

Steve McCormick introduced the author to the multigrid community and gave

numerous helpful suggestions including the use of planar relaxation with the

three-dimensional solvers. The importance of adjusting discretization

coefficients at coarser grid levels for PDEs with nonzero first-order terms was

pointed out by Klaus Steuben. A conversation with Achi Brandt affirmed that

the default multigrid options in the latest version of MUDPACK are a good

choice and that the use of deferred corrections in obtaining fourth-order approx-

imations with multigrid is a reasonable strategy.

MUltigriD software for elliptic partial differential ...125/datastream/PDF... · elliptic partial...

Documents

Transcript of MUltigriD software for elliptic partial differential ...125/datastream/PDF... · elliptic partial...