u On Nonlinear Elliptic Partial Differential Equations and ...
MUltigriD software for elliptic partial differential ...125/datastream/PDF... · elliptic partial...
Transcript of MUltigriD software for elliptic partial differential ...125/datastream/PDF... · elliptic partial...
NCAR/TN-357+STRNCAR TECHNICAL NOTE
February 1991
MUltigriD Software for Elliptic PartialDifferential Equations
MUDPACK :::::
John C. Adams
* *
* * *
0 0 0 a * * * * 0
* 0 *0 0 0 00 0
* 0 *0 *00 0 0
* . .0 0 0 0 *
* * 0 0 0 0 00
SCIENTIFIC COMPUTING DIVISION
NATIONAL CENTER FOR ATMOSPHERIC RESEARCHBOULDER, COLORADO
* 0 0 0
* 0 0 0 0
* 0 0 0
* * * * *·* · * ·
0 0 0
0
0
0 0 0
I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~II
II
iii
Preface
This Technical Note describes the multigrid package MUDPACK which includes
Fortran subroutines for efficiently approximating the solution to a variety of linear
elliptic partial differential equations. The software was developed over the past three
years by the author at NCAR. Its purpose is to make the complex collection of
integrated numerical procedures known as "multigrid iteration" available in a user-
friendly form to atmospheric scientists and others.
The package is introduced in the first section. The second section outlines some of
the special features of MUDPACK. The third section describes contents of the pack-
age. Section four contains seven examples which illustrate software use and super-
computer performance on a variety of problems. A sample documentation is given in
the appendix.
iv
Foreword
Dr. John Adams has invested a very substantial amount ofeffort in the development of MUDPACK, a general purpose softwarepackage for multigrid solution of elliptic partial differentialequations. It is already in serious demand as a software tool inmany scientific disciplines. I understand that the methods usedare very efficient and robust multigrid algorithms that werecarefully implemented and extensively tested for reliability andaccuracy.
MUDPACK is a very important contribution to the Multigridfield in particular, and computational sciences in general.Multigrid is often perceived as difficult to implement, so it hasbeen slow to find its way into applications software. John's workis therefore an especially important advance in providing thescientific community with access to multigrid technology.
Achi3iandtProfessor, Weizmann Institute
V
Table of Contents
P refa ce ..............................................................................................................................
Foreword ....... ..................................... iv
1. Introduction..........................................................................................................................l
2. Special Features................................................. 3Solving Linear Elliptic PDEs in a Variety of Forms ................................................. 3Handling of General Boundary Conditions...............................................................3Ease of Input of the Continuous Problem .......... ....................................................... 4Automatic Discretization of the Continuous Problem...........................................4Use of Multigrid Iteration to Approximate the Discretization...........................8Selection of Multigrid Options.......................................................10Selection of the Relaxation Method.......................................................11Generating Second- and Fourth-Order Approximations ...................................... 13Flexibility in Choosing Grid Size................................................................................15Availability of "Hybrid" Multigrid/Direct Method Solvers.................................16Availability of Subroutines to Compute Residuals ............................................... 17No Initial Guess Requirement...................................................................................17Non-initialization Calls ........................................................ 18Error Control...................................................................................................................18Flagging of Errors Involving Input Parameters....... ....... ........................................ 19Output of Exact Minimal Work Space Requirements...........................................19Extensive Documentation and Test Programs.......................................................20
3. M U D PA C K fil es ............................................................................................................... 2File D escription .. ................... .......... ............ .... .. ............ ......................... 21Obtaining -Files............................................22Selecting Solvers....................................................... 22
4. Examples........................................................24Example 1. 2-D Separable Elliptic Equation........................................................ 24Example 2. 2-D Nonseparable Elliptic Equation...................................................27Example 3. Helmholtz Equation on the Sphere (One Degree Grid)...................29Example 4. 3-D Separable Elliptic Equation ................ ........ .................................... 31Example 5. 3-D Helmholtz Equation in Spherical Coordinates ........................... 33Example 6. 3-D Anisotropic Elliptic Equation..........................................................34Example 7. Asymmetric Grid Size................................. ........................... 35
A pp endix ...... ................. ................................ . ...... ..... .... ........... ... ... ... .. 37
References....................................................... 49
A cknow ledgem ents .......................................... ...... . ................................. 51
-1-
1. Introduction
MUDPACK is a collection of portable Fortran 77 subprograms, vectorized on Cray
computers, that efficiently solve linear elliptic partial differential equations (PDEs)
using multigrid iteration. The package was created to make multigrid methods
available in user friendly form. It is similar in format to the separable elliptic pack-
age FISHPAK ([4]). MUDPACK extends the domain of solvable problems to include
both separable and nonseparable PDEs. Detailed descriptions of the current and an
earlier version of MUDPACK are given in [1,6]. Some of this is repeated in the
technical note.
Multigrid iteration ([7,8,10,12,14,16]) combines classical iterative techniques, such as
Gauss-Seidel line or point relaxation, with subgrid refinement procedures to yield a
method superior to the iterative techniques alone. By iterating and transferring
approximations and corrections at subgrid levels, it can achieve a good initial guess
and rapid convergence at the fine grid level. Multigrid iteration requires less storage
and computation than direct methods for nonseparable elliptic PDEs and is competi-
tive with direct methods such as cyclic reduction ([4,9,20,21]) for separable equations.
In particular, three-dimensional problems can often be handled at reasonable compu-
tational cost.
The generality of the equations solved by MUDPACK may sometimes result in loss
of efficiency since hand-tailored coding is required to achieve optimal performance for
certain problems. It is hoped that this is compensated for by the package's ease of
use, applicability to a wide range of real problems (including those typically encoun-
tered in geophysical fluid dynamics at NCAR ([5]), and avoidance of repeated "re-
inventions of the wheel." Savings in code development time can be at least as impor-
tant as economic use of machine cycles. With careful selection of relaxation and
multigrid parameters, near optimal performance can often be obtained using MUD-
PACK software. See the examples in this document and in [2,6] for a variety of
problems where discretization level error (i.e., the same error level that a direct
method will reach in approximating the continuous solution) is reached in only one
full multigrid cycle using MUDPACK solvers.
-3-
2. Special Features of MUDPACK
Some of the special features MUDPACK software brings to bear in approximating
the continuous linear elliptic PDE (in operator form)
l(u) - f (a)
are listed and discussed in this section.
Solving Linear Elliptic PDEs in a Variety of Forms
These forms include real and complex, two- and three-dimensional, self-adjoint, and
separable and nonseparable. For example, the most general two dimensional PDE
solved can have the form:
a(x,y) 2u l/ax2 + b(x,y)82u /laxy + c(x,y)02u/ay + (b)d(x,y)9u/Ox + e(x,y)9u/9y + f(x,y)u(x,y) = r(x,y)
on the region A <_x <B, C < y <D. The coefficients a, b, c, d, e, f; the right
hand side r; and u are all real or complex valued functions of the independent vari-
ables x, y.
The solution regions are rectangular in the sense that the domain of each indepen-
dent variable must be a bounded interval on the real line. This means that curvi-
linear coordinate systems such as spherical or cylindrical coordinates are acceptable.
The codes are not restricted to Cartesian coordinates (see Examples 3 and 5).
Handling of General Boundary Conditions
Any combination of periodic, specified (Dirichlet), and mixed derivative boundary
conditions is allowed. For example, in (b) u(x,y) can satisfy any one of the following
boundary conditions at x = A:
(1) u(x+B-A,y) = u(x,y) for all x, y (periodic in x)
(2) u(A,y) is specified for all y (Dirichlet)
-4-
(3) a(y)8u/Ox + ly)9u/Oy + K(y)u(A,y) = g(y) for all y
Similar boundary conditions, in any combination, are allowed at the other boun-
daries. Pure tangential derivatives (e.g., a(y)=0 at any point in (3)) are not
allowed and result in a fatal error flag.
Ease of Input of the Continuous Problem
User defined input subroutines are the mechanisms for passing PDE coefficients and
boundary conditions. For example, a SUBROUTINE COF(X,Y,A,B,C,D,E,F) is used
to input the coefficients in (b) at any grid point (x,y). A SUBROUTINE
BNDY(KBDYXORY,ALFA,BETA,GAMA,GBDY) is used to input oblique mixed
derivative conditions for (b). KBDY=1,2,3, and 4 identify the x = A, x = B,
y = C, and y = D boundaries. For example, if BNDY is called with KBDY=1 then
XORY will input the current y value and ALFA, BETA, GAMA, GBDY should out-
put values for a(y), 2y), (y), g(y).
Automatic Discretization of the Continuous Problem
The discretization is transparent to a user who only needs to supply the PDE, boun-
dary conditions, and grid size information. Standard second-order finite difference
formula, on a uniform grid G superimposed on the solution region, are used to
approximate the partial derivatives in (a). The result is a linear system of equations
L U=F (c)
where the coefficient matrix L is block tridiagonal. The coefficients multiplying the
second partial derivatives in the PDE are adjusted during discretization at coarser
grid levels if there are nonzero first-order coefficients which would destroy the diago-
nal dominance of L. This helps preserve convergence of the relaxation schemes.
The internal discretization process is illustrated by outlining part of it for the PDE in
(b). Assume that an equally spaced n by m grid is superimposed on the rectangle
-5-
[A,B] X [C,D] and let
Ax - (B-A)/(n -1); Ay = (D-C)/(m -1)
be the grid increments in the x and y direction respectively. Then (x(i),y(j)) is the
solution grid where x(i) = A + (i-1)Ax for i = 1,...,n and y(j) = C + (j-1)Ay
for j=l,...,m Let U(i,j) approximate the continuous solution u(x(i),y(j)) . The
standard second order finite difference formulas
a2u /ax2 = (U(i ++l,j)-2 U(i,j)+U(i-l,j))/x 2
2u /lx8y = (U(i+l,j+l)+U(i-l,j-l)-U(i+l,j-l)-U(i-l,j+l))/4Ax Ay
2u /8y 2 = (U(i,j+l)-2U(i,j)+U(i-l,j))/Ay2
au/lx = (U(i+l.j)-U(i-l,j))/2Ax
au/y -= (U(i,j+l)-U(i,j-1))/2Ay
are substituted into (b) on the interior. This gives the 9 point stencil
cl(i,j)U(i+l,j) + c 2 (i,j)U(i+l,j+1) + c 3(i,j)U(i,j+1) +
c 4 (i,j)U(i-l,j+1) + c 5(i,j)U(i-l,j) + c 6 (i,j)U(i-l,j-1) +
c 7 (i,j)U(i,j-1) + c 8 (i,j)U(i+l,j-1) + c 9(i,j)U(i,j) = r(ij)
where
(x,y) = (x(i),y(j))r(i,j) r(x,y)
cl(i,j) = a(x,y)/Ax 2 + d(x,y)/2Ax
c 2 (ij) = b(x,y)/4AxAy
C3 (i,j) = c(x,y)/Ay2 + e(x,y)/2Ay
C4(i,j) = -b(x,y)/4AxAy
c 5 (i',j) = a(xy)/ x 2 - d(x,y)/2Ax
c 6 (ij) = b(x,y)/4AxAy
c 7(i,j) = c(x,y)/Ay 2 - e(x,y)/2Ay
c 8(i,j)-b (x,y)/4Ax Ay
c 9 (i,j) = f(x,y) - 2(a(x,y)/Ax2 + C(xy)/Ay)
for 1 < i < n and 1 < j < m. The "virtual" points
x(0) =A-Ax, y(0)=C--Ay
are used to center difference formula along the boundaries x =A, y = C. For
example, if u(x,y) satisfies the corner mixed derivative boundary conditions:
CO A(Y)U/8X +3A (Y)Ou/ay + A (Y)u(A,Y) = gA (Y) (at x A)
cec(z)u/ax + fc()xQu/Qy + ^y(x)u(xC) = g (x) (at y = C)
then the second order finite difference approximations
au/ x = (U(2,j)-U(O,j))/2Ax (x = A)
au/8y = (U(i,2)-U(i,0))/2Ay (y - C)
are used. This permits the elimination of the unknowns U(i,0), U(O,j) from the
discretization of (b) and gives a 2 by 2 system for U(1,0), U(0,1). The latter will
lead to a division by zero if the determinant quantity
oA (C)f/C(A )-a(A )fA (C) =0.
Consequently this is flagged as a fatal singular error condition. At the nonspecified
corner (A ,C), the special second order formula,
2u /Oxay = ( U(2,1)+U(O,1)+U(1,O)+U(1,2)-
2U(1,1)-U(2,2)-U(2,0))/2Ax A y
which eliminates the need to compute U(0,0), is used ([131).
If lexicographic ordering is employeed and
U = (U(1,1),...,U(n,l),...,U(l,m),...,U(n,m))
R - (r(l,l),...,r(n,l),...,r(l,m),..,r(n,m))
then the substitutions above give the linear system of equations (c). The same pro-
cess is performed at all subgrid levels (see the next feature). The diagonal elements
of the coefficient matrix at any grid level contain the terms (see c,(ij) above)
-7-
-2(a(x,y)/Ax 2 + c(x,y)/Ay 2)
while the off diagonal elements have the form
a(x,y)/Ax 2 (±) d(x,y)/2Ax
and
c(x,y)/Ay 2 (-) e(x,y)/2Ay
Consequently, if
O.SAx | d(x,y) | > a(x,y)
or
0.5y I e(xy) > c(x,y)
then diagonal dominance of the coefficient matrix may be lost. This is more likely at
coarser grid levels in the presence of nonzero first order terms. If corrective action is
not taken, this can inhibit or even prevent convergence of the multigrid iteration.
Consequently the replacements
a(x,y) = max(O.5Ax | d(x,y) , a(x,y))
and
c(x,y) = max(O.5Ay I e(x,y) |, c(x,y))
are made. If this changes the diagonal element at the finest grid level, a nonfatal
error condition is flagged. It may be remedied by increasing resolution. If effected at
coarser grid levels, this is equivalent to replacing the PDE with a first order approxi-
mation. Convergence and and accuracy at the finest grid level are preserved. If the
zero order term has the wrong sign then this can also destroy diagonal dominance
(e.g., f (x,y) < 0 in (b)). There is currently no correction for this in MUDPACK.
In principle, coarse grid coefficient adjustments similar to those described for nonzero
first order terms could also be made.
-8-
Use of Multigrid Iteration to Approximate the Discretization Equations
This is the essential feature of the MUDPACK software. Introductions to multigrid
iteration are given in [8,10,14]. We give an abbreviated and simplified basic descrip-
tion here. Assume
G(0) < * * < G(s) < ... < G(t)= G
is an ascending chain of subgrids terminating in G and let
L(s) U(s) = F(s)
denote the discretization of the continuous PDE (a) on G(s) for each s=0,...,t. Ordi-
narily G(s) is an "every other point subset" of G(s+l) which includes boundaries.
Let I(s-1,s) denote a "prolongation" operator for transferring grid values from
G(s-1) to G(s) and let I(s,s-1) denote a "restriction" operator for transferring grid
values from G(s) to G(s-1). Linear or cubic interpolation are used in defining
I(s-1,s). The identity restriction mapping is the most obvious choice for I(s,s-1)
but this works poorly in practice. Some form of weighted averaging is necessary.
In steps 1-6 we describe how an initial guess U(s) on the grid G(s) can be inexpen-
sively improved using the subgrid G(s-1). This is called the "coarse grid correction"
algorithm and is the heart of multigrid iteration as implemented in MUDPACK.
(take f(s) = F(s))
Step 1. Perform relaxation sweeps (usually one or two in practice) on
L(s) U(s) = (s)
Step 2. Compute the residual
R(s) = F(s)-L(s) U(s)
(Clearly, if we solve the residual equation L (s) E(s) = R (s) exactly then
U(s) = U(s) + E(s) is the exact solution. So we approximate E(s) economically
using the subgrid G(s-1)).
Step 3. Restrict the residual to G(s-1)
-9-
f (s-1) = I(s,s-1) R(s)
Step 4. "Solve"
L(s-1) E(s-1) = f(s-1)
Step 5. Prolong the correction E(s-1) to G(s) and add to U(s)
U(s) = U(s) + I(s-1,s) E(s-1)
(If I(s-1,s) E(s-1) is a good approximation to E(s) then the result should be an
improved approximation in U(s)).
Step 6. Perform more relaxation sweeps (usually one in practice) on the new
L(s) U(s) = F(s)
With recursion, steps 1-6 can be used within step 4 when "solving" for the correction
term on G(s-1). In this way, all the subgrids G(s-1), G(s-2),..., G(O) are brought
into play in improving the approximation on G(s). When s =0 the linear system
can be solved with a direct method or approximated with the relaxation sweeps in
steps 1 and 6. After step 6, the improved approximation in U(s) can be prolonged to
serve as an initial guess at G(s+1).
Step 7. Prolong U(s) to G(s+1)
U(s +1)=I(s ,s+1) U(sk)
Repeating steps 1-7 for s = 0,1,...,t-1 lifts the approximation to the finest grid level
and is called a Full Multigrid Cycle. Once the fine grid G(t) is reached, steps 1-6
can be repeated with s = t until the desired accuracy is achieved. In practice, con-
vergence to discretization level error should occur in a few (1-3) cycles. Achieving
this rapid convergence requires error free coding and selection of the "correct" under-
lying numerical schemes. For example, the relaxation method used and form of the
prolongation and restriction operators can be crucial in obtaining good algorithm per-
formance.
Optimal multigrid iteration will converge in 0(n 2 ) operations on n by n grids in
- 10-
two-dimensions. Much of the theoretical work has concentrated on proving this
[7,12,16]. By way of comparison, the direct solution of (c) on n by n grids requires
0(n log(n)) operations for separable PDEs with cyclic reduction [9,20,21] and O(n 4 )
operations for nonseparable PDEs with banded Gaussian elimination. The potential
improvement with three-dimensional problems is even greater.
A heuristic description of why multigrid iteration works, focuses on the Fourier com-
ponents of the error in the approximation at the finest grid level. The higher fre-
quencies in the error are rapidly damped by standard relaxation procedures. The
lower frequency components are damped slowly by the fixed fine grid relaxation thus
reducing the overall convergence rate. The transference to coarser grids with mul-
tigrid iteration effectively raises the relative frequency of the remaining fine grid
error components where they too are rapidly damped. The coarsest grid must be
sparse for removal of the lowest frequency fine grid error components with relaxa-
tion.
Selection of Multigrid Options
The current version of MUDPACK [6] has options for implementing variants of mul-
tigrid iteration and default options for those preferring black box solvers. The default
options were chosen for robustness and set cubic prolongation, fully weighted residual
restriction, and W(2,1) cycling. The earlier version of MUDPACK described in [1,2]
only allowed V(2,1) cycling with linear prolongation. This is still available as a possi-
bly more efficient choice for certain problems. More general cycling including F
cycles [15] is also available. The description of the integer vector parameter
MGOPT in the appendix of this document contains more discussion of multigrid
options. The grid schedules in executing one full multigrid cycle (FMG) for both
V(2,1) and W(2,1) cycles are illustrated for a four level grid in figure 1. The number
of relaxation sweeps executed at each grid level is recorded.
-11-
Figure 1
One FMG with V(2,1) Cycles
92 1 level 4
/A /9 2 1_ level 3
/\ / \ /291 _ I9,- 1 level 2
33_________ level 1
One FMG with W(2,1) Cycles
9______2__________________1 lJevel 4
_ _ //\ / \ /\ /
9 , 3 9 .level 229 1 2 3a 1 9 3 9 2 3 1 level 2
/\/ \/ / \/\/ \1\//6 6 1 6i 6B 6 6 6 level 1
Selection of the Relaxation Method
A relaxation menu is provided. It includes vectorized Gauss-Seidel schemes [11] on
alternating points (red/black), lines (in any combination of directions) and planes (for
three-dimensional anisotropic elliptic PDEs [22]). Choice of the correct method for a
particular problem can be crucial. It depends on the relative grid and PDE
coefficient size. Usually this can be pre-determined. Sometimes experimentation is
required. Advice on method selection is given in the documentation. For example,
suppose we wish to solve a three-dimensional problem of the form:
a(x,y, b(,yz)u/y 2 + c(x,y,z)2u/z 2 = r(x,y,z).
Let A,B,C denote the quantities a(x,y,z)/Ax 2 , b(x,y,z)/Ay 2 , c(x,y,z)/Az 2 where
- 12 -
Ax, Ay, Az are the ax, y, z grid increments. Choice of the method proceeds as fol-
lows:
(0) If A ; B ; C then choose point relaxation
(1) If A > > B ~ C then choose line relaxation in the x direction
(2) If B > > A : C then choose line relaxation in the y direction
(3) If C > > B , A then choose line relaxation in the z direction
(4) If A B > > C then choose line relaxations in the x and y directions
(5) If A ~ C > > B then choose line relaxations in the x and z directions
(6) If C ~ B > > A then choose line relaxations in the y and z directions
(7) If A ~ B ; C and these quantities vary considerably then choose
line relaxations in the x and y and z directions
(8) If A > > B > > C or B > > A > > C then choose x-y planar relaxation
(9) If A > > C > > B or C > > A > > B then choose x-z planar relaxation
(10) If C > > B > > A or B > > C > > A then choose y-z planar relaxation
In (8),(9) and (10), point or line relaxation in the plane can be set. Point relaxation is
the least expensive of all methods in cost per multigrid cycle and coefficient storage
and should be used when appropriate. Line relaxation requires computation, factori-
zation and storage of tridiagonal matrices (pentadiagonal if boundary conditions are
periodic). Planar relaxation uses full two-dimensional multigrid cycling on each plane
and is expensive to implement. Nevertheless it solves three-dimensional anisotropic
PDEs anisotropic PDEs which are not otherwise tractable (see Example 6 and [22]).
- 13-
Generating Second- and Fourth-order Approximations
Second-order finite difference approximations are generated on uniform grids super-
imposed on the solution region. These can be improved to fourth-order estimates
using "deferred corrections" ([17]). We briefly review this technique in one dimension.
The extension to higher dimensions is is straightforward. Suppose we wish to solve
the PDE in (a) and have obtained the linear system (c) on a one dimensional grid of
size h. We can solve this to discretization level error using multigrid iteration. The
truncation error
t =F -L u
measures how closely the exact continuous solution satisfies the discretization equa-
tions. Simple Taylor series arguments show that t has the form
t = h2(cu, + du ) + 0(h 4)
where c,d are known coefficients from the elliptic equation. If U satisfies the
discretization equations exactly, then it can be used to generate second-order approxi-
mations to uz and uz . For example if the uniform grid on the solution region
[a,b] is
a = x(1) < < x(i) < < x(n)= b
and U(i) is the approximation at x(i) then the following difference formula can be
used to approximate the third and fourth partial derivatives of u(x):
at x =-a
uzZ = (-5U(1)+18U(2)-24U(3)+14U(4)-3U(5))/2h3 + 0(h 2)
uzzzz = (3U(1)-14 U(2)+26 U(3)-24 U(4)+11 U(5)-2 U(6)/h4 + 0(h 2 )
at x =a +h
ux = (-3 3U(1)+10 U(2)-12 U(3)- U(4)-U(5))/2h3 + O(h 2)
uxzzz= (2 U(1)- U(2)+16 U(3)-14 U(4)+6 U(5)-U(6))/h 4 + O(h 2)
at x-x(i) where a+h < x < b-h
uzzz = (-U(i-2)+2U(i-1)-2 U(i+l)+U(i+2))/(2h3 ) + O(h2)
- 14-
u.xxx = (U(i-2)-4U(i-1)-+6U(i)-U(i+l)+U(i+2))/h4 + O(h2).
Similar difference formulae are used at x = b-h and x = b. These can be obtained
by using the Fortran difference formula generators described in [3]. The necessary
difference equations are encoded in the MUDPACK fourth order solvers. If we
denote all of these by the difference operators
6(U) = uz + O(h )
6(U) = u^ + O(h )
and let
T = h 2(c(U) + d(U)),
then
T=t +O(h 4).
The fourth-order truncation error estimate T is computed and passed down to
coarser grids using weighted averaging. Then one full multigrid cycle is used to solve
the correction equation
L E=-T.
Since E is an O(h4) approximation to the exact error
e =u - U =L-(-t)
it follows that
V =U + E
yields a fourth-order approximation in V. Another related and effective method for
generating higher-order approximations with multigrid is r-extrapolation (see [7]).
The use of higher-order stencils is investigated in [19].
- 15-
Flexibility in Choosing Grid Size
Second- and fourth-order approximations are generated on uniform I by m by n
grids superimposed on boxes in three dimensions or I by m grids superimposed on
rectangles in two dimensions. The grid sizes have the form:
I p2. +1
m -q2j +1k
n =r2 +1
where p, q, and r are integers greater than 1 and i , j , and k are nonnegative
integers. In earlier versions of MUDPACK, i = j = k was required. Since p, q, r
should be small for effective error reduction with multigrid iteration, the old con-
straint was not well suited for asymmetric grid sizes (see Examples 1,5 and 7). An
earlier MUDPACK requirement that p (q, r) must be greater than 2 when line
relaxation in the x (y, z) direction is used and the x (y, z) boundary condition is
periodic has also been removed.
Let G denote the I by m by n fine grid. In MUDPACK, multigrid iteration is
implemented on the ascending chain of grids
G(O) < *.. < G(s) < * * < G(t) G,
where t = max(i,j,k) and each G(s) for s=O,...,t has l(s) by m(s) by n(s) grid
points given by
I(s) = p 2maz (8+i- t'°) + 1
m(s) = q 2 maz(S+j-t,O) +
mn() = g 2 ma(+k-t,O) +n()-= r 2 + 1.
The coarsest grid, G(O) , has p+1 by q+l by r+1 points and should be small as
possible within grid size constraints for effective error reduction with multigrid itera-
tion. Each of p, q and r should each be 2 or a small odd value since even values
-16-
greater than 2 can be reduced by increasing i, j, or k . Large values for p (q, r)
can reduce the convergence rate even if line relaxation in the x (y, z) directions is
chosen (see Example 3). In two-dimensions, larger values for p or q cause no prob-
lem if one of the "hybrid" solvers (discussed below) is used (see Examples 2 and 3).
Availability of "Hybrid" Multigrid/direct Method Solvers
The certainty of direct methods is combined with the efficiency of multigrid iteration
in "hybrid" multigrid/direct method solvers in MUDPACK. This has been done for
two-dimensional nonseparable elliptic equations. Separable PDEs can be approxi-
mated with cyclic reduction [4,9,20,21] if a direct method is required. The hybrid
solvers use block Gaussian elimination whenever the coarsest grid level is encoun-
tered within multigrid cycling. This provides additional grid size flexibility by elim-
inating the usual constrain ththat p,q be smale Examples 2 and 3) and provides a
natural way to compare solutions from direct and iterative schemes. In the extreme
case of I = p+1 and m = q+1, the hybrid codes become direct method solvers. The
use of Gaussian elimination requires approximately 4(p+1)(p+1)(q+1) additional
words of storage if periodic boundary conditions are set in the y direction and
approximately 2(p+1)(p+1)(q+1) additional words of storage if periodic boundary
conditions are not set in the y direction. If there are I = m = n grid points in each
direction, then balance between multigrid iteration, which is an 0(n ) algorithm, and
the direct method, which requires O(p4 ) operations for solution on the coarsest grid,
is roughly achieved when
4 4 /4k 2p -- n ---- n .
This holds when
k = 1og 2(n)/2
grid levels are used before switching to the direct method. Choosing
p «n/2
-17-
will achieve rough parity between the direct and iterative parts of the hybrid algo-
rithm. Larger values for p mean the direct method will dominate the computation
while smaller values will only marginally increase the cost of multigrid iteration.
Availability of Subroutines to Compute Residuals
Subroutines to compute fine grid residual after calling any of the second-order solvers
are provided. The residual measures how well the current approximation satisfies the
linear system of equations coming from the discretization. If we consider the linear
system (c) and let
u be the exact solution to (a) evaluated on G,
U be the exact solution to (c),
V(n) be the approximation to U after n multigrid cycles
then the residual is given by
R(n)=F-L V(n).
The ratios
IIR(n+l) Il/ IIR(n) |
provide a convenient estimate of the convergence rate of multigrid iteration. A com-
mon measure of multigrid efficiency is to achieve discretization level error, defined by
\I U-V(n) • < 1 U-u II,
in one full multigrid cycle with no initial guess [2]. In such cases, one cannot expect
R(1) to be reduced to the level of roundoff error. Consequently, the norm of the
residual is a conservative measure of accuracy which can be wasteful if multigrid
cycles are executed until it reaches roundoff level error.
No Initial Guess Requirement
Unlike the ae ase with classical iterative schemes, initial guesses are not necessary and
should not be supplied unless they are very good (as, for example, when restarting
- 18-
multigrid iteration using an approximation generated earlier). Full multigrid cycling
[7], beginning at the coarsest grid level, should be used when there is no good initial
guess. Caution should be exercised with time-dependent marching problems where
one is tempted to use the previous time step solution u(t) as an initial guess to
u(t+dt). A better choice is to solve for the time correction term e(t,dt) = u(t+dt)-
u(t) using multigrid cycling which commences at the coarsest grid level (an initial
guess of zero or u(t)-u(t-dt) can be set there). If (in operator notation)
1(u(t)) =r(t), l(u(t+dt)) = r(t+dt)
then
l(e(t,dt)) = r(t+dt) -r(t).
and the discretization for e(t,dt) is the same as for u(t). Time dependent boundary
conditions for u(t) and u(t+dt) transfer similarly to e(t,dt). A few digits of accu-
racy in e(t,dt), which is ordinarily much smaller then u(t,dt), yields several addi-
tional digits of accuracy in the final approximation u (t +dt) = u (t)+e (t,dt). Using
this approach to integrate in time will give more accuracy then using u(t) as an ini-
tial guess at the finest grid level.
Non-initialization Calls
Redundant discretization and matrix factorization processes can and should be
bypassed on recalls to the software. For example, this happens when only the right-
hand side array has changed from a previous call or when more multigrid cycles are
needed for additional accuracy.
Error Control
Maximum relative error can be used to monitor convergence. Use of error control is
optional and requires additional storage and computation. If u(n), u(n+l) are the
last two computed iterates and e is the tolerance then
II u(n+1) - u(n) < f I u(n+l) II
- 19-
is the stopping criteria in error control. The number of relaxation sweeps at the
finest grid level should not be large if the multigrid iteration is working correctly
Flagging of Errors involving Input Parameters
This includes detection of singular and/or nonelliptic PDEs. Fatal and nonfatal
errors are flagged. For example, consider the two-dimensional elliptic PDE with
cross derivative term in (b). If f (x,y) = 0 for all (xy) and the boundary conditions
are either pure derivative or periodic then an arbitrary constant plus the solution is
also a solution. This means that the PDE is singular which is flagged as a nonfatal
error. The PDE is flagged as "nonelliptic" if
b(x,y) -4a(x,y)c(x,y) > 0
In either case, the matrix coming from the discretization might be ill-conditioned and
convergence of the iterative scheme is questionable.
Output of Exact Minimal Work Space Requirements
This is especially important with three-dimensional problems where central memory
is easily exhausted. In certain cases (e.g., if no error control is selected) equivalencing
between right hand side, solution, and work arrays is allowed to save storage. The
work length depends on the grid size and relaxation method. Simplified estimation
formulas are given in the documentation and exact requirements are output. For
example, if the PDE in (b) is approximated on a n by m grid then the appropriate
solver requires at most
4 n m (13 + ix +jy)/3
work space locations where
ix = 0 if point or line y relaxation only is used
ix = 3 if line x relaxation is used and u is not periodic in x
ix =5 if line x relaxation is used and u is periodic in x
jy -0 if point or line x relaxation only is used
-20-
jy = 3 if line y relaxation is used and u is not periodic in y
jy = 5 if line y relaxation is used and u is periodic in y
Extensive Documentation and Test Programs
See the next section and the cappendix of this Technical Note. The test programs can
be used when installing the codes on new systems.
- 21-
3. MUDPACK files
As of March 1, 1991 there are 98 MUDPACK documentation, solver, common sub-
routine, residual subroutine, and test program files. Collectively these contain over
100,000 lines of code and documentation. All the code is written in portable Fortran
77 which has been tested on a variety of computers and operating systems. Care was
taken to achieve vectorization on Cray machines.
File Description
The following seven real solvers are central to the software package:
1. MUD2 solves two-dimensional nonseparable elliptic PDEs.
2. MUD2CR solves two-dimensional elliptic PDEs with cross-derivatives.
3. MUD2SA solves two-dimensional nonseparable self-adjoint elliptic PDEs.
4. MUD2SP solves two-dimensional separable elliptic PDEs.
5. MUD3 solves three-dimensional nonseparable elliptic PDEs.
6. MUD3SA solves three-dimensional nonseparable self-adjoint elliptic PDEs.
7. MUD3SP solves three-dimensional separable elliptic PDEs.
Additional solvers include the hybrid multigrid/direct method codes MUH2 and
MUH2CR. MUD24, MUH24, MUD24CR, MUH24CR, MUD24SP, MUD34, and
MUD34SP are fourth-order solvers corresponding to MUD2, MUH2, MUD2CR,
MUH2CR, MUD2SP, MUD3 and MUD3SP. Second- and fourth-order complex
solvers are identified by replacing the NM" with a "C" in the real solver's names. For
example, CUD24CR produces a fourth-order approximation to the two-dimensional
complex elliptic PDE with a cross-derivative term. More solvers may be added in
the future. A solver's name in lower case followed by ".d" identifies a documentation
file and the name in lower case preceded by a "t" and followed by a ".f" identify a
Fortran test program file. For example, mud3.d contains complete documentation
- 22-
for MUD3 and tmud3.f is a test program illustrating use of MUD3. Users are
encouraged to carefully read the the documentation and study and execute the sam-
ple program associated with the solver wanted. Most of mud3.d is listed in the
appendix of this document.
Routines to compute fine grid residual are resm2.f (for MUD2, MUH2, MUD2SA),
resm2cr.f (for MUD2CR, MUJH2CR), resm2sp.f (for MUD2SP), resm3.f (for MUD3,
MUD3SA), resm3sp.f (for MUD3SP), resc2.f (for CUD2), resc2cr.f (for CUD2CR),
resc2sp.f (for CUD2SP), resc3.f (for CUD3), and resc3sp.f (for CUD3SP).
Obtaining MUDPACK Files:
For proprietary reasons, access to the Fortran source of each solver is not provided.
Instead, each solver is available in relocatable binary form. This saves considerable
computing resources by eliminating lengthy compile times at NCAR. Fortran source
for individual solvers will sometimes be distributed with the understanding that the
codes are not to be modified and/or distributed further and that feedback on their
performance will be provided to the author. All the MUDPACK software is copy-
righted. If you wish to use the codes on other machines, contact John Adams at:
(electronic mail: [email protected], telephone 303-497-1213).
Documentation, fine grid residual and test program files are available via "dsl" (the
distributed software libraries) at NCAR. The user document "Distributed Software
Libraries" has information about access under dsl. This can be obtained by contact-
ing the SCD consulting office. The file "README" in library "mudpack" under dsl
contains a directory for MUDPACK and current information on how to access the
relocatable binary for solvers.
Selecting Solvers
The following "flow chart" can be used in selecting the appropriate second-order
solver for the elliptic PDE to be approximated:
- 23-
(1) If the PDE is complex go to (8) else go to (2)
(2) If the PDE is three-dimensional go to (6) else go to (3)
(3) If the PDE is separable use MUD2SP else go to (4)
(4) If the PDE has a cross derivative use MUD2CR or MUH2CR else go to (5)
(5) If the PDE is self-adjoint use MUD2SA else use MUD2 or MUH2
(6) If the PDE is separable use MUD3SP else go to (7)
(7) If the PDE is self-adjoint use MUD3SA else use MUD3
(8) If the PDE is three-dimensional go to (11) else go to (9)
(9) If the PDE is separable use CUD2SP else go to (10)
(10)If the PDE has a cross derivative use CUD2CR else use CUD2
(11)If the PDE is separable use CUD3SP else use CUD3
The corresponding fourth-order solvers can improve the approximation if the
second-order solver has reached discretization level error [2,6].
- 24-
4. Examples
Use and efficiency of the MUDPACK software is illustrated in two- and three-
dimensional analytic examples which were run on the NCAR CRAY Y-MP8/864
computer. The sensitivity of multigrid iteration to the underlying relaxation method
is emphasized. None of the examples use an initial guess. The required work space
lengths (given in megawords) includes storage of the approximation and right-hand
side arrays. The "once only" initialization time TO (which includes discretization and
matrix factorizations) is separated out from the solution time Ts. The times and
megaflop rates are the result of monitoring one full multigrid cycle (or more if indi-
cated) with the performance monitor PERFMON ([18]) at NCAR. The exact max-
imum error norm
II U-u II
is tabulated where u and U are the exact solution and the approximation evaluated
on the fine grid. If V is the exact solution to the linear system arising from the
discretization then discretization level error is said to have been reached if
1 U - V 11 1• 1 u - v 1 .
Coefficients and boundary conditions are input via simple self-documenting subrou-
tines which must be declared "EXTERNAL" in the calling routines. Some of these
are listed in the examples. More examples demonstrating MUDPACK's applicability
to a wide range of problems in the atmospheric sciences are given in [51. Efficiency is
examined more closely in [2].
Example 1. 2-D Separable Elliptic Equation (high resolution grids)
Solve
(cos2(s)+l)ap/s2 - 2cos(s )sin(s) 8p /os +
O((r9+1)2 p/r)/r - rp(s,r) = f(r,s)
- 25-
on the region 0 < s <27r, 0 < r < 1. Assume p(s,r) is periodic in s and is
specified at r = 0, r = 1. For testing purposes, the exact solution
2 3p(s,r) = sin2(s) (r -r + 1)
is used. The following subroutines will input the PDE coefficients to the solver
MUD2SP at any grid point (s,r).
SUBROUTINE COFS(S,CSS,CS,CES)COSS = COS(S)CSS = COSS*COSS+1.CS = -2.*COSS*SIN(S)CES = 0.0RETURNEND
SUBROUTINE COFR(R,CRR,CR,CER)CRR (R+1)**2CR - 2.(R+1)CER =-RRETURNEND
The grid size parameters p = 3, q = 2, i -j +2 are used to keep spacing roughly
the same in the s and r direction. Experimentation shows that point relaxation with
V(2,1) cycling and cubic prolongation result in a near optimal algorithm. This choice
allows testing on very high resolution grids. The use of line relaxation greatly
increases storage requirements and should be avoided if it is not necessary. One full
multigrid cycle is executed for each grid size. The results are tabulated in Table 1.1.
The error reductions by a factor of four with each doubling of resolution indicate
discretization level error has been reached [2,6]. Improved vectorization accounts for
solution time increases of less than four with grid doubling. The default multigrid
options will produce slightly better error results but at more than twice the execution
times as the runs with V(2,1) cycles.
- 26-
Table 1.1: 2-D Separable Elliptic Equation (MUD2SP)
Grid Size
193 x 33
385 x 65
769 x 129
1537 x 257
3073 x 513
Storage
0.020 Mwords
0.072 Mwords
0.275 Mwords
1.074 Mwords
4.244 Mwords
In the next table the efficiency of MUD24SP
to fourth-order is measured. By default the
in improving the second-order estimate
fourth-order solvers use W(2,1) cycling
with fully weighted residual restrictions and cubic prolongation. This and the calcu-
lation and storage of the truncation error estimate used in "deferred corrections"
[6,17] account for the additional expense.
Table 1.2: 2-D Separable Elliptic Equation (MUD24SP)
Grid Size
193 x 33
385 x 65
769 x 129
1537 x 257
3073 x 513
Storage
0.026 Mwords
0.097 Mwords
0.374 Mwords
1.469 Mwords
5.829 Mwords
If accuracy is measured as a function of computational cost then the advantages of
the fourth-order scheme are obvious. The error on the 193 x 33 grid with the
MUD24SP is less than the error on the 3073 x 513 grid with MUD2SP. As would be
expected with a fourth-order method, there is an error reduction by a factor of 16
with grid doubling. This is true until the error is of order e-10. At this point further
grid refinement gains nothing. Pollution from numerical roundoff error is too great.
Ts
0.01 sec
0.03 sec
0.08 sec
0.26 sec
0.97 sec
Mflop
56
98
140
166
179
Error
0.17e-3
0.44e-4
O.lle-4
0.27e-5
0.68e-6
Ts
0.05 see
0.11 sec
0.27 sec
0.74 sec
2.23 sec
Mflop
27
45
71
102
134
Error
0.48e-7
0.30e-8
0.19e-9
0.16e-10
0.70e-10
-
I
I)
-
-
I
II
-
-{ I
-{
i
- 27-
Example 2. 2-D Nonseparable Elliptic Equation
Solve
2au/aOX2 + 2 ,2u/ly2 -- xyu(xjy) =r(x,y)
on the region 0.5 < x < 1.0, 1.0 < y < 2.0 with boundary conditions
au/lx - yu(0.5,y) g (y) at x - 0.5
u(1,y) is specified for 1 < y < 2
u (x,1) is specified for 0.5 < x < 1
au/ay + x u(x,2) - h(x) at y = 2
The exact solution
u(x,y) = (xy)
is used to set the right-hand side and boundary conditions and to compute the error.
Assume a solution is wanted on a grid as close to 100 by 200 as the the MUDPACK
size constraints allow. Examination of the directory README and documentation
files suggests several options. If a 97 by 193 grid is adequate then the solver MUD2
can be used with p = q =3, i = 6, j = 7 . The hybrid solver MUH2 can provide
a much closer fit with p =q -25, i = 3, j =4 . This yields a 101 by 201 fine
grid and a 26 by 26 coarse grid where Gaussian elimination is used. If an exact grid
fit is mandatory then MUH2 can be used with p = 99, q = 199, i = j = 1. In this
case MUH2 becomes a full direct method on the 100 by 200 grid. All of these are
compared in Table 2.1. Line relaxation in the x direction is used with the iterative
methods along with the default multigrid options. Line relaxation frozen at the finest
grid level (denoted LINEX) is also tested on the 97 by 193 grid. Coefficients and
boundary conditions can be input to MUD2 or MUJH2 with the following two Fortran
subprograms.
- 28-
SUBROUTINE COF(X,Y,CXX,CYY,CX,CY,CE)CXX = Y*YCYY - X*XCX =0.0CY=0.0CE --X*YRETURNEND
SUBROUTINE BC(KBDY,XORYABDY,GBDY)IF (KBDY.EQ.1) THENX =0.5Y=XORYU - (X*Y)**3DUDX = 3.*X*X*Y**3ABDY = -YGBDY = DUDX + ABDY*URETURNEND IFIF (KBDY.EQ.4) THENX =XORYY = 2.0U -(X*Y)**3DUDY = 3.*Y*Y*X**3ABDY= XGBDY = DUDY + ABDY*URETURNEND IFEND
The first three iterative methods in Table 2.1 are allowed to compute until discreti-
zation level error is reached. The required relaxation sweeps at the finest grid level
are recorded under Iter.
Table 2.1 (second-order): 2-D Nonseparable Elliptic Equation
Method (grid) Storage TO Ts Iter Error
MUD2 (97 x 193) 0.252 Mwords 0.08 sec 0.07 sec 3 0.47e-4
LINEX (97 x 193) 0.187 Mwords 0.07 sec 8.90 sec 5,400 0.47e-4
MUH2 (101 x 201) 0.311 Mwords 0.36 sec 0.08 sec 3 0.43e-4
MUH2 (100 x 200) 4.220 Mwords 33.91 sec 0.22 sec n.a. 0.45e-4
MUD2 reaches discretization level error in one full multigrid cycle. LINEX illus-
trates how multigrid can accelerate the convergence of traditional relaxation
- 29-
schemes. Line relaxation frozen at the finest grid requires over 5,000 iterations to
reach the same accuracy achieved in only 3 when used within the multigrid algo-
rithm! The third hybrid method is a reasonably economical combination of Gaussian
elimination and line relaxation on a very close grid fit. The fourth direct method is
very expensive due to storage and factorization of the block tridiagonal coefficient
matrix coming from the discretization. The solution phase with the direct method
involves only a forward and backward matrix vector sweep. Table 2.2 compares the
performance of MUD24 and MUH24 (again used as both an iterative and direct
method) in obtaining fourth-order approximations. Roundoff error and changes in
grid size account for the slight differences in error.
Table 2.2 (fourth-order): 2-D Nonseparable Elliptic Equation
Method Storage Ts Error
MUD24 (97 x 193) 0.439 Mwords 0.08 sec 0.79e-8
MUH24 (101 x 201) 0.514 Mwords 0.09 sec 0.69e-8
MUH24 (100 x 200) 4.420 Mwords 0.26 sec 0.91e-8
Example 3. Helmholtz Equation on the Sphere (one degree grid)
Solve the two-dimensional Helmholtz equation in spherical coordinates
V (o(,0)V(u (,0))) - X(, )u (,0) = f (,0)
on a one degree grid on the full surface of a sphere of radius one (q and 0 are longi-
tude and colatitude). u (4,9) is specified at the poles 0=0, 7r and is periodic in I5.
For testing purposes, we use the exact solution
u(4,q) =- sin2(0)cos(0)cos(Oy)sin(q)
and the coefficient functions
o(4,0) = X(,0) - 3/2 + sin2 (0)cos 2 (q).
- 30-
The exact solution chosen is the restriction of u(x,y,z) = (xyz) to the surface of the
sphere. The self-adjoint form of the PDE suggests using the solver MUD2SA. How-
ever the required one degree grid cannot be fit with small values for p, q. If p =45,
q = 45, i = 4 and j = 3 are selected then an exact grid fit is obtained and multigrid
is implemented on the ascending subgrids of size: 46 x 46; 91 x 46; 181 x 91; 361 x
181. The coarsest grid has too much resolution for efficient solution with MUD2SA
or MUD2. The hybrid code MUH2 circumvents this by using Gaussian elimination
whenever the 46 by 46 grid is encountered within multigrid cycling. The original
PDE must be expanded to input coefficients to MUH2 or MUD2. The following sub-
routine can be used for this:
SUBROUTINE COEF(PHI,THETA,CPP,CTT,CP,CT,CE)SINT = SIN(THETA)IF (ABS(SINT) .GT. 1.E-5) THEN
C NOT AT POLESCOST = COS(THETA)COSP = COS(PHI)CTT = 1.5 + SINT**2*COSP**2CT =-2.*SINT*COST*COSP**2)CPP = (1.5 + SINT**2*COSP**2)/SINT**2CP =--2.*COSP*SINPCE -= -CTTELSE
C AT POLESCTT =1.0CPP 1.0CT =0.CP -=0.CE =0.END IFRETURNEND
Notice that division by zero is avoided at the poles. The coefficient subroutine will
be called but the results will not be used at the poles where u is specified. One full
multigrid cycle with the default multigrid options and line-s relaxation is executed
using both MUD2 and MUH2. MUH24 is used to increase the accuracy of the
approximation generated by MUH2 from second to fourth-order.
-31-
Table 3: Helmholtz Equation on the Sphere (one degree grid)
Method Storage Mflops Ts Error
MUD2 0.618 Mwords 115 0.07 sec 0.52e-1
MUH2 1.256 Mwords 52 0.25 sec 0.24e-4
MUH24 1.909 Mwords 58 0.28 sec 0.22e-8
There is a reduction in the megaflop rate with the hybrid method. Nevertheless, the
results indicate relaxation only is ineffective in reducing error with the high resolution
46 x 46 coarse grid. In fact, if we use the most robust form of two-dimensional relax-
ation (line in both the ( and 9 direction) at all grid levels, then 36 multigrid cycles
and almost 4 seconds of computer time are required to reach discretization level
error with MUD2. This is accomplished in one cycle and 1/4 of a second, when the
coarse grid direct method is used in conjunction with line relaxation at the higher
resolution grids. The fourth-order hybrid method gains four decimal digits of accu-
racy.
Example 4. 3-D Separable Elliptic Equation
Solve
&(e aou/ax)/ax + -(eC lu/ay)/Qy + (e~2du/-z)/az -
(x+y+z)'u(x,y,z) = r(x,y,z)
on the unit cube. Assume the solution is specified (Dirichlet) at all boundaries. The
exact solution
u(x,y,z) = eZYZ
is used for testing. Results are tabulated on three-dimensional grids with 2k + 1
points in each direction for increasing k . The code MUD3SP, which was created to
save work space for separable three-dimensional problems and only allows point
- 32-
relaxation, is used. If a combination of lines or planar relaxation is required then the
one of the nonseparable solvers, MUD3 or MUD3SA, must be used (see Examples 5
and 6). W(1,1) cycles with fully weighted residual restriction and cubic prolongation
are selected.
Table 4.1 3-D Separable Elliptic Equation (MUD3SP)
k Storage Mflop Ts Error
4 0.013 Mwords 25 0.01 sec 0.83e-5
5 0.088 Mwords 49 0.05 sec 0.21e-5
6 0.651 Mwords 86 0.27 sec 0.54e-6
7 4.995 Mwords 131 1.46 sec 0.13e-6
One would expect the execution times to increase a factor of 8 with each doubling of
resolution. MUD3SP does better than this due to enhanced vectorization with higher
resolution, as indicated by the megaflop rates. Since discretization level error was
reached (errors are reduced by a factor of 4 with each doubling of resolution),
MUD34SP can be used to improve accuracy. The fourth-order solver generates an
approximation on the 333 grid with the same error as the second-order approximation
on the 1293 grid.
Table 4.2 3-D Separable Elliptic Equation (MUD34SP)
k Storage Mflop Ts Error
4 0.020 Mwords 19 0.03 sec 0.lle-5
5 0.124 Mwords 35 0.15 sec 0.13e-6
6 0.963 Mwords 62 0.74 sec 0.11e-7
7 7.142 Mwords 97 3.82 sec 0.83e-9
- 33-
Example 5. 3-D Helmholtz Equation in Spherical Coordinates
Solve
V2(u(p,9,)) - u(p,O,q) = (p,A,q).
On the region 0.5 < p < 1, 7r/4 < 0 < 31r/4, 0 < • < 27r. Multiplying the left- and
right-hand sides by p2sin(0) puts it in the following (expanded) form suitable for the
nonseparable self-adjoint solver MUD3SA:
a(p2sin(0)u /lap)/lap + 9(sin(9)9u /la)/9a + (l/sin()89u /la)/8a -
p sin(U)u (p,0,+) = in(o)f (pi9,)
Assume u(p,0,0) is specified at the p, 0 boundaries and is periodic in 0. The exact
solution
u (p,9,A) = sin(0)cos(0)/p
is used for testing. Grid size parameter are p =3, q =r = 2, i = j+l = k.
MUD3SA executes one full multigrid cycle with the default options and with line
relaxation in the p direction. Discretization level error is reached for each grid size
tested.
Table 5: 3-D Helmholtz Equation in Spherical Coordinates (MUD3SA)
Grid Size Storage TO Ts Mflop Error
25 X 9 X 17 0.056 Mwords 0.13 sec 0.02 sec 16 0.17e-2
49 X 17 X 33 0.388 Mwords 0.86 sec 0.09 sec 32 0.40e-3
97 X 33 X 65 2.895 Mwords 6.63 sec 0.39 sec 57 0.99e-4
193 X 65 X 129 22.351 Mwords 51.61 sec 1.91 sec 91 0.25e-4
The growth of TO relative to Ts with increasing resolution (due to nonvectorized
scalar operations in the discretization) underlines the importance of using non-initial
calls when possible.
- 34-
Example 6. 3-D Anisotropic Elliptic Equation
Solve the nonseparable equation
lOyz 0u/ax2 + xz/10 2U/ay2 + lOo10 y 2u/az2 -
(xyz)-u(x,y,z) = r(x,y,z)
on the region 0.5 < x,y,z < 1 with the derivative boundary conditions
au/9x + yzu(1,y,z) = h(y,z) (at x = 1)
au/az + xyu(x,y,1) = g(x,y) (at z = 1)
Assume u(x,y,z) is specified at all other boundaries and use the exact solution
(X,y,z) = (xyz) 2
for testing. Since the exact solution also satisfies the second-order discretization, a
very good approximation (within roundoff error) can be expected. A subroutine for
inputting the derivative boundary conditions is:
SUBROUTINE BND(KBDY,XORY,YORZABDY,GBDY)IF (KBDY.EQ.2) THENX - 1.0Y = XORYZ - YORZU = (X*Y*Z)**2DUDX = 2.*X*(Y*Z)**2ABDY = 1.0GBDY = DUDX + ABDY*URETURNEND IFIF (KBDY.EQ.6) THENX = XORYY -YORZZ = 1.0U = (X*Y*Z)**2DUDZ = 2.*Z*(X*Y)**2ABDY = 1.0GBDY = DUDX + ABDY*URETURNEND IFEND
The solver MUD3 is used on a 97 by 33 by 129 grid. The domination of the z and
(to a lesser extent) x derivative coefficients is amplified in the discretization by
- 35-
choosing more grid points in these directions. The unbalanced coefficients suggest
point or line relaxation will not work well ([22]). To explore this, planar x-z relaxa-
tion, line relaxation in the x and z direction, and point relaxation are all compared
in Table 6. V(2,1) cycles with linear prolongation are used with each relaxation
scheme.
Table 6: 3-D Anisotropic Elliptic Equation (MUD3)
Method Storage Cycles Mflop Ts Error
planar x-z 5.184 Mwords 3 51 9.33 sec 0.16e-10
line x-z 7.150 Mwords 16 86 9.88 sec 0.21e-4
point 4.295 Mwords 38 119 9.54 sec 0.56e-2
Full two-dimensional multigrid cycling with line relaxation in the z direction is used
on each plane within 3-D planar relaxation. This is expensive to implement and
costs more (per three-dimensional cycle) than point or line relaxations. Nevertheless,
planar relaxation requires only three cycles to reach discretization level error for this
anisotropic PDE. The other relaxation methods are allowed to execute cycles until
they have used approximately the same amount of computer time. The results indi-
cate they are not nearly as effective in error reduction. Multigrid iteration should
converge in only a few cycles when used with the "correct" relaxation method.
Example 7. Asymmetric Grid Size
Typically, three-dimensional weather models use more grid points in the horizontal
direction, where the scales are larger, than in the vertical direction. To simulate
this, we use eight times more horizontal than vertical resolution in solving the equa-
tion
(a(x)u ) x + (b(y)uy)y + (c(z)u z = r(x,y,z)
on the region 0 < x,y <10000 km and 0 < z < 10 km. The exact solution
- 36-
u(x,y,z) = e-Z/o
sin(7rx/o) cos(iry/y )
where z0 = 10 and x0 = y- = 2500, is used to set the right-hand side and boundary
conditions and to compute the error. Here we assume u is periodic in x and y,
specified at the lower z boundary, and satisfies the mixed derivative condition
u + u = h(x,y) at z =10.
The PDE coefficients are given by
a (x) = 1 + sin 2(7rx/X 0 )
b(y) = 1 + cos2(7ry/y 0 )
-z/zoc(z) e
Since the PDE is separable MUD3SP should be selected. The default multigrid
options and point relaxation are utilized. Grid size parameters are p = q = r = 3
and i =j = k+3 for increasing k. One full multigrid cycle reaches discretization
level error for each grid size tested.
Table 7 Asymmetric Grid Size (MUD3SP)
Grid
49 X 49 X 7
97 X 97 x 13
193 X 193 X 25
395 X 395 X 49
Storage
0.057 Mwords
0.324 Mwords
2.299 Mwords
17.271 Mwords
Mflop
37
76
128
193
Ts
0.058 sec
0.189 sec
0.829 sec
4.633 sec
Error
0.46e-4
0.12e-4
0.29e-5
0.77e-6
II
-
I
II
- 37-
Appendix
Major portions of the documentation file mud3.d for the MUDPACK solver MUD3
are listed in this appendix. Omitted sections are indicated with dots. Documenta-
tion files for the other solvers have a similar format.
CC SUBROUTINE MUD3(IPARM,FPARM,WORKCOEF,BNDYC,RHS,PHIMGOPT,IERROR)CCC COMPLETE DOCUMENTATION FOR MUD3 IS GIVEN BELOW. A SAMPLEC FORTRAN DRIVER IS FILE "TMUD3" ON LIBRARY "MUDPACK".CCC REQUIRED FILESCC MUD3COMC MUD3LN, MUDFAC (IF LINE RELAXATION(S) IS (ARE) USED)C MUD3PN (IF PLANAR RELAXATION IS USED)C MUDFAC (IF LINE RELAXATION(S) IS (ARE) USED WITHIN MUD3PN)CC PURPOSECC SUBROUTINE MUD3 AUTOMATICALLY DISCRETIZES AND ATTEMPTS TO COMPUTEC THE SECOND ORDER FINITE DIFFERENCE APPROXIMATION TO A THREE-C DIMENSIONAL LINEAR NONSEPARABLE ELLIPTIC PARTIAL DIFFERENTIALC EQUATION ON A BOX. THE APPROXIMATION IS GENERATED ON A UNIFORMC GRID COVERING THE BOX (SEE MESH DESCRIPTION BELOW). BOUNDARYC CONDITIONS MAY BE ANY COMBINATION OF MIXED, SPECIFIED (DIRICHLET)C OR PERIODIC. THE FORM OF THE PDE SOLVED IS ...CC CXX(X,Y,Z)*PXX + CYY(X,Y,Z)*PYY + CZZ(Z,Y,Z)*PZZ +CC CX(X,Y,Z)*PX + CY(X,Y,Z)*PY + CZ(X,Y,Z)*PZ +CC CE(X,Y,Z)*P(X,Y,Z) = R(X,Y,Z)CC HERE CXX,CYY,CZZ,CX,CY,CZ,CE ARE THE KNOWN REAL COEFFICIENTSC OF THE PDE; PXX,PYYPZZ,PX,PY,PZ ARE THE SECOND AND FIRSTC PARTIAL DERIVATIVES OF THE UNKNOWN SOLUTION FUNCTION P(X,Y,Z)C WITH RESPECT TO THE INDEPENDENT VARIABLES X,Y,Z; R(X,Y,Z) ISC IS THE KNOWN REAL RIGHT HAND SIDE OF THE ELLIPTIC PDE.CCC MESH DESCRIPTION ...CC THE APPROXIMATION IS GENERATED ON A UNIFORM NX BY NY BY NZ GRID.C THE GRID IS SUPERIMPOSED ON THE RECTANGULAR SOLUTION REGIONC
- 38-
C [XAB] X [YC,YD] X [ZEZF].
C AUTHOR AND SPECIALISTCC JOHN C. ADAMS (NCAR-1990)CCC PORTABILITYCC MUD3 ADHERES TO FORTRAN-77 STANDARDS
C PARAMETER DESCRIPTIONCCC~~**«*********4**********«**********************************i***<********
C INPUT PARAMETERSCCCC IPARMCC AN INTEGER VECTOR OF LENGTH 23 USED TO EFFICIENTLY PASSC INTEGER PARAMETERS. IPARM IS SET INTERNALLY IN MUD3C AND DEFINED AS FOLLOWS ...CCC INTL=IPARM(1)CC AN INITIALIZATION PARAMETER. INTL =0 MUST BE INPUTC ON AN INITIAL CALL TO MUD3. IN THIS CASE FULL DISCRETIZATIONC OF THE PDE WILL BE PERFORMED. INTL.NE.0 SHOULD BE INPUTC IF MUD3 HAS BEEN CALLED PREVIOUSLY AND ONLY THE VALUESC IN RHS (SEE BELOW) OR GBDY (SEE BNDYC BELOW) OR PHIC (SEE BELOW) HAVE CHANGED. THIS WILL BYPASS DISCRETIZATIONC AND SAVE TIME. MUD3 MUST BE CALLED WITH INTL=IPARM(1)=0 IFC ANY OTHER PARAMETERS HAVE CHANGED FROM THE PREVIOUS CALL.CCC NXA=IPARM(2)CC FLAGS BOUNDARY CONDITIONS ON THE (Y,Z) PLANE X=XACC = 0 IF P(X,Y,Z) IS PERIODIC IN X ON [XAXB]C (I.E., P(X+XB-XA,Y,Z) = P(X,Y,Z) FOR ALL X,Y,Z)CC = 1 IF P(XA,Y,Z) IS SPECIFIED (THIS MUST BE INPUT THRU PHI(1,J,K))CCC
= 2 IF THERE ARE MIXED DERIVATIVE BOUNDARY CONDITIONS AT X=XA(SEE 'BNDYC" DESCRIPTION BELOW WHERE KBDY = 1)
f��ff�ffff�f�f��jt������ff�fJfff�SS�fftt
- 39-
C NZF=IPARM(7)CC FLAGS BOUNDARY CONDITIONS ON THE (X,Y) PLANE Z=ZFCC = 0 IF P(X,Y,Z) IS PERIODIC IN Z ON [ZE,ZF]C (I.E., P(X,Y,Z+ZF-ZE) = P(X,Y,Z) FOR ALL X,Y,ZCC = 1 IF P(X,Y,ZF) IS SPECIFIED (THIS MUST BE INPUT THRU PHI(I,J,NZ))CC = 2 IF THERE ARE MIXED DERIVATIVE BOUNDARY CONDITIONS AT Z=ZFC (SEE 'BNDYC" DESCRIPTION BELOW WHERE KBDY = 6)CCC GRID SIZE PARAMETERSCCC IP = IPARM(8)CCCCCCCCCCCCCCCCCCCCCCCCCC
AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBEROF GRID POINTS IN THE X DIRECTION (SEE NX = IPARM(14)). "IXP+1"IS THE NUMBER OF POINTS ON THE COARSEST X GRID VISITED DURINGMULTIGRID CYCLING. IXP SHOULD BE CHOSEN AS SMALL AS POSSIBLE.RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,ESPECIALLY IF LINE RELAXATION IN THE X DIRECTION IS NOT USED.IF IXP > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF IXP CAN BE REMOVED BY INCREASING IEX = IPARM(11)WITHOUT CHANGING NX =IPARM(14)
JYQ = IPARM(9)
AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Y DIRECTION (SEE NY = IPARM(15)). "JYQ+1"IS THE NUMBER OF POINTS ON THE COARSEST Y GRID VISITED DURINGMULTIGRID CYCLING. JYQ SHOULD BE CHOSEN AS SMALL AS POSSIBLE.RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,ESPECIALLY IF LINE RELAXATION IN THE Y DIRECTION IS NOT USED.IF JYQ > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF JYQ CAN BE REMOVED BY INCREASING JEY = IPARM(12)WITHOUT CHANGING NY = IPARM(15)
CC KZR = IPARM(10)CC AN INTEGER GREATER THAN ONE WHICH IS USED IN DEFINING THE NUMBERC OF GRID POINTS IN THE Z DIRECTION (SEE NZ = IPARM(16)). "KZR+1"C IS THE NUMBER OF POINTS ON THE COARSEST Z GRID VISITED DURINGC MULTIGRID CYCLING. KZR SHOULD BE CHOSEN AS SMALL AS POSSIBLE.C RECOMMENDED VALUES ARE THE SMALL PRIMES 2 OR 3 OR (POSSIBLY) 5.C LARGER VALUES CAN REDUCE MULTIGRID CONVERGENCE RATES CONSIDERABLY,
- 40-
CCccCCCIEDCCCCCCCCCCJE.CCCCCCCCCCKECCCCCCCCCC NXCCCCCc
ESPECIALLY IF LINE RELAXATION IN THE Z DIRECTION IS NOT USED.IF KZR > 2 THEN IT SHOULD BE 2 OR A SMALL ODD VALUE SINCE A POWEROF 2 FACTOR OF KZR CAN BE REMOVED BY INCREASING KEZ = IPARM(13)WITHOUT CHANGING NZ = IPARM(16)
C IP= ARM(11)
A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE X DIRECTION (SEE NX = IPARM(14)).IEX .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,IEX SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND IXP=IPARM(8)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NX = IPARM(14).
Y = IPARM(12)
A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Y DIRECTION (SEE NY = IPARM(15)).JEY .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,JEY SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND JYQ=IPARM(9)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NY = IPARM(15).
;Z =IPARM(13)
A POSITIVE INTEGER EXPONENT OF 2 USED IN DEFINING THE NUMBEROF GRID POINTS IN THE Z DIRECTION (SEE NZ = IPARM(16)).KEZ .LE. 50 IS REQUIRED. FOR EFFICIENT MULTIGRID CYCLING,KEZ SHOULD BE CHOSEN AS LARGE AS POSSIBLE AND KZR=IPARM(10)AS SMALL AS POSSIBLE WITHIN GRID SIZE CONSTRAINTS WHENDEFINING NZ = IPARM(16).
:=IPARM(14)
THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [XAXB](INCLUDING THE BOUNDARIES). NX MUST HAVE THE FORM
NX = DCP*(2**(IEX-1)) + 1
C WHERE DCIP = IPARM(8), IEX = IPARM(1).CCC NY = IPARM(15)CC THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [YC,YD]C (INCLUDING THE BOUNDARIES). NY MUST HAVE THE FORM:CC NY = JYQ*(2**(JEY-1)) + 1CC WHERE JYQ = IPARM(9), JEY = IPARM(12).
-41-
CCC NZ = IPARM(16)CC THE NUMBER OF EQUALLY SPACED GRID POINTS IN THE INTERVAL [ZE,ZF]C (INCLUDING THE BOUNDARIES). NZ MUST HAVE THE FORMCC NZ = KZR*(2**(KEZ-1)) + 1CC WHERE KZR = IPARM(10), KEZ = IPARM(13)CCC *** EXAMPLECC SUPPOSE A SOLUTION IS WANTED ON A 33 BY 65 BY 97 GRID. THENC IXP=2, JYQ=4, KZR=6 AND IEX=JEY=KEZ=5 COULD BE USED. A BETTERC CHOICE WOULD BE IXP=JYQ=2, KZR=3, AND IEX=5, JEY=KEZ=6.
C IGUESS=IPARM(17)CC = 0 IF NO INITIAL GUESS TO THE PDE IS PROVIDEDC AND/OR FULL MULTIGRID CYCLING BEGINNING AT THEC COARSEST GRID LEVEL IS DESIRED.CC = 1 IF AN INITIAL GUESS TO THE PDE AT THE FINEST GRIDC LEVEL IS PROVIDED IN PHI (SEE BELOW). IN THIS CASEC CYCLING BEGINNING OR RESTARTING AT THE FINEST GRIDC IS INITIATED.
C MAXCY = IPARM(18)CC THE EXACT NUMBER OF CYCLES EXECUTED BETWEEN THE FINESTC (NX BY NY BY NZ) AND THE COARSEST ((DCP+1) BY (JYQ+1) BYC (KZR+1)) GRID LEVELS WHEN TOLMAX=--FPARM(7)=0.0 (NO ERRORC CONTROL). WHEN TOLMAX=FPARM(7).GT.O.O IS INPUT (ERROR CONTROL)C THEN MAXCY IS A LIMIT ON THE NUMBER OF CYCLES BETWEEN THEC FINEST AND COARSEST GRID LEVELS. IN ANY CASE, AT MOSTC MAXCY*(IPRER+IPOST) RELAXATION SWEEPS ARE PERFORMED AT THEC FINEST GRID LEVEL (SEE IPRER=MGOPT(2),IPOST=MGOPT(3) BELOW)C WHEN MULTIGRID ITERATION IS WORKING "CORRECTLY" ONLY A FEWC CYCLES ARE REQUIRED FOR CONVERGENCE. LARGE VALUES FOR MAXCYC SHOULD NOT BE REQUIRED.CCC METHOD = IPARM(19)CCCC
THIS SETS THE METHOD OF RELAXATION (ALL SCHEMES USERELAXATION ON ALTERNATING POINTS OR LINES OR PLANES)
C = 0 FOR GAUSS-SEIDEL POINTWISE RELAXATION
- 42-
CC = 1 FOR LINE RELAXATION IN THE X DIRECTIONCC = 2 FOR LINE RELAXATION IN THE Y DIRECTIONCC = 3 FOR LINE RELAXATION IN THE Z DIRECTIONCC = 4 FOR LINE RELAXATION IN THE X AND Y DIRECTIONCC = 5 FOR LINE RELAXATION IN THE X AND Z DIRECTIONCC =6 FOR LINE RELAXATION IN THE Y AND Z DIRECTIONCC = 7 FOR LINE RELAXATION IN THE X,Y AND Z DIRECTIONCC = 8 FOR X,Y PLANAR RELAXATIONCC = 9 FOR X,Z PLANAR RELAXATIONCC =10 FOR Y,Z PLANAR RELAXATION
C LENGTH = IPARM(21)CC THE LENGTH OF THE WORK SPACE PROVIDED IN VECTOR WORK.
C FPARMCC A FLOATING POINT VECTOR OF LENGTH 8 USED TO EFFICIENTLYC PASS FLOATING POINT PARAMETERS. FPARM IS SET INTERNALLYC IN MUD3 AND DEFINED AS FOLLOWS ...CCC XA=FPARM(1), XB=FPARM(2)CC THE RANGE OF THE X INDEPENDENT VARIABLE. XA MUSTC BE LESS THAN XBCCC YC=FPARM(3), YD=FPARM(4)CC THE RANGE OF THE Y INDEPENDENT VARIABLE. YC MUSTC BE LESS THAN YD.CCC ZE=FPARM(5), ZF=FPARM(6)CC THE RANGE OF THE Z INDEPENDENT VARIABLE. ZE MUSTC BE LESS THAN ZF.CC
- 43-
C TOLMAX = FPARM(5)CC WHEN INPUT POSITIVE, TOLMAX IS A MAXIMUM RELATIVE ERROR TOLERANCEC USED TO TERMINATE THE RELAXATION ITERATIONS ...
CW(CCCCCCCCBNCCCCC
)RK
A ONE DIMENSIONAL ARRAY THAT MUST BE PROVIDED FOR WORK SPACE.SEE LENGTH = IPARM(21). THE VALUES IN WORK MUST BE PRESERVEDIF MUD3 IS CALLED AGAIN WITH INTL=IPARM(1).NE.0 OR IF MUD34IS CALLED TO IMPROVE ACCURACY.
DYC
A SUBROUTINE WITH PARAMETERS (KBDYXORY,YORZALFA,GBDY).WHICH ARE USED TO INPUT MIXED BOUNDARY CONDITIONS TO MUD3.THE BOUNDARIES ARE NUMBERED ONE THRU SIX AND THE FORM OFCONDITIONS ARE DESCRIBED BELOW.
CCC (1) THE KBDY=1 BOUNDARYCC THIS IS THE (Y,Z) PLANE X=XA WHERE NXA=IPARM(2) = 2 FLAGSC A MIXED BOUNDARY CONDITION OF THE FORMCC DP/DX + ALFXA(Y,Z)*P(XA,Y,Z) = GBDXA(Y,Z)CC IN THIS CASE KBDY=1XORY=Y,YORZ=Z WILL BE INPUT TO BNDYC ANDC ALFA,GBDY CORRESPONDING TO ALFXA(Y,Z),GBDXA(Y,Z) MUST BE RETURNED.
C (6) THE KBDY=6 BOUNDARYCC THIS IS THE (X,Y) PLANE Z=ZF WHERE NZF=IPARM(7) = 2 FLAGSC A MIXED BOUNDARY CONDITION OF THE FORMCC DP/DZ + ALFZF(Y,Z)*P(X,Y,ZF) = GBDZF(X,Y)CC IN THIS CASE KBDY=6,XORY=X,YORZ=Y WILL BE INPUT TO BNDYC ANDC ALFA,GBDY CORRESPONDING TO ALFZF(X,Y),GBDZF(X,Y) MUST BE RETURNED.
C COEFCC A SUBROUTINE WITH PARAMETERS (X,Y,Z,CXX,CYY,CZZ,CX,CY,CZ,CE)C WHICH PROVIDES THE KNOWN REAL COEFFICIENTS FOR THE ELLIPTIC PDEC AT ANY GRID POINT (X,Y,Z). THE NAME CHOSEN IN THE CALLING ROUTINEC MAY BE DIFFERENT WHERE THE COEFFICIENT ROUTINE MUST BE DECLAREDC EXTERNAL.
- 44 -
CCCRHSCC AN ARRAY DIMENSIONED NX BY NY BY NZ WHICH CONTAINSC THE GIVEN RIGHT HAND SIDE VALUES ON THE UNIFORM 3-D MESH.C RHS(I,J,J) = R(XI,YJ,ZK) FOR I=1,...,NX AND J=1,...,NYC AND K=1,...,NZ. RHS CAN BE EQUIVALENCED WITH THE "1+8*NX*NY*NZ"C WORD OF "WORK" IN THE PROGRAM CALLING MUD3 TO SAVE SPACE IFC AND ONLY IF POINT RELAXATION (METHOD=IPARM(17)=0) IS CHOSEN.C IF RHS IS EQUIVALENCED WITH ANY OTHER WORD OF WORK OR IFC EQUIVALENCING IS USED WHEN METHOD.NE.0 THEN AN UNDETECTABLEC ERROR WILL RESULT.CC* WARNINGCC VALUES IN THE ARRAY RHS ARE DESTROYED BY MUD3 IF EQUIVALENCINGC WITH WORK IS USED WHEN METHOD = 0 OR IF METHOD > 0. VALUESC IN RHS ARE PRESERVED ONLY IF METHOD = 0 AND EQUIVALENCING WITHC WORK IS NOT USED.CCC PHICC AN ARRAY DIMENSIONED NX BY NY BY NZ . ON INPUT PHI MUSTC CONTAIN SPECIFIED BOUNDARY VALUES AND AN INITIAL GUESSC TO THE SOLUTION IF FLAGGED (SEE IGUESS=IPARM(17)=1l). FORC EXAMPLE, IF NYD=IPARM(5)=1 THEN PHI(I,NY,K) MUST BE SETC EQUAL TO P(XI,YD,ZK) FOR I=1,...,NX AND K=1,...,NZ PRIOR TOC CALLING MUD3. THE SPECIFIED VALUES ARE PRESERVED BY MUD3.C PHI CAN BE EQUIVALENCED WITH THE FIRST WORD OF WORK IN THEC PROGRAM CALLING MUD3 TO SAVE SPACE IF ERROR CONTROL IS NOTC SELECTED (TOLMAX=FPARM(7)=0.0 IS INPUT). EQUIVALENCING PHIC WITH WORK WILL CAUSE AN UNDETECTABLE ERROR IF ERROR CONTROLC IS REQUESTED. PHI MUST NOT BE EQUIVALENCED WITH WORK IF MUD34C WILL LATTER BE CALLED TO IMPROVE ACCURACYCC IF NO INITIAL GUESS IS GIVEN (IGUESS=0) THEN PHI MUST STILLC BE INITIALIZED AT NON-DIRICHLET GRID POINTS (THIS IS NOTC CHECKED). THESE VALUES ARE PROJECTED DOWN AND SERVE AS AN INITIALC GUESS TO THE PDE AT THE COARSEST GRID LEVEL. SET PHI TO 0.0 ATC NONDIRICHLET GRID POINTS IF NOTHING BETTER IS AVAILABLE.CCC MGOPTCC AN INTEGER VECTOR OF LENGTH 5 WHICH ALLOWS THE USER TO SELECTC AMONG VARIOUS MULTIGRID OPTIONS. IF MGOPT(1)=0 IS INPUT THENC A DEFAULT SET OF MULTIGRID PARAMETERS (CHOSEN FOR ROBUSTNESS)C WILL BE INTERNALLY SELECTED AND THE REMAINING VALUES IN MGOPTC WILL BE IGNORED. IF MGOPT(1) IS NONZERO THEN THE PARAMETERSC IN MGOPT ARE SET INTERNALLY AND DEFINED AS FOLLOWS: (SEE THEC BASIC COARSE GRID CORRECTION ALGORITHM BELOW)C
- 45-
CCKCCCCCCCCCCCCCCCC IPRCCCCC
YCLE = MGOPT(1)
=-1 IF F CYCLING IS TO BE USED...
= 0 IF DEFAULT MULTIGRID OPTIONS ARE TO BE USED
= 1 IF V CYCLING IS TO BE USED (THE LEAST EXPENSIVE PER CYCLE)
= 2 IF W CYCLING IS TO BE USED (THE DEFAULT)
> 2 IF MORE GENERAL K CYCLING IS TO BE USED(WARNING-VALUES LARGER THAN 1 OR 2 INCREASETHE EXECUTION TIME PER CYCLE CONSIDERABLY ANDRESULT IN THE NON-FATAL ERROR, IERROR = -5)
'ER = MGOPT(2)
THE NUMBER OF PRE-RELAXATION" SWEEPS EXECUTED BEFORE THERESIDUAL IS RESTRICTED AND CYCLING IS INVOKED AT THE NEXTCOARSER GRID LEVEL (DEFAULT VALUE IS 2 WHENEVER MGOPT(1)=0)
C IPOST = MGOPT(3)C
THE NUMBER OF 'OST RELAXATION" SWEEPS EXECUTED AFTER CYCLINGHAS BEEN INVOKED AT THE NEXT COARSER GRID LEVEL AND THE RESIDUALCORRECTION HAS BEEN TRANSFERRED BACK (DEFAULT VALUE IS 1WHENEVER MGOPT(1)=0).
C IRESW = MGOPT(4)CC = 0 IF UNWEIGHTED (IDENTITY) RESIDUAL RESTRICTIONS ARE USED.C * WARNING-ORDINARILY THIS OPTION GIVES VERY POOR RESULTSC WHEN USED WITHIN MULTIGRID ITERATION. IT IS INCLUDED ASC AN OPTION ONLY FOR ALGORITHM EXPERIMENTATION.CC = 1 IF FULLY WEIGHTED RESIDUAL RESTRICTIONS ARE USED (THIS IS THEC DEFAULT VALUE WHENEVER MGOPT(l)=0 AND IS THE MOST ROBUST).CC =2 IF HALF WEIGHTING IS USED WITH RESIDUAL RESTRICTIONS. THISC OPTION REQUIRES LESS COMPUTATION THAN FULL WEIGHTING AND,C WITH RED/BLACK POINT RELAXATION, SOMETIMES GIVES SIMILARC CONVERGENCE RATES. EXPERIENCE HAS SHOWN IT IS NOT AS ROBUSTC AS FULL WEIGHTING AND SHOULD BE USED WITH CAUTION.CCC INTPOL = MGOPT(5)CC = 1 IF MULTILINEAR PROLONGATION (INTERPOLATION) IS USED TOC TRANSFER RESIDUAL CORRECTIONS AND THE PDE APPROXIMATIONC FROM COARSE TO FINE GRIDS WITHIN FULL MULTIGRID CYCLING.CC = 3 IF MULTICUBIC PROLONGATION (INTERPOLATION) IS USED TO
CCCCCC
- 46-
C TRANSFER RESIDUAL CORRECTIONS AND THE PDE APPROXIMATIONC FROM COARSE TO FINE GRIDS WITHIN FULL MULTIGRID CYCLING.C (THIS IS THE DEFAULT VALUE WHENEVER MGOPT(1)=0).CC THE DEFAULT VALUES (2,2,1,1,3) IN THE VECTOR MGOPT WERE CHOSEN FORC ROBUSTNESS. IN SOME CASES V(2,1) CYCLES WITH LINEAR PROLONGATION WILLC GIVE GOOD RESULTS WITH LESS COMPUTATION (ESPECIALLY IN TWO-DIMENSIONS).C THIS WAS THE DEFAULT AND ONLY CHOICE IN AN EARLIER VERSION OF MUDPACKC (SEE [1]) AND CAN BE SET WITH THE INTEGER VECTOR (1,2,1,1,1) IN MGOPT.
C********************************************************************
C*OUTPUT PARAMETERS**************************************************C********************************************************************
CCC IPARM(22)CC ON OUTPUT IPARM(22) CONTAINS THE ACTUAL WORK SPACE LENGTHC REQUIRED FOR THE CURRENT GRID SIZES AND METHOD.CCC IPARM(23)CC IF ERROR CONTROL IS SELECTED (TOLMAX = FPARM(7) .GT. 0.0) THENC ON OUTPUT IPARM(23) CONTAINS THE ACTUAL NUMBER OF CYCLES EXECUTEDC BETWEEN THE COARSEST AND FINEST GRID LEVELS IN OBTAINING THEC APPROXIMATION IN PHI. THE QUANTITY (IPRER+IPOST)*IPARM(23) ISC THE NUMBER OF RELAXATION SWEEPS PERFORMED AT THE FINEST GRID LEVEL.CCC FPARM(8)CC ON OUTPUT FPARM(8) CONTAINS THE FINAL COMPUTED MAXIMUM RELATIVEC DIFFERENCE BETWEEN THE LAST TWO ITERATES AT THE FINEST GRID LEVEL.C FPARM(8) IS COMPUTED ONLY IF THERE IS ERROR CONTROL (TOLMAX.GT.0.)
C WORKCC ON OUTPUT WORK CONTAINS INTERMEDIATE VALUES THAT MUST NOT BEC DESTROYED IF MUD3 IS TO BE CALLED AGAIN WITH IPARM(1)=l ORC IF MUD34 IS TO BE CALLED TO IMPROVE THE ESTIMATE TO FOURTHC ORDER.CCC PHICC ON OUTPUT PHI(I,J,K) CONTAINS THE APPROXIMATION TOC P(XI,YJ,ZK) FOR ALL MESH POINTS I=1,...,NX; J=1,...,NY;C K=1,...,NZ. THE LAST COMPUTED ITERATE IN PHI IS RETURNEDC EVEN IF CONVERGENCE IS NOT OBTAINED (IERROR=-1)
- 47-
CCC IERRORCC AN ERROR FLAG THAT INDICATES INVALID INPUT PARAMETERSC WHEN RETURNED POSITIVE.CCC NON-FATAL ERRORS * * *CC =-5 IF KCYCLE=MGOPT(1) IS GREATER THAN 2. VALUES LARGER THAN 2 RESULTC IN AN ALGORITHM WHICH PROBABLY DOES FAR MORE COMPUTATION THANC NECESSARY. KCYCLE = -1 (F CYCLES) OR KCYCLE = 1 (V CYCLES)C OR KCYCLE = 2 (W CYCLES) SHOULD SUFFICE FOR MOST PROBLEMS. THEC IERROR=-5 FLAG IS OVERRIDDEN BY ANY OTHER FATAL OR NON-FATAL ERROR.CC =-4 IF THERE ARE DOMINANT NONZERO FIRST ORDER TERMS IN THE PDE WHICHC MAKE IT "HYPERBOLIC" AT THE FINEST GRID LEVEL ...C =-3 IF THE PDE IS SINGULAR (ALL BOUNDARY CONDITIONS ARE PERIODIC ORC PURE DERIVATIVE AND CE(X,Y,Z) = 0 FOR ALL (X,Y,Z)). A SOLUTIONC IS ATTEMPTED BUT CONVERGENCE MAY NOT OCCUR DUE TO ILL CONDITIONINGC OF THE DISCRETIZED LINEAR SYSTEM. THE IERROR = -3 FLAG OVERRIDESC THE IERROR = -2 AND IERROR = -1 FLAGS BELOW.CCC =-2 IF THE PDE IS NOT ELLIPTIC AT SOME MESH POINT(S). THIS MEANSC CXX,CYY,CZZ ARE NOT POSITIVE FOR ALL GRID POINTS (X,Y,Z)C IN THIS CASE A SOLUTION IS STILL ATTEMPTED BUT CONVERGENCE MAYC NOT OCCUR DUE TO ILL CONDITIONING OF THE DISCRETIZED LINEARC SYSTEM. THE IERROR = -2 FLAG OVERRIDES THE IERROR = -1C FLAG BELOW.CC =-1 IF CONVERGENCE TO THE TOLERANCE SPECIFED IN TOLMAX=FPARM(7)C IS NOT OBTAINED IN MAXCY=IPARM(18) MULTIGRID CYCLES. IN THIS CASEC THE LAST COMPUTED ITERATE IS STILL RETURNED.CCC NO ERRORS * * *CC = 0 IF THE SOLUTION IS OBTAINEDCCC FATAL ERRORS * * *CC = 1 IF INTL=IPARM(1) IS NONZERO ON AN INITIAL CALL TO MUD3CC =2 IF ANY OF THE BOUNDARY CONDITION FLAGS NXA,NXB,NYC,NYD,NZE,NZFC IN IPARM(2),IPARM(3),IPARM(4),IPARM(5),IPARM(6),IPARM(7)C ARE NOT 0, 1 OR 2. ALSO IF NXA,NXB OR NYC,NYD OR NZE,NZFC ARE NOT PAIRWISE ZERO FOR PERIODIC B.C. (E.G., NXA=0,NXB=2).CC = 3 IF MIN0(IXP,JYQ,KZR) < 2 (IXP=IPARM(8),JYQ=IPARM(9),KZR=IPARM(10))CC = 4 IF ANY OF THE EXPONENTS EX,JEY,KEZ DO NOT LE BETWEEN 1 AND 50
- 48-
C (IEX=IPARM(11),JEY=IPARM(12),KEZ=IPARM(13))CC = 5 IF NX.NE.IP*2**(IEX-1)+ OR NY.NE.JYQ*2**(JEY-1)+1C OR NZ.NE.KZR*2**(KEZ-1)+l (NX=IPARM(14),NY=IPARM(15),NZ=IPARM(16))CC = 6 IF IGUESS = IPARM(17) IS NOT EQUAL TO 0 OR 1CC = 7 IF MAXCY = IPARM(18) < 1CC = 8 IF METHOD.LT.0 OR METHOD.GT.10C ORC METHOD = 8 AND METH2.LT.0 OR METH2.GT.3C ORC METHOD = 9 AND METH2.LT.0 OR METH2.GT.3C ORC METHOD =10 AND METH2.LT.0 OR METH2.GT.3C (METHOD = IPARM(19), METH2=IPARM(20))CC = 9 IF LENGTH = IPARM(21) < IPARM(22) (INSUFFICIENT WORK SPACE)CC =10 IF XA > XB OR YC > YD OR ZE > ZFC (XA=FPARM(1),XB=FPARM(2),YC=FPARM(3),YD=FPARM(4),C ZE=FPARM(5),ZF=FPARM(6))CC =11 IF TOLMAX=FPARM(7) < 0.0CC * * * ERRORS IN SETTING MULTIGRID OPTIONS * * *CC =21 IF KCYCLE=MGOPT(1) < -1 (SEE ALSO IERROR = -5)CC =22 IF IPRER=MGOPT(2) < 1 WHEN KCYCLE IS NONZEROCC =23 IF IPOST=MGOPT(3) < 1 WHEN KCYCLE IS NONZEROCC =24 IF IRESW=MGOPT(4) IS NOT 0,1 OR 2 WHEN KCYCLE IS NONZEROCC =25 IF INTPOL=MGOPT(5) IS NOT 1 OR 3 WHEN KCYCLE IS NONZEROC
***********************************
CC END OF MUD3 DOCUMENTATIONCC*******,**********ii****4:**************
- 49-
References
[1] J. Adams, 'MUDPACK: Multigrid Fortran Software for the Efficient Solution of
Linear Elliptic Partial Differential Equations," Applied Math. and Comput.
Vol.34, Nov 1989, pp.113-146.
[2] J. Adams,'FMG Results with the Multigrid Software Package MUDPACK,"
Proceedings of the Fourth Copper Mountain Conference on Multigrid, SIAM,
1989 pp.1-12.
[3] J. Adams, '"Fortran Subprograms for Finite Difference Formula," J. Comp.
Phys.,Vol 26, Jan 1978, pp. 113-116
[4] J. Adams, P. Swarztrauber, R. Sweet, '"Efficient Fortran Subprograms for the
Solution of Elliptic Partial Differential Equations," Elliptic Problem Solvers,
Academic Press, 1982, pp.333-390.
[5] J. Adams, R. Garcia, B. Gross, J. Hack, D. Haidvogel, V. Pizzo, E. Ridley,
"Applications of Multigrid Scientific Software in Atmospheric Science," (in
preparation).
[6] J. Adams, "Recent Enhancements in MUDPACK, A Multigrid Software Package
for Elliptic Partial Differential Equations," Applied Math. and Comp., Vol. 43,
May 1991, pp.79-94.
[7] A. Brandt, 'Multi-level Adaptive Solutions to Boundary Value Problems," Math.
Comp.,31,1977,pp.333-390.
[8] W. Briggs, "A Multigrid Tutorial," SIAM, Philadelphia,1987.
[9] B. Buzbee, G. Golub, and C. Nielson, "On direct methods for solving Poisson's
equations,"SIAM J. Numer. Anal.,7,1970,pp.627-656.
[101 S. Fulton, R. Ciesielski, and W. Schubert, Multigrid methods for elliptic prob-
lems: a review. Monthly Weather Review 114:943-959 (1986).
-50-
[11] W. Gentzsch, "Vectorization of Computer Programs with Applications to Com-
putational Fluid Dynamics," Vieweg & Sohn, 1984 (246 pages).
[12] W. Hackbush and U. Trottenberg, 'Multigrid Methods," Springer-Verlag, Ber-
lin,1982.
[13] Handbook of Mathematical Functions, National Bureau of Standards Applied
Math. Series 55, p. 884.
[14] D. Jespersen, 'Multigrid Methods for Partial Differential Equations." Studies in
Numerical Analysis, Vol.24, MAA, 1984
[15] J. Mandel and S, Parter, "On the Multigrid F-Cycle," Applied Math. and Comp.,
37, 1990, pp. 19-36.
[16] S. McCormick, 'Multigrid Methods," Vol 3 of SIAM Frontiers Series, SIAM, Phi-
ladelphia, 1987.
[17] V. Pereyra, 'Highly Accurate Numerical Solution of Casilinear Elliptic
Boundary-Value Problems in n Dimensions," Math. Comp.,24, 1970 pp.771-783.
[18] D. Sato, 'PERFMON: The Cray Performance Monitor Utility," SCD UserDoc,
Version 2.0, NCAR,March 1989.
[19] S. Schaffer, 'Higher Order Multigrid Methods," Math. Comp., Vol 43, July 1984,
pp. 89-115.
[201 P. Swarztrauber, "Fast Poisson Solvers," Studies in Numerical Analysis. G.
Golub ed., Math. Assoc., America, 1985, pp. 319-369.
[21] R. Sweet, "A Parallel and Vector Variant of the Cyclic Reduction Algorithm,"
SIAM J. Sci. and Stat. Comp., Vol. 9, July 1988, pp. 761-766.
[22] C. Thole and U. Trottenberg, "Basic Smoothing Procedures for the Multigrid
treatment of Elliptic 3D Operators," Applied Math. and Comp.,19,1986,pp.
333-345.
-51-
Acknowledgements
Steve McCormick introduced the author to the multigrid community and gave
numerous helpful suggestions including the use of planar relaxation with the
three-dimensional solvers. The importance of adjusting discretization
coefficients at coarser grid levels for PDEs with nonzero first-order terms was
pointed out by Klaus Steuben. A conversation with Achi Brandt affirmed that
the default multigrid options in the latest version of MUDPACK are a good
choice and that the use of deferred corrections in obtaining fourth-order approx-
imations with multigrid is a reasonable strategy.