Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

21
Computer methods in applied mechanics end englneerlng ELSEVIER Para He1 preconditioning based on h-hierarchical finite elements w application to acoustics Manish Malhotra, Peter M. Pinsky* Department of Mechanical Engineering, Stanford University, CA 94305, USA Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-l 17 ith Received 27 March 1995; revised 15 October 1996 Abstract In this paper, we consider the hierarchical basis preconditioning approach which is closely related to the h-version of the hierarchical finite element method. In general, finite element formulations that employ hierarchical shape functions yield better conditioned matrix problems than those based on the Lagrange nodal basis. These matrix problems are also better-suited to a faster rate of convergence with Krylov-subspace iterative methods. We consider a purely algebraic approach to describe projections between the nodal and the h-hierarchical bases functions, which are then used to construct the preconditioning operator. Implementation details of the preconditioner are provided for solving finite element problems on unstructured grids. We employ the preconditioning approach in the iterative solution of linear systems that arise from finite element discretization of the exterior Helmholtz problem. Numerical results are presented to examine convergence rates for practical discretizations in acoustics, and to illustrate the computational performance of the preconditioning algorithm on serial and data-parallel computers. 0 1998 Elsevier Science S.A. 1. Introduction Due to excessive growth in computational and storage requirements, direct solution techniques often become expensive for solving large sparse systems of linear equations such as those arising from finite element discretization of partial differential equations. Gradient-type iterative methods (or more precisely, Krylov- subspace iterative methods), which are based on working with sequences of orthogonal vectors, are an attractive alternative in these situations. The conjugate gradient method for solving symmetric linear systems is amongst the most well-known Krylov-subspace iterative methods; for a review of various other Krylov-subspace methods, we refer the reader to [lo]. The convergence of Krylov-subspace methods is related to the spectral properties of the coefficient matrix, and frequently preconditioners are required to accelerate iterative convergence. When considering the solution of finite element equations that arise from second order, elliptic, partial differential equations, it becomes important to assess the effect of mesh refinement and the choice of basis functions on spectral properties, and, therefore, on iterative convergence. In contrast to finite element formulations that employ nodal basis functions, hierarchical finite element approaches are known to yield better conditioned matrix problems [27,5]. In recent years, several precondition- ing techniques that are based on two-level p-hierarchical formulations have been proposed [l&3,6]. The generalization of these techniques to the multilevel case is a non-trivial task. See e.g. [2] for a survey of various two-level and multilevel preconditioning techniques. In this paper we examine a multilevel preconditioning approach proposed in [26] that is related to the h-hierarchical formulation of the finite element method. In this approach, a multilevel splitting of the given finite element mesh is created in order to obtain the preconditioner. Multiple grid levels inherent in such a splitting arise quite naturally in h-adaptive finite element computations if nested mesh refinement is employed. Using these multiple levels, a hierarchical set of linear * Corresponding author. 0045-7825/98/$19.00 0 1998 Elsevier Science S.A. All rights reserved PII SOO45-7825(97)00142-4

Transcript of Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

Page 1: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

Computer methods in applied

mechanics end englneerlng

ELSEVIER

Para He1 preconditioning based on h-hierarchical finite elements w application to acoustics

Manish Malhotra, Peter M. Pinsky* Department of Mechanical Engineering, Stanford University, CA 94305, USA

Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-l 17

ith

Received 27 March 1995; revised 15 October 1996

Abstract

In this paper, we consider the hierarchical basis preconditioning approach which is closely related to the h-version of the hierarchical

finite element method. In general, finite element formulations that employ hierarchical shape functions yield better conditioned matrix

problems than those based on the Lagrange nodal basis. These matrix problems are also better-suited to a faster rate of convergence with

Krylov-subspace iterative methods. We consider a purely algebraic approach to describe projections between the nodal and the h-hierarchical

bases functions, which are then used to construct the preconditioning operator. Implementation details of the preconditioner are provided for

solving finite element problems on unstructured grids.

We employ the preconditioning approach in the iterative solution of linear systems that arise from finite element discretization of the

exterior Helmholtz problem. Numerical results are presented to examine convergence rates for practical discretizations in acoustics, and to

illustrate the computational performance of the preconditioning algorithm on serial and data-parallel computers. 0 1998 Elsevier Science

S.A.

1. Introduction

Due to excessive growth in computational and storage requirements, direct solution techniques often become expensive for solving large sparse systems of linear equations such as those arising from finite element discretization of partial differential equations. Gradient-type iterative methods (or more precisely, Krylov- subspace iterative methods), which are based on working with sequences of orthogonal vectors, are an attractive alternative in these situations. The conjugate gradient method for solving symmetric linear systems is amongst the most well-known Krylov-subspace iterative methods; for a review of various other Krylov-subspace methods, we refer the reader to [lo]. The convergence of Krylov-subspace methods is related to the spectral properties of the coefficient matrix, and frequently preconditioners are required to accelerate iterative convergence. When considering the solution of finite element equations that arise from second order, elliptic, partial differential equations, it becomes important to assess the effect of mesh refinement and the choice of basis functions on spectral properties, and, therefore, on iterative convergence.

In contrast to finite element formulations that employ nodal basis functions, hierarchical finite element approaches are known to yield better conditioned matrix problems [27,5]. In recent years, several precondition- ing techniques that are based on two-level p-hierarchical formulations have been proposed [l&3,6]. The generalization of these techniques to the multilevel case is a non-trivial task. See e.g. [2] for a survey of various two-level and multilevel preconditioning techniques. In this paper we examine a multilevel preconditioning

approach proposed in [26] that is related to the h-hierarchical formulation of the finite element method. In this approach, a multilevel splitting of the given finite element mesh is created in order to obtain the

preconditioner. Multiple grid levels inherent in such a splitting arise quite naturally in h-adaptive finite element computations if nested mesh refinement is employed. Using these multiple levels, a hierarchical set of linear

* Corresponding author.

0045-7825/98/$19.00 0 1998 Elsevier Science S.A. All rights reserved

PII SOO45-7825(97)00142-4

Page 2: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

98 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-l I7

basis functions are defined which are equivalent to the usual nodal basis functions on the given mesh. The preconditioner is then obtained by combining transformations between the nodal and the hierarchical basis functions. Preconditioners based on this procedure do not require the inversion of any operators and, therefore, GUI be applied very efficiently with low storage and computational overheads. We describe algorithms, suitable for computations involving unstructured finite element grids, for implementing the preconditioners on serial and distributed memory parallel computers. Algorithmically, the hierarchical basis preconditioner involves opera-

tions that are similar to the well-known V-cycle multigrid scheme. The idea of using a multilevel splitting of the

given finite element mesh has also been considered recently to improve the performance of certain element-by- element (EBE) preconditioners [ 161 (using so-called companion meshes); see [21] for details. These methods, while algorithmically quite different, share qualitative similarities in that they construct multigrid-like

preconditioners for finite element computations of partial differential equations. We use the hierarchical basis preconditioner for solving linear systems that arise in structural acoustics. The

finite element discretization of the Helmholtz equation, which is the governing equation for problems in acoustics, results in linear systems with a non-Hermitian and indefinite coefficient matrix. Additional considerations, which are required to apply the preconditioner for the solution of indefinite problems, are also

discussed. Finally, numerical results are presented to examine convergence rates obtained for practical

discretizations in acoustics, and to illustrate the computational performance of the preconditioning algorithm on serial and parallel computers. The parallel performance of similar multilevel and domain decomposition preconditioners has previously been studied for problems in computational fluid dynamics [8,12,13].

2. Hierarchical finite element approaches

The standard choice in selecting a finite element basis is to employ nodal basis functions consisting of

Lagrange polynomials in one dimension and their tensor products in higher dimensions. Another approach,

alternate to using the nodal basis, is to employ a hierarchic set of basis functions. Table 1 illustrates basis functions associated with h- and p-hierarchical approaches for the simplest case of a one-dimensional domain. The following differences in the two approaches can be observed:

- In the h-hierarchical approach, basis functions of the same polynomial order are introduced in a hierarchic fashion corresponding to nodes in the interior of an element (see column (a) of Table 1). Although not

necessary, it may be useful to consider such interior nodes as those arising from refinement of an initial

mesh. The h-hierarchical approach can also be generalized to higher polynomial orders, as illustrated for

the case of quadratic functions in column (b) of Table 1. - In contrast to the h-version, the p-hierarchical approach involves the introduction of basis functions with

increasingly higher polynomial order (see column (c) of Table 1).

Table 1 h-Hierarchical (a) linear and (b) quadratic shape functions introduced with h-refinement of two and one levels, respectively; (c) p-hierarchic

enrichment of the linear nodal basis to cubic order

h-Hierarchical shape functions (c) p-Hierarchical shape functions

(a) P = 1 (b) P = 2

Page 3: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotru, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 99

1 4 2 5 3 4 5

(4 (b) Fig. 1. (a) Nodal basis consisting of linear Lagrange polynomials for a 1-D mesh; (b) h-hierarchic linear basis functions defined on a

two-level splitting of the mesh.

In this paper, a preconditioning approach that is based on the properties of h-hierarchic, linear finite-element basis functions is considered. For the purpose of describing the preconditioning method, it is useful to consider

h-hierarchical basis functions as being associated with the splitting of a given finite element mesh into a sequence of nested hierarchic grid levels. Fig. 1 illustrates a given one-dimensional mesh and its splitting into two levels. Based on such a splitting into nh levels, the definition of h-hierarchic basis functions can be

formalized as follows. First, let us introduce the set of nodes J%~, k = 1, . . _ , nhr which contains those vertices in

level k that are not present in any of the coarser levels. Further, for each k = 1, . . . , rzh, define

Fk = u Jy i=l

to be the set containing all vertices in level k. For example, corresponding to the case shown in Fig. 1, nh = 2,

N, = (1,2,3} and ;Y; = {4,5}. Also, TI = { 1,2,3} and Y? = { 1,2,3,4,5}. Observe, ( 1) also implies that for

each k=2 ,..., n,,, Y~=.&UF~_,; this recursive relation is crucial to the following definition.

Using the sets J$$ and & defined on a splitting of the finite element mesh, an h-hierarchical basis can be defined as follows: (i) basis functions corresponding to nodes in JY, consist of the usual nodal basis functions,

and (ii) for any level 1 < k C n,,, the hierarchical basis consists of nodal basis functions corresponding to nodes

in Nk together with the hierarchical basis for nodes in Tk_ L. This recursive definition leads to a set of basis functions that are equivalent to the nodal basis. This

equivalence is demonstrated by the invertibility of the transformation matrix between nodal and hierarchical

bases (as will be shown in Section 4.1). We note that, due to their equivalence, hierarchical basis functions also

satisfy the finite-element completeness property (see e.g. [ 15, Chapter 31) characteristic of nodal basis functions.

3. Model boundary value problem

We consider a linear, second-order, elliptic partial differential equation in two dimensions:

L&=-V.(KVU)+~.VU+CU=~ ino, (2)

U=O on r, , (3)

au KG=h on 4 , (4)

with the boundary of the domain 80 = r, U 4. The weak or variational form of the problem is expressed as: Find u E 9, where Y = {u E H ‘(0) 1 u = 0 on c}, such that

a(u,u)=f(u), VUEY, (5)

Page 4: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

100 M. Malhotra, P.M. Pin& I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

a(u,u):= a(~vu.v~+ub.v~+CuU)dY, I (6)

j(u):=/oufdx+j- uhds. G

(7)

After introducing a finite element discretization on the domain Q the approximate form of the variational problem becomes: Find uh E yh, where yh C 9, such that

a(uh, u”) =f(uh) ) v uh E Yh .

By introducing basis functions {&} that span yh in (8), we obtain the discrete system of equations

(8)

Kd =f (9)

where Kij = a(& +j) andf, =f(+j). The key observation here is that the set of basis functions {+i} can be chosen in several different ways. For Krylov-subspace methods, the choice of basis functions becomes important since it affects the eigenvalue distribution of the discrete operator K, and hence the convergence behavior. In fact, for

a self-adjoint and positive operator 3, it is well-known that the spectral condition number of K grows as 6’(K2) when nodal basis functions are used (see e.g. [l]), while with hierarchical basis the condition number is

0(logh-1)2 in two dimensions and O’(K’) in three dimensions (see e.g. [26]). Since the convergence of conjugate gradient iterations [14] is proportional to the square-root of the spectral condition number of the

coefficient matrix, the above estimates suggest that significant reduction in iteration counts may be achieved by using hierarchical basis for two-dimensional problems. While the situation is more complex and less well-

understood from a theoretical standpoint for non self-adjoint and indefinite operators 3, hierarchical basis still

appear to be advantageous in those cases. Unlike nodal shape functions, h-hierarchical shape functions do not possess compact support. Therefore,

interactions between shape functions in the multilevel h-hierarchic basis result in significantly reduced sparsity of the finite element matrices and in related computational complexities. Nevertheless, transformation matrices that relate nodal to hierarchical bases can readily be constructed, and in the next section we describe an

approach due to Yserentant [26], to exploit these transformations to create efficient multilevel preconditioners.

4. h-Hierarchical basis preconditioner

4.1. Transformations from nodal to hierarchical basis

Let the function u(x) E H’ be approximated on a given finite-element mesh with N unknowns using the standard piecewise-continuous linear nodal basis functions, as well as an equivalent set of hierarchical basis

functions defined earlier. Let us denote the column vectors containing nodal and hierarchical basis functions,

respectively, by

G(x) = ]+,(+$&)* * * &&NT and WI = [t4C+,W. . . #&>I’.

Then, the approximated function Us can be represented as

Uh(X) = Z!? a;$i(x) or Uh(X) = ? fii$$O t (10)

i=l i=l

where cui and pi, i = 1, . . . , N, are unknown coefficients (or degrees of freedom) in the respective bases. We are

interested in the linear transformation matrix P E RNxN, such that

P(x) = P@(x) . (11)

We note that, the elements of P are independent of x, but do depend on the topology of the mesh and, in particular, the associated splitting on which hierarchical basis functions are defined.

To motivate the construction of P, we first consider the simplest case of a two-level splitting such as in Fig. 1,

Page 5: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pin& I Comput. Methods Appl. Mech. Engrg. 15.5 (1998) 97-117 101

and examine the inverse relationship @(x) = R,?P(x). For the case of a two-level splitting, it is easy to show that

the nodal and hierarchical basis functions are related as

(12)

If we choose a permutation such that all basis functions corresponding to nodes in 9, are ordered before those

in X,, then the matrix R, assumes a block upper-triangular structure, and (12) can be written compactly as

Z -F @= ;’ z

[ 1 !P. “2

(13)

Here, Z,, and In2 are square identity matrices of sizes m, = dim Y, and n2 = dim J&, respectively. Further, F is

a sparse matrix containing the interaction of hierarchical shape functions between the two levels. More

specifically, F, = I&$,) with i E .T, and k E .A$, where j = (k - rn,). Although we have used a simple one-dimensional setting to introduce the structure of R, , ( 12- 13) also hold for two- and three-dimensional cases

if one replaces x with x E Rd, d = 2 or 3, as appropriate. For generalizing the relation in (13) to the case of multiple levels, recall the definition of the hierarchical basis

functions. Because of the recursive construction of the hierarchical basis, the transformation matrix in the case

of nh > 2 levels can be stated as

@=Rn,_;..R,R,%?

By (11) and (14) we get

- 1, is the transformation matrix between the nodal basis defined on on nodes in Yk and Nk+,. Further, each R, has the block structure

Z -F 0

R, = ii’ Ink+, 0

[ 1 ,

0 0 L;,,

(14)

(15)

nodes in

(16)

where mk = dim Yk, nk+, = dim Nkfl, and m:,, = n - mk+l, We note here that as a consequence of its block upper-triangular structure, the inverse of R, can be determined quite easily. In particular, it is easy to verify

from (16):

4.2. Formulation of the preconditioner

In order to fix ideas on preconditioning, consider the solution of a given system of equations Ax = b using a Krylov-subspace iterative algorithm. This system can be preconditioned using matrices ML and MR, the so-called left and right preconditioners, respectively, in the following way:

M,‘AM,‘M,x = M,‘b . (17)

Now consider the solution of our nodal basis finite element equations, Kd =f. We start by using hierarchical basis functions in the bilinear form a(*, a) of (8). Then, using (15), we introduce nodal basis functions CD in place of ?P to obtain the relation between K and its equivalent representation in the hierarchical basis Z?, as

follows:

d = a(!PT, PT)

= Pa(@j’, @jT)PT

= PKPT . (18)

Page 6: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

102 M. Malhotra, P.M. Pinsky 1 Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

Eqs. (17) and (18) suggest that choosing ML ’ = P and Mi ’ = PT as the left and right preconditioners for K is n equivalent to solving the unpreconditioned system of equations with K as the coefficient matrix. Since linear systems obtained from hierarchical basis functions are better-suited to solve with Krylov-subspace methods, the hierarchical basis preconditioner can be formulated as

M,, = (P’P>_’ . (19)

We remark that construction of the hierarchical basis preconditioner in this form is essentially similar to that in [26]. From (18) and (19), observe that K is never formed explicitly, and all computations are performed in the

usual nodal basis. Therefore, the preconditioner should be easy to integrate within most finite element programs that typically employ nodal basis.

The matrix entries in MHB depend on the topology of the mesh and the choice of the multilevel splitting, but do not have an explicit dependence on coefficients appearing in the differential operator Z. In order to introduce such a dependence, we consider two approaches. The simplest approach is to combine diagonal scaling with the

preconditioner MHB . Since Z? is never computed directly, diag(K) is not readily obtained. Instead, it is simpler, often useful [12], to employ diag(K) for diagonal scaling in the following way:

M nBDS = (PT diag(K)-‘P)-’ . (20)

Another approach to enhance the preconditioner in ( 19) becomes evident if we consider the block form of %

given by

(21)

where an appropriate permutation was used to order all nodes in the coarsest level first. In this form, the block matrix K,, = a(t& t,$), where +i are the basis functions corresponding to nodes i E N,. By (l), 9, = JV,; so 8, 1 is just the nodal basis finite element matrix K, I for the coarsest level, i.e. K, , = K, 1. Based on this observation, we form the preconditioner

M HBCS

0 -Ip -’ 1 > oz ’ (22)

where the operator {ILU(K, ,)}-I represents an approximate solution of unknowns associated with nodes in the

coarsest level.

REMARK 1. It is noteworthy that the preconditioners MHB, M,,,, and M,,,, are not based on a ‘splitting’ or ‘factorization’ of the large sparse coefficient matrix K. This feature can be exploited to obtain highly

storage-efficient matrix-free implementations of a preconditioned Krylov-subspace iterative method. We compare the performance of preconditioners (20) and (22) in the context of such an implementation in our numerical examples.

5. Serial and parallel implementation

The use of a preconditioner in conjunction with a Krylov-subspace iterative method requires solving a secondary system of equations, of the form Mz = r, at each step of the iteration. An attractive feature of

choosing M as one of the preconditioners MHB, M,,,, or MHBCs is that solving for z does not require a matrix inversion, but only matrix-vector products of the form

z=P’y and y=Pr. (23)

As noted by Yserentant [26], these products can be computed directly without the explicit calculation of P and PT, whereby significant savings in storage and computational costs are achieved. This is a key algorithmic feature that enables efficient application hierarchical basis transformations. Next, we describe algorithms required for the efficient application of the preconditioning step in (23) on serial and distributed-memory parallel machines. The data structure employed in these algorithms is suitable for computations on unstructured

grids. For the simpler case of two-dimensional meshes with uniformly refined triangular elements, see [26].

Page 7: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 103

5.1. Multilevel connectivity arrays

The first step is to define an array which contains an efficient representation of the hierarchical levels forming

the multilevel splitting of the given finite element mesh. This is done through an n,+, X nen array ivdual that

contains multilevel h-hierarchic connectivity data. More precisely, the ith row of the integer array ivdual contains the list of II,, mesh vertices in the set Yk _ , that are h-hierarchical to a given vertex i E Nk. As usual,

n “p is the total number of vertices in the finite element mesh, and n,, is the number of element nodes (e.g.

n = 3 for a triangle and n,, = 4 for a quadrilateral or a tetrahedron). A vertex belonging to JV, , or a vertex that

hl nearest neighbors less than nen (e.g. vertices on the mesh boundary), has its corresponding entries in ivdual set to zero. In order to avoid expensive list-searches in the recursive application of transformations that comprise

P, it is useful to order vertices along the longer axis of ivdual in the increasing order of level-sets to which they belong. Accordingly, let the nnp X I array ivperm contain the original location, ivdual(i, .), of the list

ivdual(j, .), i.e. ivperm(j) = i. We also define a nh, X 1 array nvlevl whose kth element nvlevl(k) =

dim Fk. The second step is to construct an efficient representation of the transformation matrix P. This is done through

an n,p X nen single-precision real array htrans, whosejth row contains nen values corresponding to nodal basis

functions associated with the list of vertices in ivdual(ivperm(j), .) evaluated at vertex ivperm(j). Note that the evaluation of htrans involves CC&, X nen) flops. However, since htrans is computed only once, this cost can be amortized over the total number of iterations and is insignificant in practice.

S.2. Implementation on serial computers

The procedure for applying transformations in P and PT to a vector on a sequential computer is described in Algorithm 5.1. The preconditioning step involves O(n,, X nen) operations, and requires an additional memory of nnp X (n,,,, + 1) integer words and nnP X n,, single-precision real words for storing arrays ivdual, ivperm, and

htrans.

ALGORITHM 5. I

Sketch of procedure to evaluate z = P’y and y = Pr

Given II,,, nh,, ivdual, ivperm, htrans and nvlevl, proceed as follows:

I* compute y = Pr *I

/* loop over hierarchical levels */

do l= nh,, . . . ,2 /* loop over vertices i E NL *I

do j = nvlevl(1 - 1) + 1,. . . , nvlevl(1) i=ivperm(j)

I * update y(m), where vertex m E Y,_ , and m is hierarchic to i */ do k = 1,. . . , nen

y( ivdual(j, k))= y(ivdual(j, k))+ htrans(i, k)*r(i)

end do end do

end do I* compute z = P’y *I

/ * loop over hierarchical levels */ do 1 = 2,. . . , nh,

I* loop over vertices i E 4 *I

do j = nvlev(E - 1) + 1, . . . , nvlevl(1) i= ivperm(j)

/* update z(i) from vertices m E q_, , where m is hierarchic to i *I do k = 1, rzen

z(i)= z(i)+ htrans(i, k)* y(ivdual(j, k))

end do end do

end do

Page 8: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

104 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

5.3. Implementation on distributed memory computers

ALGORITHM 5.2

Algorithm for z = P’y and y = Pr on distributed memory computers

Given nen, nh,, ivdual, ivperm, htrans and nvlevl, proceed as follows: I* compute y =Pr *I /* loop over hierarchical levels */ do 1= n,,,, . . . ,2

/ * select vertices i E Nt */

n, = nvlevl(l- l)+ 1 n2 = nvlevl(Z)

I * update y(m), where m E 9’_, and m is hierarchic to i *I

/* step 1: embarrassingly parallel scaling of r(i) */ temp(ivperm(n L :n,), 1 :n,,)= htrans(ivperm(n, :n,), 1 :n,,)*

r(ivperm(n, :n2))

/* step 2: scatter scaled values to destination vertices m */

scatter temp(ivperm(n , : n,), 1 : nen) -+ y(ivdual(n, : n2, 1 : n,,))

end do I* compute z = P’y * I /* loop over hierarchical levels */ do 1 = 2,. . . , n,,,

/* select vertices i E ,llr */

rz, = nvlevl(l- l)+ 1 n2 = nvlevl(l)

/* update z(i) from vertices m E Y[_ 1, where m is hierarchic to i */ /* step 1: gather values from vertices m */ gather y(ivdual(n, :n2, 1: n,,))+ temp(ivperm(n, : n,), 1: nen)

/* step 2: embarrassingly parallel update of z(i) */

do k= ~,...,Iz,”

z(ivperm(n , :n,))= z(ivperm(n, :n,))+

htrans(ivperm(n , :n,),k)* temp(ivperm(n, :n2),k)

end do end do

The implementation of unstructured finite element codes on distributed-memory parallel computers needs to

contend with communication overheads due to inter-processor data transfers. As a result, highly optimized communication libraries have been developed on several parallel computers, such as the CM-5, that provide global-to-local (scatter) and local-to-global (gather) communication primitives typical to computations on

unstructured meshes [ 171. In Algorithm 5.2, a parallel implementation of the serial algorithm described earlier is presented in a form that exploits efficient gather and scatter communication primitives; see [22] for detailed description of these primitives on the CM-5.

Note that Algorithm 5.2 employs the same arrays and data described in Section 5.1. However, it is important to specify that the arrays ivdual, ivperm and htrans are now stored in distributed memory of the machine. An efficient mapping of these arrays is achieved by partitioning the arrays along their longer axis, while retaining the second axis in the local memory of each processor. The overall storage required for the parallel implementation of the preconditioning steps is identical to the serial algorithm. Also, note that Algorithm 5.2

requires nh, sequential steps for computing each of the two products in (23), and therefore exploits the maximum parallelism inherent in the hierarchical approach.

Page 9: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky / Comput. Methods Appl. Mech. Engrg. 155 (1998) 97- II7 105

6. Application to the Helmholtz equation

For illustration of the proposed preconditioning approach, we consider the iterative solution of linear systems

arising from finite element discretization of the Hehnholtz equation. The Helmholtz equation governs the

propagation of waves in a homogeneous, isotropic, inviscid fluid, and is used for modeling problems in underwater structural acoustics. Besides acoustics, some problems in electromagnetic and optical scattering are also modeled using the Helmholtz equation. In particular, we are interested in solving exterior problems in structural acoustics that involve infinite fluid domains. These problems are treated by introducing an artificial

boundary in order to define a finite computational fluid domain, which can then be discretized using finite elements. The artificial boundary has imposed on it a radiation condition that eliminates reflected waves, thereby

making the boundary value problem equivalent to solving the exterior problem.

The acoustics problem outlined above presents a number of interesting features that are challenging for solvers. The recently developed DtN radiation boundary condition [ 191 is a very accurate boundary condition, but involves integrals that couple all degrees of freedom on the artificial boundary. This results in a fully populated matrix whose storage and factorization often becomes expensive. Iterative methods have an advantage in treating this due to their reduced storage requirements. Recently, there is an increasing interest in solving

acoustics problems at high frequencies. Since meshes for high frequencies must adequately resolve the wave forms present, this leads to rapid growth in mesh density and represents an important challenge for iterative

solvers. Also, matrix problems arising in structural acoustics are indefinite and typically become more difficult to solve with increasing frequency in the absence of an effective preconditioner.

6.1. Finite element discretization

The boundary value problem for the radiation and scattering of pressure waves from a rigid body submerged in an infinite acoustic medium is given as follows:

-v2p-k2p=f, in 9,

Vp.n=h, on $,

Vp*n=-S(p), onr,.

(24)

(25)

(26)

In the above equations, p is the unknown acoustic pressure in the computational fluid domain 9, k is the acoustic wave number, f is the acoustic source, h is the prescribed Neumann data, and n denotes the unit outward normal from boundaries; see Fig. 2. Eq. (26) is the radiation boundary condition applied on the

boundary r,, which encloses the rigid body and all acoustic sources. We employ an exact representation of the radiation impedance by choosing S(p) to be the Dirichlet-to-Neumann (DtN) map proposed in [ 191. In two dimensions, the DtN map is given as

cx

S(p) = C’ - r;rR; 1 cos n(0 - @‘)p(R, 0’) de’ , ,,=o ” L

(27)

where R denotes radius of the circular radiation boundary f,, a prime on the sum indicates that a factor of half multiplies the IZ = 0 term, H,(.) is the nth order Hankel function of the first kind and HA(*) its derivative with

respect to the argument. The weak form of the problem is expressed as: Find p E 9’ such that

u(v, p) =f(u) , for all u E Y” , (28)

where

a(~, p) := (Vu, VP) - k2(u, P) + WJ, p>, (29)

Mu, PI : = (u, S(p)),; 1 (30)

f(u) := (u, f) + (u, h),; . (31)

Here, the function spaces Y C H ‘( 9) and ‘V C H’( 9). Also, (.. .) : ‘V X Y + @ is the L2( 9) inner product. Note that the functional a(*, .> is a sesquilinear form, i.e. conjugate linear in the first argument and linear in the

Page 10: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

106 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

I

Fig. 2. Computational domain for the exterior acoustics problem

second, andf(*) is conjugate linear. Next, we introduce a finite element discretization on the domain 9, and let

Yh C Sq Y” C Y be finite element spaces consisting of continuous piecewise linear polynomials. The approximate statement of the weak form can now be stated as: Find ph E Lfh such that

a(vh, ph) =f(uh) , for all vh E “Irh (32)

By introducing the finite element nodal basis functions in (32) we obtain a complex symmetric and indefinite

matrix problem given by

Kd=f, (33)

where

K:= [A, -k2A2 +A,]. (34)

The matricesA,,A2ElRNxN andA,EcNXN arise from the respective inner products in (29). If nodes on the

radiation boundary are assumed to be numbered last, then A ~ has the block form:

(35)

where A d,n E @ NdtnXNdtn. Here, N denotes the total number of unknowns and Ndtn the number of mesh points on the DtN boundary. Note that while A f and A, are real symmetric, Adtn is non-Hermitian but complex

symmetric. Due to the non-Hermitian and indefinite nature of matrix problems arising in acoustics, their iterative solution

is difficult. In particular, as the acoustic wave number k increases, the eigenspectrum of K shifts towards the left half plane under which conditions the convergence of iterative methods deteriorates significantly. Moreover, with increasing wave number, the wavelength of the oscillatory solution requires a progressively finer mesh in order to accurately resolve the solution. The need for refinement leads to large matrix problems which have poor numerical conditioning and this further deteriorates iterative convergence. Therefore, the convergence of iterative solution methods for acoustics need to be improved under two conditions: (i) decreasing mesh size, and (ii) increasing acoustic wave number or frequency.

While hierarchical basis functions can always be used for discretization of the variational problem (32), it turns out that for indefinite problems, such as steady state acoustics, certain restrictions on the choice of the coarse grid size must be respected. This is discussed next.

Page 11: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 107

6.2. Choice of the coarse grid size

Multiple grid levels inherent in a splitting of the finite element mesh, on which the hierarchical basis is

defined, arise quite naturally in h-adaptive finite element computations if nested mesh refinement is employed.

However, unlike positive-definite bilinear forms, quasi-optimal error estimates for the Galerkin finite element approximation of an indefinite bilinear form are obtained only under the condition that the mesh size is sufficiently small (see [20] for details). Therefore, a more careful selection of grid levels, in particular the coarsest grid, is required when the preconditioning approach is applied to indefinite problems.

We propose to arrive at a quantitative empirical guideline for choosing the coarse grid for multilevel methods,

by examining the dispersion and attenuation characteristics of finite element approximations on a simple model problem. In particular, following Thompson and Pinsky [24], we conduct a one-dimensional dispersion analysis to obtain the limit of resolution of a finite element mesh-the minimum mesh size to resolve waves of certain wavelength-for a given choice of the interpolation functions. We show via extensive numerical tests (cf.

Section 7) that for best convergence results, the coarse grid size should be no greater than this limit of resolution.

6.2.1. Dispersion analysis for the Helmholtz equation

Let us consider a one-dimensional setting in which the domain is discretized using equi-spaced grid points,

with h being the distance between two consecutive points. If we employ linear Lagrange polynomials to discretize the variational equation corresponding to the Helmholtz operator, the resulting 3-point stencil for an

interior node can be stated as

IQ!&_, +(f-k2g$,,+($-k’{)6,,+, =o. h

Now consider the representation of the eigenfunction 4 = A exp(ikx), which represents a purely propagating wave with wave number k, on the discrete grid points, i.e.

4,,, = A exp(ikhx,,) , (37)

where x, = nh is the coordinate of the nth grid point, and kh is the numerical wave number due to approximation

on the given grid. Inserting (37) in (36), we obtain

(38)

From (38), observe that if kh S fi, then I[] s 1 and consequently kh is purely real. However, for kh > \/Tz,

/[I > 1 and in this case kh is complex even though the exact function (37) is purely propagating! Therefore,

from this simple analysis, we see that if the wave number normalized with the grid-spacing parameter h exceeds a certain limit, the numerical solution (even for a purely propagating) wave undergoes attenuation. Based on this

observation, we require that the characteristic mesh size, H, of the coarsest grid in the multilevel splitting of a finite element mesh should at least satisfy the minimum resolution requirement, i.e. for linear finite elements,

H <e/k.

6.3. Matrix-free computations

For exterior problems in acoustics, the discretization of the nonlocal Dirichlet-to-Neumann map results in a complex-valued and fully populated matrix. The amount of memory required for storing such a matrix depends quadratically on the number of mesh points on the radiation boundary, and it is often prohibitively expensive to

store the dense global matrix. One advantage of iterative methods is that global matrices need not be explicitly formed and stored, but

instead a direct procedure for computing the matrix-vector product required in the iterative algorithm may be used. Such matrix-free computations can be performed without the explicit assembly of any global matrices, and

Page 12: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

108 M. Malhotra, P.M. Pinsky / Comput. Methods Appl. Me&. Engrg. 155 (1998) 97-l I7

therefore result in substantial storage reductions. In the case of typical finite element stiffness and mass matrices (e.g. A, and A 2), a typical matrix-vector product of the form u = Ap can be computed in the following three

steps:

-0) (ii)

(iii)

Construct element vectors p’, e = 1 . . . n,,, from the global vector p using element connectivity data. Here, n,, denotes the total number of elements. Evaluate element-level matrix-vector products of the form:

ue = A’p’ .

Obtain the global product-vector u from element vectors ue using the assembly operator:

(39)

A similar matrix-free implementation of the product with the full matrix Adtn can be achieved as shown next. In two dimensions, the variational term containing the DtN map in (30) and its resulting matrix contribution A dtn are as follows:

(v, S(p)),_ = i? - yH!g; I,‘” u(R, 0) I,‘” cos n(r9 - O’)p(R, 0’) dt9’ de , n=O

Adtn =;; - 2:; [I,‘” N(B) cos n0 d0 I:” NT(W) cos n0’ de’ n

+ N(0) sin n0 df? NT(B’) sin &’ d0’ I

.

(40)

(41)

Here, N(8) contains nodal basis shape functions corresponding to all nodes on the radiation boundary. The

evaluation of the product A dtnpdtn, where pdtn is a given vector with components corresponding to nodes on the

radiation boundary, can be achieved without the explicit storage of Adtn by expressing the overall product in terms of element-level operations in the following way:

0)

(ii)

Construct element vectors p&, e = 1 . . . it:“, from the global vector pdtn using element connectivity data. Here, nzr denotes the total number of elements on the radiation boundary.

Evaluate element-level matrix-vector products of the form:

(42)

N, cos n6 d8 , and SE = I Jfl_

N, sin n0 d0 .

(iii)

Note that the integrals in (43) are evaluated on ati: = LY fl r,, where 0’ denotes an element on the

radiation boundary, and U:$ an: = r,. Obtain the global product-vector udtn from element product-vectors uit,, using the assembly operator.

In the above form, the computation of udt,, p roceeds locally within elements on the radiation boundary, and

involves only element-level vector inner-products, element-level vector updates and a global sum. A comparison of memory requirements for the assembled form of A dtn and the matrix-free implementation is illustrated in Fig.

3. The comparison indicates that for large-scale problems, with sufficient boundary refinement in two or three dimensions, memory savings up to three orders of magnitude can be obtained with the matrix-free approach.

From the standpoint of operation counts, while a matrix-vector product with the assembled DtN matrix requires 6’(Ndtn)* operations, the matrix free-approach involves only O(Ndtn X (N + 1)) operations. Therefore, the

matrix-free procedure is significantly less expensive since typically (N + 1) << Ndtn. Here, Ndtn denotes number of mesh points on the DtN boundary and N denotes the number of terms in the truncated DtN map. Due to their simple form, the integrals cz and sz can be analytically evaluated a priori, thereby avoiding overheads of numerical quadrature. This rearrangement of computations provides a computationally efficient matrix-free

Page 13: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra. P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 109

Number of points on DtN boundary (~10~)

Fig. 3. Memory requirements for performing matrix-vector products with nonlocal DtN operator using the full matrix and schemes.

interpretation of the nonlocal DtN, and

penalties related to its non-local nature.

matrix-free

allows the use of this exact boundary condition without any storage

7. Numerical examples

A canonical problem in structural acoustics is the scattering of plane waves from a rigid body submerged in a

fluid of infinite extent. We consider the case of two-dimensional plane-wave scattering from a rigid cylinder with conical-to-spherical end caps, and a large length to diameter ratio E/d = 8.0, as shown in Fig. 4. Representing

the total pressure field due to scattering by the sum decomposition p = p,, + pinf, where pint is the prescribed

incident plane wave and ps, is the unknown scattered field, the boundary value problem (24)-(26) can be

formulated for just the unknown part p,, by taking f = 0 in (24) and imposing the boundary condition (25) as

VP,, . n = -Vpinc * n on 4. (44)

We consider a plane wave incident along the length of the cylinder such that pint = eikx, where i = J-1 and x is the coordinate along the length of the cylinder. The DtN map (27) is applied on a circular boundary of radius R = 1.

Fig. 4. Two-dimensional trace of a cylinder with conical-to-spherical end caps

Page 14: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

110 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-I I7

The preconditioned quasi-minimal residual (QMR) algorithm [9,1 l] is employed for iterative solution of the

linear systems. In particular, we use a complex-symmetric version of QMR [ 1 l] that reduces the amount of computational work and storage requirements by approximately a factor of half compared to the general non-Hermitian case. In the results that follow, iterations were stopped when the true residual at iteration n, rn =f - Kd,, satisfied lb,11 < lo-’ . llfll. The performance of the following preconditioners was studied with increasing (i) mesh refinement and (ii) frequency of analysis:

- M,, = diag(K), which is the diagonal preconditioner.

- Msso, = (D + wL)D ‘(D + wU), which is the SSOR preconditioner. Here, K = L + D + U, with L and U strictly lower and upper triangular, and D = diag(K) is nonsingular; also, we fix w = 1.2. While an efficient

serial implementation of M,,,, can be done using the Eisenstat trick [7], the numerical results described below do not exploit that trick.

- MnL3,s = (P’ diag(K)-‘P)-‘, which combines hierarchical basis transformations with diagonal scaling.

- M,,,, =(PTIK;l ;I-‘+‘, which combines hierarchical basis transformations with the solution of the coarse level unknowns. A direct profile solver was employed to perform the coarse level solution.

In order to illustrate the character of the solution field, we plot the pressure contours in the fluid domain 9 for

plane-wave scattering at kd = T in Fig. 5.

7.1. Convergence with mesh rejinement

The effects of mesh refinement is examined first by fixing the frequency of analysis, and solving the problem

on successively refined meshes. In order to obtain hierarchical basis preconditioners for a given mesh, a

multilevel splitting which consists of nested grid levels is constructed. Fig. 6 illustrates such a splitting for a mesh with 24 X 128 (radial X circumferential) elements and 3200 nodes; the coarse mesh in this case consists of 3 X 16 elements and 64 nodes, and intermediate mesh levels are obtained by successive uniform refinement. We

denote by ‘HB Levels’ the total number of hierarchical levels in such a splitting. Table 2 summarizes the number of iterations required for convergence with various preconditioners at

kd = 7r/6. We observe that both M,, which denotes unpreconditioned algorithm, and MD suffer substantial deterioration in iteration count as the mesh size decreases. In contrast, the performance of preconditioners

employing hierarchical basis transformations is least sensitive to decreasing mesh size, and consequently these

Level

Level

Level

Level

Fig. 5. Pressure (real part) contours in the fluid domain for scattering at kd = QT.

Fig. 6. Hierarchical splitting of the mesh with 24 X 128 elements that is used to form M,,,, and M,,,, at kd = ~r/6

24 x 128

12 x 64

6 x 32

3 x 16

Page 15: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky / Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 111

Table 2

Iteration counts for the scattering problem at kd = a/6

Mesh kd MD M SSOR HB Levels M “BDS M HBCS

12x64 nfb

24 X 128 48 X 256 96x512

a Iteration count not available.

124 116 57 3 56 47 286 244 105 4 70 62 592 468 197 5 83 78

1372 1083 a 6 102 96

are very effective in reducing the total number of iterations. Although M,,,, provides some reduction in

iteration counts, its performance does not scale as well with decreasing mesh size. Fig. 7 shows the effect of increasing mesh refinement on the number of iterations for convergence at

kd = 7~/2 and 7~. A comparison of these iteration counts again suggests that both M,,,, and M,,,, provides a

convergence rate that scales very favorably with decreasing mesh size.

7.2. Convergence with increase in frequency

Next, we examine the performance of various preconditioners on a fixed mesh under increasing frequency of analysis. Recall that as the frequency of analysis is increased, it is necessary to adapt the hierarchical levels in

the multilevel splitting such that the coarsest level satisfies the limit of resolution, i.e. H c a/k, where H is the coarse mesh size. In order to satisfy this Iimit, meshes with 6 X 32 and 12 X 64 elements were used as coarse

levels at kd = IT/~ and 71, respectively. To further study the effect of the coarse-grid size H on performance of hierarchical basis preconditioners, we

2 2000

.s 1800

6 1600

2 1400 1200

1000

“0 800

5 600

1 400

1

/ d’ ./

.’

./

,/’ x

Le p___D-_-O---* -

* . .f b ..,, _

LOO 1000 10000 lE5

Number of unknowns

(a)

10000 lE5

Number of unknowns

Fig. 7. Convergence with mesh refinement for: (a) kd = a/2; (b) kd = T,

Page 16: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

112 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

consider solving the problem at a given frequency on a fixed mesh-one with 12 544 unknowns-but with different coarse grid sizes. The characteristic mesh size, h, is taken to be

where hi:,, is the maximum of the two diagonal lengths in a quadrilateral element. Then, for various meshes considered here, h approximately evaluates to the values shown in Table 3. Table 4 shows iteration counts

obtained using different hierarchical splittings of the given mesh (with 12 544 unknowns), with the hierarchical

basis preconditioner MHBDS. These results show that lowest iteration counts are achieved for hierarchical splittings that have a coarse grid size such that kH is less than, but close to, a (= 3.464). In fact, we have

always observed this result, and therefore the HB preconditioners MHBDs and MHBCs were always obtained keeping in view this guideline.

Fig. 8 illustrates the growth in iteration counts for different reconditioners on a fixed mesh with kd = IT/~, IT / 2 and m. A comparison of the performance of MD and MHBDS indicates that iteration counts obtained with M HBDS are 4 to 8 times lower than diagonal scaling at kd = n/6 and IT/ 2. However, the iteration counts for

M HBDS do not remain as favorable when the wavenumber increases to kd = IT. This can be attributed to the

difference in choice of coarse levels as frequency increases. The preconditioner M,,,, accounts for this dependence on the choice of the coarse mesh and provides rate of convergence which is almost mesh, and

frequency, independent.

7.3. Computational pe$ormance

A cost evaluation of preconditioning algorithms is presented for implementations on a single workstation and on a distributed-memory parallel computer. Although we consider uniform nested meshes, computational times

presented here do not exploit their special structure but are based on general data structures and algorithms for unstructured meshes as described in Section 5.

The computational effort required within each iteration of the complex-symmetric QMR algorithm [l l]

involves evaluation of: (i) a matrix-vector product, (ii) preconditioning steps, (iii) 2 dot products, and (iv) 4

saxpy operations. In order to compare the computational performance of various preconditioners, we summarize elapsed times required for performing the matrix-vector product, the preconditioning steps and the overall time spent within each iteration of QMR.

Table 5 presents such a summary of elapsed times for various preconditioners on a Sun SparclO workstation. The computation of matrix-vector products was performed using the element-by-element procedure described in Section 6.3. The preconditioner M,,,, was stored using a profile storage scheme. A comparison of time per iteration required for various preconditioners, together with the iteration counts for convergence at kd = IT/~, indicates that significant savings in overall solution times are obtained with M,,,, and MHBCs as problem size

Table 3

Characteristics mesh size, h, for various infinite element meshes used here

Mesh 3x 16 6 x 32 12 x 64 24 x 128 48 x 256

Characteristic size h 4 2 1 0.5 0.25

Table 4

Effect of varying H on the performance of M,,,, with n = 12 544

k kh Coarse mesh kH Number of iterations

r/2 0.39 3 X 16 6.28 224

6 X 32 3.14 187

12 x 64 1.57 271

71 0.78 3 X 16 12.56 1272

6 X 32 6.28 723

12 X 64 3.14 506

Page 17: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 113

Fig. 8. Convergence with frequency increase for meshes with: (a) n = 12 544; (b) n = 49 664 unknowns.

“0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

kd

(4

3000 1 I 8 I 8 % ‘/a’

.z 2500 j-$BDS : ,A ,A

/a

- $ HBCS 9 2

0’ 2 2000 - 0.

/

kd

(b)

Table 5

Elapsed time for performing various operations in each iteration of complex-symmetric QMR on a Sun SPARCstationlO, and total solution

time at kd = a/6

Mesh (unknowns) Preconditioner

24 X 128 (3200) Matrix-vector product (s) 0.44 0.44 0.44 0.44

Preconditioning (s) 0.009 5.47 0.094 0.096

Time/iteration (s) 0.48 5.95 0.568 0.570

Total iterations 244 105 70 62

Solution time (min:s) 158 10:25 0:40 0:35

48x256(12544)

96 X 512 (49 664)

Matrix-vector product (s) 1.24 1.24 I .24 1.24

Preconditioning (s) 0.034 55.49 0.340 0.342

Time/iteration (s) 1.39 56.85 1.69 1.70

Total iterations 468 197 83 78

Solution time (min:s) lo:52 186:39 2:21 2:13

Matrix-vector product (s) 5.69 a 5.69 5.69

Preconditioning (s) 0.083 a 1.93 1.88 Time/iteration (s) 6.22 a 8.06 8.01

Total iterations 1083 a 102 96

Solution time (min:s) 112:16 B 13:42 12:49

a Profile storage of M,,,, exceeded available disk space.

Page 18: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

114 M. Malhotra, P.M. Pinsky / Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

Table 6

Iterative solution times on a Sun SPARCstation 10 for increasing kd

Unknowns

(coarse mesh)

kd Preconditioner

MD M SSCIR M FIR”? M HRr?C

12544 a/2 Time/iteration (s) 1.39 56.85 1.69 1.71

(6 X 32) Total iterations 876 285 187 109

Solution time (min:s) 20:29 270:05 5:18 3:06

12 544 IT Time/iteration (s) 1.39 56.85 1.69 1.89 (12 X 64) Total iterations 1377 473 506 143

Solution time (min:s) 31:54 448:lO 14:15 4:31

49 664

(6 X 32)

m/2 Time/iteration (s)

Total iterations

Solution time (min:s)

6.22 1 8.06 8.09

I844 I

214 134

191:lO B 28:45 18:04

49 664 IT Time/iteration (s) 6.22 a 8.06 8.25

(12X64) Total iterations 2972 s 564 173

Solution time (min:s) 308:30 ‘I 75146 23147

a Profile storage of MS,,, exceeded available disk space.

increases. Note that although M SSOR provides acceptable iteration counts, without using Eisenstat’s trick, it is computationally ineffective due to excessive computational and storage requirements. The elapsed times for solution at kd = m/2 and n are presented in Table 6. An examination of these results suggests that as frequency increases, the savings in total solution times with MHBCs become increasingly significant, with factors of improvement observed being between 6 to 13 times over M,,, and up to 3 times over MHBDS.

Table 7 presents the storage requirements and elapsed times for performing direct solution of various problem sizes using a complex-symmetric profile solver. A comparison of these direct solution times with iterative

solution times presented in Tables 5 and 6 demonstrates that even for large two-dimensional problems iterative

solution using hierarchical basis preconditioners is faster by about 3 to 6 times depending on the frequency of analysis. Moreover, the low storage requirements of both of these preconditioners, together with the matrix-free implementation of QMR iterations, makes it possible to solve substantially larger problems even on workstations.

Next, we examine the performance of hierarchical basis preconditioners on the CM-5, which is a distributed-

memory parallel computer. The parallel code used for these numerical studies was implemented in CM Fortran using the global vector-units execution model of the CM-5 [23]. On vector-parallel platforms such as the CM-5, the application of preconditioners which require solution of triangular systems is often quite expensive because

of the difficulty in vectorizing computations involved in the solution of such systems. In fact, very often data-parallel implementations only employ diagonal preconditioning [4] due to its potential for high parallel performance. A summary of parallel solution times, using diagonal and hierarchical basis preconditioners on a

64-node CM-5, is shown in Table 8. An examination of these solution times clearly demonstrates that savings of about 2.3 to 5 or more times in the total solution time can be obtained with MHBDs over diagonal scaling. Perhaps even more significantly, the preconditioner MHBCs p rovides factors of 7 to over 12 times savings over diagonal scaling.

Table 7 Storage required and solution time for various problem sizes using a complex-symmetric profile solver

Mesh (unknowns) Storage (Mbytes) Solution time (min:s)

24 X 128 (3200) 6.15 0:5 I 48X256(12544) 48.57 13:05 96x512(49664) 386.27 a

a Profile storage of the coefficient matrix exceeded available disk space.

Page 19: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117 115

Table 8 Elapsed time for various operations in each QMR iteration and the total solution time at kd = 7~ on a 64-mode CM-5 with vector units

Mesh (unknowns)

48 X 256 (12 544) Matrix-vector product(s)

Preconditioning (s)

Time/iteration (s)

Total iterations

Solution time (min:s)

Preconditioner

MD

0.17

0.001

0.18

1377

4:09

M HBDS M HBT?

0.17 0.17

0.02 0.04

0.20 0.22

506 143

1:45 0:32

96X512 (49664) Matrix-vector product (s) 0.32 0.32 0.32

Preconditionmg (s) 0.003 0.091 0.26

Time/iteration (s) 0.34 0.42 0.60

Total iterations 2972 564 175

Solution time (min:s) 16~42 4:00 1145

192X 1024(197632) Matrix-vector product (s) 0.57 0.57 0.57

Preconditioning (s) 0.005 0.44 0.69

Time/iteration (s) 0.60 I .03 1.29

Total iterations >5000 648 219

Solution time (min:s) >50:00 1 I:07 4:43

It is important to note here that preconditioning with MHBCs requires an approximate solution of unknowns

corresponding to the coarse level. For the coarse mesh size considered here (832 unknowns) this computational stage lacked data-parallelism, and therefore required special attention to implement in a synchronous data-

parallel environment. The results shown in Table 8 are based on performing the coarse solution in serial mode on the control processor using a profile solver. It is noteworthy that despite communication overheads inherent

in such a procedure, the preconditioner M,,,, p rovides substantial reduction in overall iterative solution times.

However, the scalability of such an implementation with increasing size of the coarse grid problem may not be acceptable for overall efficiency of the parallel algorithm. Nevertheless, a greater amount of parallelism can be

exploited in the coarse solver as the size of the coarse grid problem increases, thereby providing greater

flexibility in choosing alternate procedures to perform an approximate solution of coarse unknowns.

8. Conclusions

In this paper we have considered a preconditioning approach which is based on exploiting the properties of

h-hierarchical basis functions associated with a multilevel splitting of a given finite element mesh. The

formulation of the hierarchical basis preconditioner, introduced in [26], was extended to include an approximate

solution of unknowns corresponding to the coarse level in the multilevel splitting in conjunction with transformation from the nodal to hierarchical bases. The data structure and algorithms required for performing the multilevel preconditioning were described for implementations on a workstation and on a distributed-

memory parallel computer. The solution of two-dimensional scattering problems in acoustics was performed with preconditioned QMR in

order to examine convergence rates obtained with various preconditioners and illustrate their computational performance. These numerical tests indicate that convergence rates predicted by theoretical bounds, which motivate the formation of h-hierarchical basis preconditioners, are realized on practical discretizations for problems in acoustics. Moreover, by combining hierarchical basis transformations with the solution of a coarse problem, we achieve iterative convergence rates for the Helmholtz equation that depend relatively mildly on both frequency and discretization. Since the preconditioners can be implemented efficiently, the reduction in iteration counts also leads to substantial savings in overall solution times on a workstation and on the CM-5. The

results presented in this paper demonstrate that up to an order or more of savings in total solution times can be achieved on both serial and parallel platforms depending on the mesh size and frequency of analysis.

Page 20: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

116 M. Malhotra, P.M. Pinsky I Comput. Methods Appl. Mech. Engrg. 155 (1998) 97-117

The transformations inherent in hierarchical basis preconditioners can also be applied efficiently in conjunction with matrix-free implementations of a gradient-type iterative algorithm. This is particularly useful in the context of acoustics, where a matrix-free representation of the non-local DtN contribution enables the use of this highly accurate boundary condition without any storage penalties associated with its non-local nature.

In the case of the hierarchical basis preconditioner involving a coarse-grid solve, the choice of a solution algorithm for the approximate solution of coarse-level unknowns depends on the size of the coarse grid and also on the type of computing platform. Since the coarse mesh sizes considered here were reasonably small, this step was performed using a profile solver. Alternative coarse-grid solution methods may need to be examined with

increasing size of the coarse grid problem. In this paper multilevel splittings required for defining hierarchical basis transformations were constructed

using uniform mesh refinement. For the case of nonuniform meshes, the construction of a multilevel splitting

becomes a difficult task. However, hierarchic grid levels which arise in h-adaptive finite element computations

offer a natural framework to extend this approach to nonuniform adaptively refined meshes and this would be

investigated in a separate study.

Acknowledgments

Our thanks are due to Dr Roland Freund of Bell Laboratories for his helpful comments, and to Dr Arthur Raefsky of Centric Engineering Systems, Inc., for several helpful discussions on the nuances of programming

the CM-5. We also gratefully acknowledge the support of ONR grant NOOO14-92-J-1774. This work was supported in part by a grant of HPC time from the DOD HPC Shared Resource Center, AHPCRC, Minnesota,

which provided access to the CM-5.

References

[I] 0. Axelsson and VA. Barker, Finite Element Solution of Boundary Value Problems (Academy Press, 1984).

[2] 0. Axelsson and P.S. Vassilevski, A survey of multilevel preconditioned iterative methods, BIT 29 (1989) 769-793.

[3] E. Barragy and G.F. Carey, Preconditioners for high degree elements, Comput. Methods Appl. Mech. Engrg. 93 (1991) 97-110.

[4] H. Berryman, .I. Saltz, W. Gropp and R. Mirchandaney, Krylov methods preconditioned with incompletely factored matrices on the

CM-2, Technical Report 89-54, NASA Langley Research Center, ICASE, Hampton, VA, 1989.

[5] G.F. Carey and E. Barragy, Basis function selection and preconditioning high degree finite element and spectral methods, BIT 29

(1989) 794-804.

[6] J.K. Dickinson and PA. Forsyth, Preconditioned conjugate gradient methods for three-dimensional linear elasticity, Int. J. Numer.

Methods Engrg. 37 ( 1994) 221 l-2234.

[7] S.C. Eisenstat, Efficient implementation of a class of preconditioned conjugate gradient methods, SIAM J. Sci. Statist. Comput. 2

(1981) l-4.

[8] H.C. Elman and X.Z. Guo, Performance enhancements and parallel algorithms for two multilevel preconditioners, SIAM J. Sci. Statist.

Comput. 14(4) (1993) 890-913.

[9] R.W. Freund, Conjugate gradient-type methods for linear systems with complex symmetric coefficient matrices, SIAM J. Sci. Statist.

Comput. 13 (1992) 425-448.

[lo] R.W. Freund, G.H. Golub and N.M. Nachtigal, Iterative solution of linear systems, Acta Numer. 1 (1992) 57-100.

[ll] R.W. Freund and N.M. Nachtigal, An implementation of the QMR method based on coupled two-term recurrences, SIAM J. Sci.

Comput. 15 (1994) 313-337.

[12] A. Greenbaum, C. Li and H.Z. Chao, Parallelizing preconditioned conjugate gradient algorithms, Comput. Phys. Comm. 53 (1989)

295-309.

[ 131 W.D. Gropp and D.E. Keyes, Complexity of parallel implementation of domain decomposition techniques for elliptic partial differential

equations, SIAM J. Sci. Statist. Comput. 9(2) (1988) 312-326.

[14] M.R. Hestenes and E. Steifel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards 49 (1952)

409-436.

[15] T.J.R. Hughes, The Finite Element Method (Prentice-Hall, Inc., Englewood Cliffs, NJ, 1987).

[16] T.J.R. Hughes, I. Levit and J. Winget, An element-by-element solution algorithm for problems of structural and solid mechanics,

Comput. Methods Appl. Mech. Engrg. 36 (1983) 241-254. [17] 2. Johan, T.J.R. Hughes, K.K. Mathur and S.L. Johnsson, A data parallel finite element method for computational fluid dynamics on

the Connection Machine system, Comput. Methods Appl. Mech. Engrg. 99 (1992) 113-134.

[ 181 M. Jung, U. Langer and U. Semmler, Two-level hierarchically preconditioned conjugate gradient methods for solving linear elasticity

finite element equations, BIT 29 (1989) 748-768.

Page 21: Parallel preconditioning based on h-hierarchical finite elements with application to acoustics

M. Malhotra, P.M. Pinsky I Compui. Methods Appl. Mech. Engrg. 155 (1998) 97-l I7 117

[19] J.B. Keller and D. Givoli, Exact non-reflecting boundary conditions, J. Comput. Phys. 82( 1) (1989) 172-192.

[ZO] A.H. Schatz, An observation concerning Ritz-Galerkin methods with indefinite bilinear forms, Math. Comput. 28 (1974) 959-962.

[21] T.E. Tezduyar, M. Behr, SK. Aliabadi, S. Mittal and SE. Ray, A new mixed preconditioning method for finite element computations,

Comput. Methods Appl. Mech. Engrg. 99 (1992) 27-42.

[22] Thinking Machines Corporation, Cambridge, MA, CMSSL for CM Fortran: CM-5 Edition, Version 3.1, June 1993.

[23] Thinking Machines Corporation, Cambridge, MA, CM Fortran Language Reference Manual: CM-5 EditionVersion 2.1, January 1994.

[24] L.L. Thompson and P.M. Pinsky, Complex wavenumber Fourier analysis of the p-version finite element method, Comput. Mech. 13(4)

(1993) 255-275.

[25] H. Yserentant, Preconditioning indefinite discretization matrices, Numer. Math. 54 (1990) 719-734.

[26] H. Yserentant, On the multi-level splitting of finite element spaces, Numer. Math. 49 (1986) 379-412.

[27] O.C. Zienkiewicz, D.W. Kelly, J. Gago and I. Babuska, Hierarchical finite element approaches, error estimation and adaptive

refinement, in: J.R. Whiteman, ed., The Mathematics of Finite Elements and Applications IV (1981) 314-346.