Automatic generation of efficient routines for evaluating multivariate polynomials arising in finite...

ELSE VIE RPII: S0965-9978(97)00001 -X

Advances in Engineering Software 28 (1997) 239–245Q1997E1sevier ScierrceLimited

All rightsreserved.Prirrtedin Great Britain0965-9978/97/$17.00

Automatic generation of efficientroutines forevaluating multivariatepolynomials arising

in finite element computations

Frank CameronLaboratory of Electricity and Magnetism, Tampere University of Technology, P.O. Box 692, 33101 Tampere, Firdand

(Received 23 March 1995; revised version received 27 November 1996; accepted 23 December

Multivariate polynomials commonly occur in finite element computations.Homer’s method is an efficient way of evaluating multivariate polynomials andtheir derivatives. We have developed MAPLE procedures that generate efficientC or FORTRAN routines based on Homer’s method. Operation counts for somecommon finite element computations are shown to be significantly lower withthese routines than with some other codes from the literature. ~ 1997 ElsevierScience Limited

Key words: automatic code generation, Homer’s method, multivariatepolynomial, finite element, shape functions.

1 INTRODUCTION

Multivariate polynomials arise frequently in finiteelement (FE) computations. A typical coordinate trans-formation in 3D FE calculations can be expressed as

.X= ~./Vj(U,V,W’).Xii=]where the xi are known and Ni are shape functions orinterpolation functions. These Ni are polynomials in u, wand w and so the summation is just a multivariatepolynomial in u, v and w.

Polynomials and their derivatives can be efficientlyevaluated using Homer’s method. This fact is wellknown and Homer’s method is standard material innumerical analysis textbooks.l Perhaps less well knownis that Homer’s method can be applied to multivariatepolynomials.2 In spite of the central role played bymultivariate polynomials in FE analysis, Homer’smethod does not appear to have been used in theirevaluation.

It would indeed be tedious to write code manuallyusing Homer’s method for all the different element typesthat are used in FE analysis. We have therefore writtensome procedures using the symbolic computationlanguage MAPLE3 that automatically generate C orFORTRAN routines tailored to a specific element. TheC or FORTRAN routines can then be used to evaluatethe multivariate polynomials and their derivatives using

1996)

Homer’s method. (Other uses of symbolic computationsoftware in code generation for FE analysis are des-cribed in Refs 4–7.)

In Section 2 we discuss where multivariate poly-nomials appear in FE analysis and how they can berepresented. We describe two different implementationsof Homer’s method in Section 3. Section 4 contains ashort example on how to use our MAPLE proceduresto make C or FORTRAN routines that use Homer’smethod. We compare the operation counts of theseHomer-based routines to counts of other routines foundin the literature in Section 5.

2 MULTIVARIATE POLYNOMIALS

2.1 FE computations

The multivariate polynomials we are concerned witharise from coordinate transformations commonly usedin FE calculations. Let us consider a 3D element whose

) - I, 2, . . . . n. Thenodal coordinates are (xi, ~i, Zi , 1—global x coordinate at any point in this element istypically defined bys

x= ~Ni(u,v, ‘)xi ~fx(lK,Z4,1J, W)(1)

i= 1

where the shape functions Ni are expressed in terms ofreference coordinates (u, v, w). The vector x contains

239

240 F. Cameron

Procedure DHORN3,evatuateea trivariate polynomial and its threederivativesusing Homer’smethodinput the polynomialcoefficientsat, the scalar n,, the veetor ~ andthe matrix n3begin

1. p = dpl = dp2 = dp3 = O2. e=l3. for i = n, to 1 by -1 do4. begin pa = dp2a = dp3a = O5. for j = n2(i) to 1 by -1do6. begin dp3b = O7. pb = a,8. /=1+ 19. for k = n~(i,j) to 2 by -1 do10. begin dp3b = dp3b. ta +pb11. pb = pb. ta -tat12. e=l+l

end13. dp3a = dp3a. tz + dp3b14. dp2a = dp2a ~5Ê‡•þI••tz●Fpa15. pa= pu.t2+pb

end16. dp3= dp3. tl + dp3a17. dp2 = dp2. t, + dp2a18. dpl = dpl . tl -t p19. p = p. t] + pa

endend

Fig. 1. The DHORN3 algorithm.

[x~,X2,..., x.]. Let the analogous expressions for the yand z coordinates be~Y and~z. Transformation (1) maybe rewritten as follows

‘= ~a~(X)U”iVoi~’i (2)i=l

where no two of the product terms are the same,i.e. uefvp(w~i# ua~”vfiw~.The point in rewriting (1) as(2) is to emphasize the fact that this is a multivariatepolynomial in (24,v, w).

In addition to computing the transformation (l),another common task in FE analysis is to compute theJacobian of the transformation:

[

afxja 8J.18v afxlawJ = 8jJ8u afy/L%)afy/aW1 (3)

a~jaua~java~jawIn general, each element of J will be a multivariatepolynomial in u, v and w. We note that the three rowsof J are the gradients of f,, fy and ~,. We will form Jusing these gradients.

We are concerned with the efficient computation of(2) and its Jacobian matrix (3).

2.2 Representation

The polynomials in (l)–(3) are trivariate because theyarise from a 3D element. For 2D elements we getbivariate polynomials. We now consider how to repre-sent trivariate and bivariate polynomials efficiently.

One way to express trivariate polynomials is to use

the highest occurring powers of the three variables. Withthis approach a trivariate polynomial in tl, t2and 13canbe represented by

rn, M, m.

(4)

where

1= k + (j – l)m3 + (i – l)mJms (5)

In (4) and (5), (m, – 1) is the highest occurring powerof tr,r = 1, 2, 3. Suppose we used this representationfor the coordinate transformation of a 3D quadraticserendipity element. For such an element ml =m2 = m3 = 3. It follows that p of (4) has 27 terms.However, the trivariate polynomial of a 3D quadraticserendipity element has 20 terms, i.e. the same as thenumber of nodes. This in turn means that seven ofthe al coefficients in (4) must be O.We conclude that (4)is not, in general, a very efficient representation for atrivariate polynomial.

We propose the following representation for atrivariate polynomial:

(6)

where

i – 1 ~2(r) j–1

1 = ~ ~ r13(Y, S) + ~n3(i, S) + k (7)r= I S=l S=l

In (6) and (7) nl, rzz(i) and rZ3(i,j) have the following

Automatic generation of eficient FE routines

Table 1. Operation counts for different loop Homer algorithms. Kadd, E&u, and K= are the numberofadditions, multiplications and assignments needed

Algorithm K,~~ K~ul K.

nlDHORN3 3K + ~~j(S) + 4n1 2K + & q(s) + 4n1 3K + 3 ~ n~(s) + 7n1 + 5

S=l $=1 S=l

HORN3 2K + n, K + nl 2K + ~ nz(s) + 2n1+ 2S=l

DHORN2 3K + nl 2K + n, 3K + 3n1+ 4HORN2 2K K 2K + nl + 2

interpretation: nl – 1 is the highest occurring powerof tl,n2(i) – 1 is the highest power of t2that appearsin conjunction with tf–1and rz3(i, j) – 1 is the highestpower of t3that appears in conjunction with 11i-1 t;– 1.

Although (6) requires storage space for vector n2 andmatrix rz3 that is not needed in (4), for a givenpolynomial the number of zero als in (6) is never greaterthan in (4). For example, the 3D quadratic serendipityelement can be represented by (6) with no al being zerowhen we set

[1332

nl = 3, n2 = [3,3, 2], n3 = 3 3 2

22–

Assuming that none of the a[s is zero, the total numberof terms in p of (6) is given by K:

n, n,(r)

(8)

We will say that two trivariate polynomials p and q havethe same structure if n? = n;, n: = nj and n: = n:.

In using (6) and (7) to represent a trivariatepolynomial we may still have to set some coefficientsto O. For example, the polynomial

P = 1 + 2tl + 3tz + 4t; t; + 5t3 + 6t1t2t3 (9)

would have coefficient vector a = [1,5,3,2, 0,6,0, 4].Hence two of the eight coefficients are zero. Incomparison, the representation of (9) provided by (4)would have 18 coefficients of which 12 would be zero.

For bivariate polynomials arising from 2D elementsn3 and the third summation in (6) are removed. Thetotal number of terms in a bivariate polynomial is then

K = ~nz(r) (lo)r=l

3 HORNER’S METHOD FOR MULTIVARIATEPOLYNOMIALS

There are two ways Homer’s method can be implemen-ted for the evaluation of a multivariate polynomial and

its gradient: (1) with do or for loops or (2)brackets. We shall discuss these separately.

3.1 Loop versions

241

with nested

Figure 1 contains the DHORN3 algorithm for evaluat-ing a trivariate polynomial along with its derivativesusing Homer’s method. DHORN3 uses the trivariatepolynomial representation of (6). We could in principleuse DHORN3 to evaluate a bivariate polynomial and itsderivatives. However the reduction of DHORN3 to thecase of bivariate polynomials is straightforward: removefrom DHORN3 the innermost for loop and all variableshaving a ‘3’ in their name. We do not present thisreduction but we will refer to it as DHORN2. For abivariate polynomial DHORN2 needs fewer operationsthan DHORN3.

To obtain an algorithm that only evaluates apolynomial and not its derivatives, one should removeall occurrences of variables in DHORN3 and DHORN2that start with the letters ‘alp’. We assume suchalgorithms are called HORN3 and HORN2. Thenumber of additions, multiplications and assignmentsrequired by these algorithms is shown in Table 1. TheK in Table 1 is given by (8) for trivariate polynomialsand by (10) for bivariate polynomials.

3.2 Nested version

The nested implementation of Horner’s method ischaracterized by nested brackets. For example, thepolynomial

p = al + aj + as + ad tj + as tltj+ a(it,t;

+ a’7~3 + ag tl t3 + a9tl ~~ (11)

written in nested Horner’s form is

p = al + aT + aq tz + (az + (as + agtq)ts

+ (as+ a6t2)t2+ Uqtl)tl (12)

The nested form is more efficient than the loop formbecause it has less overhead. However the nestedform can only be written for a multivariate polynomialof one particular structure (see Section 2.2). If we use a

242 F. Cameron

nested implementation, then we need a separate poly-nomial evaluation routine for every different type ofelement. By contrast the loop implementation is general,e.g. in principle any trivariate polynomial can beevaluated with HORN3.

The structure of the polynomials in the Jacobian (3)must also be taken into account when using the nestedform. Of the four polynomials, ~X(x,u, v, w) of (l),@X/8u, 8jJt3u and C3~X/~w,no two have the samestructure, hence no two could be evaluated using thesame nested Homer routine. We could force twopolynomials to have the same structure by adding zeroterms, but this is antithetic to the e~cient evaluations weare seeking. We show next that we can evaluate thepartial derivatives efficiently if they are done together.We consider the 2D and 3D cases separately.

2D gradientsFor the 2D case, we use one subtlety in evaluating thegradient of (2). Let the highest degrees of u and v benU— 1 and nv – 1. Let u and v be the ‘outer’ and ‘inner’variables, respectively. We may rewrite (2) as

n“–1

x= ~uigi(v)i=()

(13)

where the gi are polynomials of at most degree n.. Thederivative with respect to u is

(14)

where the integer i appears explicitly. However, if wetake the derivative with respect to v,

ax ‘“-1~~6’gj(W)xz = ‘=0 f%

(15)

the integer factors arising from differentiation aredistributed amongst the dg~(v)/dv terms. Hence weincur extra multiplications by differentiating withrespect to the ‘inner’ variable. To avoid this we shouldrewrite (2) with v as the outer variable and u as the innervariable,

n. – 1

X= ~ Vihi(u) (16)‘=0

and evaluate ~x/6’u based on this.Let us consider the 2D eight-node serendipity

rectangle as an example. The bivariate polynomial (2)for this element has the following form:

x = al + a’2~+ aqv2 + a@ + asu~ + a@W2+ aTu2

+ agu2wWith v and u as the inner and outer variabIes,respectively, the partial derivatives in nested Homer’s

form are as follows:

i3x/6’u = 2(va8 + a~)u + (vub + a~)u + ad

8x/8~ = (2(~aG) + ~a~ + as)~ + 2(~aq) + a’2

Evaluating ~x/dv involves one more multiplication than8x/du. However, using v as the inner variable whenforming t)x/Ov results in an expression needing onlyfive multiplications. The savings in this example aresmall, but for higher order elements they can be moresubstantial.

3D gradientsIn the 3D case we can use common subexpressions tomake the evaluation of the gradient of(2) more efficient.We may rewrite (2) as

flu – 1n. – 1

‘= E z U’”’%(W)i=l) j=fJ

(17)

where the gij are polynomials of at most degreenw. Taking the derivatives with respect to u and v weobtain

~=nE’”E’iui-lv’gij(w)‘=1 j=l)

~=nE1n5’~ui”j-’gij(wi=f.1 j= 1

(18)

(19)

some of the same gij terms which appear in (18) also

appear in (19). To exploit this, at least the commonlyoccurring gij terms must be evaluated beforehandand stored.

Let us assume that all gij in (17) have been evaluatedbeforehand. Then (17) is essentially a bivariate poly-nomial in u and u and we may use the techniquepresented in the 2D case above to take into accountefficiently the i and j factors arising from derivation in(18) and (19). Of course we need not evaluate all gijbeforehand. Rather we only need those gij that are

common to both (18) and (19).The third component of the gradient must be

computed by itself. By expressing the muhivariatepolynomial as

n.,– 1n.,– 1

‘= Z X ‘iw’hij(u)‘=0 j=l)

we may write the derivative with respect to w as

*=nE’nf2’iwi-lv’hJu)i= 1 j=O

(20)

(21)

Again, if the hij appearing in (21) are computedbeforehand, we may use the technique presented in the2D case to take into account the i factor arising fromderivation.

Automatic generation of ejicient FE routines 243

Table 2. MAPLE procedures

Name Description

shapelagr Makes the Ni in (1) for any 2D or 3D Lagrangian elementshapeser2d, shapeser3d Makes the Ni in (1) for any 2D or 3D serendipity elementplotlagr Plots a reference element for 2D or 3D Lagrangian elementsplotser2d, plotser3d Plots a reference element for 2D or 3D serendipity elementsmvcof Makes a C or FORTRAN routine that computes ai in (2)poly Makes a C or FORTRAN routine for evaluating (2) using the

nested version of Homer’s methodgrad2d Makes a C or FORTRAN routine that uses nested Homer’s method

to evaluate the gradient of (2) for a bivariate polynomialgrad3d Makes a C or FORTRAN routine that uses nested Homer’s method

to evaluate the gradient of (2) for a trivariate polynomial

4 MAPLE PROCEDURES

Our MAPLE procedures are given in Table 2.t Withthese procedures one can generate C or FORTRANroutines that evaluate bivariate or trivariate polynomialsand their gradients using Homer’s method. At presentwe have procedures for only two classes of element —serendipity and Lagrangian — but other element classescan be easily added.

We now present a sample MAPLE session where wegenerate C routines that use a nested Homer’s methodfor evaluating the coordinate transformation of (2) andits gradient. Our element in this session is a 2Dquadratic serendipity element. This element has foursides and each side has three interpolation points.first put this information into a MAPLE array:

npt := [3,3,3,3] :

The command

Nser2 := shapeser2d(npt, u, v, ‘intpts ) :

We

produces in Nser2 the eight shape functions and inintpts the reference coordinates {u, v} for the eightinterpolation points. For example, the first shapefunction is

simplif y(Nser2[l]);

Next we setup a dummy array corresponding to x in (l):

x := array (l. .8) :

The command

mvcof (Nser2, x, u, v, w,C) :

generates a C routine, mvcof2. c, that computes the aicoefficients in (2) for our element when given the xivalues in (l). (The w argument in mvcof is used fortrivariate polynomials. For our 2D case w must bepresent, but it has no meaning.) To produce a C routine

tThese procedures and some related material can be obtainedvia anonymous ftp from ftp: //ftp. cc. tut. f i/pub/frank/elemhorn.

called poly. c that evaluates (2) we execute

poly(Nser2, x,u, v,w, C) :

The command

grad2d(Nser2, x, u, v, C) :

generates a C routine, grad2. c, for evaluating thegradient of (2) using the ideas given in Section 3.2.

5 EXAMPLES

In this section we use the C routines generated by ourMAPLE procedures for two tasks:

Task 1: compute transformation (1) for all coordinates.Task 2: compute transformation (1) and its Jacobianfor all coordinates.

For these two tasks we compare the operation countsof our C routines to corresponding counts for the codeof Zavarise et al.9 and the code of Zienkiewicz &Taylor.g Zavarise et al. have made an elegant routinefor calculating the coordinate transformation (l), and ifdesired its derivatives, for any 2D or 3D rectangularserendipity element. The code in Zienkiewicz & Taylor’sbook is for 2D rectangular elements with four to ninenodes. The nine node case is the quadratic Lagrangeelement and all other cases can be said to belong to the2D serendipity family. The elements for which wedetermine operations counts are given in Table 3.

5.1 Task 1

Figure 2 contains two fragments of MAPLE-generatedC code, both of which perform Task 1 for a 2D element.Arrays x and y contain the xi and ~i coordinates offx

Table 3. Test elements

Element number Description

1 8-node serendipity 2D rectangle2 12-node serendipity 2D rectangle3 9-node Lagrange 2D rectangle4 20-node serendipity 3D rectangle5 32-node serendipity 3D rectangle

244 F. Cameron

Fragment A (loopHomer) Ihgment B (nested Homer)mvcof(x, ax, n2x, &nlx, px); mvcof(x, ax, n2x, knlx, px);horn2(ax, nix, n2x, t, px, &xO); poly(ax, t, &xO);mvcof(y, ay, tiy, tily, py); mvcof(y, ay, n2y, tnly, py);horn2(ay, nly, n2y, t, py, &yO); poly(ay, t, &yO);

Fig. 2. Two fragments of Ccode for performing Task 1 fora2D element.

Fragment A(loopHomer) Fragment B(nested Homer)mvcof2(x, ax, n2x, knlx, px); mvcof2(x, ax, n2x, knlx, px);dhorn2(ax, nix, n2x, t, px, &xO,drO); poly(ax, t, &xo);mvcof2(y, ay, n2y, knly, py); grad2(ax, t, dxO);dhorn2(ay, nly, n2y, t, py, &yO,dyO); mvcof2(y, ay, n2y, ~ly, PY);

poly(ay, t, &yO);grad2(ay, t, dyO);

Fig. 3. Two fragments of Ccode for performing Task 2fora2D element

andfyof(l). The values of reference coordinates uandv are in array t. Both fragments of code computethe global coordinates in XOand yO. Fragment A usestheloop Homer routine horn2 described in Section 3.1.Routine poly in Fragment B uses a nested Homer’smethod described in Section 3.2.

Table 4 contains operations counts for three codealternatives forthetest elements ofTable 3. The numberof additions, multiplications and assignments neededis given by ~add, K~Uland K=, respectively. Comparingthe alternatives in Table 4, we see there is a price to bepaid for generality: the dershafn code is the mostgenerally applicable, but it also has the largest opera-tions counts. The most extreme case is element number1, where the operation count of dershafn is more than

twice that of the nested Homer code. As was expectedthe loop Homer code is not as efficient as the nestedHomer code, but it is still more efficient than dershafn.

5.2 Task 2

Figure 3 contains two fragments of C code that useMAPLE-generated routines to compute the coordinatetransformation (1) and its Jacobian matrix for a 2Delement. The first row of the Jacobian matrix is in arraydxO and the second row is in array dyO. All othervariables have the same interpretation as in Fig. 2.

The differences in operation counts for Task 2, areeven greater than those for Task 1 (see Table 5). Thecount of dershafn is as much as three times that of

Table 4. Operation counts for Task 1. The values in brackets correspondto (Kadd? %.1, K=)

Alternative Element number

1 2 3 4 5

1 (58, 40, 66) (140, 90, 104) (58, 46, 66) (369, 162, 243) (984, 363, 417)2 (76, 42, 106) (166, 92, 162) (78, 48, 110) (441, 174, 408) (1095, 378, 672)3 (140, 116, 148) (268, 220, 244) (432, 368, 428) (840, 704, 736)

description

1 Nested Homer: (mvcof2. c and poly. c) or (mvcof3. c and poly. c)2 Loop Homer: (mvcof2. c and horn2. c) or (mvcof3. c and horn3. c)3 dershafn9

Table 5. Operation counts for Task 2. The values in brackets correspondto (Kadd! A&.1, L)

Alternative Element number

1 2 3 4 5

1 (74, 60, 70) (168, 130, 108) (78, 70, 70) (456, 258, 288) (1134, 540, 483)2 (98, 64, 138) (198, 124, 206) (102, 72, 144) (552, 285, 570) (1263, 546, 909)3 (248, 224, 288) (524, 460, 516) (912, 884, 980) (1896, 1736, 1840)4 (loo, 88, 97) (130, 122, 128) –

Description

1 Nested Homer: (mvcof2. c and poly. c and grad2. c) or (mvcof3. c and poly. c and grad3. c)2 Loop Homer: (mvcof2. c and dhorn2. c) or (mvcof3. c and dhorn3. c)3 dershafn94 shape and shap2 (Ref. 8, pp. 539–541)

Automatic generation

the nested Homer code. The shape/shap2 code is alsomore efficient than dershafn. In comparison to thenested Homer code though, shape/shap2 needs z5°/0

more operations for element 1 and roughly 70°/0 morefor element 3.

REFERENCES

1.

2,

3.

4.

Kincaid, D. & Cheney, W. Numerical Analysis. Brooks/Cole, PacificGrove, CA 1991.Knuth, D. E. The Art of Computer Programming, Vol. 2Seminumerical Algorithms. Addison-Wesley, Reading,MA, 1971.Char, B. W., Geddes, K. O., Gonnet, G. H., Leong, B. L.,Monagan, M. B. & Watts, S. B. Maple V LanguageReference Manual. Springer, New York, 1991.Aladjem, M. A. & Mikhailov, M. D. A shape function

of ejic&mt FE routines 245

5.

6.

7.

8.

9.

codewriter. Communications in Applied NumericalMethods, 1988, 4, 807-14.Barbier, C., Clark, P. J., Bettess, P. & Bettess, J. A.Automatic generation of shape functions for finiteelementanalysisusing REDUCE. Engineering Computations, 1990,7, 349-58.Ioakimidis, N. I. Elementary applications of MATHE-MATICALto the solution of elasticity problems by thefinite element method. Computing Methods in AppliedMechanical Engineering, 1993, 102,29-40.Wang, P. S. FINGER: a symbolic system for automaticgeneration of numerical programs in finite elementanalysis. Journal of Symbolic Computation, 1986, 2,305-16.Zienkiewicz,O. C. & Taylor, R. L. The Finite ElementMethod, Vol. 1, 4th edn. McGraw-Hill, London, 1989.Zavarise,G., Vitaliani, R. & Schrefler,B. An algorithm forgeneration of shape functions in serendipity elements.Engineering Computations, 1991, 8, 19–31.

Automatic generation of efficient routines for evaluating multivariate polynomials arising in finite...

Documents

Transcript of Automatic generation of efficient routines for evaluating multivariate polynomials arising in finite...