Algorithm 867: QUADLOG—A Package of Routines for...

30
Algorithm 867: QUADLOG—A Package of Routines for Generating Gauss-Related Quadrature for Two Classes of Logarithmic Weight Functions NELSON H. F. BEEBE and JAMES S. BALL University of Utah A collection of subroutines and examples of their uses are described for the quadrature method developed in the companion article. These allow the exact evaluation (up to computer truncation and rounding errors) of integrals of polynomials with two general types of logarithmic weights, and also with the corresponding nonlogarithmic weights. The recurrence coefficients for the related nonclassical orthogonal polynomials with logarithmic weight functions can also be obtained. Tests of accuracy on various platforms are presented. The routines are usable from Fortran, C, and C++ programs conforming to any of at least six international programming-language standards. Categories and Subject Descriptors: G.1.2 [Numerical Analysis]: Approximation—Special func- tion approximations; G.1.4 [Numerical Analysis]: Quadrature and Numerical Differentiation (F.2.1)—Gaussian quadrature; G.4 [Mathematical Software]: Mathematical Software—Algo- rithm design and analysis, certification and testing, documentation, reliability and robustness, user interfaces General Terms: Algorithms Additional Key Words and Phrases: EISPACK pythag() function, gamma-function testing, Gauss- Chebyshev quadrature, Gauss-Jacobi quadrature, Gauss-Laguerre quadrature, Gauss-Legendre quadrature, Gauss-type quadrature, logarithmic integrals, machine-epsilon testing, Maple sym- bolic algebra system, Mehler quadrature, orthogonal polynomials, psi-function testing, software portability, software testing ACM Reference Format: Beebe, N. H. F. and Ball, J. S. 2007. Algorithm 867: QUADLOG—A package of routines for gener- ating Gauss-related quadrature for two classes of logarithmic weight functions. ACM Trans. Math. Authors’ addresses: N. H. F. Beebe, University of Utah, Department of Mathematics, 155 S 1400 E, Rm 233, Salt Lake City, UT 84112-0090; email: [email protected]; J. S. Ball, University of Utah, Department of Physics, Salt Lake City, UT 84112-0830. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. C 2007 ACM 0098-3500/2007/08-ART20 $5.00 DOI 10.1145/1268769.1268774 http://doi.acm.org/ 10.1145/1268769.1268774 ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Transcript of Algorithm 867: QUADLOG—A Package of Routines for...

Algorithm 867: QUADLOG—A Package ofRoutines for Generating Gauss-RelatedQuadrature for Two Classes of LogarithmicWeight Functions

NELSON H. F. BEEBE

and

JAMES S. BALL

University of Utah

A collection of subroutines and examples of their uses are described for the quadrature method

developed in the companion article. These allow the exact evaluation (up to computer truncation

and rounding errors) of integrals of polynomials with two general types of logarithmic weights, and

also with the corresponding nonlogarithmic weights. The recurrence coefficients for the related

nonclassical orthogonal polynomials with logarithmic weight functions can also be obtained. Tests

of accuracy on various platforms are presented.

The routines are usable from Fortran, C, and C++ programs conforming to any of at least six

international programming-language standards.

Categories and Subject Descriptors: G.1.2 [Numerical Analysis]: Approximation—Special func-tion approximations; G.1.4 [Numerical Analysis]: Quadrature and Numerical Differentiation

(F.2.1)—Gaussian quadrature; G.4 [Mathematical Software]: Mathematical Software—Algo-rithm design and analysis, certification and testing, documentation, reliability and robustness, userinterfaces

General Terms: Algorithms

Additional Key Words and Phrases: EISPACK pythag() function, gamma-function testing, Gauss-

Chebyshev quadrature, Gauss-Jacobi quadrature, Gauss-Laguerre quadrature, Gauss-Legendre

quadrature, Gauss-type quadrature, logarithmic integrals, machine-epsilon testing, Maple sym-

bolic algebra system, Mehler quadrature, orthogonal polynomials, psi-function testing, software

portability, software testing

ACM Reference Format:Beebe, N. H. F. and Ball, J. S. 2007. Algorithm 867: QUADLOG—A package of routines for gener-

ating Gauss-related quadrature for two classes of logarithmic weight functions. ACM Trans. Math.

Authors’ addresses: N. H. F. Beebe, University of Utah, Department of Mathematics, 155 S 1400

E, Rm 233, Salt Lake City, UT 84112-0090; email: [email protected]; J. S. Ball, University of

Utah, Department of Physics, Salt Lake City, UT 84112-0830.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is

granted without fee provided that copies are not made or distributed for profit or direct commercial

advantage and that copies show this notice on the first page or initial screen of a display along

with the full citation. Copyrights for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,

to redistribute to lists, or to use any component of this work in other works requires prior specific

permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn

Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]© 2007 ACM 0098-3500/2007/08-ART20 $5.00 DOI 10.1145/1268769.1268774 http://doi.acm.org/

10.1145/1268769.1268774

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

2 • N. H. F. Beebe and J. S. Ball

Softw. 33, 3, Article 20 (August 2007), 30 pages. DOI = 10.1145/1268769.1268774 http://doi.acm.

org/10.1145/1268769.1268774

1. INTRODUCTION

In the previous article Ball and Beebe [2007], high-accuracy quadrature meth-ods for evaluating integrals of the following forms∫ ∞

0

d x xαe−x ln x f (x) (α > −1)

and ∫ 1

−1

d x (1 − x)α(1 + x)β ln(1 + x) f (x) (α > −1, β > −1)

were derived, and explicit methods for obtaining the necessary coefficients wereobtained. Because the treatment of these integrals is somewhat different, theroutines necessary for each type of quadrature will be discussed separately.

2. IMPLEMENTATION AND PORTABILITY ISSUES

The algorithms required for the quadratures discussed later in Sections 4 and 5,and their testing, have been implemented in ANSI/ISO Standard Fortran 77[ANSI 1978] code, with two exceptions to the Standard discussed below.

The code is equally well compilable by compilers conforming to the Fortran 90[ISO and IEC 1991] and Fortran 95 [ISO 1997; Adams et al. 1997] Standards,and to High Performance Fortran [High Performance Fortran Forum 1992],although some compilers may require an additional option to specify that thecode is in fixed, rather than free, format.

All files are encoded with the 128-character ASCII/ISO 646 Standards char-acter set [ANSI 1986; ISO 1983, 1991]. This character set is a subset of allcurrent computer character sets, other than the IBM mainframe EBCDIC set.

Filenames conform to the least-common denominator of the IBM PC DOS filesystem, and the ISO 9660 standard for filenames on CD-ROMs [ISO 1988]. Theyshould be portable to all current commercially-available computing systemswithout change.

Variable and routine names are restricted to no more than six characters,with lettercase not significant. This satisfies restrictions of Fortran 77 and C 89[ISO 1990; Schildt et al. 1990], even though later Standards for these languagesare more liberal, even eliminating arbitrary limits on identifier lengths.

There are two exceptions to the Fortran 77 Standard in our Fortran code:

(1) Mixed lettercase is used for improved readability: Fortran keywords andparameter names are spelled in uppercase, and all other names, in low-ercase. Comments and character strings are written in mixed case, andinclude standard TEX [Knuth 1984] markup where mathematical materialis required.

(2) Shared symbolic constants are defined in PARAMETER statements and placedin separate files to avoid code duplication. Non-Standard, but now almost

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 3

universally-implemented, INCLUDE statements are used to incorporate theshared definitions into the various Fortran files at compile time. These filesare never required by user code.

We know of no current Fortran compilers that lack support for these exten-sions. The Fortran 90 and 95 Standards permit, but do not require support for,lowercase letters, and both require the INCLUDE statement.

All Fortran code has been prettyprinted, using tools developed by the firstauthor (NHFB). All variables and routines are explicitly declared, using dec-larations generated by the very useful Fordham University Fortran portabilitychecker and static analyzer, ftnchek [Moniot 1991]. Except for the test pro-grams, the package is entirely free of I/O statements, and there are no STOP orCALL EXIT statements to terminate processing prematurely.

No features unique to Fortran 90 or 95 have been used in the code, andno source code changes were required to use compilers for these languages,although some compilers required notification of the fixed source format.

In order to make the code acceptable to High Performance Fortran compilers,it was necessary to change argument array dimensions of the form A(N) andA(1) to A(*) in two low-level routines borrowed from EISPACK [Smith et al.1976; Garbow et al. 1977] and LINPACK [Dongarra et al. 1979]. No other codechanges were required to accommodate this extended dialect of Fortran. Wechose those two routines over more modern LAPACK [Anderson et al. 1992]equivalents, because the latter cascade into several other low-level routinesthat would also have to be included in our package.

We have not implemented the code in the popular C programming language[ISO 1990, 1999], for two reasons. First, many computing environments permitinterlanguage calling, so the Fortran code can be used, albeit with some porta-bility loss, from C and C++ programs. Second, the freely available Fortran-to-Cconverter, f2c [Gay et al. 1989], and at least one commercial product, the ex-cellent Cobalt Blue translator [Cobalt Blue, Inc. 1988], could be used to auto-matically generate C equivalents of all of the code.

However, for user convenience, a header file, gjl.h, is provided for use fromprograms written in Standard C 89 [ISO 1990; Schildt et al. 1990], Stan-dard C 99 [ISO 1999], or Standard C++ 98 [ISO 1998]. That header file spec-ifies mappings from C/C++-style compiled names and datatypes to Fortranequivalents, and defines suitable function prototypes for the public subrou-tines and functions. C and C++ programmers should therefore be able to usethe routine names as defined in this article, provided that they rememberthat Fortran arguments are always passed by reference, so that scalar argu-ments will require a leading ampersand to pass the address, instead of thevalue.

The header file should properly handle the mapping of three commoncompiled-name conventions: Fortran routine foo may be known as FOO, foo,or foo in various C implementations. The user may also predefine macros forthree data types, fortran double precision, fortran quadruple precision,and fortran integer, on systems where the default C data types double, longdouble, and int are unsuitable.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

4 • N. H. F. Beebe and J. S. Ball

Table I. Test Systems and Compilers

Compilers are native, except for the Free Software Foundation’s GNU compilers (gcc, g77), the

Lahey/Fujitsu compiler (lf95), the Lucent Technologies’ Bell Laboratories’ Fortran-to-C

translator (f2c), the Portland Group, Inc.’s compilers (prefixed pg), the NAG, Inc.’s compilers

(prefixed nag), and the N. A. Software compiler on GNU/Linux IA-32 (f95).

Vendor and Model CPU Operating System Compilers

Apple Power Macintosh PowerPC Rhapsody 5.5 f2c + gcc

Apple Power Macintosh PowerPC GNU/Linux

2.2.15pre7

f77

Compaq/DEC Alpha

4100-5/266

Alpha 21164 OSF/1 4.0F f77, f90, f95, g77, nagf95

HP 9000/735 PA-RISC 1.1 HP-UX 10.01 f77, fort77

IBM RS/6000 43P PowerPC AIX 4.2 f77, g77, xlf, xlf90

IBM RS/6000 44P-270 Power3-II AIX 4.3 f77, g77, xlf, xlf90,

xlf95

Intel IA-32 Pentium III GNU/Linux

2.2.12-20smp

(Redhat 6.1)

f77, f95, g77, lf95,

nagf95, pgf77, pgf90,

pghpf

NeXT Turbostation Motorola 68020 Mach 3.3 f2c + gcc

SGI Indigo/2 MIPS R4000 IRIX 5.3 f77, g77

SGI Origin 200 MIPS R10000 IRIX 6.5 f77, f90, fort77, g77,

nagf95

Sun SPARC 10/512 SuperSPARC GNU/Linux

2.2.12-42smp

(Redhat 6.1)

f77, g77

Sun SPARC 20/512 SuperSPARC Solaris 2.6 f77, g77, f90, nagf95

Sun Enterprise 5500 UltraSPARC Solaris 2.7 f77, g77, f90, nagf95

To check this claim of usability from C and C++, we manually translatedone of the test programs to clean Standard C, as program cglfd1, and compiledand ran it with 55 C and C++ compilers on the test systems listed in Table I.We have also checked the translation with LCLint [Evans 1998; Santo Orcero2000], an excellent, and finicky, C code checker. The results were comparable tousing a Fortran test program, and the mixed-language builds were automatic,once the necessary underlying configuration support had been supplied.

Notable among the entries of Table I is the Fortran-to-C translator, f2c, cou-pled with GNU gcc on Apple Rhapsody and NeXT Mach, which do not havea vendor-provided Fortran compiler. The interface between the two is hiddeninside a script called f77, which behaves like conventional UNIX Fortran com-pilers. The translation to C proved flawless, and the resulting C code couldhave been trapped and used in isolation, as long as the associated f2c runtimelibrary were available.

We have not been able to find a reliable Fortran-to-Java translator: althoughthere are at least two such projects [Doolin et al. 1999; Zheng et al. 1998;Seymour and Dongarra 2003], neither is yet robust enough, or complete enough,for routine use. We have not been willing to do the translation manually, sothe Java language remains unsupported by this package, although, with somesystem dependence, and loss of cross-platform byte-code portability, the stan-dard Java Native Interface [Gordon and McClellan 1998; Liang 1999] couldbe used to call our package from Java programs. See Casanova et al. [1997]

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 5

for further details on the issues of communication between Java and Fortranlibraries.

Together with f2c, a C-to-Java translator should be workable, but we havenot yet found a satisfactory one.

3. PACKAGING ISSUES

For packaging and testing convenience, the software has been separated intomultiple file directories. However, all filenames of Fortran code, and all Fortranroutine names, are unique, so that the compiled code from all of the routinescan be placed in a single object library.

In order to build the quadrature code from the software distribution archivefile, the companion distribution for the gamma and psi functions [Beebe andBall, To appear] must also be unpacked into the same location. In a directorycontaining just the archive files, suitable commands on most Unix systems are:

gunzip < gampsi.tar.gz | tar xf -gunzip < quadlog.tar.gz | tar xf -

or more compactly with GNU utilities:

tar xfz gampsi.tar.gztar xfz quadlog.tar.gz

These unbundle the software under a subdirectory named dist.Building and testing of the programs is controlled by UNIX Makefiles, writ-

ten to conform to the Free Software Foundation’s GNU Project conventions[Free Software Foundation 1998]. In particular, the package can be built, tested,and installed on almost any UNIX-like or POSIX-conformant system using theconventional GNUware incantation

./configure && make all check install

To build with a specific compiler, and optimization options, the user can dosomething like this (such as for an IBM RS/6000 AIX 4.3 system):

env F77=xlf95 ./configure && \make all check install FOPT=’-O4 -qfixed -qarch=com’

A top-level invocation of make will automatically step into each subdirectory,executing the requested targets in each of them, with the same settings of makevariables. This procedure is recursive, so that a one-line command at top levelcan build an arbitrarily large, complex, and deeply nested, software package.

A subsequent make clean removes intermediate files, and make distcleanremoves everything that make built, reducing each directory to its original dis-tribution state, in preparation for a build on a new system, or archiving.

A make uninstall can be used to back out of an installation. It removes thefile(s) installed by make install, but otherwise, leaves the build directoriesintact.

All of the routines described later are documented in UNIX manual pages inthe package. These are written using standard nroff/troff markup in -man

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

6 • N. H. F. Beebe and J. S. Ball

format, and translations to plain ASCII text, validated HTML, PDF1, andPOSTSCRIPT

2 are included.Although the configure scripts are large (about 1600–3800 lines of Bourne

shell code), they are generated automatically from corresponding shortconfigure.in files by the GNU autoconf program [MacKenzie 1992; Vaughanet al. 2000]. These files are very simple ones, since there are few system de-pendencies to handle, apart from the directory containing a C version of one ofthe test programs, where additional configuration work is need to handle themessy job of interlanguage calling and linking on the test systems.

The configure scripts are distributed as part of the package, so that usersites do not require prior installation of the autoconf system.

Each Makefile is similarly generated by configure from a correspondingfile, Makefile.in, of similar size. The installation location can be specified atconfigure time like this:

./configure --prefix=/home/jones/local

or at make time like this:

make prefix=/home/jones/local install

The default prefix is the GNU standard /usr/local.For make install to succeed, the user must have write access for directories

under the prefix directory. The --prefix option allows unprivileged users tomake private installations. System managers will normally install software inmore widely accessible standard places.

After a successful make install, the package can be linked with user pro-grams by using the compiler options -L/usr/local/lib -lgjl on UNIX andPOSIX-conformant systems, assuming the default installation location.

On systems where additional load library directories can be made known tothe linker via a system configuration file (e.g., in /etc/ld.so.conf on GNU/Linux systems), only the -lgjl option may be needed.

4. LOG QUADRATURE BASED ON GENERALIZEDGAUSS-LAGUERRE QUADRATURE

The results of the previous article by Ball and Beebe [2007] for this quadraturecan be summarized by the following formulas:

∫ ∞

0

dx xαe−x ln x f (x) ≈N∑

i=1

[dWi(α) f (xi(α)) + δxi(α) f ′(xi(α))] (1)

≈N∑

i=1

[Wi(α)(xi(α) − 1) f (xi(α)) − Zi(α) f ( yi(α))] (2)

∫ ∞

0

dx xαe−x f (x) ≈N∑

i=1

Wi(α) f (xi(α)). (3)

1PDF (PORTABLE DOCUMENT FORMAT) is a registered trademark of Adobe Systems, Inc.2POSTSCRIPT is a registered trademark of Adobe Systems, Inc..

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 7

These are all subject to the condition α > −1. Each of the sums should giveexact results if f (x) is a polynomial of order 2N − 1 or less.

xi(α) and Wi(α) are the nodes and weights of the generalized Gauss-Laguerrequadrature, yi(α) and Zi(α) are the nodes and weights of the quadrature basedon the new polynomials, and δxi(α) and dWi(α) are the necessary coefficientsfor the use of the first sum involving the derivative of f (x).

Notice that Equation (1) has N nodes common to both terms, while Equa-tion (2) has 2N nodes, with separate sets of N for each of the two terms. Thenonlogarithmic case, Equation (3), requires only N nodes.

In the following descriptions, all arguments, and external functions, begin-ning with the letters [i-n] are of Fortran type INTEGER, and those beginningwith the letters [a-ho-z] are of Fortran type DOUBLE PRECISION. With the soleexception of Cray vector supercomputers, on all currently-marketed comput-ing systems, this corresponds to a 64-bit word offering about sixteen figures ofprecision.

The routine glqfd(x, w, deltaw, deltax, alpha, nquad, ierr) producesthe quantities needed for the first form of the quadrature, with functionand derivative values. The cryptic routine name stands for Gauss-Laguerrequadrature with functions and derivatives. Other routine names below are cho-sen similarly, as abbreviations of obvious descriptive phrases that are recordedin initial comment statements in each file.

On entry to glqfd(), the caller must have defined:

nquad Number of nodes for the quadrature.

alpha Value of α for the Gauss-Laguerre quadrature, α > −1.

On return from glqfd(), the results are:

x(*), w(*) Arrays of dimension at least nquad: x(*) contains the nodes, andw(*) the weights, of the nonlogarithmic Gauss-Laguerre quadra-ture, Equation (3).

deltaw(*), deltax(*)Arrays of dimension at least nquad: deltaw(*) and deltax(*) con-tain the weights needed in Equation (1).

ierr Scalar status code which should always be checked by the caller:0 success,1 eigensolution could not be obtained,2 destructive overflow,3 nquad out of range,4 alpha out of range.

The output arrays are indeterminate when ierr is nonzero.

The error case ierr = 3 can occur if a quadrature of exceptionally high de-gree is called for. Because Fortran 77 lacks dynamic storage allocation, andbecause we did not wish to burden the user interface with several additionalarguments for working storage, we have instead used statically allocated inter-nal arrays. The array sizes are determined by an included PARAMETER definitionfor MAXPTS, set to at least 1024 in the distributed package, and nquad may not

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

8 • N. H. F. Beebe and J. S. Ball

then exceed MAXPTS. Numerical stability, and limitations of floating-point range,are likely to be much more limiting in practice than our choice of MAXPTS.

As is the case with generalized Gauss-Laguerre quadrature, machine un-derflows will occur for large values of nquad. Exactly how large is machineand precision dependent. This problem can be fixed (by the user) by standardrescaling of weights and the definition of f (x).

While it is not necessary for performing the quadrature, the nodes producedby this and subsequent routines are arranged in order of increasing size (usefulfor producing tables, for example), as is conventional in quadrature routines.

In all following code examples, we assume that this small fragment has beenprovided:

DOUBLE PRECISION ZEROPARAMETER (ZERO = 0.0d+00)

The returned nodes and weights are used like this to compute the quadra-ture in Equation (1), assuming that f(x) is a function for evaluating f (x), andfprime(x) one for f ′(x):

sum = ZERODO 10 i = 1,nquad

sum = sum + deltaw(i) * f(x(i)) + deltax(i) * fprime(x(i))10 CONTINUE

A slower, but usually more accurate, version of this code uses a convenientvector-sum primitive, dvsum(), in this package. The default version usesKahan’s compensated sum, but several other implementations are possible.

DOUBLE PRECISION dvsumDOUBLE PRECISION temp(2,MAXPTS)...DO 10 i = 1,nquad

temp(1,i) = deltaw(i) * f(x(i))temp(2,i) = deltax(i) * fprime(x(i))

10 CONTINUEsum = dvsum(temp, 2*nquad)

We use dvsum() scores of times in this package to reduce summation accuracyloss. For brevity, we henceforth show only the simpler code.

Although vector summation is not part of the Basic Linear Algebra Subrou-tines (BLAS) [Lawson et al. 1979b, 1979a; Dongarra et al. 1988c, 1988a, 1990b,1990a] infrastructure upon which modern numerical software is often based,good algorithms for accurate summation are now available [Higham 1996,Chapter 4], [Espelid 1995; Anderson 1999; Demmel and Hida 2003; Nievergelt2003; McNamee 2004], and our use of dvsum() is a desirable step toward en-hanced accuracy.

Some architectures offer a multiply-add instruction that first computes anexact double-length product, followed by an add with a single rounding (IBMPower [Apple Computer, Inc. 1995; IBM Corporation 1994; Weiss and Smith1994], IBM S/390 G5 [Abbott et al. 1999; Schwarz and Krygowski 1999; Slegel

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 9

et al. 1999], Intel IA-64 [Intel Corporation 2000, p. 12-6], SGI MIPS IV (R5000,R8000, R10000, and R12000) [Kane and Heinrich 1992; Yeager 1996], andSPARC64-GP [Hal Computer Systems, Inc. 2000]), or with two roundings (HPPA-RISC [Kane 1996]). The recent 1999 C Standard [ISO 1999] provides a stan-dard library routine, fma(x,y,z), for this operation, with a single rounding.When all terms are known to be of the same sign, and the compiler generatessuch instructions, then the first form will be more accurate, and significantlyfaster, than the second form using dvsum(). However, when the summation in-cludes terms of unlike signs, accuracy loss from cancellation of common leadingbits is likely to be a larger source of error than rounding.

Double-length sum and product accumulation would be even better, but apartfrom the IBM S/390 mainframe architecture, no vendor implements it in hard-ware for double-precision operands, though several do provide it in software.

If the evaluation of the derivative is difficult or not possible, the second formof the quadrature must be used. This is implemented by the routine glqf(x,w, wxm1, y, z, alpha, nquad, ierr).

On entry to glqf(), the input arguments are as given above for glqfd().On return from glqf(), the results are:

x(*), w(*) Arrays of dimension at least nquad: x(*) contains the nodes, andw(*) the weights, of the nonlogarithmic Gauss-Laguerre quadra-ture, Equation (3).

y(*), z(*) Arrays of dimension at least nquad: y(*) contains the nodes, andz(*) the weights, needed for the second term in Equation (2).

wxm1(*) Array of dimension at least nquad, containing the scaled weightsw(i) * (x(i) - 1), denoted Wi(α)(xi(α) − 1) in Equation (2).

ierr Scalar status code as for glqfd(). The output arrays are indetermi-nate when ierr is nonzero.

The quadrature for Equation (2) can then be computed by

sum = ZERODO 10 i = 1,nquad

sum = sum + wxm1(i)*f(x(i)) - z(i)*f(y(i))10 CONTINUE

The quadrature for the nonlogarithmic case, Equation (3), can be computedby

sum = ZERODO 10 i = 1,nquad

sum = sum + w(i)*f(x(i))10 CONTINUE

Either glqf() or glqfd() can be used to compute the nodes and weights for thenonlogarithmic case.

The underlying routine glqrc(a, b, s, t, alpha, nquad, ierr) computesthe recursion coefficients and zeroth and first moments of the monic polynomials

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

10 • N. H. F. Beebe and J. S. Ball

corresponding to the positive weight function

w(x, α) = (x − 1 − ln(x))e−x xα.

On entry to glqrc(), the input arguments are as given above for glqfd().On return from glqrc(), the results are:

a(0..nquad) Recursion coefficients: a(n) = aαn .

b(0..nquad) Recursion coefficients: b(n) = bαn.

s(0..nquad) First moments: s(n) = sαn .

t(0..nquad) Zeroth moments: t(n) = tαn .

ierr Scalar status code as for glqfd() and glqf(). The output arraysare indeterminate when ierr is nonzero.

where the right-hand side quantities are defined in Ball and Beebe [2007], andthe array indexes start at zero.

Completely analogous quadruple-precision routines qglqf(), qglqfd(),qglqrc(), and qvsum() are also available when that precision is supported bythe local Fortran system.

5. LOG QUADRATURE BASED ON GAUSS-JACOBI QUADRATURE

The results of the previous paper for this quadrature can be summarized bythe following formulas [Ball and Beebe 2007]:∫ 1

−1

dx (1 − x)α(1 + x)β ln(1 + x

2) f (x)

≈N∑

i=1

[(d Wi(α, β) − ln(2)Wi(α, β)) f (xi(α, β)) + δxi(α, β) f ′(xi(α, β))]. (4)

= −N∑

i=1

Zi(α, β) f ( yi(α, β)), (5)

∫ 1

−1

dx (1 − x)α(1 + x)β ln(1 + x) f (x)

=N∑

i=1

[d Wi(α, β) f (xi(α, β)) + δxi(α, β) f ′(xi(α, β))] (6)

=N∑

i=1

[(ln 2)Wi(α, β) f (xi(α, β)) − Zi(α, β) f ( yi(α, β))], (7)

∫ 1

−1

dx (1 − x)α(1 + x)β f (x) =N∑

i=1

Wi(α, β) f (xi(α, β)). (8)

These are all subject to the conditions (α > −1, β > −1). These expressionsshould give exact results if f (x) is a polynomial of order 2N − 1 or less.

xi(α, β) are the nodes, and Wi(α, β) the weights, of the ordinary Gauss-Jacobiquadrature, Equation (8). δxi(α, β) and dWi(α, β) are the weights of the Gauss-Jacobi log quadrature, Equation (6), involving the derivative of f (x). yi(α, β)

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 11

are the nodes, and Zi(α, β) the weights, of the second term in the derivative-freeGauss-Jacobi log quadrature, Equation (7).

Notice that, as in Equation (1), two of the log quadratures have N nodes com-mon to both terms, while the derivative-free one has 2N nodes, with separatesets of N for each of the two terms.

The subroutine gjqfd(x, w, deltaw, deltax, alpha, beta, nquad,ierr) produces the quantities needed for the derivative forms Equation (4) andEquation (6).

On entry to gjqfd(), the caller must have defined:

nquad Number of nodes for the quadrature.

alpha, beta Values of α, β for the Gauss-Jacobi quadrature, where α > −1,β > −1.

On return from gjqfd(), the results are:

x(*), w(*) Arrays of dimension at least nquad: x(*) contains the nodes, andw(*), the weights, for the nonlogarithmic Gauss-Jacobi quadrature,Equation (8).

deltaw(*), deltax(*)Arrays of dimension at least nquad: deltaw(*) and deltax(*) con-tain the weights needed in Equation (6).

ierr Scalar status code which should always be checked by the caller:0 success,1 eigensolution could not be obtained,2 destructive overflow,3 nquad out of range,4 alpha out of range,5 beta out of range.

The output arrays are indeterminate when ierr is nonzero.

The quadrature for Equation (6) can then be computed by

sum = ZERODO 10 i = 1,nquad

sum = sum + deltaw(i)*f(x(i)) + deltax(i)*w(i)*fprime(x(i))10 CONTINUE

If the evaluation of the derivative is difficult or not possible, the second formof the quadrature, Equation (7), must be used. This is implemented by theroutine gjqf(x, w, y, z, alpha, beta, nquad, ierr).

On entry to gjqf(), the input arguments are as given above for gjqfd().On return from gjqf(), the results are:

x(*), w(*) Arrays of dimension at least nquad: x(*) contains the nodes, andw(*) contains the weights, of the nonlogarithmic Gauss-Jacobiquadrature, Equation (8).

y(*), z(*) Arrays of dimension at least nquad: y(*) contains the nodes, andz(*) the weights, in the second term of Equation (7).

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

12 • N. H. F. Beebe and J. S. Ball

ierr Scalar status code as for gjqfd(). The output arrays are indetermi-nate when ierr is nonzero.

The quadrature for Equation (7) can then be computed by

sum = ZEROdlgtwo = dlog(2.0d+00)DO 10 i = 1,nquad

sum = sum + dlgtwo*w(i)*f(x(i)) - z(i)*f(y(i))10 CONTINUE

The quadrature for the nonlogarithmic case, Equation (8), can be computedby

sum = ZERODO 10 i = 1,nquad

sum = sum + w(i)*f(x(i))10 CONTINUE

Either gjqf() or gjqfd() can be used to compute the nodes and weights for thenonlogarithmic case.

The underlying routine gjqrc(a, b, s, t, alpha, beta, nquad, ierr)computes the recursion coefficients and zeroth and first moments of the monicpolynomials corresponding to the positive weight function

w(x, α, β) = (1 − x)α(1 + x)β(− ln((1 + x)/2)).

On entry to gjqrc(), the input arguments are as given above for gjqfd().On return from gjqrc(), the results are:

a(0..nquad) Recursion coefficients: a(n) = aαn .

b(0..nquad) Recursion coefficients: b(n) = bαn.

s(0..nquad) First moments: s(n) = sαn .

t(0..nquad) Zeroth moments: t(n) = tαn .

ierr Scalar status code as for gjqfd() and gjqf(). The output arraysare indeterminate when ierr is nonzero.

where the right-hand side quantities are defined in Ball and Beebe [2007], andthe array indexes start at zero.

Completely analogous quadruple-precision routines qgjqf(), qgjqfd(),qgjqrc(), and qvsum() are also available when that precision is supported bythe local Fortran system.

6. OVERVIEW OF TESTING AND VALIDATION

Once automatic configuration has been carried out as in Section 3, the softwareis ready for testing. When quadruple-precision support is available and desired,make all-qp will compile and build the required library support.

For a simple verification that the code is working as intended, make demoruns demonstration programs whose output can be manually comparedwith standard published sources cited in that output. Target demo-qp runs

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 13

quadruple-precision versions of those programs. The output includes reportsof the recursion coefficients, moments, nodes, and weights, and computationof suitable sets of test integrals that can be evaluated analytically, so thatrelative errors can be determined.

Manual checking of output is tedious, however, so automated testing is desir-able. The check (and check-qp when quadruple-precision support is available)targets in the Makefiles should always be exercised after building the software.These targets run an extensive validation suite for the programs in the currentdirectory, and similar suites in all nested subdirectories.

Testing is essential, even for software that has been in long use withoutchanges. Modern compilers are large3 and complex, and are certain to have bugsthat might be exposed by user code. Changes in operating systems releases, run-time library versions, compilers, compilation options, and host platforms can allpotentially break previously working code. In the case of the first two of these,an unchanged previously working binary executable may even be found to fail.

During development, and immediately prior to submission of this arti-cle, the code has been built and validated on at least the systems shown inTable I, with the indicated native and third-party Fortran compilers: morethan forty of them. Where available, compiler options that diagnose portabil-ity problems were used. We expect that the code should work without sourcecode changes in almost any environment with a Standard-conforming Fortrancompiler.

Testing of code that uses floating-point arithmetic is considerably moredifficult than for code that uses integer instructions, because finite-precisionfloating-point arithmetic on computers is both inexact, and not associative. Onone important historic supercomputer, such arithmetic was not even commu-tative!

While the IEEE 754 Standard [IEEE 1985; IEC 1989] for binary floating-point arithmetic has helped enormously to rectify floating-point hardware de-sign deficiencies, and to make numerical computer programs more predictable,there are still differences that arise because the evaluation order in Fortranexpressions is not completely specified by the Standards, and because somearchitectures do intermediate computations in higher precision. Thus, evenchanging the compiler optimization level on a single system can produce differ-ent results.

Also, some floating-point architectures do not provide the guarantees re-quired by the IEEE 754 Standard of correct rounding of the last bit in all prim-itive instructions. For example, Story and Tang [1999] write about the IntelIA-64 and Intel IA-32 (formerly, x86) architectures:

While correctly rounded implementations are ideal, they areunattainable at present within practical speed and resource limits.. . . Therefore, a worst-case error below 0.6 ulps4 is an improvementwhen compared to that of 1 ulp in the Pentium TM generation.

3Recent GNU C compiler releases exceed 1,500,000 lines of code at version 3.3.2, and the corre-

sponding C runtime library has more than 866,000 lines of code at version 2.2.4ulp means unit in the last place.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

14 • N. H. F. Beebe and J. S. Ball

Many current architectures implement part of the Standard in software, andthus, changes in software releases could potentially introduce changes in thefloating-point arithmetic.

The traditional approach used in the validation suites of most GNUware—test input and correct test output that is automatically compared with newly-generated test output by a file difference utility, such as UNIX diff—breaksdown for numerical code. Bit-for-bit identical floating-point output across mul-tiple platforms is simply infeasible, unless all such arithmetic is done entirelyin the same software. Apple’s Standard Apple Numeric Environment (SANE)is one example of this, though it encompasses only a single vendor’s productline, and for reasons of speed, is usually replaced by hardware arithmetic.

The test output in the validation suites in this package is several tens ofthousands of lines of numerical data: application of diff could produce up tofour times that many lines, and that is infeasible for a human software installerto examine and validate.

Consequently, the first author (NHFB) developed a special-purpose numericfile differencing utility, ndiff [Beebe, To appear], that can be used like UNIXdiff, but which provides user control over what constitutes a ‘difference’ be-tween two numbers. This software is freely usable and distributable under theGNU General Public License, and has undergone extensive portability testing.Of course, it also has its own validation suite, using itself for the output file com-parisons. ndiff has been written for substantial generality and applicability:in particular, it can be compiled to use normal double-precision floating-pointarithmetic, extended-precision arithmetic, or software multiple-precision arith-metic. This means that it could be used, for example, to compare 100,000-digitnumerical values produced by different symbolic-algebra packages.

Code in the distributed Makefiles will automatically select ndiff if it isfound in the program search path, and will otherwise fall back to using diff.Installers of this package are therefore strongly urged to fetch and install ndifffirst!

ndiff offers a particularly convenient feature: while reporting lines thatdiffer by more than the user-specified tolerances, it also tracks the maximumrelative and absolute errors in lines that “match,” because their numeric dif-ferences are smaller than the tolerances. On completion, it prints a report ofthose two maximum errors, and their locations. Thus, by setting the toleranceshigh, the only output is that two-line maximum-error report.

7. TEST-COVERAGE ANALYSIS

As part of the numerical testing of our package, we also carried out a test-coverage analysis, using support of the Sun Solaris C and Fortran compilers,and the associated tcov utility. This instruments the code such that successiveruns accumulate the number of times each statement is executed. The sourcelistings produced by tcov prefix each program line with its execution count, andthe listing ends with a report of how much of the code was actually executed.

By addition of suitable test data (as the last data file in each of the listsof test files in the validation suite), we were able to ensure that almost all of

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 15

the code, including the test program code, has been executed at least once. Theonly cases where we could not provide test data that would not compromiseportability of the test suite, while forcing a particular code block to be executed,were the exceptional cases of:

—nonconvergence of the eigenvalue solver;

—destructive overflow in the quadrature determination;

—a few test-program output statements that are executed only when unexpect-edly large relative errors are detected;

—special arguments of the �(x) and ψ(x) functions (see the following section)that precipitate overflow, but are not otherwise generated by the quadratureroutines or the test programs.

Certainly, every statement that includes floating-point arithmetic has been ex-ercised at least once by the test data.

8. TESTING THE �(x) AND ψ(x) PRIMITIVES

The quadrature routines require two low-level primitives to compute the func-tions �(x) and ψ(x) = d ln �(x)/dx = �′(x)/�(x) [Abramowitz and Stegun 1964,p. 258, 6.3.1]. ψ(x) is also called the digamma function. These functions areimplemented in the Fortran functions dgamma(x) and dpsi(x). The accuracy ofthese primitives limits the accuracy obtainable for the quadrature sums.

None of the four ANSI/ISO Fortran language Standards includes these func-tions. However, many vendor Fortran, C, and C++ implementations offer either�(x) or ln �(x), sometimes also in longer precision (long double in C), but noneprovides the ψ(x) function.

We include routines for both �(x) and ψ(x) in our package, in order to elimi-nate dependence on the quality, and availability, of vendor implementations.

The �(x) function routine that we chose was extracted from the Netlib repos-itory for ACM Algorithm 715 [Cody 1993]. Its internal constants have beenadjusted for IEEE 754 double-precision arithmetic.

In order to assess its accuracy across various systems, we prepared a test filecontaining 8000 pseudo-random integers in the range 0 . . 231−1 logarithmicallydistributed over the range of x for which �(x) is representable, accompanied bya suitable integer power of two. Thus, the argument x = n2p can be recon-structed exactly on input, without the loss of accuracy attributed to deficientimplementations of decimal-to-binary conversion. Those values were then usedto create a simple program in the MAPLE V5.15 symbolic algebra language tocompute accurate values of �(x) to 50 decimal digits. The output of that pro-gram serves as the “exact” output against which any implementation of �(x)can be compared.

For our quadrature methods, the code does not require negative argumentsfor �(x), so for this paper, we have not tested implementations with negativex. Neither did we carry out tests with IEEE 754 denormalized numbers, NaN,or Infinity, or arguments that might provoke returned values of those forms.

5MAPLE and MAPLE V are registered trademarks of Waterloo Maple Inc.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

16 • N. H. F. Beebe and J. S. Ball

For more extensive test procedures for these functions, see Cody [1991]. Moredetailed tests are provided in a companion article [Beebe and Ball, To appear]about high-precision computation of �(x) and ψ(x).

We also ran our test programs using a venerable �(x) function implementa-tion from L. W. Fullerton’s FNLIB library [Fullerton 1978].

In any program to evaluate a function of a single argument, it is alwaysworthwhile to recall the relationship [Cody 1991, p. 49] between relative error(ε) in the argument and the function:

ε( f ) = xf ′(x)

f (x)ε(x).

For the �(x) function, this can be written as

ε(�) = xψ(x)ε(x).

The factor xψ(x) increases uniformly for positive x → ∞, so we need onlyconsider its behavior for large x. The largest argument for which �(x) is rep-resentable in IEEE 754 double-precision arithmetic is about x ≈ 171.6243 . . . ,and ψ(171.6243 . . . ) ≈ 5.142392 . . . . Thus, an error in x is magnified by a factorof the product of these numbers, 882.5598 . . . ; near the overflow limit, a 1-biterror in x corresponds to a 10-bit error in �(x)!

ψ(x) is representable for all representable x: at the limit x ≈ 1.797. . . e+308,the magnification factor is xψ ′(x)/ψ(x) ≈ 1.408e-3. Thus, for x near the overflowlimit, ψ(x) is very insensitive to errors in x. Near the underflow limit for normal-ized numbers, x = 2−1022 ≈ 2.22e-308, the magnification factor is about −1.00.A plot of the magnification factor over the range of double-precision floating-point values shows that it remains less than ten in absolute value, except inthe interval (1.3201, 1.6128), bracketing the zero of ψ(x) at x ≈ 1.4616 . . . , withasymptotes to ±∞ at that zero. Thus, computation of ψ(x) away from that singlezero experiences negligible error magnification.

Table II collects the accuracy-test results for �(x). Several points are to benoted from this table:

—The Cody �(x), which we use, has a smaller (by 8%) maximum relative errorthan the older Fullerton one.

—Use of an extended-precision implementation (labeled “extended native”) pro-duces essentially exact double-precision results, adding two to three decimalplaces compared to other implementations.

—Suppression of the generation of fused multiply/add instructions on the IBMRS/6000 systems increases the maximum relative error by a factor of 1.61,demonstrating the beneficial effect of such instructions.

—The Intel x86 80-bit precision for intermediate results reduces the error bya factor 1.65. Notice how close this is to the improvement factor from IBMfused multiply/add instructions.

—There are no significant differences among Fortran 77, 90, and 95 compilers.Because the SGI and Sun f90 compilers are derived from Cray’s, rather thanfrom the vendors’ earlier f77 compilers, we had expected differences.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 17

Table II. Accuracy of �(x) Implementations on Various Systems

For brevity, the CPU and O/S columns of Table I are omitted. ‘Extended native’ means use of

an interface to the C library routine long double lgammal(). That data type has 128 bits,

except on Intel IA-32, where it has 80 bits. Compiler names are made uniform, although HP

and IBM use different names.

Max. rel. Error

Vendor and Model �(x) implementation f.p. ulps

Apple Power Macintosh Cody f77 8.00e-14 360

gsl f77 3.43e-15 15

Compaq/DEC Alpha 4100-5/266 Cody f77 1.32e-13 594

Fullerton f77 1.43e-13 644

gsl f77, f90, f95 3.43e-15 15

native f77, f90, f95 1.17e-13 526

HP 9000/735 Cody f77 1.32e-13 594

Fullerton f77 1.43e-13 644

gsl f77 3.35e-15 15

native f77 1.43e-13 644

IBM RS/6000 44P-270 Cody f77 8.20e-14 369

Cody f77 (no multiply/add) 1.32e-13 594

gsl f77, f90, f95 3.43e-15 15

native f77, f90, f95 1.12e-13 504

extended native f77, f90, f95 1.08e-16 0

Intel x86 Cody f77 8.00e-14 360

gsl f77 1.76e-15 7

native f77 8.00e-14 360

SGI Indigo/2 Cody f77 1.32e-13 594

Fullerton f77 1.43e-13 644

gsl f77 3.41e-15 15

SGI Origin 200 Cody f77, f90 1.33e-13 598

Fullerton f77 1.43e-13 644

gsl f77 2.58e-15 11

gsl f90 2.59e-15 11

native f77, f90 1.43e-13 644

extended native f77, f90 1.10e-16 0

Sun SPARC 10/512 Cody f77 1.32e-13 594

gsl f77 3.43e-15 15

native f77 9.08e-14 408

Sun SPARC 20/512 Cody f77 1.32e-13 594

Fullerton f77 1.43e-13 644

gsl f77, f90 3.43e-15 15

native f77 9.96e-14 448

extended native f77 1.08e-16 0

Sun Enterprise 5500 Cody f77 1.32e-13 594

Fullerton f77 1.43e-13 644

gsl f77, f90 3.43e-15 15

native f77, f90 9.96e-14 448

extended native f77, f90 1.08e-16 0

—Only on the Compaq/DEC and Sun systems is the vendor version more accu-rate than the Cody one.

—Sun’s version has a 30% smaller relative error than those from other vendorswith 64-bit registers.

—Sun generously made its high-quality math library, fdlibm, freely available[Sun Microsystems, Inc. 1995], and used it in the Java language runtime

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

18 • N. H. F. Beebe and J. S. Ball

library. In 1996, the Free Software Foundation incorporated it in their C run-time library, glibc, thereby making it available on millions of additional sys-tems running GNU/Linux and various BSD UNIX variants. Unfortunately,glibc is not yet supported on most of the architectures that the GNU compil-ers run on [Free Software Foundation 1996]. IBM OS/390 also uses fdlibmfor its C/C++ run-time library [Abbott et al. 1999, p. 753].

—The GNU Scientific Library (gsl) [Galassi et al. 1999] version is uniformlymore accurate than all but the extended-precision versions.

We conclude that our choice of Cody’s �(x) implementation is certainly jus-tified at present, although when extended-precision versions, or gsl, are avail-able, we must recommend their code over Cody’s. gsl is a large and complexlibrary,6 and code interdependencies preclude easy extraction of just the �(x)and ψ(x) code from it. However, gsl can be built and installed using the one-line GNUware incantation on most UNIX and POSIX-conformant systems withIEEE 754 arithmetic.

The only portable avenue for significant improvement would be to replaceCody’s �(x) code with an extended-precision version. To our knowledge, no suchcode is freely available, although the necessary algorithmic work has been pub-lished [Char 1980; Fransen and Wrigge 1980; Fransen 1981; Carmignani et al.1980; Carmignani and Macaluso 1981].

We therefore extended Cody’s work to higher degree and precision, producingroutines qgamma() and qpsi(), described in detail in a companion article [Beebeand Ball, To appear]. Once they were available, it was then a simple matter toprovide a pair of small interface routines to call them, converting their returnedresults to double precision.

These routines use the nonstandard quadruple-precision Fortran data type,REAL*16, introduced by IBM in 1967 [Abbott et al. 1999, p. 726]. Most majorvendors have followed IBM in supporting this extension, although IBM S/360mainframe architectures even today remain the only ones with full quadruple-precision support in hardware [Abbott et al. 1999, p. 724].

Fortran 90 extended the Fortran type declaration syntax, so that statementsof the form

INTEGER digits, erangePARAMETER (digits = 30)PARAMETER (erange = 300)REAL (SELECTED_REAL_KIND(digits,erange)) x

could be used to request a type with a least 30 decimal digits, and an expo-nent range of ±300. Of the Fortran 90 and 95 compilers tested, some did notprovide quadruple precision with this syntax, although they did so with theolder REAL*16 type declaration! Only one provided quadruple precision on theGNU/Linux systems. Also, when they did not support the requested precision,they refused to compile the code. In such a case, most Fortran 77 compilerswarn that the requested precision is not supported, and fall back to a lower

6Almost 220,000 lines of code at gsl version 1.4.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 19

precision. Thus, current Fortran 90 and 95 implementations offer no new sup-port here.

For the ψ(x) function, things are more difficult. We know of no programminglanguage standards that include this function, and it is not part of fdlibm [SunMicrosystems, Inc. 1995]. Amos [1983] published code for this function and itsderivatives, reported to be accurate to about 18 decimal digits (in a Univac 72-bit floating-point system). His code, Fullerton’s FNLIB version for ψ(x) alone,and Cody’s ACM Algorithm 715 [Cody 1993] code are the only ones that wecould find in the Netlib archive. We chose Cody’s implementation.

Fortunately, tests of the implementation of ψ(x) against accurate valuesfrom MAPLE show better accuracy than that for the �(x) function. As before,we prepared a test file containing 10250 pseudo-random integers in the range0 . . 231 − 1 logarithmically distributed over the range of x for which ψ(x) is rep-resentable, accompanied by a suitable integer power of two. Across all of thesystems shown in Table II, the worst-case maximum relative error was 1.15e-15(IBM), corresponding to five ulps; most others had maximum relative errors ofless than two ulps, and the Intel x86 systems, less than one ulp (a result of theirlonger intermediate precision). The gsl version was even better: its worst-casemaximum relative error was 5.88e-16 (HP), or 2.6 ulps. All other systems haderrors of less than 1.6 ulps.

A second test with 10240 randomly distributed arguments in the interval(1.3201, 1.6128), where the relative error magnification is unavoidably large,show worse results: maximum relative errors of up to 3.95e-15 (17.8 ulps) onall systems, except the IBM and Intel ones, where the errors reached 2.09e-11(94125 ulps) and 9.72e-13 (4378 ulps), respectively.

9. TESTING THE MACHINE-EPSILON PRIMITIVES

Although the quadrature routines do not require knowledge of the machineepsilon (the smallest number that can be added to one and still differ fromone), the test programs use it in their reports of relative errors.

In principle, assuming a number system with a base that is a power of two(true on all commercially significant desktop and mainframe systems today),this should be straightforwardly computable from the pseudocode

epsilon = 1while ((1 + epsilon/2) > 1)

epsilon = epsilon/2

Unfortunately, this simple code fails on several systems.The Honeywell Series 6000, Intel x86, and Motorola 68K architectures all

have floating-point registers that are longer than memory words. On such sys-tems, if the compiler retains the loop variable in a register, an abnormally smallvalue will be produced. The functions deps() and qeps() in our package usethis technique, but embed a call to a do-nothing external subroutine to forcethe loop variable into memory before the call, and reload it after the call. Thisproduces working code on all but one of the test systems, to be discussed next.

IBM RS/6000 and SGI MIPS systems have an unusual quadruple-precisionformat that uses two double-precision words. A few older systems, such as the

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

20 • N. H. F. Beebe and J. S. Ball

DEC PDP-10, also did this. However, the IBM RS/6000 systems depart frompast practice in not constraining the exponents of the two words to differ bya constant value. The result is that epsilon is reduced to the smallest rep-resentable nonzero value, the machine underflow limit. We have not found aportable way to handle this problem. IBM’s other IEEE 754 implementation,on the IBM S/390 G5 (and later) processors, will not exhibit any such anomaly.

EISPACK has a similar function, epslon(), which works differently. It com-putes

a = 4.0 / 3.0b = a - 1.0c = b + b + bepslon = abs(c - 1.0)

This code is easier to understand if we walk through it in fixed-point binaryarithmetic:

a = 1.01010101...b = 1.01010101... - 1.00000000...= 0.01010101...

c = 0.01010101... + 0.01010101... + 0.01010101...= 0.11111111...

epslon = abs(0.11111111... - 1.00000000...)= 0.00000000...00001

With a p-bit significand, the computation effectively produces epslon = 2−(p−1),which is precisely the machine epsilon.

This algorithm could be foiled by compile-time optimization that used longerprecision. However, in tests across all of the compilers and systems in Table I,with a range of optimization levels, we found that the EISPACK algorithmappears to work correctly, except on the IBM RS/6000, where the quadruple-precision version produces a value half the expected size, and where the double-precision version produced two different answers, depending on the optimiza-tion level, and on Sun Solaris 2.7, where f90 version 5.0 produced a quadruple-precision answer four times the correct value. Version 6.0 of that compiler,released in mid-2000, corrected that flaw.

We have therefore retained our algorithm using the reduction loop, with awarning that the quadruple-precision version, qeps(), will misbehave on IBMRS/6000 and some Sun systems.

10. TESTING THE EIGENVALUE SOLVER

There are at least nine publicly available, and well-tested, tridiagonal ma-trix eigenvalue solvers: imtql1(), tql1(), and tqlrat() from EISPACK[Smith et al. 1976; Garbow et al. 1977], and dsteqr(), dsterf(), dstev(),dstevd(), dstevr(), and dstevx() from LAPACK [Anderson et al. 1992; An-derson et al. 1995; Anderson et al. 1999].

We chose tql1(), primarily because it is compact (135 lines) and requiresonly a single supporting routine (55 lines). By contrast, the LAPACK routines

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 21

are considerably longer: dstev() cascades into 28 other LAPACK routines, fora total of 6575 lines of LAPACK code. Also, LAPACK uses Fortran CHARACTER*narguments. These are a significant barrier to interfacing to other languages,because of several incompatible ways of implementing such arguments. Eventhough LAPACK is widely, and freely, available, it is not universally installedon all computing systems, and we do not wish to burden the installer of ourpackage with the task of first installing a much larger one.

tql1() is based on an Algol procedure from more than three decades ago[Bowdler et al. 1968]. The only significant change made since the second editionof EISPACK has been to replace three expressions of the form

√1 + x2 with

calls to a new routine, pythag(a,b), which computes sqrt(a**2 + b**2)using aclever square-root-free iterative algorithm [Dubrulle 1983; Moler and Morrison1983] that avoids premature destructive underflow and overflow.

During the course of testing our package with extreme values, we found thatpythag() would go into an infinite loop when passed Infinity or NaN argumentvalues. This happens because its loop convergence test is of the form

IF ((4.0d+00 + r) .EQ. 4.0d+00) RETURN

With normal arguments, r converges cubically to zero, so rarely are more thanfour iterations needed (tcov shows an average of 2.02 iterations). However,should r become Infinity or NaN, the test is never satisfied, and the loop neverterminates.

Fortunately, the fix is both easy, and portable: before the loop start, insertthe equivalent of

IF (pythag .NE. pythag) RETURNIF (pythag .EQ. (pythag + pythag)) RETURN

The first tests for a NaN (the only floating-point value never equal to itself),and the second tests for ±Infinity.

Although these simple tests work on almost all systems, there are compilersthat fail to handle them correctly; solutions to this problem are given in Beebeand Ball [To appear].

While convergence tests like that in pythag() are superior to the old machine-dependent programming style of comparing a computed small value against asmall magic constant, they do require careful handling of special values in IEEE754 arithmetic.

Although pythag() is a clever algorithm, profiling shows that it takes about7.3 flops per result in double precision, and 11.7 in quadruple precision. Com-parisons with an alternative function, d2norm(), that uses a suitably-scaled callto dsqrt() showed that the average error from d2norm() was about half thatfrom pythag(), and quadrature accuracy improved slightly with the replace-ment, so we ultimately used d2norm() in our version of tql1().

We also experimented with alternative eigenvalue solvers, using tql1()-likewrappers: there is not a clear winner, although in several tests, the LAPACKroutine dstevx() produced notable improvements in quadrature accuracy. Themake target demo-lapack in the jacobi and laguerre directories runs tests withthe demonstration programs using wrapped LAPACK routines. Because of the

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

22 • N. H. F. Beebe and J. S. Ball

LAPACK cascade noted above, we have not attempted to provide quadruple-precision versions of all of the LAPACK routines that would be needed for ourquadruple-precision code.

11. TESTING THE GAUSS-LAGUERRE LOG QUADRATURE

From standard integral tables [Gradshteyn and Ryzhik 1965, p. 576, §4.352],we find ∫ ∞

0

dx xν−1e−μx ln x = μν−1�(ν)[ψ(ν) − ln μ] (�μ, ν > 0).

This provides an analytic result for the Gauss-Laguerre quadrature, Equa-tion (2), with f (x) = x p (p ≥ 0): substitute μ = 1 to obtain∫ ∞

0

dx xαe−x ln xx p = �(p + α + 1)ψ(p + α + 1). (9)

Test programs tglf1 and tglfd1 read test files with triples (nquad, alpha,pmax), and compute the integral in Equation (9) with a quadrature of ordernquad, and also exactly, for p = 0, 1, 2, . . . , pmax. Since the N -point quadratureis, in principle, exact for p ≤ 2N−1, input data can be used to probe the accuracyfor larger p: with an 8-point quadrature and small α, where the effect is mostnoticeable, ten decimals are precipitously lost when p exceeds this limit.

tglf1 and tglfd1 print the quadrature nodes and weights, followed by thecomputed approximate and exact integral, and the relative error expressedas a raw number, as a number of ulps, and as a number of bits in error. TheMakefile targets check-glf1 and check-glfd1 run the test programs for severaltest input files, and compare the output with master output files. As long asndiff is available, the short relative error fields are ignored, and the checkoutput is very short. All of our test programs produce similar output, and havesimilar check-xxx targets in the Makefile for running the tests, and checkingtheir output.

A second pair of test programs, tglf2 and tglfd2, exploits the relation[Gradshteyn and Ryzhik 1965, p. 576, §4.352]

Q p =∫ ∞

0

dx x pe−μx ln x (�μ > 0, p = 0, 1, 2, . . . ) (10)

= μ−(p+1) p!

[1 + 1

2+ . . . + 1

p− γ − ln μ

]

where γ = 0.577 215 664 . . . is the Euler-Mascheroni constant. With μ = 1,p = 1, and using Equation (2), we have∫ ∞

0

dx e−x ln x = −γ

=N∑

i=1

δWi(α)

=N∑

i=1

[Wi(α)(xi(α) − 1) − Zi(α)].

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 23

Thus, simple sums of the weights should evaluate to a well-known constant,although the quadrature nodes are not tested in the first of these sums. Thetest programs read test files with values of nquad and carry out the quadratureat that order. Neither �(x) nor ψ(x) is required in the test code.

In the general case,

Q0 = μ−1[−γ − ln μ] (�μ > 0)

Q p+1 = μ−(p+2)(p + 1)!

[1 + 1

2+ · · · + 1

p + 1− γ − ln μ

](11)

= μ−1(p + 1)μ−(p+1) p!

[1 + 1

2+ · · · + 1

p− γ − ln μ + 1

p + 1

]

= μ−1(p + 1)Q p + μ−(p+2) p!.

This recursion provides a fast way to compute the exact integrals, Q p, andimportantly, does not require either �(x) or ψ(x) functions.

With a change of variable, t = μx, we can rewrite Equation (10) as

Q p =∫ ∞

0

dt1

μ

(tμ

)p

e−t ln

(tμ

)

= μ−(p+1)

[∫ ∞

0

dt t pe−t ln t − ln μ

∫ ∞

0

dt t pe−t]

= μ−(p+1)

[∫ ∞

0

dt t pe−t ln t − ln μ�(p + 1)

]

where the � function arises from [Abramowitz and Stegun 1964, p. 255, §6.1.1].The last integral is our Equation (2) when α = p and f (x) = 1.

Test programs tglf3 and tglfd3 read test files with triples (nquad, mu,pmax), and use this relation and our quadrature routines to compute, forp = 0, 1, 2, . . . , pmax, integrals that are compared with exact integrals ob-tained from the recursion, Equation (12). Neither �(x) nor ψ(x) is required inthe test code.

The test programs described so far use polynomial functions for which thequadrature, in exact arithmetic and for sufficiently large quadrature order, isexact.

We supply a pair of test programs, tglf4 and tglfd4, for a nonpolynomialfunction, f (x) = sin(σ x), for which the quadrature can only be approximate,even in exact arithmetic. The required exact integral is known for the nonlog-arithmic case [Gradshteyn and Ryzhik 1965, p. 490]:

S(α, σ ) =∫ ∞

0

dx xαe−x sin(σ x) (12)

= �(1 + α)(1 + σ 2)−(1+α)/2 sin((1 + α)θ ) (13)

where

θ = arctan σ.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

24 • N. H. F. Beebe and J. S. Ball

To obtain the exact integral in the logarithmic case, first recall the relation[Abramowitz and Stegun 1964, p. 258]

d�(z + c)

dz= �(z + c)ψ(z + c), (14)

and the standard differential formula [Beyer 1978, p. 379]

dxβ

dβ= xβ ln x, (15)

then differentiate Equation (12) and Equation (13) with respect to α:

dS(α, σ )

dα=

∫ ∞

0

dx xαe−x ln(x) sin(σ x) (16)

= S(α, σ )[ψ(1 + α) + θ/ tan((1 + α)θ ) − (1/2) ln(1 + σ 2)]. (17)

Since for small x, sin x → x, for small σ , the integrals should approachσ times the integrals for f (x) = x, which we have already considered. Forlarge σ , the integrands become so oscillatory that we cannot hope for accuratequadrature.

The test programs tglf4 and tglfd4 read test files with triples (nquad, sigma,alpha) and evaluate the integral Equation (16) with our quadrature, and withEquation (17). The accuracy attainable deteriorates with increasing σ , but withfavorable σ values, is about 14 decimal digits at modest quadrature order.

The test programs tglf5 and tglfd5 read test files with triples (nquad, alpha,pmax) and compute the nonlogarithmic integral in Equation (3) for f (x) = x p

with a quadrature of order nquad, and also exactly, for p = 0, 1, 2, . . . , pmax.The test programs tglf6 and tglfd6 read test files with triples (nquad, sigma,

alpha) and evaluate the nonlogarithmic integral, Equation (12) for f (x) = x p,with our quadrature, and also with the exact result, Equation (13).

The companion programs tqglf1, tqglfd1, . . . , tqglf6, and tqglfd6 providesimilar testing of the quadruple-precision quadratures.

Because the derivative-free quadrature formula Equation (2) has subtrac-tions, it is reasonable to ask whether they are a significant source of inaccuracy.Instrumentation of one of the test programs shows that one can readily find in-stances of individual terms in the summation where up to 20 bits are lost, butthe computed quadrature remains quite accurate. As a further experiment, forα in the range [−1 + 1/256, 10] with steps of 1/256, quadrature order n = 50,and f (x) = x p for p = 0, 1, 2, . . . , 99, we compared the relative errors fromEquation (2) with those from the subtraction-free quadrature Equation (1).The largest ratio of relative errors was about 1450, corresponding to a loss ofabout three decimal digits: such large errors were found only for high p values.

12. TESTING THE GAUSS-JACOBI LOG QUADRATURE

MAPLE is unable to integrate Equation (6) analytically, and when either α < 1or β < 1, its adaptive high-precision numerical quadrature also fails. When itsucceeds in producing a numeric result, it does so in a few seconds.

We require a simple test function, f (x), for which the integral in Equation (6)can be done analytically. Standard integral tables [Beyer 1978; Gradshteyn and

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 25

Ryzhik 1965] do not contain any obvious candidates, but we can derive one asfollows.

We start with the analytic result [Beyer 1978, p. 437]

J (α, β) =∫ 1

0

dy (1 − y)α yβ (18)

= �(α + 1)�(β + 1)

�(α + β + 2). (19)

We now use the two derivatives Equation (14) and Equation (15) inEquation (18) and Equation (19), producing

dJ(α, β)

dβ=

∫ 1

0

dy (1 − y)α yβ ln y , (20)

= J (α, β)[ψ(β + 1) − ψ(α + β + 2)]. (21)

Next, we change variables in Equation (18) via y = (1 + x)/2 and simplify toget

J (α, β) = 2−(α+β+1)

∫ +1

−1

dx (1 − x)α(1 + x)β. (22)

Differentiating this with respect to β, we find

dJ(α, β)

dβ= 2−(α+β+1)

∫ +1

−1

dx (1 − x)α(1 + x)β ln

(1 + x

2

). (23)

This integral is one for which we have quadrature formulas, Equation (4) andEquation (5), and the choice f (x) = (1 + x)p, where p ≥ 0, gives us a range oftest functions whose exact integrals are

I (p, α, β) =∫ +1

−1

dx (1 − x)α(1 + x)β ln

(1 + x

2

)(1 + x)p (24)

= 2(α+β+p+1) J (α, β + p)[ψ(β + p + 1) − ψ(α + β + p + 2)]. (25)

Test programs tgjf1 and tgjfd1 read test files with quadruples (nquad,alpha, beta, pmax) and compute the integrals in Equation (24) for p = 0, 1, 2, . . . ,pmax, using our two forms of Gauss-Jacobi quadrature, and compare the resultswith the exact analytic ones.

Test programs tgjf2 and tgjfd2 are similar to these, but for the nonloga-rithmic integrals in Equation (22).

The companion programs tqgjf1, tqgjfd1, tqgjf2, and tqgjfd2 provide sim-ilar testing of the quadruple-precision quadratures.

13. CONCLUSION

The new types of orthogonal polynomials introduced in the previous paper[Ball and Beebe 2007] have been implemented in software usable from sev-eral popular programming languages for the numerical evaluation of integralswith weight functions of logarithmic and nonlogarithmic type for the Gauss-Legendre, Gauss-Chebyshev, Gauss-Jacobi (or Mehler), and Gauss-Laguerrecases.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

26 • N. H. F. Beebe and J. S. Ball

In exact arithmetic, all such quadratures of order N are exact for integrationof polynomial functions of degree up to order 2N − 1.

Generation of the nodes and weights takes O(N 2) operations; subsequentquadratures require O(2N ) (logarithmic) or O(N ) (nonlogarithmic) functionevaluations.

Exponent range in the IEEE 754 floating-point arithmetic system for 64-bit data types limits the quadrature order in the case of logarithmic weightsto about 100 in practice, although with additional scaling, that order can beapproximately doubled. The Gauss-Legendre case can be taken up to aboutorder 530 before Infinities and NaNs are produced.

Extensive testing compared several implementations of the �(x) and ψ(x)functions, and produced recommendations for the choice between them. Numer-ical experiments suggest that little improvement in these functions is possiblewithout going beyond double precision.

Quadruple-precision code for the �(x) and ψ(x) functions, accurate to 29 ormore digits, has been developed, and is described in a companion article [Beebeand Ball, To appear].

Testing exposed a serious bug (an infinite loop) in the EISPACK primitive,pythag(), that is precipitated by Infinity or NaN arguments. A simple, andportable, fix for both cases has been implemented in the version distributed withour package, although our tql1() and qtql1() routines now use functionallyequivalent replacements, d2norm() and q2norm().

Finally, vagaries of the software implementation of quadruple-precisionarithmetic on IBM RS/6000 Power and PowerPC systems have been exposedand identified, although we cannot yet offer a portable solution to the prob-lem of robust determination of the machine epsilon. This is the sole case inthe entire package where machine-specific code appears to be necessary; all ofthe remainder of our package is completely portable across systems providingIEEE 754 arithmetic.

REFERENCES

ABBOTT, P. H., BRUSH, D. G., CLARK III, C. W., CRONE, C. J., EHRMAN, J. R., EWART, G. W., GOODRICH,

C. A., HACK, M., KAPERNICK, J. S., MINCHAU, B. J., SHEPARD, W. C., SMITH, R. M., SR., TALLMAN,

R., WALKOWIAK, S., WATANABE, A., AND WHITE, W. R. 1999. Architecture and software support

in IBM S/390 Parallel Enterprise Servers for IEEE floating-point arithmetic. IBM J. Resear.Develop. 43, 5/6, 723–760.

ABRAMOWITZ, M. AND STEGUN, I. A., EDS. 1964. Handbook of Mathematical Functions with Formu-las, Graphs, and Mathematical Tables. Applied mathematics series, vol. 55. U.S. Department of

Commerce, Washington, DC.

ADAMS, J. C., BRAINERD, W. S., MARTIN, J. T., SMITH, B. T., AND WAGENER, J. L. 1997. Fortran 95Handbook: Complete ISO/ANSI Reference. MIT Press, Cambridge, MA.

AMOS, D. E. 1983. Algorithm 610: A portable FORTRAN subroutine for derivatives of the psi

function. ACM Trans. Math. Softw. 9, 4 (Dec.), 494–502.

ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD, S., DEMMEL, J., DONGARRA, J., CROZ, J. D., GREENBAUM, A.,

HAMMARLING, S., MCKENNEY, A., AND SORENSEN, D. 1999. LAPACK Users’ Guide 3rd ed. Society

for Industrial and Applied Mathematics, Philadelphia, PA.

ANDERSON, E., BAI, Z., BISCHOF, C., DEMMEL, J., DONGARRA, J., CROZ, J. D., GREENBAUM, A., HAMMARLING,

S., MCKENNEY, A., OSTROUCHOV, S., AND SORENSEN, D. 1995. LAPACK Users’ Guide 3rd ed. Society

for Industrial and Applied Mathematics, Philadelphia, PA.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 27

ANDERSON, E., BAI, Z., BISCHOF, C., DEMMEL, J., DONGARRA, J., DU CROZ, J., GREENBAUM, A., HAMMAR-

LING, S., MCKENNEY, A., OSTROUCHOV, S., AND SORENSON, D. 1992. LAPACK Users’ Guide. 2nd ed.

Society for Industrial and Applied Mathematics, Philadelphia, PA.

ANDERSON, I. J. 1999. A distillation algorithm for floating-point summation. SIAM J. Sci. Com-put. 20, 5 (Sept.), 1797–1806.

ANSI. 1978. American National Standard Programming Language FORTRAN: approved April

3, 1978, American National Standards Institute, Inc., ANSI X3.9-1978. Revision of ANSI

X3.9-1966, Rev. Ed. American National Standards Institute, 1430 Broadway, New York, NY.

http://observer.gsfc.nasa.gov/iteams/doc/ansi f77.ps, http://observer.gsfc.nasa.gov/

iteams/doc/f77.doc, http://observer.gsfc.nasa.gov/iteams/doc/f77 cov.pdf, and http://

observer.gsfc.nasa.gov/iteams/doc/f77 doc.pdf.

ANSI. 1986. ANSI X3.4-1986, Code for Information Interchange. American National Standards

Institute, 1430 Broadway, New York, NY 10018.

APPLE COMPUTER, INC., IBM CORPORATION, AND MOTOROLA, INC. 1995. PowerPC Microproces-sor Common Hardware Reference Platform: A System Architecture. Morgan Kaufmann

Publishers.

BALL, J. S. AND BEEBE, N. H. F. 2007. Efficient Gauss-related quadrature for two classes of loga-

rithmic weight functions. ACM Trans. Math. Softw. 33, 3, Article 9.

BEEBE, N. H. F. Algorithm ndiff—A numeric file difference utility. ACM Trans. Math. Softw. To

appear. http://www.math.utah.edu/∼beebe/software/ndiff.

BEEBE, N. H. F. AND BALL, J. S. Algorithm Quadruple-precision �(x) and ψ(x) functions for real

arguments. ACM Trans. Math. Softw. To appear.

BEYER, W. H., ED. 1978. CRC Handbook of Mathematical Sciences, 5th ed. CRC Press, 2000 N.W.

Corporate Blvd., Boca Raton, FL.

BOWDLER, H. J., MARTIN, R. S., REINSCH, C., AND WILKINSON, J. H. 1968. The Q R and LR algorithms

for symmetric matrices. Numerische Mathematik 11, 293–306.

CARMIGNANI, M. AND MACALUSO, A. T. 1981. Computation of the special functions �(x), log

�(x), β(x, y), erf (x), erfc (x) to a high degree of precision. Atti Accad. Sci. Lett. Arti Palermo Ser.(5) 2, 1, 7–25 (1985).

CARMIGNANI, M., PULEO, G., AND MACALUSO, A. T. 1980. Calculating to high precision the Euler–

Mascheroni constant and generalized harmonic series. First applications to the calculation of the

function �(x). Atti Accad. Sci. Lett. Arti Palermo Parte I (4) 40, 2, 211–223 (1984).

CASANOVA, H., DONGARRA, J., AND DOOLIN, D. M. 1997. Java access to numerical libraries. Concur-rency: Prac. Exper. 9, 11 (Nov.), Special Issue: Java for Computational Science and Engineering—

Simulation and Modeling II. 1279–1291.

CHAR, B. W. 1980. On Stieltjes’ continued fraction for the gamma function. Math. Comput. 34, 150

(April), 547–551.

COBALT BLUE, INC. 1988. FOR C: Fortran 77 to C translator. 11585 Jones Bridge Rd, Ste 420-306,

Alpharetta, GA. http://www.cobalt-blue.com.

CODY, W. J. 1991. Performance evaluation of programs related to the real gamma function. ACMTrans. Math. Softw. 17, 1 (March), 46–54.

CODY, W. J. 1993. Algorithm 715: SPECFUN—A portable FORTRAN package of special function

routines and test drivers. ACM Trans. Math. Softw. 19, 1 (March), 22–32.

DEMMEL, J. AND HIDA, Y. 2003. Accurate and efficient floating point summation. SIAM J. Sci.Comput. 25, 4 (Dec.), 1214–1248.

DODSON, D. S. 1983. Corrigendum: Remark on Algorithm 539: Basic Linear Algebra Subroutines

for FORTRAN Usage. ACM Trans. Math. Softw. 9, 1 (March), 140–140. See [Lawson et al. 1979a;

Dodson and Grimes 1982; Hanson and Krogh 1987; Louter-Nool 1988].

DODSON, D. S. AND GRIMES, R. G. 1982. Remark on Algorithm 539: Basic Linear Alge-

bra Subprograms for Fortran Usage [F1]. ACM Trans. Math. Softw. 8, 4 (Dec.), 403–

404.

DONGARRA, J. J., CROZ, J. D., HAMMARLING, S., AND DUFF, I. 1990a. Algorithm 679: A set of Level

3 Basic Linear Algebra Subprograms: Model implementation and test programs. ACM Trans.Math. Softw. 16, 1 (March), 18–28.

DONGARRA, J. J., CROZ, J. D., HAMMARLING, S., AND DUFF, I. 1990b. A set of Level 3 Basic Linear

Algebra Subprograms. ACM Trans. Math. Softw. 16, 1 (March), 1–17.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

28 • N. H. F. Beebe and J. S. Ball

DONGARRA, J. J., CROZ, J. D., HAMMARLING, S., AND HANSON, R. J. 1988a. Algorithm 656: An extended

set of Basic Linear Algebra Subprograms: Model implementation and test programs. ACM Trans.Math. Softw. 14, 1 (March), 18–32.

DONGARRA, J. J., CROZ, J. D., HAMMARLING, S., AND HANSON, R. J. 1988b. Corrigenda: An Extended

Set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14, 4 (Dec.), 399–

399.

DONGARRA, J. J., CROZ, J. D., HAMMARLING, S., AND HANSON, R. J. 1988c. An extended set of FOR-

TRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14, 1 (March), 1–17.

DONGARRA, J. J., MOLER, C. B., BUNCH, J. R., AND STEWART, G. W. 1979. LINPACK Users’ Guide.

Society for Industrial and Applied Mathematics, Philadelphia, PA.

DOOLIN, D., DONGARRA, J., AND SEYMOUR, K. 1999. JLAPACK—compiling LAPACK Fortran to Java.

Scient. Program. 7, 2, 111–138. http://www.cs.utk.edu/f2j/.

DUBRULLE, A. A. 1983. Class of numerical methods for the computation of Pythagorean sums.

IBM J. Res. Develop. 27, 6 (Nov.), 582–589.

ESPELID, T. O. 1995. On floating-point summation. SIAM Rev. 37, 4 (Dec.), 603–607.

EVANS, D. 1998. LCLint: a tool for statically checking C programs. World-Wide Web document

and source code. http://lclint.cs.virginia.edu/.

FRANSEN, A. 1981. Addendum and corrigendum to: High-precision values of the gamma function

and of some related coefficients [Math. Comp. 34 (1980), no. 150, 553–566, MR 81f:65004] by

Fransen and S. Wrigge. Mathem. Computat. 37, 155 (July), 233–235.

FRANSEN, A. AND WRIGGE, S. 1980. High-precision values of the gamma function and of some

related coefficients. Mathem. Computat. 34, 150 (Apr.), 553–566.

FREE SOFTWARE FOUNDATION. 1996. Porting the GNU C library. http://www.gnu.org/software/

libc/porting.html.

FREE SOFTWARE FOUNDATION. 1998. Makefile conventions. ftp://ftp.gnu.org/gnu/GNUinfo/

standards.*.

FULLERTON, L. W. 1978. FNLIB: Special function library. Developed at Los Alamos National Lab-

oratory. ftp://ftp.netlib.org/fn/.

GALASSI, M., DAVIES, J., THEILER, J., GOUGH, B., PRIEDHORSKY, R., JUNGMAN, G., AND BOOTH, M.

1999. GNU Scientific Library—Reference Manual. Free Software Foundation, 675 Mass

Ave, Cambridge, MA. Edition 0.5+, for gsl-0.5+. ftp://sourceware.cygnus.com/pub/gsl and

ftp://alpha.gnu.org/gnu/.

GARBOW, B. S., BOYLE, J. M., DONGARRA, J. J., AND MOLER, C. B. 1977. Matrix eigensystem routines—

EISPACK guide extension. Lecture Notes in Computer Science, G. Goos and J. Hartmanis, Eds.

vol. 51. Springer-Verlag, Berlin, Germany.

GAY, D. M., FELDMAN, S., MAIMONE, M., AND SCHRYER, N. 1989. f2c: A Fortran to C converter.

ftp://netlib.bell-labs.com/netlib/f2c/.

GORDON, R. AND MCCLELLAN, A. 1998. Essential JNI: Java Native Interface. Prentice-Hall, Engle-

wood Cliffs, NJ.

GRADSHTEYN, I. S. AND RYZHIK, I. M. 1965. Table of Integrals, Series, and Products, 4th Ed. Aca-

demic Press, New York, NY.

HAL COMPUTER SYSTEMS, INC. 2000. SPARC64-GP processor. http://mpd.hal.com/products/

SPARC64-GP.html.

HANSON, R. J. AND KROGH, F. T. 1987. Algorithm 653: Translation of Algorithm 539: PC-BLAS

Basic Linear Algebra Subprograms for FORTRAN usage with the INTEL 8087, 80287 Numeric

Data Processor. ACM Trans. Math. Softw. 13, 3 (Sept.), 311–317.

High Performance Fortran Forum. 1992. High Performance Fortran Language Specifica-

tion, V 0.4. High Performance Fortran Forum. http://www.math.utah.edu/pub/tex/bib/

index-table-h.html#hpfortran (extensive bibliography).

HIGHAM, N. J. 1996. Accuracy and Stability of Numerical Algorithms. Society for Industrial and

Applied Mathematics, Philadelphia, PA, USA.

IBM CORPORATION. 1994. The PowerPC Architecture: A Specification for a New Family of RISCProcessors, 2nd ed. Morgan Kaufmann Publishers.

IEC. 1989. IEC 60559 (1989-01): Binary Floating-Point Arithmetic for Microprocessor Systems.

International Electrotechnical Commission, 3, rue de Varembe, PO Box 131, CH-1211 Geneva

20, Switzerland.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

Algorithm 867: QUADLOG • 29

IEEE. 1985. ANSI/IEEE 754-1985, Standard for Binary Floating-Point Arithmetic. IEEE, New

York, NY.

Intel Corporation 2000. Intel IA-64 Architecture Software Developer’s Manual. Volume 1: IA-64 Application Architecture. Intel Corporation, Santa Clara, CA. http://developer.intel.com/

design/ia-64/downloads/245317.htm.

ISO. 1983. ISO Standard 646, 7-Bit Coded Character Set for Information Processing Interchange,

2nd Ed. International Organization for Standardization, Geneva, Switzerland.

ISO. 1988. ISO 9660:1988: Information processing: volume and file structure of CD-ROM for in-formation interchange = Traitement de l’information: structure de volume et de fichier des disquesoptiques compacts a memoire fixe (CD-ROM) destines a l’echange d’information, International Or-

ganization for Standardization, Geneva, Switzerland.

ISO. 1990. ISO/IEC 9899:1990: Programming Languages—C. International Organization for

Standardization, Geneva, Switzerland.

ISO. 1991. ISO/IEC 646:1991 Information Technology—ISO 7-Bit Coded Character Set for In-formation Interchange. International Organization for Standardization, Geneva, Switzerland.

ISO. 1997. ISO/IEC 1539-1:1997: Information technology—Programming languages—Fortran—Part 1: Base language. International Organization for Standardization, Geneva,

Switzerland.

ISO. 1998. ISO/IEC 14882:1998: Programming Languages—C++. International Organiza-

tion for Standardization, Geneva, Switzerland. http://webstore.ansi.org/ and http://www.

cssinfo.com/.

ISO. 1999. ISO/IEC 9899:1999: Programming Languages—C. International Organization

for Standardization, Geneva, Switzerland. http://webstore.ansi.org/ and http://www.

cssinfo.com/.

ISO AND IEC. 1991. International Standard: Information, Technology, Programming Languages,Fortran, 2nd Ed. International Organization for Standardization, Geneva, Switzerland.

KANE, G. 1996. PA-RISC 2.0 Architecture. Prentice-Hall, Englewood Cliffs, NJ.

KANE, G. AND HEINRICH, J. 1992. MIPS RISC Architecture. Prentice-Hall, Englewood Cliffs, NJ.

KNUTH, D. E. 1984. The TEXbook. Addison-Wesley, Reading, MA.

LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979a. Algorithm 539: Basic

linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 3 (Sept.), 324–

325.

LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979b. Basic linear algebra subpro-

grams for fortran usage. ACM Trans. Math. Softw. 5, 3 (Sept.), 308–323.

LIANG, S. 1999. Java Native Interface: Programmer’s Guide and Specification. Addison-Wesley,

Reading, MA.

LOUTER-NOOL, M. 1988. Algorithm 663: Translation of algorithm 539: Basic linear algebra

subprograms for FORTRAN usage in FORTRAN 200 for the Cyber 205. ACM Trans. Math.Softw. 14, 2 (June), 177–195.

MACKENZIE, D. 1992. GNU Autoconf: A package for creating scripts to configure source code

packages using templates and an m4 macro package. ftp://ftp.gnu.org/gnu/autoconf/.

MCNAMEE, J. M. 2004. A comparison of methods for accurate summation. SIGSAM Bull. 38, 1

(March), 1–7.

MOLER, C. AND MORRISON, D. 1983. Replacing square roots by Pythagorean sums. IBM J. Resear.Develop. 27, 6 (Nov.), 577–581.

MONIOT, R. K. 1991. ftnchek: a static analyzer for Fortran 77 programs. http://dsm.dsm.

fordham.edu/∼ftnchek/.

NIEVERGELT, Y. 2003. Scalar fused multiply-add instructions produce floating-point matrix arith-

metic provably accurate to the penultimate digit. ACM Trans. Math. Softw. 29, 1 (March), 27–

48.

PRICE, D. T. 1996. Remark on algorithm 715. ACM Trans. Math. Softw. 22, 2 (June), 258–258.

SANTO ORCERO, D. 2000. The code analyser LCLint. Linux J. 73, 100, 102–104. http://www2.

linuxjournal.com/lj-issues/issue73/3599.html.

SCHILDT, H., ANSI, ISO, IEC, AND ISO/IEC JTC 1. 1990. The Annotated ANSI C Standard:American National Standard for Programming Languages C: ANSI/ISO 9899-1990. Osborne/

McGraw-Hill.

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.

30 • N. H. F. Beebe and J. S. Ball

SCHWARZ, E. M. AND KRYGOWSKI, C. A. 1999. The S/390 G5 floating-point unit. IBM J. Resear.Develop. 43, 5/6, 707–721.

SEYMOUR, K. AND DONGARRA, J. 2003. Automatic translation of Fortran to JVM bytecode. Concur-rency Comput.: Prac. Exper. 15, 3–5 (Mar./Apr.), 207–222.

SLEGEL, T. J., AVERILL III, R. M., CHECK, M. A., GIAMEI, B. C., KRUMM, B. W., KRYGOWSKI, C. A., LI,

W. H., LIPTAY, J. S., MACDOUGALL, J. D., MCPHERSON, T. J., NAVARRO, J. A., SCHWARZ, E. M., SHUM, K.,

AND WEBB, C. F. 1999. IBM’s S/390 G5 microprocessor. IEEE Micro 19, 2 (Mar./Apr.), 12–23.

SMITH, B. T., BOYLE, J. M., DONGARRA, J. J., GARBOW, B. S., IKEBE, Y., KLEMA, V. C., AND MOLER, C. B.

1976. Matrix Eigensystem Routines—EISPACK Guide. Lecture Notes in Computer Science, G.

Goos and J. Hartmanis, Eds. vol. 6. Springer-Verlag, Berlin, Germany.

STORY, S. AND TANG, P. T. P. 1999. New algorithms for improved transcendental functions on IA-

64. In Proceedings of the 14th IEEE Symposium on Computer Arithmetic. Adelaide, Australia,

I. Koren and P. Kornerup, Eds. IEEE Computer Society Press, Silver Spring, MD, 4–11.

SUN MICROSYSTEMS, INC. 1995. fdlibm: A freely distributable math library. ftp://ftp.netlib.

org/fdlibm/.

VAUGHAN, G. V., ELLISTON, B., TROMEY, T., AND TAYLOR, I. L. 2000. GNU Autoconf, Automake andLibtool. New Riders Publishing, Carmel, IN.

WEISS, S. AND SMITH, J. E. 1994. Power and PowerPC: Principles, Architecture, Implementation.

Morgan Kaufmann Publishers.

WILKINSON, J. H. AND REINSCH, C., Eds. 1971. Linear Algebra: Handbook for Automatic Computa-tion, vol. II, F. L. Bauer, A. S. Householder, F. W. J. Olver, H. Rutishauser, K. Samelson and E.

Stiefel, Eds. Springer-Verlag, Berlin, Germany.

YEAGER, K. C. 1996. The MIPS R10000 superscalar microprocessor—emphasizing concurrency

and atency-hiding techniques to efficiently run large, real-world applications. IEEE Micro 16, 2

(Apr.), 28–40.

ZHENG, Q., WU, Z., FOX, G., AND LI, X. 1998. F2j: A prototype of Fortran-to-Java converter. http://

www.npac.syr.edu/projects/pcrc/f2j.html.

Received November 2004; revised January 2006; accepted April 2006

ACM Transactions on Mathematical Software, Vol. 33, No. 3, Article 20, Publication date: August 2007.