Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103....

7
Inverse-Distance-Wei$hted Spatial Interpolation Using Parallel SupercomPuters Marc P. Armstlong and RlchardMarclano* Absttact Interpdation is a computationally intensive activity that may ,eouire hours of execition time to produce tesults when )aige problems arc considered. In ihis paper -we describe a tt*t"gy to reduce computation times through the use of par- allel frocessing. To acheive this goal, a setial algorithm that performs two-dimensional inverse-distance-weighted interpo- iation is decomposedinto a form suitable for parallel proc- essing in two siared memory computing envitonments. The first ises a conventional architecture with a single mono- ')ithic memory, while the second usesa hierarchically otgan' ized collection of local caches to implement a large shared virtual addressspace. A seriesof computational experiments was conducted in which the number of processors used in parallel is systematically increased. The resuJtsshow o sub- itantial rediction in toial processing time ond speedups that are c]ose to linear when tEe additional processors are used. The general approach described in this pa-per can be used to impilve the firformance of other types of computationally intensive interpolation problems. lntroduction The computation of a two-dimensional gridded surface from a dispersed set of control points with known values is a funda- -"nt"l operation in automated cartography'- Though sev,eral methods have been developed to accomplish this task (Lam, 1983;Burrough, 1986), inverse-distance-weighted interpolation is widely applied and is available in many commercial GIS softwara environments. For large problems, however, inverse- distance-weighted interpolation can require substantial amountsof cimputation. thu purposebf this paper is to dem- onstratehow parallel processingcan be used to improve the computationai performanceof an inverse-distance-weighted interpolation algorithm when it is applied to a Iarge problem. Backgtound The rinderlying assumptionof inverse-distance-weighted in- terpolation is that of positive spatial autocorrelation (Crom- Iev, rssz): The contribution ofcontrol points near to a grid Iolation with an unknown value is Sreater than that of dis- tant control points. This assumptionis embeddedin the fol- lowing equation: zj : Photogrammetric Engineering & Remote Sensing, Vol. Oo, No. 9, September 1994,pp. 1097-1103. 0099-1 112l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote Sensing s 2*'F' N s L wii where z, is the estimatedvalue at grid location i' i,, is the known value at control point location r', and w,, is the weight that controls the effect of control points ' on the calculation of z,' Often w,,is set equal to d,;". where d,, is some measure of distance and'a is often r.O or 2.0 (Hodgson, 1992)'As the value of the exponent (a) increases, close control points,con- tribute a greatei proportion to the value of each interpolated srid location (Mitchell, 7977'p. 256). " I., this formulation, all control points (2,)would contrib- ute to the calculation of an interpolated value aI cell z,' Given the assumptionof positive spatial autocorrelation, however, it is common to restrict computation to some neighborhood (k) of 2,. This is often done b-y setting a limit o.r ih" number'of poihts used to compute the z, values' The choice of an appropriatevalue for k has been the source of some discussionin the literature (Hodgson,1992).The now- ancient SYMAP algorithm (Shepard, 1968). for example, at- tempts to ensurelhat betweerrfour and ten data p-ointsane .rr"d (Mo.t*onier, 1982)by varying the searchradius about each 2,. If fewer than four ilointi are found within an initial radiusl the radius is expanded;if too many (e.g., more than ten) points are found, the radius is contracted'MacDougall- (rg'zo) also implements a similar approach to neighborhood specification. ^ While this process is conceptually rational, it involves considerable computation because, for each grid-point, the . distancebetweenlt and all control points must be evaluated to determine those that are closest. MacDougall (r9s+), for example, demonstrated that computation times increased dramatically with the number ofpoints used to interpolate a 24 bv 80 mip erid; this grid size was selected because it was the iesolutio^n 6f atptt"ttnmeric cRT displays popular at that time. Using an B-bii, 2-MHz microcomputer and interpreted BASIC, he f6und that computation times increasedfrom a lit- tle more than 30 minutes when three control points were used, to almost 13 hours when calculationswere based on 100 control points. Though most workstations and main- frameswould now compute this small problem in a few sec- onds, the need for high performancecomputing can be Departments of Geography and Computer Scienceand Pro- gram in Applied Mathematicaland ComputationalSciences; ind Gerard Weeg Computer Center and Department of Com- puter Science, relpectively; 316 fessup Hall, The University of Iowa, Iowa City, IA 52242. *R. Marciano is presently with the National Supercomputer Center for Energy and the Environment, University of Ne- vada, Las Vegas, Las Vegas, NV 89154. PEERA

Transcript of Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103....

Page 1: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

Inverse-Distance-Wei$hted Spatial InterpolationUsing Parallel SupercomPuters

Marc P. Armstlong and Rlchard Marclano*

AbsttactInterpdation is a computationally intensive activity that may,eouire hours of execition time to produce tesults when)aige problems arc considered. In ihis paper -we describe att*t"gy to reduce computation times through the use of par-allel frocessing. To acheive this goal, a setial algorithm thatperforms two-dimensional inverse-distance-weighted interpo-iation is decomposed into a form suitable for parallel proc-essing in two siared memory computing envitonments. Thefirst ises a conventional architecture with a single mono-')ithic

memory, while the second uses a hierarchically otgan'ized collection of local caches to implement a large sharedvirtual address space. A series of computational experimentswas conducted in which the number of processors used inparallel is systematically increased. The resuJts show o sub-itantial rediction in toial processing time ond speedups thatare c]ose to linear when tEe additional processors are used.The general approach described in this pa-per can be used toimpilve the firformance of other types of computationallyintensive interpolation problems.

lntroductionThe computation of a two-dimensional gridded surface from adispersed set of control points with known values is a funda--"nt"l operation in automated cartography'- Though sev,eralmethods have been developed to accomplish this task (Lam,1983; Burrough, 1986), inverse-distance-weighted interpolationis widely applied and is available in many commercial GISsoftwara environments. For large problems, however, inverse-distance-weighted interpolation can require substantialamounts of cimputation. thu purpose bf this paper is to dem-onstrate how parallel processing can be used to improve thecomputationai performance of an inverse-distance-weightedinterpolation algorithm when it is applied to a Iarge problem.

BackgtoundThe rinderlying assumption of inverse-distance-weighted in-terpolation is that of positive spatial autocorrelation (Crom-Iev, rssz): The contribution ofcontrol points near to a gridIolation with an unknown value is Sreater than that of dis-tant control points. This assumption is embedded in the fol-lowing equation:

zj :

Photogrammetric Engineering & Remote Sensing,Vol. Oo, No. 9, September 1994, pp. 1097-1103.

0099-1 1 12l94/0009-000$3.00/0@ 1994 American Society for Photogrammetry

and Remote Sensing

s2*'F'

NsL wii

wherez, is the estimated value at grid location i'i,, is the known value at control point location r', andw,, is the weight that controls the effect of control points

' on the calculation of z,'

Often w,, is set equal to d,;". where d,, is some measure ofdistance and'a is often r.O or 2.0 (Hodgson, 1992)' As thevalue of the exponent (a) increases, close control points, con-tribute a greatei proportion to the value of each interpolatedsrid location (Mitchell, 7977' p. 256)." I., this formulation, all control points (2,) would contrib-ute to the calculation of an interpolated value aI cell z,'Given the assumption of positive spatial autocorrelation,however, it is common to restrict computation to someneighborhood (k) of 2,. This is often done b-y setting a limito.r ih" number'of poihts used to compute the z, values' Thechoice of an appropriate value for k has been the source ofsome discussion in the literature (Hodgson, 1992). The now-ancient SYMAP algorithm (Shepard, 1968). for example, at-tempts to ensurelhat betweerrfour and ten data p-ointsane.rr"d (Mo.t*onier, 1982) by varying the search radius abouteach 2,. If fewer than four ilointi are found within an initialradiusl the radius is expanded; if too many (e.g., more thanten) points are found, the radius is contracted' MacDougall-(rg'zo) also implements a similar approach to neighborhoodspecification.^

While this process is conceptually rational, it involvesconsiderable computation because, for each grid- point, the .distance betweenlt and all control points must be evaluatedto determine those that are closest. MacDougall (r9s+), forexample, demonstrated that computation times increaseddramatically with the number ofpoints used to interpolate a24 bv 80 mip erid; this grid size was selected because it wasthe iesolutio^n 6f atptt"ttnmeric cRT displays popular at thattime. Using an B-bii, 2-MHz microcomputer and interpretedBASIC, he f6und that computation times increased from a lit-

tle more than 30 minutes when three control points wereused, to almost 13 hours when calculations were based on100 control points. Though most workstations and main-frames would now compute this small problem in a few sec-onds, the need for high performance computing can beDepartments of Geography and Computer Science and Pro-

gram in Applied Mathematical and Computational Sciences;ind Gerard Weeg Computer Center and Department of Com-puter Science, relpectively; 316 fessup Hall, The Universityof Iowa, Iowa City, IA 52242.*R. Marciano is presently with the National SupercomputerCenter for Energy and the Environment, University of Ne-vada, Las Vegas, Las Vegas, NV 89154.

PEERA

Page 2: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

established by increasing the problem size from 24 by 80 toa larger 2aO by S00 grid with an increased number of controlpoints (10,000); this roughly maintains the same ratio of con-trol points to grid points used by MacDougall (1984). Whenthe problem is enlarged in this way, a similar computationalbarrier is encountered: using a fast workstation, an RS/6000-sso, and an interpolation algorithm that uses the three clos-est control points, this larger and more realistic problemrequired 7 hours and 27 minutes execution time. This is con-sistent with many historical trends in computing; as machinespeeds increase, so does the size of the problems we wish tosolve, and, consequently, there is a continuing need to re-duce computation times (Freund and Siegel, 1SOS).

The interpolation process described above has beencalled a "brute-force" approach (Hodgson, 19Bg), and re-searchers have worked to overcome the computational in-tractibil i t ies associated with this method. White (i984)demonstrated the performance advantage of using integer (asopposed to floating point) distance calculations. MacDougall(t9Be) showed how computation times could be reduced byusing a distance estimation technique that does not requirethe calculation of square roots. This estimation technique,however, introduces small errors into the interpolation pro-cess because distances along diagonals are underestimated byapproximately 10 percent.

Other efforts have focused on reducing the number ofcomputations made. Originally, the neighborhood for inter-polation was established by computing distances betweeneach grid point and all control points and then orderingthese distances to find the k-nearest. According to Hodgson(1989), ERDAS adopted a strategy to reduce computation thatuses a local approach to interpolation; a 3 by 3 window isIocated over each grid cell for which a value must be inter-polated, and only those control points that are in the win-dow are candidates for inclusion in the computation of theinterpolated cell value. Hodgson (1989), however, states that,while this approach reduces computation, it is not guaran-teed to find the k-nearest neighbors. Clarke (1990, p. 202)discusses this problem further and provides C code to imple-ment this local approach to interpolation.

Hodgson (1989) developed an efficient interpolationstrategy that is guaranteed to find the k-nearest points of agrid location using a "learned search" method. This methodexploits the spatial structure inherent in the arrangement ofcontrol points; for any given cell, its k-nearest control pointsare likely to be the same as those of an adjoining cell. There-fore, rather than discard information about control points ob-tained during a previous search for neighbors from adiacentcells, his method retains these values, bases the new searchon them, and reduces the total number of calculations re-quired.

Despite these improvements, substantial amounts ofcomputation time are still required for extremely large prob-Iems. Large interpolation problems are destined to becomecommonplace given the increasing diversity, size, and disag-gregation of digital spatial databases. Parallel algorithms areoften developed specifically to overcome the computationalintractabilities that are associated with such large problems.It is interesting to note, however, that the efficient approachdescribed by Hodgson (tg8g) was developed for implementa-tion on a sequential (Von Neumann) machine; it does not ap-pear to be well-suited for implememtation in parallelcomputing environments because it relies on the presence ofserial dependencies in the data to reduce computation. New

1094

approaches must be developed to exploit the latent parallel-ism in spatial problems because it is likely that future highperformance computing environments will be constructedusing parallel architectures and methods of computation.

The parallel algorithms described in this paper are basedon an existing serial brute force algorithm. Specifically, wedemonstrate how a serial FORTRAN implementation of codethat performs two-dimensional interpolation (MacDougall,1984) is translated, using parallel programming extensions toFORTRAN 77, i\to versions that run in two parallel computingenvironments: a traditional shared-memory computer and anewer, more massively parallel hybrid architecture that usesa distributed hierarchical cache structure to implementshared-memory access. In translating a serial piogram into aform suitable for parallel processing, the charicteiistics ofthe problem and the architecture of the computer to be usedmust be considered. We first briefly discuss architectural fac-tors and then turn to a specific distussion of our parallel im-plementations of the interpolation algorithm. Note that these"brute-force" implementations represent worst case scenariosagainst which other approaches to parallel algorithm en-hancement can be compiled.

Architectuml ConsiderationsDuring the past several years, many computer architects andmanufacturers have turned from pipelined architectures to-ward parallel processing as a means of providing cost-effec-tive high performance computing environments (El-Rewini efa.1., r99e; Myers, 1993; Pancake, 1991). Though parallel ar-chitectures take many forms, a basic distinction can bedrawn on the basis of the number of instructions that are ex-ecuted in parallel. A single instruction, multiple data (stvo)stream computer simultaneously executes the same instruc-tion on several (often thousands ofl data items. Conse-quently, programs developed for such architectures normallyuse_a synchronous, fine-grained approach to parallelism. Amultiple instruction, multiple data (utvo) str-eam computer,on the other hand, handles the partitioning of work amongp_rocessors in a more flexible way, because processors can beallocated tasks that vary in size, For example, programmersmight assign either portions of a large loop or a copy of anentire- procedure to each processor. This asynchronous ap-proach to parallelism is often implemented using a shared-memory model in which each processor can independentlyread from and write to a common address space. This gen-eral approach was adopted for the computatlonal experl-ments reported in this paper.

The allocation and execution of asynchronous parallelprocesses can occllr using either an architecture explicitly de-signed for parallel processing, such as the computeis used inour experiments, or a loosely confederated set of networkedworkstations using software such as LINDA (Carriero and Gel-emter, 1990) or eVM (Parallel Virtual Machine) (Beguelin etal., t9914 Beguelin ef o1., 1gg3). r,nvoa and pvM are coordina-tion_lalguages that enable parallel processing on these net-worked computers. Because tasks can be assigned withdifferent amounts of computation when this Coarse-grainedMIMD approach is used, programmers must be concerned withbalancing workloads across processors. If a given processorfinishes with it assigned task and it requireJ data being com-puted by another processor in order to continue, theni de-pendencyis said to exist, the pool of available processors isbeing used inefficiently, and parallel performan-ce will be de-graded (Armstrong and Densham, 19gZ).

Page 3: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

Code Fragment 1. EPFbased pseudocode for parallel interpola-tion.

T STATEMENTSFILE MANAGEMENTREAD DATA POINTSCALCULATE INTERPOLATION PARAMETERS

t_start = etime(tmp)FOR EACH CELL IN THE MAPPARALLEL

INTEGER I, J, K, TAdREAL TEMP,T,BREAL distance(Maxcoords)INTEGER L(Maxcoords)PRIVATE I, J, K, RAD, TEMP, T, B, diStANCC, LDOALL o=l:Columns)

DO 710 I=1, RowsFOR EACH DATA POINTCOMPUTE DISTANCE FROM POINT TO GRID CELLCHECK NUMBEROF POINTS WITHIN RADruSCOMPUTE INTERPOLATED VALUE

CONTINUEEND DOALL

END PARALLELt_end = etime(tmp)t1=(t_end-t_star t )

END

Our parallel implementations of MacDougall's serial in-terpolation algorithm use parallel programming su-persets ofFORTRAN zz tiat support parallel task creation and conbol,including memory aCcess and task sy-nchronization (e.g.,Brawer, isas). Cod" Fragments 'J' and 2, based on Mac-Dougall (1984), present two simpliled (pseudocode) views ofthe brute-force interpolation algorithm' Code Fragment 1 wasdeveloped for the Encore environment and uses Encore Par-allel ronrnaN (EPF) constructs to implement parallelism (En-

core Computer, 1988). Code Fragment 2 was implemented.for the KSnr supercomputer and uses extensions that are sim-ilar in functiorrto those provided by nrr. In general, the im-plementations vary primarily in syntax and are brieflydescribed in the following sections.

Parallel Task CteationEach parallel implementation is a conventional roRtRaN zzptog.i- in which parallel constructs are inserted. In the En-iord environment, spr constructs can only be used withinparallel regions which are defined by enclosing code blocksi"ittr tt

" k6ywords PARALLEL and END PARALLEL. Within

a parallel region, the program initiates additional tasks andexecutes the"m in par"ttet' For example, the DOALL ... ENDDOALL construct partitions loop iterations among the availa-ble processors (Code Fragment 1).

In the KSRI programming environment, a TILE directiveis used to control the partitioning of tasks among processors(Code Fragment 2). This directive automatically allocatestasks among the available set of processors using, in this -case, the loop iterations controlled by the f counter, whichare assigned to different processors t_hat qe used t-o computeindepen-"dently each elem-ent of the shared interpolated ma-trix.

Sharcd MemoryIn the EPF environment, all variables are shared among theset of parallel tasks unless they are explicitly re-declared in-side a parallel region. A variable declared, or re-declared, in-side a parallel region cannot be accessed outside that parallelregion; therefore, each task has its own private copy of that

Code Fragment 2. KSR1 pseudocode for the interpolatin prob-lem. The emphasis in this example is on the main loop in theinterpolation code.

lmplementationThe computational experiments described in this paper wereperformed using two parallel computers: an Encore Multi-max and a KSRI supercomputer. The Encore is a modularMIMD computing environment that uses a conventionalshared memory architecture (Brawer, 19S9, pp. 28-29). Userscan access between 1 and t4 Ns32332 processors that areconnected by a Nanobus with a sustained bandwidth of toomegabytes per second. Each processor has its own small (32-K byte) cache of fast static RAM and can access 32 Mbytes offast, global, shared memory over the Nanobus.

Unlike the Encore, the KSR1 supercomputer does nothave a single, monolithic shared memory. Instead, it uses adistributed architecture. This distributed architecture can beproqrammed like a conventional shared memory machine,but"it is actually built from a collection of local caches. Thebasic configuration of processing units of the KSRI that weused at the Cornell Theory Center consists of 64 separateRISC processors that are organized ilto two rings of 32 PJo-cessols; each processor is connected to a high-bandwidthnetwork. Memory access in this distributed structure, knownas ALLCACHE, is organized hierarchically. Each processor inthe architecture has a 256-KB subcache and a 32-MB cache.Very fast access is provided for local memory requests (e.g.,from the subcache), and increasing latency penalties are in-curred when memory must be accessed from remote loca-tions. Specifically, data can be accessed from the subcache ofeach processor in only two clock cycles. If a proclss requiresdata that are not found in the subcache, they might be ac-cessed from the 32-}lB cache, and a penalty of t8 clock cy-cles would be incurred. However, if data must be accessedfrom an entirely different processor on the network, 175clock cycles are required. Even though the machine can beprogrammed as a single large shared virtual space, becauseif tf,rrr latency p"n"Ities, riorkload distributi,on and depend-ency relations must be carefully evaluated during the algo-rithm design stage.

PEIRA

Notes:

1) psc stands for partially shared common.

2l A c*ksr* psc declaration allows each parallel thread in a TILE directive to have

its om coov of the comnon block. This is required because the PRIVATE

declaration does not allow the use of multidimersional variables'

integer L(Maxcoords)REAL distance(Maxcoords)

common /myblock/ distmce, L

psc /myblock/

TILE 0, PRMTE=(I,K,K1',K2,L1,rad,temp,T,B,/myblock/))DO 700 I=1, Columns

DO 710 I=1, Rows

71.0 CONTINUE7OO CONTINUE

Page 4: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

TneLe 1. Coupurelot Trvgs wxrn DTFFERENT NuMBERS oF ENcoREPRocessons ARE USED ro CovpurE rHE 24O By 800 GR|D Usrr.re 10,000

CoNTRoL Porrrs eruo tHneE PRoxluA_ PorNrs.

Processors

Hours 3 3 . 3 17.o

superscalar workstation (Rsloooo-sso) was used to computeresults for the same problem, it was more than four timesfaster than a single Encore processor. When the full comple-ment-of 14 processors is used, however, the advantage of theparallel approach is clearly demonstrated: the Encore is threetimes faster than the workstation.

Figure 1 shows a monotonic decrease of run times as ad-ditional processors are used to compute the interpolatedgrid. Though the slope of the curve begins to flatfen, agreater than order-of-magnitude decrease in computationtime is achieved. This indicates that the problem is amenableto parallelism and that further investigatibn of the problem iswarranted.

The results of parallel implementations are often evalu-ated by comparing parallel timing results to those obtainedusing-a version of the code that uses a single processor.Speedup (Brawer, 1989, p. Z5) is the ratio bf these two exe-cution times:

Execution Time for One ProcessorDpeeoup : E

Measures of speedup are often used to determinewhether additional processors are used effectively by a pro-gram. Though, the maximum speedup attainable is e(uai tothe number_of processors (a) used, speedups are normallyless than n because of inefficiencies that relults from compu-tational overhead, such as the establishment of parallel pr-oc-esses, and because of inter-processor communication anddepend-ency bottlenecks. Figure 2 shows the speedups ob-tained for the 10,000-point interpolation probl-em. The near-linear increase indicaies that the problem'scales well in theEncore shared-memory MIMD environment.

A measure of efficiency is also sometimes used to evalu-ate the way in which processors are used by a program. This

l +7 21 0

2.53 .44 . 2c . /

variable, This behavior is explicitly enforced in both envi-ronments by using the PRIVATE keyword when declaringvariables. This approach could be used, for example, in acase when each of several parallel tasks needs its own copyof data to modify specific elements in a shared data struclture. In the KSRI environment, however, the PRIVATE con-struct does not support multidimensional variables.Consequently, we used an extension, called partially sharedcommon, that enabled us to implement the independent cal-culation of interpolated grid values (Code Fragment 2).

General Strategy for lmplementation Using Shared MemoryThe process of converting a serial program to a parallel ver-sion often takes place in a series of steps, beginning with ananalysis of dependencies among program components; suchdependencies may preclude efficient implementation (Arms-trong and Densham, 1992). As the code is decomposed intodiscrete parts, each is treated as an independent process thatcan be eiecuted concurrently on different procesiors. Whilesome problems may need to be completely restructured toachieve an efficient parallel implementation, in many in-stances conversion is straightforward. In this case, the inter-polation algorithm can be cleanly divided into a set ofindependent processes using a coarse-grained approach toparallelism in which the computation of an interpolatedvalue for each grid cell in the lattice is treated in'dependentlyfrom the computation of values for all other cells. Thoughsome variables are declared private to the computation ofeach cell value, the matrix that contains the interpolated gridvalues is shared. Thus, each process contributes to the com-putation of the grid by indepbndently writing its results tothe matrix held in shared memory. In principle, therefore, aslarger numbers of processes execute concurrently, the totaltime required to calculate results should decrease. This gen-eral programming strategy is used for both implementationsof the parallel interpolation algorithm which are tested usingthe following problem: 10,000 control points are used to in-terpolate a 24Oby 800 grid (192,000 cells) using the threeclosest control points to perform each calculation.

Encole ResultsBecause the Encore is a multi-user system, small inegulari-ties in execution times can arise from system load variabil-ity. To control the effect that multiple users have on overallsystem performance, we attempted to make timing runs dur-ing low-use periods. Given the amount of computation timerequired for the interpolation experiments that used one andtwo processors, however, some periods of heavier system usewere encountered.

The f 0,000-point interpolation problem presents a formi-dable computational task that is well illustrated by the re-sults contained in Table 1. When a single Encore processoris used, 33.3 hours are required to compute a solution to theproblem. The run time is reduced to 2.5 hours, however,when all 14 processors are used. Each Encore processor isnot especially fast by today's standards, and, in fact, when a

1100

1 0 t <

Number of processors

Figure 1. Run times (hours) for the Encore interpolationexperiments.

Page 5: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

1 4

1 2

1 0

o 8

o@

o 6

4

Number of Processors

Figure 2. Speedups for the Encore interpolation runs.

TlaLe 2. Erscterucv oF THE ENcoRE INTERPoLATIoN ExpeRtverurs.

Processors

Eff ic iency 0.99 0.Sg o.97 0.99 0 .98 0 .99

TneLe 3. KSR1 lrurenpor-rrloN UsING rHE24O BY 800 GRID wlrH 10'000PoTNTS AND rxRee-PRoxtvAL Potttts.

Processors Seconds H:M:S Speedup Efhciency

o ! R 3 ? 3 3Number of Processors

Figure 3. Run times (minutes) for the KSR1 interpola-tion experiments.

741 21 0

').7

I J

5 8

77191 185

5911 5 1

2 :08 :390 :19 :450 :09 :51O:O2 :37

1 .06 .5

1.3.75 1 . 1

1.000.93o.870.88

measure simply controls for the size of a speedup by divid-ing it by the number of processors used' If values remainneir 1.0 as additional processors are used to compute solu-tions, then the program scales well. On the other hand, if ef-ficiency values begin to decrease as additional processors areused, then they are being used ineffectively and an alterna-tive approach to decomposing the problem might be pur-sued. Table 2 shows the computational efficiency of this setof interpolation experiments. The results demonstrate thatthe processors are being used efficiently across their entirerange (Z to 14 processors) because no marked decreases areexhibited. The fluctuations observed can be attributed to thelack of precision of the timing utility (etime) and randomvariations in the performance of a multi-processor, multi-user system. It islnteresting to note, however, that the larg-est decrease in efficiency occurs as the number of processorsis increased from 12 to 14 (the maximum for our Encore en-vironment). It is possible that this decrease is caused, at leastin part, by the effect of system-related processes that are fac-tored into the computation of results when all 14 processorsare used. Because additional processors are unavailable,however, it cannot be determined if this decrease is causedby the presence of this overhead or by problems associated

PE&R5

with scalability that are only encountered as larger-numbersof processors are added to the computation of results. Wetherefore wished to determine whether this trend would per-sist when a larger-scale KRsl supercomputer with-Oe proces-sors was applied to the same problem. We were also inter-ested in uslng a larger, faster supercomputer because, evenwhen all L4 Encore processors were used, 2.5 hours of com-putation time were required'

Page 6: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

3 s oo

a

Number of Processors

Figure 4. Speedups for the KSR1 interpolation runs.

cuted concurrently on shared memory parallel computers.This general approach to reducing computation times willbecome increasingly commonplace beciuse many computermanufacturers have begun to use parallelism to provide userswith cost-effective, high performance computing environ-ments. The approach described here should also work whenapplied to other computationally intensive methods of inter-polation such as kriging (Oliver et al., 1989a; Oliver ef o1.,198gb) that may not be directly amenable to the neighbor-hood-search method developed by Hodgson.

The results of our computational experiments show thata parallel, coarse-grained, shared-memory approach to paral-lel inverse-distance-weighted interpolation yields resulls thatscale well. The measures of efEciencv obtained for the En-c-ore implementation are uniformly high: they are greaterthan-or equal to o.95. The small drop-off in ;fficiency thatwe observed in the Encore environment when 14 processorswere used did not become substantially greater in the moremassively parallel KSRI environment. In fact, efEciency re-mained high and fluctuated only slightly when we increasedfrom -1.5 to 58 KsRl processors. Our parallel approach to in-terpolation reduced overall computalion timei from over 33hours (2000 minutes) when one Encore processor was used,to approximately two and one-half minutes when the fullcomplement of 58 xsnr processors was applied to the prob-lem; this represents a speedup of 8oo. If [he Rs/oooo-ssoworkstation time is used in place of the Encore as the basefor the calculation, the speedup, while more modest at'1,7g,is still quite substantial.

Future research in this area should take place in twoveins. The first is to use a more massive appioach to parallel-ism. Altemative architectures such as a heterogeneous net-work of workstations (Carriero and Gelemter, i990; or anSIUD machine could be fruitfully investigated. It may be that amore highly parallel brute-force approach can yield perform-.anc_e_ tlat is superior to the search-based approaches suggested!y Hodgson. The second, and probably ultlmately *o.e !ro-ductive, line of inquiry would meld the work begun here withthat of the neighborhood search-based work. The combinationofboth approaches should yield highly effective results thatwill transform large, nearly intractable spatial interpolationproblems into those that can be solved in seconds.-

AcknowledgmentsPartial support for this research was provided by the Na-tional Science Foundation (ses-soz+zia). We would Iike tothank-The University of Iowa Weeg Computing Center forproviding access to the Encore Multimarcomputer, and theCornell National Supercomputing Facility for providing ac-cess to the RS/6000-5S0 workstation and the KSR1 supercom-puter. Dan Dwyer, Smart Node Program Coordinator at CNSF,was especially helpful. Michael E. Hodgson, Claire E. pavlik,Demetrius Rokos, and Gerry Rushton provided helpful com-ments on an earlier draft oi this paper-. A-y I. Ruggles cre-ated the graphs. This is a revised and expanded veision of apaper that appeared in the Proceedings of Auto-Carto 11.

RefelencesArmstrong, M.P., and P.J. Densham, 1992. Domain decomposition for

parallel processing of spatial problems, Computers, Environmentand Urban Systems, 16(6):497-513.

Beguelin, A., /. Dongarra, A. Geist, B. Manchek, and V. Sunderam,1991. ,4 User's Guide to PVM: Parallel Virtual Machine, Techni-

1al.R9n9rt ORNL/TM-I1826, Oak Ridge National Laboratory,Oak Ridge, Tennessee.

KSR1 ResultsThe KSRI is a dedicated parallel supercomputer. At the timeour experiments were conducted, the KSRI computer at theCornell Theory Center was configured with 64 processors.However, these processors could only be accessed in a lim-ited number of queues: 'J-., 7, 75, and 58 processors.

As shown by comparing the results in Table i. withthose in Table 3, each processor in the KSRI configuration isfar faster than those employed by the Encore. When a singleprocessor on each machine was used to compute the same10,000-point interpolation problem, what took 33 hours onthe Encore was reduced to approximately two hours on theKSRr; this decrease is more than an order of magnitude insize. As additional processors are added, computation timescontinue to decrease dramatically, and when all 58 proces-sors are used, total run time is reduced from more than twohours to two and one-half minutes (Figure 3). The speedupand efEciency measures for the KSRr implementation alsodemonstrate that the additional processors are being used ef-fectively. Near-linear speedups are observed (Table 3 andFigure 4) and efficiency remains high (near 0.9) for each ofthe available configurations of the machine.

ConclusionsSeveral different approaches to improving the performance ofinverse-distance-weighted interpolation algorithms have beendeveloped. One successful approach (Hodgson, 1989) fo-cused on reducing the total number of computations madeby efficiently determining those control points that are in theneighborhood of each interpolated cell. When traditional se-rial computers are used, this approach yields substantial re-ductions in computation times. The method developed inthis paper takes a different tack by decomposing the interpo-lation problem into a set of sub-problems that can be exej

Lto2

Page 7: Inverse-Distance -Weighted Spatial Interpolation …Vol. Oo, No. 9, September 1994, pp. 1097-1103. 0099-1 1 12l94/0009-000$3.00/0 @ 1994 American Society for Photogrammetry and Remote

Beguelin, A., /. Dongarra, A. Geist, and V. Sunderam, 1993. Visuali-zation and debugging in a heterogeneous environment, IEEEComputer, 26(6):88-95.

Brawer, S., 1989. Introduction to Parallel Programming, AcademicPress, San Diego, California.

Burrough, P.A., 1986. Principles of Geographical Information Sys-tems for Land Resources Assessment, Oxford University Press,New York, N.Y.

Carriero, N., and D. Gelernter, 799O. How to Write Parallel Programs:A First Coutse, The MIT Press, Cambridge, Massachusetts.

Clarke, K.C., 1990. Analytical and Computer Cartograph6 Prentice-Hall, Englewood Cliffs, New lersey.

Cromley, R.G., 1992. Digital Cartography, Prentice-Hall, EnglewoodCliffs, New Jersey.

El-Rewini, H., T. Lewis, and B. Shriver, 1993. Parallel and distrib-uted systems: From theory to practice, IEEE Parallel and Distrib-uted TechnologTr, 1 (3) : 7-1 1.

Encore Computer, 1988. Encote Parallel Fortran,EPF 724-06785,Re-vision A, Encore Computer Corporation, Marlborough, Massa-chusetts.

Freund, R.F., and H.f. Siegel, 1993. Heterogeneous processing, IEEEC o m p ute r, 26(6):73-77 .

Hodgson, M.E., 1989. Searching methods for rapid grid interpolation,The Professional Geographer, 41(1):51-61.

1992. Sensitivity of spatial interpolation models to parametervariation, Technical Papers of the ACSM Annual Convention,American Congress on Surveying and Mapping, Bethesda, Mary-land,2:173-t22.

Lam, N-S., 1983. Spatial interpolation methods: A review, TfteAmerican Cartographer, 2:729-149.

MacDougall, E.8., 1976. Computer Programming for Spatial Prcb-lems, Edward Arnold, London.

1984. Surface mapping with weighted averages in a micro-computer, Spatial Algorithms for Prccessing Land Data with aMicrocomputer, Lincoln Institute Monograph #84-2, Lincoln In-stitute of Land Policy, Cambridge, Massachusetts.

Mitchell, W.1.,1577. Computer-Aided Architectural Design, Van Nos-trand Reinhold, New York, N.Y.

Monmonier, M.S., 1982. Computer-Assisted Cartography: Principlesand Prospects, Prentice-Hall, Englewood Cliffs, New Jersey.

Myers, W., 1993. Supercomputing 92 reaches down to the worksta-rion, IEEE Compute4 26(7)1773-r\7 .

Oliver, M., R. Webster, and J. Gerrard, 1989a. Geostatistics in physi-

cal geography. Part I: Theory, Tmnsactions of the Institute ofB ritish Ge o gra p he rs, 74 :259-269.

1989b. Geostatistics in physical geography. Part II: Applica-tions, Transactions of the Institute of British Geographers, 14:270-286.

Pancake, C.M., 1991. Software support for parallel computing: whereare we headed? Communications of the Association for Comput-i n g Mac h i n e ry, 3 4(71) :5 244.

Shepard, D., 1968. A two dimensional interpolation function for ir-regularly spaced data, Harvard Papers in Theoretical Geography,Geography and the Property of Surfaces Series No. 15, HarvardUniversity Laboratory for Computer Graphics and Spatial Analy-sis, Cambridge, Massachusetts.

White, D., 1984. Comments on surface mapping with weighted aver-ages in a microcomputer, Spatial Algorithms for ProcessingLand Data with a Microcomputer, Lincoln Institute Monograph#84-2, Lincoln Institute of Land Policy, Cambridge, Massachu-setts,

Marc ArmstrongMarc Armstrong has a B.A. degree from SUNY-Plattsburgh, an M.A. degree from UNC-Char-lotte, and a Ph.D. degree from the University ofIllinois at Urbana-Champaign. He is currently anAssociate Professor at The Universitv of Iowa

where he holds appointments in the Departments of Geogra-phy and Computer Science and in the Program in AppliedMathematical and Computational Sciences. His research fo-cuses on the design of spatial decision support systems andmethods of spatial analysis.

Richard MarcianoRichard Marciano received the M.S. and Ph.D.degrees in computer science from the Universityof Iowa in 1989 and 1992, respectively. For the

"&& iff:,is il:#:*i":":.*:1ilT"::Till*:?#iinterests include computational science, parallelizing compil-ers, algebraic compiler environments, and high-speed com-puting. He is currently a Computational Scientist at theNational Supercomputer Center for Energy and the Environ-National Supercomputer Center for Energy and the Environ-ment, University of Nevada, Las Vegas.

Call for NominationsASPRS Fellow and Honorary Member Awards

ASPRS wouid like to encourage members to submit nominations for its Fellow and Honorary Memberawards.

The Fellow Award is relatively new. It is conferred on active Society members who have performedexceptional service in advancing the science and use of the mapping sciences and for service to the Society.\fernilees must have been active members of the Society for at least ten years at the time of their nomination.

The Honorary Member Award is ASPRS's highest honor. It recognizes an individual who has rendereddistinguished service to the Society and/or who has attained distinction in advancing the science and use ofthe mapping sciences. It is awarded for professional excellence and se vice to the Society. Nominees musthave been active members of the Society for at least fifteen years at the time of their nomination.

For more information, or to obtain a nomination form for these very special awards, please call DianaRipple at 301-493-0290.