compmech1

8/3/2019 compmech1

1/19

F

orPeerR

eview

Draft Manuscript for Review

Draft Manuscript for Review

PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFERAND FLUID FLOW PROBLEMS

Journal: Computational Mechanics

Manuscript ID: CM-03-0003

Manuscript Type: Original Paper

Date Submitted by the Author: 29-Nov-2003

Keywords: finite element, galerkin, variational method

age 1 of 19

Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

Computational Mechanics

8/3/2019 compmech1

2/19

ForPeer

Review

1

PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFER AND FLUID

FLOW PROBLEMS

I. V. Singh

Mechanical Engineering Group

Birla Institute of Technology and Science

Pilani, 333 031, Rajasthan, India

E-mail: [email protected], [email protected]

ABSTRACT

The parallel implementation of the element free Galerkin (EFG) method for heat transfer and fluid flow

problems on MIMD type parallel computer is treated. A new parallel algorithm has been proposed in which

parallelization is performed by row-wise data distribution among the processors. The codes have been

developed in FORTRAN language using MPI message passing library. Two model (heat transfer and fluid flow)

problems have been solved to validate the proposed algorithm. The total time, communication time, user time,

speedup and efficiency have been estimated for heat transfer and fluid flow problems. For 8 processors, the

speedup & efficiency have been obtained as 6.86 & 85.81% respectively in heat transfer problems for the data

size of 1229=N and 7.20 & 90.00% respectively in fluid flow problems for the data size of 1462=N .

Keywords: meshless method; EFG method; parallel computing; heat transfer; fluid flow

1 INTRODUCTION

In the last two decades, the meshless methods have been developed as an effective tool to solve boundary

value problems. The essential feature of these meshless methods is that they only require a set of nodes to

construct the interpolation functions. In contrast to the conventional finite element method, these techniques

save the tedious job of mesh generation as no element is required in the entire model. Furthermore, re-meshing

appears easier because nodes can be easily added or removed in the analysis domain. A large variety of

meshless methods have been developed so far which include: smooth particle hydrodynamics (SPH) [1], diffuse

element method (DEM) [2], element free Galerkin (EFG) method [3], reproducing kernel particle method

(RKPM) [4], partition of unity method (PUM) [5], H-p cloud method [6], free mesh method (FMM) [7], natural

element method (NEM) [8], local boundary integral equation (LBIE) method [9], meshless local Petrov-

Galerkin (MLPG) method [10], the method of finite spheres [11], local radial point interpolation method

(LRPIM) [12] and regular hybrid boundary node method (RHBNM) [13], etc. It has been observed that the

results obtained by most of these meshless methods are competitive to FEM in different areas of engineering.

Page 2 of 19


mailto:[email protected]:[email protected]:[email protected]:[email protected]

8/3/2019 compmech1

3/19

ForPeer

Review

2

Now, the only barrier with the wide implementation of these meshless methods is their huge computational cost.

To reduce the computational cost, few researchers have parallelized free mesh method (FMM) [14], smooth

particle hydrodynamics (SPH) [15] and partition of unity method (PUM) [16]. In the continuity of the

parallelized meshless methods, a parallel algorithm has also been proposed to reduce the computational cost of

the EFG method. The parallel code has been written in FORTRAN language using MPI message passing library

and executed on a supercomputing machine PARAM 10000. Code validation has been done by solving two

model problems. Total time, communication time, user time, speedup and efficiency have been calculated for

heat transfer and fluid flow problems.

2. REVIEW OF THE EFG METHOD

The discretization of the governing equations by EFG method requires moving least square (MLS)

approximants, which are made up of three components: a weight function associated with each node, a basis

function and a set of non-constant coefficients. Using MLS approximation, the unknown function )(xT or )(xu

is approximated by )(xhT or )(xhu over the solution domain [3, 16] as.

Txxx )()()(1

===

n

III

h TT (1a)

or

uxxx )()()(1

===

n

III

h uu (1b)

where, )(xI is the shape function and IT (or Iu ) is the nodal parameter

In the present analysis, cubicspline weight function [16] is used

>

8/3/2019 compmech1

4/19

ForPeer

Review

3

The boundary conditions are given as:

at edge ,1 eTT= (3b)

at edge ,2 0=

y

Tk (3c)

at edge ,3 )( =

TTh

x

Tk (3d)

at edge ,4 )( =

TTh

y

Tk (3e)

Enforcing essential boundary conditions using Lagrange multiplier method, a set of linear equations is obtained

using Eq. (1a)

=

q

R

T

G

GK

0T(4a)

+ +

=

43,

,

,

,

0

0dhdhd

k

kK J

T

IJ

T

IyJ

xI

T

yJ

xI

JI (4b)

+ + =

43

dThdThdQf IIII& (4c)

=1

dNG kIKI (4d)

=1

dNTq KeK (4e)

Eq. (4a) can be further written as:

[ ]{ } { }FUA = (5a)

where,

[ ]NbyN

T

=

0G

GKA (5b)

{ }1byN

=

TU (5c)

{ }1byN

=

q

RF (5d)

2.2 The EFG for Fluid Flow Problems (Example-II)

The momentum equation for viscous incompressible fluid flowing through the long and uniform duct is given as

Page 4 of 19



8/3/2019 compmech1

5/19

ForPeer

Review

4

02

2

2

2

=

+

z

p

y

u

x

u(6a)

The essential boundary conditions are

at the surface 011 == uu (6b)

at the surface 022 == uu (6c)

at the surface 033 == uu (6d)

at the surface 044 == uu (6e)

Enforcing essential boundary conditions using Lagrange multiplier method, the following set of linear equations

is obtained using Eq. (1b)

=

q

f

u

G

GK

T0

(7a)

where

=

d

KyJ

xJ

T

yI

xI

JI

,

,

,

, 0

0(7b)

= dMf II (7c)

+++=

4321

dNdNdNdNG KIKIKIKIKI (7d)

+++=

4

4

3

3

2

2

1

1dNudNudNudNuq KSKSKSKSK

(7e)

The Eq. (7a) can be written as:

[ ]{ } { }FUA = (8a)

where

[ ]NN

T

by0

=

G

GKA (8b)

{ }1byN

=

uU (8c)

{ } 1byN

= q

f

F (8d)

age 5 of 19



8/3/2019 compmech1

6/19

ForPeer

Review

5

3 PARALLEL IMPLEMENTATION

Parallel implementation on distributed memory systems differs from the parallel implementation on shared

memory systems. On distributed memory parallel computers, each processor has its own local memory only.

Data exchange between different processors is done by message passing and the time needed for this

interprocessor communication is to be considered. For parallelization, data are distributed among the processors

in such a way that communication costs should be as low as possible. One approach is the domain

decomposition method in which whole domain is divided in to small subdomains and each processor performs

the work on one subdomain. This gives good results if it is possible to divide the domain into subdomains in

such a way that each processor gets nearly equal amount of work. In this paper, authors have utilized the data

decomposition approach in the parallel algorithm. The EFG code for solving heat transfer and fluid flow

problems consist two parts:

nodes generation and assembling of the system matrix

solution of the linear system of equations

Basically, there are two methods for the parallelization of EFG sequential code. The first method

emphasizes on the parallel implementation of the whole sequential code while second method direct towards the

careful analysis of the whole sequential code and then select the portions intelligently where implementing

parallel programming will results in reduction of computational cost both in terms of time and complexity.

Therefore, first a careful analysis of the EFG code has been performed and it has been found that the time

required in solving the linear system of equations (i.e. inversion time) increases with the increase in data size

(number of equations) as shown in Table 1 & Fig. 1 for heat transfer problems and in Table 2 & Fig. 2 for fluid

flow problems respectively. In other words, the major part of the total computational time is required in solving

the linear system of equations. Therefore, parallel code has been developed only for the solution of the system

of linear equations, not for the whole EFG sequential code.

Table 1: Variation of total time & solution (inversion) time with data size (no. of equations) for heat transfer

problems

Computational time (sec)Data size

(no. of equations)Total time, tt Solution time, st

(Inverse time)

x100

t

s

t

t%

89

131

281

461

701

9911229

0.6431

1.6158

11.3075

55.9784

194.5410

598.85651179.5400

0.3641

1.1312

9.8999

52.740

188.2192

587.26741161.9026

56.62

70.00

87.55

94.21

96.75

98.0698.50

Page 6 of 19



8/3/2019 compmech1

7/19

ForPeer

Review

6

200 400 600 800 1000 12000

10

20

30

40

50

60

70

80

90

100

Data size (Number of equations)

Solution(inverse)time(%)

Fig. 1: Percentage variation of solution (inverse) time with data size (no. of equations) for heat transfer problem

Table 2: Variation of total time & solution (inversion) time with data size (no. of equations) for fluid flow

problem

Computational time (sec)Data size

(no. of equations)Total time, tt Solution time, st

(Inverse time)

x100

t

s

t

t%

113161

316

521

776

1126

1462

0.85952.3036

16.6898

78.2076

266.1130

838.3200

1973.200

0.71682.0446

15.8126

75.9189

260.6112

826.9423

1953.8189

83.3988.76

94.75

97.07

97.95

98.64

99.02

age 7 of 19



8/3/2019 compmech1

8/19

ForPeer

Review

7

200 400 600 800 1000 1200 140050

55

60

65

70

75

80

85

90

95

100

Data size (Number of equations)

Solution(inverse)time(%)

Fig. 2: Percentage variation of solution (inverse) time with data size (no. of equations) for fluid flow problem

3.1 Parallel Algorithm for Solution of Linear System Equations

Matrix inversion method is one of the common methods adopted for obtaining the solution of the system of

linear equations [ ]{ } { }FUA = . In this method, first the inverse of matrix [ ]A is calculated then the solution is

computed using { } [ ] { }FAU 1= . In the present work, a parallel algorithm is proposed based on the matrix

inversion technique to reduce the computational cost of the EFG method. During implementation of this

algorithm on a supercomputer (PARAM 10000), first row wise data distribution is carried out. After proper data

distribution among the processors, an identity matrix is generated by each processor of size [ ]A . In the process

of matrix inversion, row wise operations are carried out. Every non-diagonal element of matrix [ ]A is converted

to zero and every diagonal element of matrix [ ]A is converted to unity. Whatever operations carried out on

matrix [ ]A , the same operations are also carried out on matrix [ ]I . Each processor operates on its own row to

achieve less computational time. After finding the inverse of matrix [ ]A , the unknown [ ]U is calculated by

using { } [ ] { }FAU 1= .

Page 8 of 19



8/3/2019 compmech1

9/19

ForPeer

Review

8

Parallel Algorithm

Global Numprocs Number of processors

N Number of equations

MyRank Rank of each processor.

Rank Rank of processor holding current row

[ ]A Input matrix.{ }F Input column vector

[ ]I Inverse matrix of [ ]Ai Variable indicating current rowstart Starting row number for each processor

end Ending row number for each processor

do 0=i toNumprocs - 1

Set startSet end

end do

do i = 1 to NSet diagonal elements of [ ] iA = 1.0

Change non-diagonal element of [ ] iA

Change elements of matrix [ ] iI

do i = 0 toNumprocs 1Find theRankof the current row

If (MyRank=Rank) then

Broadcast current row

endif

end do

do j = startto end

Change non-diagonal element of [ ] 0.0=iA

Change elements of matrix [ ] iIend do

end do

do i = startto end

Compute { }iUend do

do j = 1 toNumprocs 1

Send { }iU to Master Processorend do

3.2 Hardware and Software UsedThe hardware used for numerical solution is a PARAM 10000 supercomputer which has been developed by

C-DAC, Pune, India. The PARAM 10000 is 6.4 GF, RISC based distributed memory multiprocessor system and

categorized under multiple instruction multiple data (MIMD) type computer. It has total four nodes (three

compute nodes and one server node). Each compute node has two UltraSparc II 64-bit RISC CPUs of 400 MHz,

512 MB main memory, two Ultra SCSI HDD of 9.1 GB each and one 10/100 Fast Ethernet Card while server

node has two UltraSparc II 64-bit RISC CPUs of 400 MHz, 1GB of main memory, four Ultra SCSI HDD of 9.1

GB each and one 10/100 Fast Ethernet Card. PARAM 10000 parallel machine has total 8 processors (each node

with two processors), Sun Sparc Compilers (F90 Compiler Version 2.0, F77 Compiler Version 5.0, C Compiler

Version 5.0, C++ Compiler Version 5.0) and supports both MPI & PVM message passing environments.

age 9 of 19



8/3/2019 compmech1

10/19

ForPeer

Review

9

4 NUMERICAL RESULTS AND DISCUSSION

The parallel code has been developed in FORTRAN language for the proposed algorithm. The EFG results

have been obtained for the model heat transfer and fluid flow problems. The computational time components i.e.

(total time, user time and communication time), speedup and efficiency (Appendix) have been calculated for the

whole sequential code using a PARAM 10000 supercomputer.

4.1 Example-I: Heat Transfer Problem

The parallel EFG results have been obtained for a model heat transfer problem. The different parameters

used for the analysis of model shown in Fig. 3 are tabulated in Table 3. Table 4 shows a comparison of

temperature values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 4, it is

clear that the temperature values obtained by EFG method are in good agreement with those obtained by FEM.

Table 5 shows the variation of total time, communication time, speedup and efficiency with the number of

processors for 701=N . The variation of total time, communication time, user time, speedup and efficiency with

number of processors is also presented in Table 6 for 991=N and in Table 7 for 1229=N . Fig. 4 & Fig. 5

show the variation of speedup & efficiency with number of processors & data size (number of equations). Using

8 processors, the maximum speedup & efficiency have been obtained as 6.86 & 85.81% respectively for the data

size of 1229=N .

From the above analysis, it is observed that as the data size (number of equations) increases, the results

starts improving both in terms of efficiency and speedup. The contribution of communication time to the total

time is almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the

speedup & efficiency are improving with the increase in number of processors.

Fig. 3: Model for heat transfer problem

y

2

L

W

1

3

x

Page 10 of 19



8/3/2019 compmech1

11/19

ForPeer

Review

10

Table 3: Data for the model shown in Fig. 3

Parameters Value of parameters

Length )(L

Width (W )

Thermal conductivity ( k)Rate of internal heat generation ( Q )

Heat transfer coefficient ( h )

Surrounding fluid temperature ( T )

Temperature at edge ( )eT , 0=x or 1

1 m

1 m

400 W/m C

0 W/m3

100 W/m2 C

20 C

200 C

Table 4: Comparison of EFG results with FEM at few typical locations for 121 nodes

Location (m) Temperature ( C0

)

xy

EFG FEM0.50.5

0.5

1.0

1.0

1.0

1.00.5

0.0

1.0

0.5

0.0

160.8224172.2274

175.0713

140.9820

151.8434

155.0760

160.8950172.2670

175.1240

141.0610

151.8510

155.1100

Table 5: Variation of total time, communication time, user time, speedup and efficiency with number of

processors for 701=N

Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)Speedup

Efficiency

(%)

1

2

3

4

5

6

7

8

212.2315

111.4260

78.0770

60.6514

55.0408

54.5207

46.3948

50.2148

0.0000

0.0671

0.1053

0.5706

0.8323

6.0414

1.9330

5.5403

211.6500

110.4400

76.3050

58.2800

48.6950

42.8500

36.8950

33.4350

1.00

1.92

2.77

3.63

4.34

4.94

5.74

6.33

100.00

95.82

92.45

90.75

86.92

82.32

81.95

79.13



Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)Speedup

Efficiency

(%)

1

23

4

5

6

78

598.8565

310.1355214.2525

164.6720

141.6955

134.5425

114.9740122.0950

0.0000

0.19240.2256

0.9013

1.7398

6.8044

3.08556.2373

597.2500

307.7700211.3900

161.3050

133.4550

114.3500

98.785089.4100

1.00

1.942.82

3.70

4.47

5.22

6.056.68

100.00

97.0394.18

92.56

89.50

87.05

86.3783.50

age 11 of 19



8/3/2019 compmech1

12/19

ForPeer

Review

11



Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)Speedup

Efficiency

(%)

12

3

4

5

67

8

1179.5400606.6460

417.6550

320.2650

282.8400

254.3290222.0310

216.9900

0.00000.2567

0.7822

0.5918

2.6019

4.78614.9631

3.8176

1176.5700600.4800

411.4100

315.1500

261.6100

221.0900191.8700

171.3900

1.001.95

2.86

3.73

4.50

5.326.13

6.86

100.0097.97

95.32

93.33

89.95

88.6987.60

85.81

1 2 3 4 5 6 7 81

2

3

4

5

6

7

8

Number of processors

Speedup

N=701

N=991

N=1229

Ideal

Fig. 4: Variation of speedup with number of processors and data size (no. of equations)

Page 12 of 19



8/3/2019 compmech1

13/19

ForPeer

Review

12

1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

80

90

100


Efficiency(%)

N=701

N=991

N=1229

Fig. 5: Variation of efficiency with number of processors and data size (no. of equations)

4.2 Example-II: Fluid Flow Problem

The parallel EFG results have been obtained for a model fluid flow problem. The different parameters used

for the analysis of model shown in Fig. 6 are tabulated in Table 8. Table 9 shows a comparison of velocities

values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 9, it is clear that the

velocity values obtained by EFG method are in good agreement with those obtained by FEM.

Table 10 shows the variation of total time, communication time, user time, speedup and efficiency with the

number of processors for 776=N . Table 11 & Table 12 also show the variation of total time, communication

time, user time, speedup and efficiency with number of processors for 1126=N & 1462=N respectively. Fig.

7 & Fig. 8 show the variation of speedup & efficiency with number of processors & data size (number of

equations). Using 8 processors, the maximum speedup & efficiency have been obtained to be 7.20 & 90.00%

respectively for the data size of 1462=N .

From the above analysis, it is observed that with increase in data size (number of equations), the results are

improving both in terms of efficiency and speedup. The contribution of communication time to the total time is

almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the speedup

& efficiency are improving with the increase in number of processors.

age 13 of 19



8/3/2019 compmech1

14/19

ForPeer

Review

13

Fig. 6: Model cross-section of the fluid flowing through a duct

Table 8: Data for model shown in Fig. 6

Parameters Value of parametersDepth (D )Length (L )

Pressure gradient (z

P

)

Dynamic viscosity ( )

All surface velocities ( )Su

0.25 m

0.25 m

5000 N/m2/m

5 Ns/m2

0 m/sec

Table 9: Comparison of EFG results with FEM at few typical locations for 121 nodes

Location (m) Velocity (m/sec)

x y EFG FEM

0

0

00

0

0

0.125

0.100

0.0750.050

0.025

0.000

0.0000

1.8670

3.18823.9319

4.4598

4.6417

0.0000

1.8280

3.13043.9929

4.4827

4.6412

3

4

D

L

y

z

x

1

2

Page 14 of 19



8/3/2019 compmech1

15/19

ForPeer

Review

14



Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)

SpeedupEfficiency

(%)

1

2

3

4

5

6

7

8

266.1130

141.0705

99.6623

76.6645

66.8084

67.3349

59.5002

65.2026

0.0000

0.0445

0.1466

0.3459

0.8534

6.1760

1.7815

5.3795

265.4600

138.8950

95.5150

72.6950

59.8550

52.2250

44.4450

40.9100

1.00

1.91

2.77

3.65

4.43

5.08

5.97

6.48

100.00

95.56

92.64

91.29

88.70

84.72

85.32

81.11



Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)Speedup

Efficiency

(%)

12

3

4

5

678

838.3200433.8130

299.6620

230.6550

201.4360

181.6840163.2360163.3560

0.00000.1940

0.2509

1.1592

1.6366

4.93643.48105.1744

836.4050427.2300

291.9100

221.5500

182.4800

155.1100134.4500121.9800

1.001.95

2.86

3.77

4.58

5.396.226.85

100.0097.88

95.50

94.38

91.67

89.8788.8785.71

Table 12: Variation of total time, communication time, user time speedup and efficiency with number of


Number of

processors

Total Time

(sec)

Communication

time (sec)

User time

(sec)Speedup

Efficiency

(%)

12

3

4

5

6

7

8

1973.2000994.7770

697.1760

534.6850

449.4020

400.0850

357.6345

335.0405

0.00000.2786

0.9614

0.3451

2.6545

5.8641

4.3058

6.8723

1968.5500983.7400

664.7600

502.8700

422.2300

354.2450

308.5900

273.5350

1.002.00

2.96

3.91

4.66

5.55

6.39

7.20

100.00100.00

98.71

97.86

93.24

92.61

91.35

90.00

age 15 of 19



8/3/2019 compmech1

16/19

ForPeer

Review

15

1 2 3 4 5 6 7 81

2

3

4

5

6

7

8


Speedup

N=776

N=1126

N=1462

Ideal

Fig. 7: Variation of speedup with number of processors and data size (no. of equations)

1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

80

90

100


Efficiency(%)

N=776

N=1126

N=1462

Fig. 8: Variation of efficiency with number of processors and data size (no. of equations)

Page 16 of 19



8/3/2019 compmech1

17/19

ForPeer

Review

16

5 CONCLUSIONS

In this paper, a new parallel algorithm has been proposed for the EFG method. The parallel EFG code has

been written in FORTRAN language using MPI message passing library and validated by solving two model

problems. The analysis shows that with the increase in data size (number of equations), speedup and efficiency

both improve. Moreover it is also observed that with the increase in data size, the results (total time,

communication time, user time, efficiency and speedup) are improving with the increase in number of

processors. From parallel EFG results presented in this paper, it can be noted that the proposed algorithm is

working well for the EFG method.

NOTATIONS

maxd Scaling parameter

Q Rate of internal heat generation /volume

h Convective heat transfer coefficient

k Coefficient of thermal conductivity

M

T

A

Ph

c

r

eT Edge temperature

T Surrounding fluid temperature

)(Iw xx Weight function

Boundary of the domain

M Pressure gradient (z

P

)

n Number of nodes in the domain of influence

N Number of equations

KN Lagrange interpolant

)(xh

T or )(xh

u Moving least square approximant

* Lagrange multiplier

Dynamic viscosity

Domain of the problem

)(x Shape function

age 17 of 19



8/3/2019 compmech1

18/19

ForPeer

Review

17

REFERENCES

1. J. J. Monaghan, An introduction to SPH, Computer Physics Communications, Vol. 48, pp. 89-96, 1988.

2. B. Nayroles, G. Touzot and P. Villon, Generalizing the finite element method: diffuse approximation and

diffuse elements, Computational Mechanics, Vol. 10, pp. 307-318, 1992.

3. T. Belytschko, Y. Y. Lu and L. Gu, Element free Galerkin methods,International Journal for Numerical

Methods in Engineering, Vol. 37, pp. 229-256, 1994.

4. W. K. Liu, S. Jun and Y. F. Zhang, Reproducing kernel particle methods, International Journal for

Numerical Methods in Engineering, Vol. 20, pp. 1081-1106, 1995.

5. I. Babuska and J. M. Melenk, The partition of unity method,International Journal for Numerical Methods

in Engineering, Vol. 40, 727-758, 1997.

6. C. A. Durate and J. T. Oden, An H-p adaptive method using clouds, Computer Methods in Applied

Mechanics and Engineering, Vol. 139, pp. 237-262, 1996.

7. G. Yagawa and T. Yamada, Free mesh method, a new meshless finite element method, Computational

Mechanics, Vol. 18, pp. 383-386, 1996.

8. N. Sukumar, B. Moran and T. Belytschko, The natural element method in solid mechanics, Inernational

Journal for Numerical Methods in Engineering, Vol. 43, pp. 839-887, 1998.

9. T. Zhu, J. D. Zhang and S. N. Atluri, A meshless local boundary integral equation (LBIE) method for

solving nonlinear problems, Computational Mechanics, Vol. 22, pp. 174-186, 1998.

10. S. N. Atluri and T. Zhu, A new Meshless Local Petrov-Galerkin (MLPG) approach in computational

mechanics, Computational Mechanics, Vol. 22, pp. 117-127, 1998.

11. S. De and K. J. Bathe, The method of finite spheres, Computational Mechanics, Vol. 25, pp. 329-345, 2000.

12. G. R. Liu and Y. T. Gu, A local radial point interpolation method (LRPIM) for free vibration analysis of 2-

D solids,Journal of Sound and Vibration, Vol. 246(1), pp. 29-46, 2001.

13. J. Zhang, Z. Yao and M. Tanaka, The meshless regular hybrid boundary node method for 2-D linear

elasticity,Engineering Analysis with Boundary Elements, Vol. 27, pp. 259-268, 2003.

14. M. Shirazaki and G. Yagawa, Large-scale parallel flow analysis based on free mesh method: a virtually

meshless method, Computer Methods in Applied Mechanics and Engineering, Vol. 174, pp. 419-431, 1999.

15. D. F. Medina and J. K. Chen, Three-dimensional simulations of impact induced damage in composite

structures using the parallelized SPH method, Composites: Part-A, Vol. 31, pp. 853-860, 2000.

Page 18 of 19



8/3/2019 compmech1

19/19

ForPeer

Review

18

16. I. V. Singh, K. Sandeep and R. Prakash, Heat transfer analysis of two-dimensional fins using meshless

element-free Galerkin method,Numerical Heat Transfer-Part A, Vol. 44, pp. 73-84, 2003.

APPENDIX

1 COMPUTATIONAL TIME COMPONENTS

The different components of computational time include real time, system time, user time, CPU time, total time

and communication time. Among all these components, emphasis has been given on total time, communication

time and user time.

1.1 Total Time

The total time (run time) is the time at which parallel computation starts to the moment at which last processor

finishes its execution. The total time is the time measured by the MPI watches built in the program itself.

1.2 Communication Time

The communication time is the time required to transfer the data form one processor to the other processor or

processors.

1.3 User time

The time spent by the program in its execution.

2 PERFORMANCE MATRICES

2.1 Speedup

A measure of relative performance between a multiprocessor system and a single processor system is the

speedup factor, it is defined as:

system)essor(multiprocprocessorsofnumberusingtime)(executionUser time

system)processor(singleprocessorsoneusingtime)(executionUser timeSpeedup =

2.2 Efficiency

processorsofnumberXprocessorsofnumberusingUser time

system)processor(singleprocessorsoneusingUser time

Efficiency =

age 19 of 19 Computational Mechanics

compmech1

Documents

Transcript of compmech1