compmech1

download compmech1

of 19

Transcript of compmech1

  • 8/3/2019 compmech1

    1/19

    F

    orPeerR

    eview

    Draft Manuscript for Review

    Draft Manuscript for Review

    PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFERAND FLUID FLOW PROBLEMS

    Journal: Computational Mechanics

    Manuscript ID: CM-03-0003

    Manuscript Type: Original Paper

    Date Submitted by the Author: 29-Nov-2003

    Keywords: finite element, galerkin, variational method

    age 1 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    2/19

    ForPeer

    Review

    1

    PARALLEL IMPLEMENTATION OF THE EFG METHOD FOR HEAT TRANSFER AND FLUID

    FLOW PROBLEMS

    I. V. Singh

    Mechanical Engineering Group

    Birla Institute of Technology and Science

    Pilani, 333 031, Rajasthan, India

    E-mail: [email protected], [email protected]

    ABSTRACT

    The parallel implementation of the element free Galerkin (EFG) method for heat transfer and fluid flow

    problems on MIMD type parallel computer is treated. A new parallel algorithm has been proposed in which

    parallelization is performed by row-wise data distribution among the processors. The codes have been

    developed in FORTRAN language using MPI message passing library. Two model (heat transfer and fluid flow)

    problems have been solved to validate the proposed algorithm. The total time, communication time, user time,

    speedup and efficiency have been estimated for heat transfer and fluid flow problems. For 8 processors, the

    speedup & efficiency have been obtained as 6.86 & 85.81% respectively in heat transfer problems for the data

    size of 1229=N and 7.20 & 90.00% respectively in fluid flow problems for the data size of 1462=N .

    Keywords: meshless method; EFG method; parallel computing; heat transfer; fluid flow

    1 INTRODUCTION

    In the last two decades, the meshless methods have been developed as an effective tool to solve boundary

    value problems. The essential feature of these meshless methods is that they only require a set of nodes to

    construct the interpolation functions. In contrast to the conventional finite element method, these techniques

    save the tedious job of mesh generation as no element is required in the entire model. Furthermore, re-meshing

    appears easier because nodes can be easily added or removed in the analysis domain. A large variety of

    meshless methods have been developed so far which include: smooth particle hydrodynamics (SPH) [1], diffuse

    element method (DEM) [2], element free Galerkin (EFG) method [3], reproducing kernel particle method

    (RKPM) [4], partition of unity method (PUM) [5], H-p cloud method [6], free mesh method (FMM) [7], natural

    element method (NEM) [8], local boundary integral equation (LBIE) method [9], meshless local Petrov-

    Galerkin (MLPG) method [10], the method of finite spheres [11], local radial point interpolation method

    (LRPIM) [12] and regular hybrid boundary node method (RHBNM) [13], etc. It has been observed that the

    results obtained by most of these meshless methods are competitive to FEM in different areas of engineering.

    Page 2 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/3/2019 compmech1

    3/19

    ForPeer

    Review

    2

    Now, the only barrier with the wide implementation of these meshless methods is their huge computational cost.

    To reduce the computational cost, few researchers have parallelized free mesh method (FMM) [14], smooth

    particle hydrodynamics (SPH) [15] and partition of unity method (PUM) [16]. In the continuity of the

    parallelized meshless methods, a parallel algorithm has also been proposed to reduce the computational cost of

    the EFG method. The parallel code has been written in FORTRAN language using MPI message passing library

    and executed on a supercomputing machine PARAM 10000. Code validation has been done by solving two

    model problems. Total time, communication time, user time, speedup and efficiency have been calculated for

    heat transfer and fluid flow problems.

    2. REVIEW OF THE EFG METHOD

    The discretization of the governing equations by EFG method requires moving least square (MLS)

    approximants, which are made up of three components: a weight function associated with each node, a basis

    function and a set of non-constant coefficients. Using MLS approximation, the unknown function )(xT or )(xu

    is approximated by )(xhT or )(xhu over the solution domain [3, 16] as.

    Txxx )()()(1

    ===

    n

    III

    h TT (1a)

    or

    uxxx )()()(1

    ===

    n

    III

    h uu (1b)

    where, )(xI is the shape function and IT (or Iu ) is the nodal parameter

    In the present analysis, cubicspline weight function [16] is used

    >

  • 8/3/2019 compmech1

    4/19

    ForPeer

    Review

    3

    The boundary conditions are given as:

    at edge ,1 eTT= (3b)

    at edge ,2 0=

    y

    Tk (3c)

    at edge ,3 )( =

    TTh

    x

    Tk (3d)

    at edge ,4 )( =

    TTh

    y

    Tk (3e)

    Enforcing essential boundary conditions using Lagrange multiplier method, a set of linear equations is obtained

    using Eq. (1a)

    =

    q

    R

    T

    G

    GK

    0T(4a)

    + +

    =

    43,

    ,

    ,

    ,

    0

    0dhdhd

    k

    kK J

    T

    IJ

    T

    IyJ

    xI

    T

    yJ

    xI

    JI (4b)

    + + =

    43

    dThdThdQf IIII& (4c)

    =1

    dNG kIKI (4d)

    =1

    dNTq KeK (4e)

    Eq. (4a) can be further written as:

    [ ]{ } { }FUA = (5a)

    where,

    [ ]NbyN

    T

    =

    0G

    GKA (5b)

    { }1byN

    =

    TU (5c)

    { }1byN

    =

    q

    RF (5d)

    2.2 The EFG for Fluid Flow Problems (Example-II)

    The momentum equation for viscous incompressible fluid flowing through the long and uniform duct is given as

    Page 4 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    5/19

    ForPeer

    Review

    4

    02

    2

    2

    2

    =

    +

    z

    p

    y

    u

    x

    u(6a)

    The essential boundary conditions are

    at the surface 011 == uu (6b)

    at the surface 022 == uu (6c)

    at the surface 033 == uu (6d)

    at the surface 044 == uu (6e)

    Enforcing essential boundary conditions using Lagrange multiplier method, the following set of linear equations

    is obtained using Eq. (1b)

    =

    q

    f

    u

    G

    GK

    T0

    (7a)

    where

    =

    d

    KyJ

    xJ

    T

    yI

    xI

    JI

    ,

    ,

    ,

    , 0

    0(7b)

    = dMf II (7c)

    +++=

    4321

    dNdNdNdNG KIKIKIKIKI (7d)

    +++=

    4

    4

    3

    3

    2

    2

    1

    1dNudNudNudNuq KSKSKSKSK

    (7e)

    The Eq. (7a) can be written as:

    [ ]{ } { }FUA = (8a)

    where

    [ ]NN

    T

    by0

    =

    G

    GKA (8b)

    { }1byN

    =

    uU (8c)

    { } 1byN

    = q

    f

    F (8d)

    age 5 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    6/19

    ForPeer

    Review

    5

    3 PARALLEL IMPLEMENTATION

    Parallel implementation on distributed memory systems differs from the parallel implementation on shared

    memory systems. On distributed memory parallel computers, each processor has its own local memory only.

    Data exchange between different processors is done by message passing and the time needed for this

    interprocessor communication is to be considered. For parallelization, data are distributed among the processors

    in such a way that communication costs should be as low as possible. One approach is the domain

    decomposition method in which whole domain is divided in to small subdomains and each processor performs

    the work on one subdomain. This gives good results if it is possible to divide the domain into subdomains in

    such a way that each processor gets nearly equal amount of work. In this paper, authors have utilized the data

    decomposition approach in the parallel algorithm. The EFG code for solving heat transfer and fluid flow

    problems consist two parts:

    nodes generation and assembling of the system matrix

    solution of the linear system of equations

    Basically, there are two methods for the parallelization of EFG sequential code. The first method

    emphasizes on the parallel implementation of the whole sequential code while second method direct towards the

    careful analysis of the whole sequential code and then select the portions intelligently where implementing

    parallel programming will results in reduction of computational cost both in terms of time and complexity.

    Therefore, first a careful analysis of the EFG code has been performed and it has been found that the time

    required in solving the linear system of equations (i.e. inversion time) increases with the increase in data size

    (number of equations) as shown in Table 1 & Fig. 1 for heat transfer problems and in Table 2 & Fig. 2 for fluid

    flow problems respectively. In other words, the major part of the total computational time is required in solving

    the linear system of equations. Therefore, parallel code has been developed only for the solution of the system

    of linear equations, not for the whole EFG sequential code.

    Table 1: Variation of total time & solution (inversion) time with data size (no. of equations) for heat transfer

    problems

    Computational time (sec)Data size

    (no. of equations)Total time, tt Solution time, st

    (Inverse time)

    x100

    t

    s

    t

    t%

    89

    131

    281

    461

    701

    9911229

    0.6431

    1.6158

    11.3075

    55.9784

    194.5410

    598.85651179.5400

    0.3641

    1.1312

    9.8999

    52.740

    188.2192

    587.26741161.9026

    56.62

    70.00

    87.55

    94.21

    96.75

    98.0698.50

    Page 6 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    7/19

    ForPeer

    Review

    6

    200 400 600 800 1000 12000

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Data size (Number of equations)

    Solution(inverse)time(%)

    Fig. 1: Percentage variation of solution (inverse) time with data size (no. of equations) for heat transfer problem

    Table 2: Variation of total time & solution (inversion) time with data size (no. of equations) for fluid flow

    problem

    Computational time (sec)Data size

    (no. of equations)Total time, tt Solution time, st

    (Inverse time)

    x100

    t

    s

    t

    t%

    113161

    316

    521

    776

    1126

    1462

    0.85952.3036

    16.6898

    78.2076

    266.1130

    838.3200

    1973.200

    0.71682.0446

    15.8126

    75.9189

    260.6112

    826.9423

    1953.8189

    83.3988.76

    94.75

    97.07

    97.95

    98.64

    99.02

    age 7 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    8/19

    ForPeer

    Review

    7

    200 400 600 800 1000 1200 140050

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Data size (Number of equations)

    Solution(inverse)time(%)

    Fig. 2: Percentage variation of solution (inverse) time with data size (no. of equations) for fluid flow problem

    3.1 Parallel Algorithm for Solution of Linear System Equations

    Matrix inversion method is one of the common methods adopted for obtaining the solution of the system of

    linear equations [ ]{ } { }FUA = . In this method, first the inverse of matrix [ ]A is calculated then the solution is

    computed using { } [ ] { }FAU 1= . In the present work, a parallel algorithm is proposed based on the matrix

    inversion technique to reduce the computational cost of the EFG method. During implementation of this

    algorithm on a supercomputer (PARAM 10000), first row wise data distribution is carried out. After proper data

    distribution among the processors, an identity matrix is generated by each processor of size [ ]A . In the process

    of matrix inversion, row wise operations are carried out. Every non-diagonal element of matrix [ ]A is converted

    to zero and every diagonal element of matrix [ ]A is converted to unity. Whatever operations carried out on

    matrix [ ]A , the same operations are also carried out on matrix [ ]I . Each processor operates on its own row to

    achieve less computational time. After finding the inverse of matrix [ ]A , the unknown [ ]U is calculated by

    using { } [ ] { }FAU 1= .

    Page 8 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    9/19

    ForPeer

    Review

    8

    Parallel Algorithm

    Global Numprocs Number of processors

    N Number of equations

    MyRank Rank of each processor.

    Rank Rank of processor holding current row

    [ ]A Input matrix.{ }F Input column vector

    [ ]I Inverse matrix of [ ]Ai Variable indicating current rowstart Starting row number for each processor

    end Ending row number for each processor

    do 0=i toNumprocs - 1

    Set startSet end

    end do

    do i = 1 to NSet diagonal elements of [ ] iA = 1.0

    Change non-diagonal element of [ ] iA

    Change elements of matrix [ ] iI

    do i = 0 toNumprocs 1Find theRankof the current row

    If (MyRank=Rank) then

    Broadcast current row

    endif

    end do

    do j = startto end

    Change non-diagonal element of [ ] 0.0=iA

    Change elements of matrix [ ] iIend do

    end do

    do i = startto end

    Compute { }iUend do

    do j = 1 toNumprocs 1

    Send { }iU to Master Processorend do

    3.2 Hardware and Software UsedThe hardware used for numerical solution is a PARAM 10000 supercomputer which has been developed by

    C-DAC, Pune, India. The PARAM 10000 is 6.4 GF, RISC based distributed memory multiprocessor system and

    categorized under multiple instruction multiple data (MIMD) type computer. It has total four nodes (three

    compute nodes and one server node). Each compute node has two UltraSparc II 64-bit RISC CPUs of 400 MHz,

    512 MB main memory, two Ultra SCSI HDD of 9.1 GB each and one 10/100 Fast Ethernet Card while server

    node has two UltraSparc II 64-bit RISC CPUs of 400 MHz, 1GB of main memory, four Ultra SCSI HDD of 9.1

    GB each and one 10/100 Fast Ethernet Card. PARAM 10000 parallel machine has total 8 processors (each node

    with two processors), Sun Sparc Compilers (F90 Compiler Version 2.0, F77 Compiler Version 5.0, C Compiler

    Version 5.0, C++ Compiler Version 5.0) and supports both MPI & PVM message passing environments.

    age 9 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    10/19

    ForPeer

    Review

    9

    4 NUMERICAL RESULTS AND DISCUSSION

    The parallel code has been developed in FORTRAN language for the proposed algorithm. The EFG results

    have been obtained for the model heat transfer and fluid flow problems. The computational time components i.e.

    (total time, user time and communication time), speedup and efficiency (Appendix) have been calculated for the

    whole sequential code using a PARAM 10000 supercomputer.

    4.1 Example-I: Heat Transfer Problem

    The parallel EFG results have been obtained for a model heat transfer problem. The different parameters

    used for the analysis of model shown in Fig. 3 are tabulated in Table 3. Table 4 shows a comparison of

    temperature values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 4, it is

    clear that the temperature values obtained by EFG method are in good agreement with those obtained by FEM.

    Table 5 shows the variation of total time, communication time, speedup and efficiency with the number of

    processors for 701=N . The variation of total time, communication time, user time, speedup and efficiency with

    number of processors is also presented in Table 6 for 991=N and in Table 7 for 1229=N . Fig. 4 & Fig. 5

    show the variation of speedup & efficiency with number of processors & data size (number of equations). Using

    8 processors, the maximum speedup & efficiency have been obtained as 6.86 & 85.81% respectively for the data

    size of 1229=N .

    From the above analysis, it is observed that as the data size (number of equations) increases, the results

    starts improving both in terms of efficiency and speedup. The contribution of communication time to the total

    time is almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the

    speedup & efficiency are improving with the increase in number of processors.

    Fig. 3: Model for heat transfer problem

    y

    2

    L

    W

    1

    3

    x

    Page 10 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    11/19

    ForPeer

    Review

    10

    Table 3: Data for the model shown in Fig. 3

    Parameters Value of parameters

    Length )(L

    Width (W )

    Thermal conductivity ( k)Rate of internal heat generation ( Q )

    Heat transfer coefficient ( h )

    Surrounding fluid temperature ( T )

    Temperature at edge ( )eT , 0=x or 1

    1 m

    1 m

    400 W/m C

    0 W/m3

    100 W/m2 C

    20 C

    200 C

    Table 4: Comparison of EFG results with FEM at few typical locations for 121 nodes

    Location (m) Temperature ( C0

    )

    xy

    EFG FEM0.50.5

    0.5

    1.0

    1.0

    1.0

    1.00.5

    0.0

    1.0

    0.5

    0.0

    160.8224172.2274

    175.0713

    140.9820

    151.8434

    155.0760

    160.8950172.2670

    175.1240

    141.0610

    151.8510

    155.1100

    Table 5: Variation of total time, communication time, user time, speedup and efficiency with number of

    processors for 701=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)Speedup

    Efficiency

    (%)

    1

    2

    3

    4

    5

    6

    7

    8

    212.2315

    111.4260

    78.0770

    60.6514

    55.0408

    54.5207

    46.3948

    50.2148

    0.0000

    0.0671

    0.1053

    0.5706

    0.8323

    6.0414

    1.9330

    5.5403

    211.6500

    110.4400

    76.3050

    58.2800

    48.6950

    42.8500

    36.8950

    33.4350

    1.00

    1.92

    2.77

    3.63

    4.34

    4.94

    5.74

    6.33

    100.00

    95.82

    92.45

    90.75

    86.92

    82.32

    81.95

    79.13

    Table 6: Variation of total time, communication time, user time, speedup and efficiency with number of

    processors for 991=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)Speedup

    Efficiency

    (%)

    1

    23

    4

    5

    6

    78

    598.8565

    310.1355214.2525

    164.6720

    141.6955

    134.5425

    114.9740122.0950

    0.0000

    0.19240.2256

    0.9013

    1.7398

    6.8044

    3.08556.2373

    597.2500

    307.7700211.3900

    161.3050

    133.4550

    114.3500

    98.785089.4100

    1.00

    1.942.82

    3.70

    4.47

    5.22

    6.056.68

    100.00

    97.0394.18

    92.56

    89.50

    87.05

    86.3783.50

    age 11 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    12/19

    ForPeer

    Review

    11

    Table 7: Variation of total time, communication time, user time, speedup and efficiency with number of

    processors for 1229=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)Speedup

    Efficiency

    (%)

    12

    3

    4

    5

    67

    8

    1179.5400606.6460

    417.6550

    320.2650

    282.8400

    254.3290222.0310

    216.9900

    0.00000.2567

    0.7822

    0.5918

    2.6019

    4.78614.9631

    3.8176

    1176.5700600.4800

    411.4100

    315.1500

    261.6100

    221.0900191.8700

    171.3900

    1.001.95

    2.86

    3.73

    4.50

    5.326.13

    6.86

    100.0097.97

    95.32

    93.33

    89.95

    88.6987.60

    85.81

    1 2 3 4 5 6 7 81

    2

    3

    4

    5

    6

    7

    8

    Number of processors

    Speedup

    N=701

    N=991

    N=1229

    Ideal

    Fig. 4: Variation of speedup with number of processors and data size (no. of equations)

    Page 12 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    13/19

    ForPeer

    Review

    12

    1 2 3 4 5 6 7 80

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Number of processors

    Efficiency(%)

    N=701

    N=991

    N=1229

    Fig. 5: Variation of efficiency with number of processors and data size (no. of equations)

    4.2 Example-II: Fluid Flow Problem

    The parallel EFG results have been obtained for a model fluid flow problem. The different parameters used

    for the analysis of model shown in Fig. 6 are tabulated in Table 8. Table 9 shows a comparison of velocities

    values obtained by EFG method with those obtained by FEM for 121 nodes. From Table 9, it is clear that the

    velocity values obtained by EFG method are in good agreement with those obtained by FEM.

    Table 10 shows the variation of total time, communication time, user time, speedup and efficiency with the

    number of processors for 776=N . Table 11 & Table 12 also show the variation of total time, communication

    time, user time, speedup and efficiency with number of processors for 1126=N & 1462=N respectively. Fig.

    7 & Fig. 8 show the variation of speedup & efficiency with number of processors & data size (number of

    equations). Using 8 processors, the maximum speedup & efficiency have been obtained to be 7.20 & 90.00%

    respectively for the data size of 1462=N .

    From the above analysis, it is observed that with increase in data size (number of equations), the results are

    improving both in terms of efficiency and speedup. The contribution of communication time to the total time is

    almost negligible. Moreover it is also clear that with the increase in data size (number of equations), the speedup

    & efficiency are improving with the increase in number of processors.

    age 13 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    14/19

    ForPeer

    Review

    13

    Fig. 6: Model cross-section of the fluid flowing through a duct

    Table 8: Data for model shown in Fig. 6

    Parameters Value of parametersDepth (D )Length (L )

    Pressure gradient (z

    P

    )

    Dynamic viscosity ( )

    All surface velocities ( )Su

    0.25 m

    0.25 m

    5000 N/m2/m

    5 Ns/m2

    0 m/sec

    Table 9: Comparison of EFG results with FEM at few typical locations for 121 nodes

    Location (m) Velocity (m/sec)

    x y EFG FEM

    0

    0

    00

    0

    0

    0.125

    0.100

    0.0750.050

    0.025

    0.000

    0.0000

    1.8670

    3.18823.9319

    4.4598

    4.6417

    0.0000

    1.8280

    3.13043.9929

    4.4827

    4.6412

    3

    4

    D

    L

    y

    z

    x

    1

    2

    Page 14 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    15/19

    ForPeer

    Review

    14

    Table 10: Variation of total time, communication time, user time, speedup and efficiency with number of

    processors for 776=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)

    SpeedupEfficiency

    (%)

    1

    2

    3

    4

    5

    6

    7

    8

    266.1130

    141.0705

    99.6623

    76.6645

    66.8084

    67.3349

    59.5002

    65.2026

    0.0000

    0.0445

    0.1466

    0.3459

    0.8534

    6.1760

    1.7815

    5.3795

    265.4600

    138.8950

    95.5150

    72.6950

    59.8550

    52.2250

    44.4450

    40.9100

    1.00

    1.91

    2.77

    3.65

    4.43

    5.08

    5.97

    6.48

    100.00

    95.56

    92.64

    91.29

    88.70

    84.72

    85.32

    81.11

    Table 11: Variation of total time, communication time, user time, speedup and efficiency with number of

    processors for 1126=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)Speedup

    Efficiency

    (%)

    12

    3

    4

    5

    678

    838.3200433.8130

    299.6620

    230.6550

    201.4360

    181.6840163.2360163.3560

    0.00000.1940

    0.2509

    1.1592

    1.6366

    4.93643.48105.1744

    836.4050427.2300

    291.9100

    221.5500

    182.4800

    155.1100134.4500121.9800

    1.001.95

    2.86

    3.77

    4.58

    5.396.226.85

    100.0097.88

    95.50

    94.38

    91.67

    89.8788.8785.71

    Table 12: Variation of total time, communication time, user time speedup and efficiency with number of

    processors for 1462=N

    Number of

    processors

    Total Time

    (sec)

    Communication

    time (sec)

    User time

    (sec)Speedup

    Efficiency

    (%)

    12

    3

    4

    5

    6

    7

    8

    1973.2000994.7770

    697.1760

    534.6850

    449.4020

    400.0850

    357.6345

    335.0405

    0.00000.2786

    0.9614

    0.3451

    2.6545

    5.8641

    4.3058

    6.8723

    1968.5500983.7400

    664.7600

    502.8700

    422.2300

    354.2450

    308.5900

    273.5350

    1.002.00

    2.96

    3.91

    4.66

    5.55

    6.39

    7.20

    100.00100.00

    98.71

    97.86

    93.24

    92.61

    91.35

    90.00

    age 15 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    16/19

    ForPeer

    Review

    15

    1 2 3 4 5 6 7 81

    2

    3

    4

    5

    6

    7

    8

    Number of processors

    Speedup

    N=776

    N=1126

    N=1462

    Ideal

    Fig. 7: Variation of speedup with number of processors and data size (no. of equations)

    1 2 3 4 5 6 7 80

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Number of processors

    Efficiency(%)

    N=776

    N=1126

    N=1462

    Fig. 8: Variation of efficiency with number of processors and data size (no. of equations)

    Page 16 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    17/19

    ForPeer

    Review

    16

    5 CONCLUSIONS

    In this paper, a new parallel algorithm has been proposed for the EFG method. The parallel EFG code has

    been written in FORTRAN language using MPI message passing library and validated by solving two model

    problems. The analysis shows that with the increase in data size (number of equations), speedup and efficiency

    both improve. Moreover it is also observed that with the increase in data size, the results (total time,

    communication time, user time, efficiency and speedup) are improving with the increase in number of

    processors. From parallel EFG results presented in this paper, it can be noted that the proposed algorithm is

    working well for the EFG method.

    NOTATIONS

    maxd Scaling parameter

    Q Rate of internal heat generation /volume

    h Convective heat transfer coefficient

    k Coefficient of thermal conductivity

    M

    T

    A

    Ph

    c

    r

    eT Edge temperature

    T Surrounding fluid temperature

    )(Iw xx Weight function

    Boundary of the domain

    M Pressure gradient (z

    P

    )

    n Number of nodes in the domain of influence

    N Number of equations

    KN Lagrange interpolant

    )(xh

    T or )(xh

    u Moving least square approximant

    * Lagrange multiplier

    Dynamic viscosity

    Domain of the problem

    )(x Shape function

    age 17 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    18/19

    ForPeer

    Review

    17

    REFERENCES

    1. J. J. Monaghan, An introduction to SPH, Computer Physics Communications, Vol. 48, pp. 89-96, 1988.

    2. B. Nayroles, G. Touzot and P. Villon, Generalizing the finite element method: diffuse approximation and

    diffuse elements, Computational Mechanics, Vol. 10, pp. 307-318, 1992.

    3. T. Belytschko, Y. Y. Lu and L. Gu, Element free Galerkin methods,International Journal for Numerical

    Methods in Engineering, Vol. 37, pp. 229-256, 1994.

    4. W. K. Liu, S. Jun and Y. F. Zhang, Reproducing kernel particle methods, International Journal for

    Numerical Methods in Engineering, Vol. 20, pp. 1081-1106, 1995.

    5. I. Babuska and J. M. Melenk, The partition of unity method,International Journal for Numerical Methods

    in Engineering, Vol. 40, 727-758, 1997.

    6. C. A. Durate and J. T. Oden, An H-p adaptive method using clouds, Computer Methods in Applied

    Mechanics and Engineering, Vol. 139, pp. 237-262, 1996.

    7. G. Yagawa and T. Yamada, Free mesh method, a new meshless finite element method, Computational

    Mechanics, Vol. 18, pp. 383-386, 1996.

    8. N. Sukumar, B. Moran and T. Belytschko, The natural element method in solid mechanics, Inernational

    Journal for Numerical Methods in Engineering, Vol. 43, pp. 839-887, 1998.

    9. T. Zhu, J. D. Zhang and S. N. Atluri, A meshless local boundary integral equation (LBIE) method for

    solving nonlinear problems, Computational Mechanics, Vol. 22, pp. 174-186, 1998.

    10. S. N. Atluri and T. Zhu, A new Meshless Local Petrov-Galerkin (MLPG) approach in computational

    mechanics, Computational Mechanics, Vol. 22, pp. 117-127, 1998.

    11. S. De and K. J. Bathe, The method of finite spheres, Computational Mechanics, Vol. 25, pp. 329-345, 2000.

    12. G. R. Liu and Y. T. Gu, A local radial point interpolation method (LRPIM) for free vibration analysis of 2-

    D solids,Journal of Sound and Vibration, Vol. 246(1), pp. 29-46, 2001.

    13. J. Zhang, Z. Yao and M. Tanaka, The meshless regular hybrid boundary node method for 2-D linear

    elasticity,Engineering Analysis with Boundary Elements, Vol. 27, pp. 259-268, 2003.

    14. M. Shirazaki and G. Yagawa, Large-scale parallel flow analysis based on free mesh method: a virtually

    meshless method, Computer Methods in Applied Mechanics and Engineering, Vol. 174, pp. 419-431, 1999.

    15. D. F. Medina and J. K. Chen, Three-dimensional simulations of impact induced damage in composite

    structures using the parallelized SPH method, Composites: Part-A, Vol. 31, pp. 853-860, 2000.

    Page 18 of 19

    Institute of Mechanics and Computational Mechanics, Appelstr. 9A, 30167 Hannover, Germany

    Computational Mechanics

  • 8/3/2019 compmech1

    19/19

    ForPeer

    Review

    18

    16. I. V. Singh, K. Sandeep and R. Prakash, Heat transfer analysis of two-dimensional fins using meshless

    element-free Galerkin method,Numerical Heat Transfer-Part A, Vol. 44, pp. 73-84, 2003.

    APPENDIX

    1 COMPUTATIONAL TIME COMPONENTS

    The different components of computational time include real time, system time, user time, CPU time, total time

    and communication time. Among all these components, emphasis has been given on total time, communication

    time and user time.

    1.1 Total Time

    The total time (run time) is the time at which parallel computation starts to the moment at which last processor

    finishes its execution. The total time is the time measured by the MPI watches built in the program itself.

    1.2 Communication Time

    The communication time is the time required to transfer the data form one processor to the other processor or

    processors.

    1.3 User time

    The time spent by the program in its execution.

    2 PERFORMANCE MATRICES

    2.1 Speedup

    A measure of relative performance between a multiprocessor system and a single processor system is the

    speedup factor, it is defined as:

    system)essor(multiprocprocessorsofnumberusingtime)(executionUser time

    system)processor(singleprocessorsoneusingtime)(executionUser timeSpeedup =

    2.2 Efficiency

    processorsofnumberXprocessorsofnumberusingUser time

    system)processor(singleprocessorsoneusingUser time

    Efficiency =

    age 19 of 19 Computational Mechanics