A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE...

download A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE MULTI-GPU ARCHITECTURES

of 7

Transcript of A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE...

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    1/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    DOI:10.5121/ijdps.2015.6501 1

     A  PROGRESSIVE MESH METHOD FOR PHYSICAL

    SIMULATIONS USING L ATTICE BOLTZMANN

    METHOD ON SINGLE-NODE MULTI-GPU 

     A RCHITECTURES 

    Julien Duchateau1, François Rousselle

    1, Nicolas Maquignon

    1, Gilles Roussel

    1,

    Christophe Renaud1

    1Laboratoire d’Informatique, Signal, Image de la Côte d’Opale

    Université du Littoral Côte d’Opale, Calais, France

     A BSTRACT  

     In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations

    by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is

    able to mesh automatically the simulation domain according to the propagation of fluids. This method can

    also be useful in order to perform several types of physical simulations. In this paper, we associate this

    algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is

    able to perform various types of simulations on complex geometries. The use of this algorithm combined

    with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the

    staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.

     K  EYWORDS 

    Progressive mesh, Lattice Boltzmann method,single-node multi-GPU, parallel computing.

    1. INTRODUCTION 

    The lattice Boltzmann method (LBM) is a computational fluid dynamics (CFD) method. It is a

    relatively recent technique which is able to approximate Navier-Stokes equations by a collision-

    propagation scheme [1]. Lattice Boltzmann method however differs from standard approaches asfinite element method (FEM) or finite volume method (FVM) by its mesoscopic approach. It is an

    interesting alternative which is able to simulate complex phenomena on complex geometries. Its

    high parallelization makes also this method attractive in order to perform simulations on parallelhardware. Moreover, the emergence of high-performance computing (HPC) architectures using

    GPUs [5] is also a great interest for many researchers.

    Parallelization is indeed an important asset of lattice Boltzmann method. However, performsimulations on large complex geometries can be very costly in computational resources. Thispaper introduces a new progressive mesh algorithm in order to perform physical simulations on

    complex geometries by the use of a multiphase and multicomponent lattice Boltzmann method.The algorithm is able to automatically mesh the simulation domain according to the propagation

    of fluids. Moreover, the integration of this algorithm on single-node multi-GPU architecture isalso an important matter which is studied in this paper. This method is an interesting alternative

    which has never been exploited at the best of our knowledge.

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    2/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    Section 2 first describes the multiphase and multicomponent lattice Boltzmann method. It is ableto simulate the behavior of fluids with several physical states (phase) and it is also able to modelseveral fluids (component) interacting with each other. Section 3 presents then several recent

    works involving lattice Boltzmann method on GPUs. Section 4 mostly concerns the main

    contribution of this paper: the inclusion of a progressive mesh method in the simulation code. The

    principles of the method and the definition of an adapted criterion are firstly introduced. Theintegration on a single-node multi-GPU architecture is then described. An analysis concerning

    performance is also studied in section 5. The conclusion and future works are finally presented inthe last section.

    2. THE LATTICE BOLTZMANN METHOD 

    2.1. The Single relaxation time Bhatnagar-Gross-Krook (SRT-BGK) Boltzmann

    equation

    The lattice Boltzmann method is based on three main discretizations: space, time and velocities.Velocity space is reduced to a finite number of well-defined vectors. Figures 1(a) and 1(b)

    illustrate this discrete scheme for D2Q9 and D3Q19 model.

    The simulation grid is therefore discretized as a Cartesian grid and calculation steps are achieved

    on this entire grid. The discrete Boltzmann equation[1] with a single relaxation timeBhatnagar-Gross-Krook (SRT-BGK) collision term is defined by the following equation:

      , Δ  ,   1 ,   ,   (1) ,   , 1  

    2   2  (2)

      13 Δ!

    Δ "  (3)

    The function  ,  corresponds to the discrete density distribution function along velocityvector  at a position and a time . The parameter  corresponds to the relaxation time of thesimulation. The value  is the fluid density and  corresponds to the fluid velocity. Δ!andΔ arethe spatial and temporal steps of the simulation respectively. Parameters # are weighting valuesdefined according to the lattice Boltzmann scheme and can be found in [1].Macroscopic

    quantities as density  and velocity  are finally computed as follows:

    (a) D2Q9 scheme (b) D3Q19 schemeFigure 1: Example of Lattice Boltzmann schemes

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    3/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    ,     $ ,   (4), ,   $ ,   (5)

    2.2. Multiphase and Multi Component Lattice Boltzmann Model

    Multiphase and multicomponent models (MPMC) allow performing complex simulations

    involving several physical components. In this section, a MPMC-LBM model based on the work

    achieved by Bao& Schaeffer [4] is presented.It includes several interaction forces based onpseudo-potential. It is calculated as follows:

    %&  ' 2(&  &)&&   (6)The term (& is the pressure term. It is calculated by the use of an equation of state as the Peng-Robinson equation:

    (&   &*&+&1 &&  -&.+&&1 2&    (7)Internal forces are then computed. The internal fluid interaction force is expressed as follows [2][3]:

    /&&  0 )&2   %& $ #!   %&   1 02   )&2   %&#%&     (8)

    The value0 is a weighting term generally fixed to 114 according to [2] [3]. The inter-componentforce is also introduced as follows [4]:

    /&&  )&&2   %&$#!   %&     (9)Additional forces can be added into the simulation code as the gravity force, or a fluid-structure

    interaction [3]. The incorporation of the force term is then achieved by a modifiedcollisionoperator expressed as follows:

     &, , Δ  &, ,   1 &, ,   &,, Δ&,   (10)Δ&,   &,&, &  Δ& &,& , &  (11)Δ&  /&Δ

    &   (12)

    Macroscopic quantities for each component are finally computed by the use of equations (4) and(5).

    3. LATTICE BOLTZMANN METHODS AND GPUS 

    The mass parallelism of GPUs has been quickly exploited in order to perform fast simulations[7]

    [8] using lattice Boltzmann method. Recent works have shown that GPUs are also used with

    multiphase and multicomponent models [16] [14]. The main aspects of GPU optimizations are

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    4/15

    International Journal of Distri

    decomposed into several categoverlap of memory transfersoptimize global memory band

    Concerning LBM, an adapted d

    studied and has proven to be effi

    Several access patterns are als

    pattern, consists of using two ctemporal and spatial dependenc

    reading distribution functions fr

    reciprocally. This pattern is comsingle GPU. Several techniqu

    significantly the computationa

    compression [6], Swap algorithtechnique is used in order to sav

    Recent works involving implem

    of several GPUs are also availab

    entire simulation domain into sLBM kernels on each sub-do

    context. Communications betweZero-copy feature allows to per

    GPU pointers. Data must howperformance.

    Some approaches have finally

    constituted of multiple GPUs byour case, we only dispose of o

    these architectures in this paper.

    4. A PROGRESSIVE MESH

    ON SINGLE-NODE MULTI- 

    4.1. Motivation

    Works described in the previousdivided into subdomains acco

    subdomains are therefore calcula

    Figure 2: Division of the simulatio

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    ries [10] [9] as thread level parallelism, GPU meith computations …. Data coalescence is needewidth. This implies several conditions as desc

    ta structure such as the Structure of Array (SoA)

    cient on GPU [7].

    described in the literature. The first one, name

    alculation grids in GPU global memory in order tof the data (Equation (10)). Simulation steps alte

    m A and writing them to B, and reading from B an

    monly used and offers very good performance [10]es are however presented in literature in ord

    l memory cost without loss of information s

    [6] or A-A pattern technique [12]. In this paper, thmemory due to spatial and temporal data dependen

    ntation of lattice Boltzmann method on a single-n

    le. A first solution, proposed in [13] [17], consists i

    ubdomains according to the number of GPUs anain in parallel. CPU threads are used to handle

    n sub-domains are performed using zero-copy memform efficient communications by a mapping betw

    ever be read and written only once in order to

    een proposed recently to perform simulations on

    the use of MPI in combination with CUDA [19][1  e computing node with multiple GPUs thus we d

    ALGORITHM FOR LATTICE BOLTZMANN

    PU ARCHITECTURES 

    section consider that the entire simulation domain iding to the number of GPUs, as shown on

    ted in parallel.

    n domain: the entire domain is decomposed into subdom

    to the number of GPUs.

    er 2015

    mory access,in order to

    ribed in [9].

    as been well

    A-B access

    manage thenate between

    writing to A

    [11] [9] on ar to reduce

    ch as grids

    A-A patterncy.

    de composed

    dividing the

    d performingeach CUDA

    ory transfers.en CPU and

    obtain good

    everal nodes

    ][21] [15]. Inon't focus on

    METHODS

    s meshed andigure 2. All

    ins according

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    5/15

    International Journal of Distri

    In this paper, a new approach idoes not requires to be fully menew progressive mesh method

    propagation of the simulated

    beginning of the simulation (Figpropagation of the fluid as can b

    the simulation geometry (Figure

    simulations. It is also a real advaof pipes or channels. It can in

    geometry used for the simulatio

    Figure 3: Example of a 3D simu

    created at the beginning of the sim

    fluid, (c) all subdomains

    The progressive mesh algorithm

    create a new subdomain to theexisting subdomains. Calculatio

    optimization factor.

    4.2. Definition of a Criterio

    The definition of a criterion is afor the simulation. This criterion

    velocity seems like a good choifluid velocity between two ite

    dispersion. Our criterion is ther

    56&The symbol 5 5stands for the Efor all active subdomains on th

    boundary, a new subdomain is cgenerally fixed to 7 in this papeeach subdomain.

    (a)

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    s considered. For most simulations, the entire domshed at the beginning of the simulation. We propo

    in order to dynamically create the mesh acco

    luid. The idea consists in defining a first subd

    ure 3(a)). Several subdomains can then be createdseen of Figure 3(b). This method finally adapts au

    3(c)). This method is therefore applicable for any

    ntage for an application on industrial structures mosdeed save a lot of memory and calculations acc

    .

    lation using the progressive mesh algorithm: (a) a first su

    lation, (b) several subdomains are created following the

    are created and completely adapt to the simulation geom

     

    firstly needs the introduction of an adapted criteri

    simulation. This new subdomain needs then to bes on single-node multi-GPU architecture are finally

    for the Progressive Mesh

    n important aspect in order to efficiently create neneeds to represent efficiently the propagation of fl

    e in order to define an efficient criterion. The difrations is considered in order to observe efficie

      fore defined as follows for thecomponent 8:  5&, Δ  &, 5 clidean norm in this paper. This criterion needs to

    boundaries. If the criterion exceeds anarbitrary t

      reated next to this boundary as shown on Figure 4.r in order to detect any change of velocity on the

    (b) (c)

    er 2015

    ain generallyse therefore arding to the

    main at the

    following theomatically to

    eometry and

    tly composedrding to the

    bdomain is

    ropagation of

    try.

    n in order to

    connected toan important

    subdomainsid. The fluid

    erence of thetly the fluid

    (13)

    be calculated

    reshold

    9on a

    he value9 isoundaries of

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    6/15

    International Journal of Distri

    Figure 4: The criterion ǁC_α (x) ǁ_

    then a ne

    4.3. Algorithm

    This section describes the algor

    model with the inclusion ofsummarize the previous sectio

    subdomains are achieved at theprocess. Figure 5 describes our r

     

    Figure 5: Algorithm for the multiph

    our progressive mesh me

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    2 is calculated on the boundary. If the criterion exceeds t

    w subdomain is created next to the boundary.

    ithm for the multiphase and multicomponent latti

    ur progressive mesh algorithm. It is also usefuls. The calculation of the criterion and the cre

    last step of the algorithm in order to not disturb tsulting algorithm.

    ase and multicomponent Lattice Boltzmann model with t

    hod. For colors, please refer to the PDF version of this p

      er 2015

    e threshold S

    e Boltzmann

    in order totion of new

    he simulation

    e inclusion of

    per.

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    7/15

    International Journal of Distri

    4.4. Integration on Single-N

    Efficiency of inter-GPU commuperformance. Indeed, our simuldynamically. The repartition o

    optimization. An efficient assigsimulation. Indeed, it can reduce

    simulation time.

    4.4.1. Overlap Communication

    Several data exchanges are need

    inter-component /! implies topropagation step of LBM also i

    GPUs (Figure 6). Aligned buffer

    In order to obtain a simulation

    with algorithm calculations. In

    obtain a significant performancethe computation process into

    Computations on the needed bosubdomains are also done whilperformed simultaneously with c

    In most cases for lattice Boltzmpage-locked memory which allo

    [17][13] [15].A different approa

    In most recent HPC architectureperformance, Nvidia launched

    Figure 6: Schematic example f

    corresponds to   values to comversion of this paper.

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    de Multi-GPU Architecture

    nications is surely the most difficult task in order tations are composed of numerous subdomains whi

    GPUs to the different subdomains is an import

    ment can have an important impact on the perforthe communication time between subdomains and

    s with Computations

    d for this type of model. The computation of intera

    have access to neighboring values of the pseudo-

    mplies to communicate several distribution functi

      s may be used for data transactions.

    ime as short as possible, it is necessary to overlap

    eed, overlapping computations and communicati

    gain by reducing the waiting time of data. The idea2 steps: boundary calculations and interior

    undaries are firstly done. Communications betweecomputing the interior. The different communica

    alculations which allow good efficiency.

    ann method, memory is transferred via zero-copy tw good overlapping between communications and

    h is studied in this paper concerning inter-GPU co

    s, several GPUs can be connected to the same PCIPUDirect with CUDA 4.0.This technology allo

    or communication of distribution functions in 2D:

    unicate between subdomains. For colors, please refer t

    er 2015

    obtain goodch are addedant factor of

    mance of theso reduce the

    tion /: andotential. The

    ns  between

    data transfer

    ns allows to

    is to separatecalculations.

    neighboringions are thus

    ansactions tocomputations

    munications.

    . To improves to perform

    ed arrows

    o the PDF

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    8/15

    International Journal of Distri

    Peer-to-Peer transfers and meperform data transfer using Peerzero-copy transactions for other

    of the CPU and therefore to acc

    improves performance and the e

    Figure

    4.4.2 Optimization of Data Tra

    The repartition of GPUs is an

    Communications cost is generalexchanges between sub domains

    associated with one GPU.The

    belonging to the same GPU. Icommunications are performed

    concern communications betwe

    however made between Peer-togoal to optimize dynamically the

    For a new sub domain ;, the fun

    Where ;7? @ ABCDE-FAABCDE-FAEThe function =;, ;  comparsubdomain and its neighbors. A

    to-Peer communications. The f

    The function /; needs theref cost. This function is calculated

    assigned to this subdomain. Indynamically and the same GPU

    assigned. Figure 8 explains via

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    9/15

    International Journal of Distri

    5. RESULTS AND PERFOR

    5.1. Hardware

    8 NVIDIA Tesla C2050 graphisimulations. Table 1 describe

    communications for our architec

    Tabl

    Figure 9: Peer-to-

    CUDA co

    Total amoun

    (14) Multiprocessors

    GP

    L2

    Total amount of s

    Total number of re

    Figure 8: Schematic example in/; is calculated for all availablcolors, pl

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    ANCE 

    s cards Fermi architecture based machine are uss some Tesla C2050 hardware specifications.

    ture are also described in Figure 9.

    1: Tesla C2050 Hardware specifications

    eer communications accessibility for our architecture.

    mpute capability 2.

      t of global memory 2687 M

      , (32) scalar processors/MP 448 CUD

    clock rate 1147

    ache size 786432

    ared memory per block 49152

    isters available per block 327

      2D for the optimization of the repartition of GPUs. The

    GPUs and the GPU which have the minimum value is c

    ease refer to the PDF version of this paper.

    er 2015

    d to performPeer-to-Peer

    Bytes

    A cores

    Hz

    bytes

    ytes

    8

    unction

    hosen. For

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    10/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    10 

    5.2. Simulations

    Two simulations are considered on large simulation domain in order to evaluate the performanceof our contribution. Both simulations include the use of two physical components. The geometryhowever differs between these simulations. The first simulation is based on a simple geometry

    composed of 1024*256*256 calculation cells where a fluid fills all simulation domains during thesimulation (Figure 10). The second simulation is based on a complex geometry composed of

    1024*1024*128 calculations cells where the fluid moves within channels (Figure 11).

    5.3. Performance

    This section deals with the performance obtained by our method. A comparison between the

    progressive mesh algorithm and the static mesh method generally used in literature is shown. Theoptimization of the repartition of GPUs on subdomains is also studied. The performance metric

    generally used for lattice Boltzmann method is the Million Lattice nodes Updates Per Second(MLUPS). It is calculated as follows:

    HEMNOPQ  KDR-BF ABC @ FRE D BE-BDFAABRS-BDF BR   (16)This classical approach generally used in literature in order to perform simulations consists inequally dividing the simulation domain according to the number of GPUs. It offers generally

    good performance as communications can be overlapped with calculations. The use of Peer-to-

    Peer communications also has a beneficial effect on the performance, as shown on Figure 13.Peer-to-Peer communications allow obtaining a performance gain between 8 and 12% according

    Figure 10: A two-component leakage simulation on a simple geometry with a domain size of

    1024*256*256 cells.

    Figure 11: A two-component leakage simulation on a complex geometry composed of channels with a

    domain size of 1024*1024*128 cells.

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    11/15

    International Journal of Distri

    to the number of GPUs uscommunications offer a good scof Peer-to-Peer communications,

    The inclusion of the progressiv

    performance. Sub domains of siand 14 describes performance

    simulation presented on Figureperformance at the beginning

    simulation has for consequen

    simulation. In this particular casshown on Figure 14, which lead

    mesh. In terms of memory conslead to have the entire simulatio

    Figure 12: Comparison of p

    commun

    Figure 13: Comparison of perfor

    method for the simulation shown o

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    d for the simulation described in Figure 1aling but an almost perfect scaling is obtained withas shown on Figure 12.

    mesh also has an important beneficial effect on t

    e 128*128*128 are considered for these simulatioin terms of calculations and memory consum

    10. Note that the progressive mesh algorithm obtof the simulation. The addition of sub domain

      e a decrease of performance until the conver

    , all simulation domain is meshed at the end of thes to a very slight decrease of performance compare

    umption, fast apparitions of news sub domains aredomain in memory after a few iterations.

    rformance between Peer-to-Peer communications with z

    ications for the simulation shown on Figure 10.

    mance between the progressive mesh method and the stat

    n Figure 10. The inclusion of the optimization for GPU a

    is also presented.

    er 2015

    11 

    . Zero-copythe inclusion

    he simulation

    s. Figures 13tion for the

    ins excellents during the

    ence of the

    imulation, asto the static

    noted which

    ero-copy

    ic mesh

    signment

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    12/15

    International Journal of Distri

    Figure 13 also compares perfor

    is a simple assignation which as

    uses the optimization method pleads to an important difference

    noted at the convergence of thisdue to the fact that the commu

    optimized assignment. Since subtherefore important to optimize t

    The same comparison is also do

    15 and 16. The main differencecomplex and channelized. Physion industrial structures.

    In this case, the progressive mmethod is easily able to simulate

    while the static mesh method is

    is indeed too important forconsumption during the simulati

    less important than the static mfor this particular simulation.

    automatically adapts to the evolsimulation domain are meshed.

    Figure 14: Comparison of memormesh met

    buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    ance between two different assignments for GPUs.

    signs to new subdomain the first available GPU. T

    esented in section 4.4.2. The comparison of theseof performance. Indeed, a difference of approxima

    simulation between the two approaches. This differication cost is more important for a simple assign

    domains are added dynamically and connected to ehese communications in order to reduce the simulati

    e for the simulation presented on Figure 11, as sho

    in this situation is the geometry of the simulationcal simulations on channelized geometry are espe

    esh method shows excellent results. In terms ofon a global simulation domain of size 1024*1024*

    nable to perform the simulation. The amount of ne

    his simulation. Figure 15 shows the evolutionon. The memory cost at the convergence of the si

    sh method. A gain of approximatively 50% of meThis is due to the fact that the progressive

    ution of the simulation and so only needed zones

    y consumption between the progressive mesh method anhod for the simulation shown on Figure 10.

    er 2015

    12 

    The first one

    e second one

    two methodstively 30% is

    nce is mostlyment than an

    ch other, it ison time.

    n on Figures

    hich is moreially present

    memory, this128 and more

    ded memory

    of memoryulation is far

    ory is notedesh method

    of the global

    the static

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    13/15

    International Journal of Distri

    Figure 15: Comparison of memory

    meth

    The comparison of the repartiperformance gain (19%) is still

    method is important in order tonot need to be fully meshed brin

    an important impact on the perfo

    Figure 16: Comparison of performa

    of GP

    6. CONCLUSION 

    In this paper, an efficient progr

    Boltzmann method is presented.

    perform several types of physautomatically added to the simul

    to save a lot of memory and ca

      buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

    consumption between the progressive mesh method and t

    d for the simulation shown on Figure 11.

    tion of GPUs is also described in Figure 16.oted for this simulation. This proves that a dynamic

    btain good performance. Moreover, the fact that thegs an important gain in performance. The geometry

    rmance on the progressive mesh method.

    nce between a simple repartition of GPUs with an optimi

    s for the simulation shown on Figure 11.

    ssive mesh algorithm for physical simulations usi

    This progressive mesh method can be a useful to

    ical simulations. Its main advantage is that suation by the use of an adapted criterion. This meth

    lculations in order to perform simulations on large

    er 2015

    13 

    he static mesh

    n importantoptimization

    domain doeshas therefore

    ed assignment

    ng the lattice

    l in order to

    domains ared is also able

    installations.

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    14/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    14 

    The integration of the progressive mesh method on single-node multi-GPU architecture is alsotreated. A dynamic optimization of the repartition of GPUs to subdomains is an important factorin order to obtain good performance. The combination of all these contributions allows therefore

    performing fast physical simulations on all types of geometry. The progressive mesh method istherefore an interesting alternative because it allows obtaining similar or better performances than

    the usual static mesh method.

    The progressive mesh algorithm is however limited to the memory of the GPU which is generallyfar more inferior to the CPU RAM. The creation of new subdomains is indeed possible while

    there is a sufficient amount of memory on the GPUs. Extensions of this work to cases that require

    more memory than all GPUs can handle is now under investigation. Data transfer optimizationswith the CPU host will therefore be essential to keep good performances.

    ACKNOWLEDGEMENTS  

    This work has been made possible thanks to collaboration between academic and industrial

    groups, gathered by the INNOCOLD association.

    REFERENCES 

    [1] B. Chopard, J.L. Falcone J. Latt, The Lattice Boltzmann advection diffusion model revisited, The

    European Physical Journal - Special Topics,Vol. 171, pp. 245-249, 2009.

    [2] S. Gong, P. Cheng, Numerical investigation of droplet motion and coalescence by an improved latticeBoltzmann model for phase transitionsand multiphase flows, Computers & Fluids , Vol. 53, pp. 93-

    104, 2012.

    [3] S. Gong, P. Cheng, A lattice Boltzmann method for liquid vapor phase change heat transfer,

    Computers & Fluids, Vol. 54, pp. 93-104, 2012.

    [4] J. Bao, L. Schaeffer, Lattice Boltzmann equation model for multicomponent multi-phase flow with

    high density ratios, Applied MathematicalModelling, 2012.

    [5] Nvidia, C. U. D. A. (2011). Nvidia cuda c programming guide. NVIDIA Corporation, 120, 18.8

    [6] M. Wittmann, T. Zeiser, G. Hager, G. Wellein, Comparison of different propagation steps for Lattice

    Boltzmann methods, Computers and Mathematicswith Applications, Vol. 65 pp. 924-935, 2013.

    [7] J. Tölke, Implementation of a Lattice Boltzmann kernel using the compute unified device architecturedeveloped by nVIDIA, Computing andVisualization in Science, 1-11, 2008.

    [8] J. Tölke, M. Krafczyk, TeraFLOP computing on a desktop PC with GPUs for 3D CFD, International

    Journal of Computational Fluid Dynamics 22(7), pp. 443-456, 2008.

    [9] F. Kuznik, C.Obrecht, G. Rusaouën, J-J. Roux, LBM based flow simulation using GPU computing

    processor, Computers and Mathematics withApplications 27, 2009.

    [10] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, A new approach to the lattice Boltzmann method

    for graphics processing units, Computersand Mathematics with Applications 61, pp. 3628-3638, 2011.

    [11] P.R. Rinaldi, E.A Dari, M.J. Vénere, A. Clausse, A Lattice-Boltzmannsolver for 3D fluid on GPU,

    Simulation Modeling Pratice and Theory 25,pp. 163-171, 2012.

    [12] P. Bailey, J. Myre, S. Walsh, D. Lilja, M. Saar, Accelerating lattice boltzmann fluid flows using

    graphics processors, International Conferenceon Parallel Processing, pp. 550-557, 2009.

    [13] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, Multi-GPU implementation of the lattice

    Boltzmann method, Computers and Mathematicswith Applications, 80, pp. 269-275, 2013.

    [14] X. Li, Y. Zhang, X. Wang, W. Ge, GPU-based numerical simulation of multi-phase flow in porousmedia using multiple-relaxation-time latticeBoltzmann method, Chemical Engineering Science, Vol.

    102, pp. 209-219,2013.

    [15] M. Januszewski, M. Kostur, Sailfish: A flexible multi-GPU implementationof the lattice Boltzmann

    method, Computer Physics Communications,Vol. 185, pp. 2350-2368, 2014.[16] F. Jiang, C. Hu, Numerical simulation of a rising CO2 droplet in the initial accelerating stage by a

    multiphase lattice Boltzmann method,Applied Ocean Research, Vol. 45, pp. 1-9, 2014.

  • 8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

    15/15

    International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

    15 

    [17] C. Obrecht, F. Kuznik, B. Tourancheau, and J.-J. Roux, Multi-GPU Implementation of a Hybrid

    Thermal Lattice Boltzmann Solver using theTheLMA Framework, Computers and Fluids, Vol. 80,

    pp. 269275, 2013.

    [18] C. Rosales, Multiphase LBM Distributed over Multiple GPUs, CLUSTER’11 Proceedings of the

    2011 IEEE International Conference onCluster Computing, pp. 1-7, 2011.

    [19] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, Scalable lattice Boltzmann solvers for CUDA

    GPU clusters, Parallel Computing, Vol.39, pp. 259-270, 2013.[20] J. Habich, C. Feichtinger, H. Köstler, G. Hager, G. Wellein, Performance engineering for the lattice

    Boltzmann method on GPGPUs: Architecturalrequirements and performance results, Computer &

    Fluids, Vol. 80, pp.276-282, 2013.

    [21] C. Feichtinger, J. Habich, H. Köstler, U. Rüde, T. Aoki, Performance Modeling and Analysis of

    Heterogeneous Lattice Boltzmann Simulationson CPU-GPU Clusters, Parallel Computing, 2014.

    AUTHORS 

    Julien Duchateau is a PhD student in computer science at the Université du Littoral Côte d’Opale in France.

    His main research interest are massive parallelism on CPUs and GPUs, physical simulations and computer

    graphics.

    François Rousselle is an associate professor in computer science at the Université du Littoral Côte d’Opale

    in France. His main research interests are computer graphics, physical simulations, virtual reality andmassive parallelism.

    Nicolas Maquignon is a PhD student in simulation and numerical physics at the Université du Littoral Côte

    d’Opale. His main research interests are numerical physics, numerical mathematics and numerical

    modeling.

    Christophe Renaud is a professor in computer science at the Université du Littoral Côte d’Opale in France.

    His main research interests are computer graphics, virtual reality, physical simulations and massive

    parallelism.

    Gilles Roussel is an associate professor in automatic at the Université du Littoral Côte d’Opale in France.His main research interests are automatic, signal processing, physical simulations and industrial computing.