A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE...

8/20/2019 A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NOD…

1/15

International Journal of Distributed and Parallel Systems (IJDPS) Vol.6, No.5, September 2015

DOI:10.5121/ijdps.2015.6501 1

A PROGRESSIVE MESH METHOD FOR PHYSICAL

SIMULATIONS USING L ATTICE BOLTZMANN

METHOD ON SINGLE-NODE MULTI-GPU

A RCHITECTURES

Julien Duchateau1, François Rousselle

1, Nicolas Maquignon

1, Gilles Roussel

1,

Christophe Renaud1

1Laboratoire d’Informatique, Signal, Image de la Côte d’Opale

Université du Littoral Côte d’Opale, Calais, France

A BSTRACT

In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations

by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is

able to mesh automatically the simulation domain according to the propagation of fluids. This method can

also be useful in order to perform several types of physical simulations. In this paper, we associate this

algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is

able to perform various types of simulations on complex geometries. The use of this algorithm combined

with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the

staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.

K EYWORDS

Progressive mesh, Lattice Boltzmann method,single-node multi-GPU, parallel computing.

1. INTRODUCTION

The lattice Boltzmann method (LBM) is a computational fluid dynamics (CFD) method. It is a

relatively recent technique which is able to approximate Navier-Stokes equations by a collision-

propagation scheme [1]. Lattice Boltzmann method however differs from standard approaches asfinite element method (FEM) or finite volume method (FVM) by its mesoscopic approach. It is an

interesting alternative which is able to simulate complex phenomena on complex geometries. Its

high parallelization makes also this method attractive in order to perform simulations on parallelhardware. Moreover, the emergence of high-performance computing (HPC) architectures using

GPUs [5] is also a great interest for many researchers.

Parallelization is indeed an important asset of lattice Boltzmann method. However, performsimulations on large complex geometries can be very costly in computational resources. Thispaper introduces a new progressive mesh algorithm in order to perform physical simulations on

complex geometries by the use of a multiphase and multicomponent lattice Boltzmann method.The algorithm is able to automatically mesh the simulation domain according to the propagation

of fluids. Moreover, the integration of this algorithm on single-node multi-GPU architecture isalso an important matter which is studied in this paper. This method is an interesting alternative

which has never been exploited at the best of our knowledge.


2/15


2

Section 2 first describes the multiphase and multicomponent lattice Boltzmann method. It is ableto simulate the behavior of fluids with several physical states (phase) and it is also able to modelseveral fluids (component) interacting with each other. Section 3 presents then several recent

works involving lattice Boltzmann method on GPUs. Section 4 mostly concerns the main

contribution of this paper: the inclusion of a progressive mesh method in the simulation code. The

principles of the method and the definition of an adapted criterion are firstly introduced. Theintegration on a single-node multi-GPU architecture is then described. An analysis concerning

performance is also studied in section 5. The conclusion and future works are finally presented inthe last section.

2. THE LATTICE BOLTZMANN METHOD

2.1. The Single relaxation time Bhatnagar-Gross-Krook (SRT-BGK) Boltzmann

equation

The lattice Boltzmann method is based on three main discretizations: space, time and velocities.Velocity space is reduced to a finite number of well-defined vectors. Figures 1(a) and 1(b)

illustrate this discrete scheme for D2Q9 and D3Q19 model.

The simulation grid is therefore discretized as a Cartesian grid and calculation steps are achieved

on this entire grid. The discrete Boltzmann equation[1] with a single relaxation timeBhatnagar-Gross-Krook (SRT-BGK) collision term is defined by the following equation:

, Δ , 1 , , (1) , , 1

2 2 (2)

13 Δ!

Δ " (3)

The function , corresponds to the discrete density distribution function along velocityvector at a position and a time . The parameter corresponds to the relaxation time of thesimulation. The value is the fluid density and corresponds to the fluid velocity. Δ!andΔ arethe spatial and temporal steps of the simulation respectively. Parameters # are weighting valuesdefined according to the lattice Boltzmann scheme and can be found in [1].Macroscopic

quantities as density and velocity are finally computed as follows:

(a) D2Q9 scheme (b) D3Q19 schemeFigure 1: Example of Lattice Boltzmann schemes


3/15


3

, $ , (4), , $ , (5)

2.2. Multiphase and Multi Component Lattice Boltzmann Model

Multiphase and multicomponent models (MPMC) allow performing complex simulations

involving several physical components. In this section, a MPMC-LBM model based on the work

achieved by Bao& Schaeffer [4] is presented.It includes several interaction forces based onpseudo-potential. It is calculated as follows:

%& ' 2(& &)&& (6)The term (& is the pressure term. It is calculated by the use of an equation of state as the Peng-Robinson equation:

(& &*&+&1 && -&.+&&1 2& (7)Internal forces are then computed. The internal fluid interaction force is expressed as follows [2][3]:

/&& 0 )&2 %& $ #! %& 1 02 )&2 %&#%& (8)

The value0 is a weighting term generally fixed to 114 according to [2] [3]. The inter-componentforce is also introduced as follows [4]:

/&& )&&2 %&$#! %& (9)Additional forces can be added into the simulation code as the gravity force, or a fluid-structure

interaction [3]. The incorporation of the force term is then achieved by a modifiedcollisionoperator expressed as follows:

&, , Δ &, , 1 &, , &,, Δ&, (10)Δ&, &,&, & Δ& &,& , & (11)Δ& /&Δ

& (12)

Macroscopic quantities for each component are finally computed by the use of equations (4) and(5).

3. LATTICE BOLTZMANN METHODS AND GPUS

The mass parallelism of GPUs has been quickly exploited in order to perform fast simulations[7]

[8] using lattice Boltzmann method. Recent works have shown that GPUs are also used with

multiphase and multicomponent models [16] [14]. The main aspects of GPU optimizations are


4/15

International Journal of Distri

decomposed into several categoverlap of memory transfersoptimize global memory band

Concerning LBM, an adapted d

studied and has proven to be effi

Several access patterns are als

pattern, consists of using two ctemporal and spatial dependenc

reading distribution functions fr

reciprocally. This pattern is comsingle GPU. Several techniqu

significantly the computationa

compression [6], Swap algorithtechnique is used in order to sav

Recent works involving implem

of several GPUs are also availab

entire simulation domain into sLBM kernels on each sub-do

context. Communications betweZero-copy feature allows to per

GPU pointers. Data must howperformance.

Some approaches have finally

constituted of multiple GPUs byour case, we only dispose of o

these architectures in this paper.

4. A PROGRESSIVE MESH

ON SINGLE-NODE MULTI-

4.1. Motivation

Works described in the previousdivided into subdomains acco

subdomains are therefore calcula

Figure 2: Division of the simulatio

buted and Parallel Systems (IJDPS) Vol.6, No.5, Septem

ries [10] [9] as thread level parallelism, GPU meith computations …. Data coalescence is needewidth. This implies several conditions as desc

ta structure such as the Structure of Array (SoA)

cient on GPU [7].

described in the literature. The first one, name

alculation grids in GPU global memory in order tof the data (Equation (10)). Simulation steps alte

m A and writing them to B, and reading from B an

monly used and offers very good performance [10]es are however presented in literature in ord

l memory cost without loss of information s

[6] or A-A pattern technique [12]. In this paper, thmemory due to spatial and temporal data dependen

ntation of lattice Boltzmann method on a single-n

le. A first solution, proposed in [13] [17], consists i

ubdomains according to the number of GPUs anain in parallel. CPU threads are used to handle

n sub-domains are performed using zero-copy memform efficient communications by a mapping betw

ever be read and written only once in order to

een proposed recently to perform simulations on

the use of MPI in combination with CUDA [19][1 e computing node with multiple GPUs thus we d

ALGORITHM FOR LATTICE BOLTZMANN

PU ARCHITECTURES

section consider that the entire simulation domain iding to the number of GPUs, as shown on

ted in parallel.

n domain: the entire domain is decomposed into subdom

to the number of GPUs.

er 2015

4

mory access,in order to

ribed in [9].

as been well

A-B access

manage thenate between

writing to A

[11] [9] on ar to reduce

ch as grids

A-A patterncy.

de composed

dividing the

d performingeach CUDA

ory transfers.en CPU and

obtain good

everal nodes

][21] [15]. Inon't focus on

METHODS

s meshed andigure 2. All

ins according


5/15


In this paper, a new approach idoes not requires to be fully menew progressive mesh method

propagation of the simulated

beginning of the simulation (Figpropagation of the fluid as can b

the simulation geometry (Figure

simulations. It is also a real advaof pipes or channels. It can in

geometry used for the simulatio

Figure 3: Example of a 3D simu

created at the beginning of the sim

fluid, (c) all subdomains

The progressive mesh algorithm

create a new subdomain to theexisting subdomains. Calculatio

optimization factor.

4.2. Definition of a Criterio

The definition of a criterion is afor the simulation. This criterion

velocity seems like a good choifluid velocity between two ite

dispersion. Our criterion is ther

56&The symbol 5 5stands for the Efor all active subdomains on th

boundary, a new subdomain is cgenerally fixed to 7 in this papeeach subdomain.

(a)


s considered. For most simulations, the entire domshed at the beginning of the simulation. We propo

in order to dynamically create the mesh acco

luid. The idea consists in defining a first subd

ure 3(a)). Several subdomains can then be createdseen of Figure 3(b). This method finally adapts au

3(c)). This method is therefore applicable for any

ntage for an application on industrial structures mosdeed save a lot of memory and calculations acc

.

lation using the progressive mesh algorithm: (a) a first su

lation, (b) several subdomains are created following the

are created and completely adapt to the simulation geom

firstly needs the introduction of an adapted criteri

simulation. This new subdomain needs then to bes on single-node multi-GPU architecture are finally

for the Progressive Mesh

n important aspect in order to efficiently create neneeds to represent efficiently the propagation of fl

e in order to define an efficient criterion. The difrations is considered in order to observe efficie

fore defined as follows for thecomponent 8: 5&, Δ &, 5 clidean norm in this paper. This criterion needs to

boundaries. If the criterion exceeds anarbitrary t

reated next to this boundary as shown on Figure 4.r in order to detect any change of velocity on the

(b) (c)

er 2015

5

ain generallyse therefore arding to the

main at the

following theomatically to

eometry and

tly composedrding to the

bdomain is

ropagation of

try.

n in order to

connected toan important

subdomainsid. The fluid

erence of thetly the fluid

(13)

be calculated

reshold

9on a

he value9 isoundaries of


6/15


Figure 4: The criterion ǁC_α (x) ǁ_

then a ne

4.3. Algorithm

This section describes the algor

model with the inclusion ofsummarize the previous sectio

subdomains are achieved at theprocess. Figure 5 describes our r

Figure 5: Algorithm for the multiph

our progressive mesh me


2 is calculated on the boundary. If the criterion exceeds t

w subdomain is created next to the boundary.

ithm for the multiphase and multicomponent latti

ur progressive mesh algorithm. It is also usefuls. The calculation of the criterion and the cre

last step of the algorithm in order to not disturb tsulting algorithm.

ase and multicomponent Lattice Boltzmann model with t

hod. For colors, please refer to the PDF version of this p

er 2015

6

e threshold S

e Boltzmann

in order totion of new

he simulation

e inclusion of

per.


7/15


4.4. Integration on Single-N

Efficiency of inter-GPU commuperformance. Indeed, our simuldynamically. The repartition o

optimization. An efficient assigsimulation. Indeed, it can reduce

simulation time.

4.4.1. Overlap Communication

Several data exchanges are need

inter-component /! implies topropagation step of LBM also i

GPUs (Figure 6). Aligned buffer

In order to obtain a simulation

with algorithm calculations. In

obtain a significant performancethe computation process into

Computations on the needed bosubdomains are also done whilperformed simultaneously with c

In most cases for lattice Boltzmpage-locked memory which allo

[17][13] [15].A different approa

In most recent HPC architectureperformance, Nvidia launched

Figure 6: Schematic example f

corresponds to values to comversion of this paper.


de Multi-GPU Architecture

nications is surely the most difficult task in order tations are composed of numerous subdomains whi

GPUs to the different subdomains is an import

ment can have an important impact on the perforthe communication time between subdomains and

s with Computations

d for this type of model. The computation of intera

have access to neighboring values of the pseudo-

mplies to communicate several distribution functi

s may be used for data transactions.

ime as short as possible, it is necessary to overlap

eed, overlapping computations and communicati

gain by reducing the waiting time of data. The idea2 steps: boundary calculations and interior

undaries are firstly done. Communications betweecomputing the interior. The different communica

alculations which allow good efficiency.

ann method, memory is transferred via zero-copy tw good overlapping between communications and

h is studied in this paper concerning inter-GPU co

s, several GPUs can be connected to the same PCIPUDirect with CUDA 4.0.This technology allo

or communication of distribution functions in 2D:

unicate between subdomains. For colors, please refer t

er 2015

7

obtain goodch are addedant factor of

mance of theso reduce the

tion /: andotential. The

ns between

data transfer

ns allows to

is to separatecalculations.

neighboringions are thus

ansactions tocomputations

munications.

. To improves to perform

ed arrows

o the PDF


8/15


Peer-to-Peer transfers and meperform data transfer using Peerzero-copy transactions for other

of the CPU and therefore to acc

improves performance and the e

Figure

4.4.2 Optimization of Data Tra

The repartition of GPUs is an

Communications cost is generalexchanges between sub domains

associated with one GPU.The

belonging to the same GPU. Icommunications are performed

concern communications betwe

however made between Peer-togoal to optimize dynamically the

For a new sub domain ;, the fun

Where ;7? @ ABCDE-FAABCDE-FAEThe function =;, ; comparsubdomain and its neighbors. A

to-Peer communications. The f

The function /; needs theref cost. This function is calculated

assigned to this subdomain. Indynamically and the same GPU

assigned. Figure 8 explains via


9/15


5. RESULTS AND PERFOR

5.1. Hardware

8 NVIDIA Tesla C2050 graphisimulations. Table 1 describe

communications for our architec

Tabl

Figure 9: Peer-to-

CUDA co

Total amoun

(14) Multiprocessors

GP

L2

Total amount of s

Total number of re

Figure 8: Schematic example in/; is calculated for all availablcolors, pl


ANCE

s cards Fermi architecture based machine are uss some Tesla C2050 hardware specifications.

ture are also described in Figure 9.

1: Tesla C2050 Hardware specifications

eer communications accessibility for our architecture.

mpute capability 2.

t of global memory 2687 M

, (32) scalar processors/MP 448 CUD

clock rate 1147

ache size 786432

ared memory per block 49152

isters available per block 327

2D for the optimization of the repartition of GPUs. The

GPUs and the GPU which have the minimum value is c

ease refer to the PDF version of this paper.

er 2015

9

d to performPeer-to-Peer

Bytes

A cores

Hz

bytes

ytes

8

unction

hosen. For


10/15


10

5.2. Simulations

Two simulations are considered on large simulation domain in order to evaluate the performanceof our contribution. Both simulations include the use of two physical components. The geometryhowever differs between these simulations. The first simulation is based on a simple geometry

composed of 1024*256*256 calculation cells where a fluid fills all simulation domains during thesimulation (Figure 10). The second simulation is based on a complex geometry composed of

1024*1024*128 calculations cells where the fluid moves within channels (Figure 11).

5.3. Performance

This section deals with the performance obtained by our method. A comparison between the

progressive mesh algorithm and the static mesh method generally used in literature is shown. Theoptimization of the repartition of GPUs on subdomains is also studied. The performance metric

generally used for lattice Boltzmann method is the Million Lattice nodes Updates Per Second(MLUPS). It is calculated as follows:

HEMNOPQ KDR-BF ABC @ FRE D BE-BDFAABRS-BDF BR (16)This classical approach generally used in literature in order to perform simulations consists inequally dividing the simulation domain according to the number of GPUs. It offers generally

good performance as communications can be overlapped with calculations. The use of Peer-to-

Peer communications also has a beneficial effect on the performance, as shown on Figure 13.Peer-to-Peer communications allow obtaining a performance gain between 8 and 12% according

Figure 10: A two-component leakage simulation on a simple geometry with a domain size of

1024*256*256 cells.

Figure 11: A two-component leakage simulation on a complex geometry composed of channels with a

domain size of 1024*1024*128 cells.


11/15


to the number of GPUs uscommunications offer a good scof Peer-to-Peer communications,

The inclusion of the progressiv

performance. Sub domains of siand 14 describes performance

simulation presented on Figureperformance at the beginning

simulation has for consequen

simulation. In this particular casshown on Figure 14, which lead

mesh. In terms of memory conslead to have the entire simulatio

Figure 12: Comparison of p

commun

Figure 13: Comparison of perfor

method for the simulation shown o


d for the simulation described in Figure 1aling but an almost perfect scaling is obtained withas shown on Figure 12.

mesh also has an important beneficial effect on t

e 128*128*128 are considered for these simulatioin terms of calculations and memory consum

10. Note that the progressive mesh algorithm obtof the simulation. The addition of sub domain

e a decrease of performance until the conver

, all simulation domain is meshed at the end of thes to a very slight decrease of performance compare

umption, fast apparitions of news sub domains aredomain in memory after a few iterations.

rformance between Peer-to-Peer communications with z

ications for the simulation shown on Figure 10.

mance between the progressive mesh method and the stat

n Figure 10. The inclusion of the optimization for GPU a

is also presented.

er 2015

11

. Zero-copythe inclusion

he simulation

s. Figures 13tion for the

ins excellents during the

ence of the

imulation, asto the static

noted which

ero-copy

ic mesh

signment


12/15


Figure 13 also compares perfor

is a simple assignation which as

uses the optimization method pleads to an important difference

noted at the convergence of thisdue to the fact that the commu

optimized assignment. Since subtherefore important to optimize t

The same comparison is also do

15 and 16. The main differencecomplex and channelized. Physion industrial structures.

In this case, the progressive mmethod is easily able to simulate

while the static mesh method is

is indeed too important forconsumption during the simulati

less important than the static mfor this particular simulation.

automatically adapts to the evolsimulation domain are meshed.

Figure 14: Comparison of memormesh met


ance between two different assignments for GPUs.

signs to new subdomain the first available GPU. T

esented in section 4.4.2. The comparison of theseof performance. Indeed, a difference of approxima

simulation between the two approaches. This differication cost is more important for a simple assign

domains are added dynamically and connected to ehese communications in order to reduce the simulati

e for the simulation presented on Figure 11, as sho

in this situation is the geometry of the simulationcal simulations on channelized geometry are espe

esh method shows excellent results. In terms ofon a global simulation domain of size 1024*1024*

nable to perform the simulation. The amount of ne

his simulation. Figure 15 shows the evolutionon. The memory cost at the convergence of the si

sh method. A gain of approximatively 50% of meThis is due to the fact that the progressive

ution of the simulation and so only needed zones

y consumption between the progressive mesh method anhod for the simulation shown on Figure 10.

er 2015

12

The first one

e second one

two methodstively 30% is

nce is mostlyment than an

ch other, it ison time.

n on Figures

hich is moreially present

memory, this128 and more

ded memory

of memoryulation is far

ory is notedesh method

of the global

the static


13/15


Figure 15: Comparison of memory

meth

The comparison of the repartiperformance gain (19%) is still

method is important in order tonot need to be fully meshed brin

an important impact on the perfo

Figure 16: Comparison of performa

of GP

6. CONCLUSION

In this paper, an efficient progr

Boltzmann method is presented.

perform several types of physautomatically added to the simul

to save a lot of memory and ca


consumption between the progressive mesh method and t

d for the simulation shown on Figure 11.

tion of GPUs is also described in Figure 16.oted for this simulation. This proves that a dynamic

btain good performance. Moreover, the fact that thegs an important gain in performance. The geometry

rmance on the progressive mesh method.

nce between a simple repartition of GPUs with an optimi

s for the simulation shown on Figure 11.

ssive mesh algorithm for physical simulations usi

This progressive mesh method can be a useful to

ical simulations. Its main advantage is that suation by the use of an adapted criterion. This meth

lculations in order to perform simulations on large

er 2015

13

he static mesh

n importantoptimization

domain doeshas therefore

ed assignment

ng the lattice

l in order to

domains ared is also able

installations.


14/15


14

The integration of the progressive mesh method on single-node multi-GPU architecture is alsotreated. A dynamic optimization of the repartition of GPUs to subdomains is an important factorin order to obtain good performance. The combination of all these contributions allows therefore

performing fast physical simulations on all types of geometry. The progressive mesh method istherefore an interesting alternative because it allows obtaining similar or better performances than

the usual static mesh method.

The progressive mesh algorithm is however limited to the memory of the GPU which is generallyfar more inferior to the CPU RAM. The creation of new subdomains is indeed possible while

there is a sufficient amount of memory on the GPUs. Extensions of this work to cases that require

more memory than all GPUs can handle is now under investigation. Data transfer optimizationswith the CPU host will therefore be essential to keep good performances.

ACKNOWLEDGEMENTS

This work has been made possible thanks to collaboration between academic and industrial

groups, gathered by the INNOCOLD association.

REFERENCES

[1] B. Chopard, J.L. Falcone J. Latt, The Lattice Boltzmann advection diffusion model revisited, The

European Physical Journal - Special Topics,Vol. 171, pp. 245-249, 2009.

[2] S. Gong, P. Cheng, Numerical investigation of droplet motion and coalescence by an improved latticeBoltzmann model for phase transitionsand multiphase flows, Computers & Fluids , Vol. 53, pp. 93-

104, 2012.

[3] S. Gong, P. Cheng, A lattice Boltzmann method for liquid vapor phase change heat transfer,

Computers & Fluids, Vol. 54, pp. 93-104, 2012.

[4] J. Bao, L. Schaeffer, Lattice Boltzmann equation model for multicomponent multi-phase flow with

high density ratios, Applied MathematicalModelling, 2012.

[5] Nvidia, C. U. D. A. (2011). Nvidia cuda c programming guide. NVIDIA Corporation, 120, 18.8

[6] M. Wittmann, T. Zeiser, G. Hager, G. Wellein, Comparison of different propagation steps for Lattice

Boltzmann methods, Computers and Mathematicswith Applications, Vol. 65 pp. 924-935, 2013.

[7] J. Tölke, Implementation of a Lattice Boltzmann kernel using the compute unified device architecturedeveloped by nVIDIA, Computing andVisualization in Science, 1-11, 2008.

[8] J. Tölke, M. Krafczyk, TeraFLOP computing on a desktop PC with GPUs for 3D CFD, International

Journal of Computational Fluid Dynamics 22(7), pp. 443-456, 2008.

[9] F. Kuznik, C.Obrecht, G. Rusaouën, J-J. Roux, LBM based flow simulation using GPU computing

processor, Computers and Mathematics withApplications 27, 2009.

[10] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, A new approach to the lattice Boltzmann method

for graphics processing units, Computersand Mathematics with Applications 61, pp. 3628-3638, 2011.

[11] P.R. Rinaldi, E.A Dari, M.J. Vénere, A. Clausse, A Lattice-Boltzmannsolver for 3D fluid on GPU,

Simulation Modeling Pratice and Theory 25,pp. 163-171, 2012.

[12] P. Bailey, J. Myre, S. Walsh, D. Lilja, M. Saar, Accelerating lattice boltzmann fluid flows using

graphics processors, International Conferenceon Parallel Processing, pp. 550-557, 2009.

[13] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, Multi-GPU implementation of the lattice

Boltzmann method, Computers and Mathematicswith Applications, 80, pp. 269-275, 2013.

[14] X. Li, Y. Zhang, X. Wang, W. Ge, GPU-based numerical simulation of multi-phase flow in porousmedia using multiple-relaxation-time latticeBoltzmann method, Chemical Engineering Science, Vol.

102, pp. 209-219,2013.

[15] M. Januszewski, M. Kostur, Sailfish: A flexible multi-GPU implementationof the lattice Boltzmann

method, Computer Physics Communications,Vol. 185, pp. 2350-2368, 2014.[16] F. Jiang, C. Hu, Numerical simulation of a rising CO2 droplet in the initial accelerating stage by a

multiphase lattice Boltzmann method,Applied Ocean Research, Vol. 45, pp. 1-9, 2014.


15/15


15

[17] C. Obrecht, F. Kuznik, B. Tourancheau, and J.-J. Roux, Multi-GPU Implementation of a Hybrid

Thermal Lattice Boltzmann Solver using theTheLMA Framework, Computers and Fluids, Vol. 80,

pp. 269275, 2013.

[18] C. Rosales, Multiphase LBM Distributed over Multiple GPUs, CLUSTER’11 Proceedings of the

2011 IEEE International Conference onCluster Computing, pp. 1-7, 2011.

[19] C. Obrecht, F. Kuznik, B. Tourancheau, J-J. Roux, Scalable lattice Boltzmann solvers for CUDA

GPU clusters, Parallel Computing, Vol.39, pp. 259-270, 2013.[20] J. Habich, C. Feichtinger, H. Köstler, G. Hager, G. Wellein, Performance engineering for the lattice

Boltzmann method on GPGPUs: Architecturalrequirements and performance results, Computer &

Fluids, Vol. 80, pp.276-282, 2013.

[21] C. Feichtinger, J. Habich, H. Köstler, U. Rüde, T. Aoki, Performance Modeling and Analysis of

Heterogeneous Lattice Boltzmann Simulationson CPU-GPU Clusters, Parallel Computing, 2014.

AUTHORS

Julien Duchateau is a PhD student in computer science at the Université du Littoral Côte d’Opale in France.

His main research interest are massive parallelism on CPUs and GPUs, physical simulations and computer

graphics.

François Rousselle is an associate professor in computer science at the Université du Littoral Côte d’Opale

in France. His main research interests are computer graphics, physical simulations, virtual reality andmassive parallelism.

Nicolas Maquignon is a PhD student in simulation and numerical physics at the Université du Littoral Côte

d’Opale. His main research interests are numerical physics, numerical mathematics and numerical

modeling.

Christophe Renaud is a professor in computer science at the Université du Littoral Côte d’Opale in France.

His main research interests are computer graphics, virtual reality, physical simulations and massive

parallelism.

Gilles Roussel is an associate professor in automatic at the Université du Littoral Côte d’Opale in France.His main research interests are automatic, signal processing, physical simulations and industrial computing.

A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE...

Documents

Transcript of A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN METHOD ON SINGLE-NODE...