Lattice Boltzmann with CUDA - … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ......
Transcript of Lattice Boltzmann with CUDA - … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ......
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 127.01.2009
Lattice Boltzmann with CUDA
Lan Shi, Li Yi & Liyuan Zhang
Hauptseminar: Multicore Architectures and Programming
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 227.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 327.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 427.01.2009
Overview of LBM
Lattice Boltzmann Method is a class of computational fluid dynamics methods for fluid simulation
CFD Methods:volume mesh (irregular/regular) - Euler equations- Navier-Stokes equationsSmoothed particle hydrodynamics (SPH): - Lagrangian method Spectral methods:- spherical harmonics - Chebyshev polynomialsLBM: simulate an equivalent mesoscopic system on a Cartesian grid
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 527.01.2009
Overview of LBM
from macroscropic to mesoscopic to microscropic
ier
vrρur
T
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 627.01.2009
Overview of LBM
lattice structure:D2Q9, D3Q19 ...
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 727.01.2009
Overview of LBM
boundary condition:Domain boundary: - the out-most surrounding lattice nodes Obstacle boundary: - the objects as obstacles inside the lattice grid to block the fluid
flowSolution:- not change- bounce-back
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 827.01.2009
Overview of LBM
LBM is Rresource intensive! > 100x100x100 grid points
not practical due to the slow speed of memory access and long processing timeexplicit in nature & require only next neighbor interaction
very suitable for the implementation on GPUs
Parallel computingSingle-Program Multiple-Data (SPMD) Modelwithin-processor memory
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 927.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1027.01.2009
Target Model
Lid Driven Cavity
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1127.01.2009
Reforming of LBM EquationDiscrete Lattice Boltzmann equation
Collide Step:
Stream Step:
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1227.01.2009
Stream StepFluid particles propagate to neighboring cells
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1327.01.2009
Collide Step
4/91/91/36 --1 0 11 0 1
1 0 1 0 -- 11
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1427.01.2009
For non-moving walls:
For moving wall:
: Velocity of the moving wall
Boundary Condition (BC) Treatment
--1 0 11 0 1
1 0 1 0 -- 11
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1527.01.2009
Initialization
Boundary Condition Treatment
Perform Stream operation
Perform Collide operation
End time is reached
False
End
True
Incremented by time step
1. Initialize distribution functions , density , and velocity for each cell
2. Set initial time (t0)
3. Treat boundary cells
4. Perform Stream operation
5. Perform Collide operation
6. Increment time by step
7. Go to step 3 unless end time reached
Algorithm
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1627.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformance Demo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1727.01.2009
Implementation in CUDA und Optimization
Kernels
#define BLOCK_SIZE 16
…
dim3 dimBlock( BLOCK_SIZE, BLOCK_SIZE );
dim3 dimGrid( (cmd.sizex+2) / BLOCK_SIZE, (cmd.sizey+2) / BLOCK_SIZE );…
BC<<<dimGrid,dimBlock>>>(d_cell, d_rho, d_wall_velocity, d_sizex, d_sizey);
Stream<<<dimGrid,dimBlock>>>( d_cell, d_temp_cell, d_sizex );
Collide<<<dimGrid,dimBlock>>>( d_cell, d_rho, d_u, d_omega, d_sizex, d_sizey );…
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1827.01.2009
Implementation in CUDA und OptimizationCoalesce
Block: 16x16 =256 cellCell: 0..9 means (C,N,S,W,E,NW,NE,SW,SE,Flag)Uncoalesced access :
Coalesced access:
0..9 0..9 0..9 0..9 0..9 0..9All 256 cells
0,0,…,0 1,1,…,1 2,2,…,2 3,3,…,3 4,4,…,4 9,9,…,9All 10 elements
10-vectors
256-vectors
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 1927.01.2009
Implementation in CUDA und OptimizationGhost Cell
Block( i , j ) Block( (i+1) , j )
0,0 1,0 2,0 … 15,0 16,00,1
0,0 1,0 2,0 … 15,00,1
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2027.01.2009
Implementation in CUDA und OptimizationGhost Cell
How it works
……
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2127.01.2009
Implementation in CUDA und OptimizationMatrix vs. Standard Block
Matrix complementationdecomposed in blocksevery block must be 16x16 cells
If the block on the edge is small than 16x16, then completed with “0”
a
b
x
y
Original Matrix
Standard matrix
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2227.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2327.01.2009
Chart : optimization
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2427.01.2009
Chart : GPU vs GPU
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2527.01.2009
Outline
Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo
Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design
Page 2627.01.2009
References
http://www.wikipedia.orghttp://www10.informatik.uni-erlangen.dehttp://www12.informatik.uni-erlangen.dehttp://math.nist.gov/mcsd/savg/parallel/index.html