Lattice Boltzmann with CUDA - … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ......

26
Friedrich-Alexander-Universität Erlangen-Nürnberg Hardware-Software-Co-Design Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming

Transcript of Lattice Boltzmann with CUDA - … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ......

Page 1: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 127.01.2009

Lattice Boltzmann with CUDA

Lan Shi, Li Yi & Liyuan Zhang

Hauptseminar: Multicore Architectures and Programming

Page 2: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 227.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo

Page 3: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 327.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo

Page 4: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 427.01.2009

Overview of LBM

Lattice Boltzmann Method is a class of computational fluid dynamics methods for fluid simulation

CFD Methods:volume mesh (irregular/regular) - Euler equations- Navier-Stokes equationsSmoothed particle hydrodynamics (SPH): - Lagrangian method Spectral methods:- spherical harmonics - Chebyshev polynomialsLBM: simulate an equivalent mesoscopic system on a Cartesian grid

Page 5: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 527.01.2009

Overview of LBM

from macroscropic to mesoscopic to microscropic

ier

vrρur

T

Page 6: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 627.01.2009

Overview of LBM

lattice structure:D2Q9, D3Q19 ...

Page 7: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 727.01.2009

Overview of LBM

boundary condition:Domain boundary: - the out-most surrounding lattice nodes Obstacle boundary: - the objects as obstacles inside the lattice grid to block the fluid

flowSolution:- not change- bounce-back

Page 8: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 827.01.2009

Overview of LBM

LBM is Rresource intensive! > 100x100x100 grid points

not practical due to the slow speed of memory access and long processing timeexplicit in nature & require only next neighbor interaction

very suitable for the implementation on GPUs

Parallel computingSingle-Program Multiple-Data (SPMD) Modelwithin-processor memory

Page 9: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 927.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo

Page 10: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1027.01.2009

Target Model

Lid Driven Cavity

Page 11: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1127.01.2009

Reforming of LBM EquationDiscrete Lattice Boltzmann equation

Collide Step:

Stream Step:

Page 12: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1227.01.2009

Stream StepFluid particles propagate to neighboring cells

Page 13: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1327.01.2009

Collide Step

4/91/91/36 --1 0 11 0 1

1 0 1 0 -- 11

Page 14: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1427.01.2009

For non-moving walls:

For moving wall:

: Velocity of the moving wall

Boundary Condition (BC) Treatment

--1 0 11 0 1

1 0 1 0 -- 11

Page 15: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1527.01.2009

Initialization

Boundary Condition Treatment

Perform Stream operation

Perform Collide operation

End time is reached

False

End

True

Incremented by time step

1. Initialize distribution functions , density , and velocity for each cell

2. Set initial time (t0)

3. Treat boundary cells

4. Perform Stream operation

5. Perform Collide operation

6. Increment time by step

7. Go to step 3 unless end time reached

Algorithm

Page 16: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1627.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformance Demo

Page 17: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1727.01.2009

Implementation in CUDA und Optimization

Kernels

#define BLOCK_SIZE 16

dim3 dimBlock( BLOCK_SIZE, BLOCK_SIZE );

dim3 dimGrid( (cmd.sizex+2) / BLOCK_SIZE, (cmd.sizey+2) / BLOCK_SIZE );…

BC<<<dimGrid,dimBlock>>>(d_cell, d_rho, d_wall_velocity, d_sizex, d_sizey);

Stream<<<dimGrid,dimBlock>>>( d_cell, d_temp_cell, d_sizex );

Collide<<<dimGrid,dimBlock>>>( d_cell, d_rho, d_u, d_omega, d_sizex, d_sizey );…

Page 18: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1827.01.2009

Implementation in CUDA und OptimizationCoalesce

Block: 16x16 =256 cellCell: 0..9 means (C,N,S,W,E,NW,NE,SW,SE,Flag)Uncoalesced access :

Coalesced access:

0..9 0..9 0..9 0..9 0..9 0..9All 256 cells

0,0,…,0 1,1,…,1 2,2,…,2 3,3,…,3 4,4,…,4 9,9,…,9All 10 elements

10-vectors

256-vectors

Page 19: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 1927.01.2009

Implementation in CUDA und OptimizationGhost Cell

Block( i , j ) Block( (i+1) , j )

0,0 1,0 2,0 … 15,0 16,00,1

0,0 1,0 2,0 … 15,00,1

Page 20: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2027.01.2009

Implementation in CUDA und OptimizationGhost Cell

How it works

……

Page 21: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2127.01.2009

Implementation in CUDA und OptimizationMatrix vs. Standard Block

Matrix complementationdecomposed in blocksevery block must be 16x16 cells

If the block on the edge is small than 16x16, then completed with “0”

a

b

x

y

Original Matrix

Standard matrix

Page 22: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2227.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo

Page 23: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2327.01.2009

Chart : optimization

Page 24: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2427.01.2009

Chart : GPU vs GPU

Page 25: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2527.01.2009

Outline

Overview of LBMAn usage of LBM AlgorithmImplementation in CUDA and OptimizationPerformanceDemo

Page 26: Lattice Boltzmann with CUDA -  … · Page 1 27.01.2009 Lattice Boltzmann with CUDA Lan Shi, ... LBM: simulate an ... 15,0 0,1 . Friedrich-Alexander ...

Friedrich-Alexander-Universität Erlangen-NürnbergHardware-Software-Co-Design

Page 2627.01.2009

References

http://www.wikipedia.orghttp://www10.informatik.uni-erlangen.dehttp://www12.informatik.uni-erlangen.dehttp://math.nist.gov/mcsd/savg/parallel/index.html