Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf ·...
Transcript of Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf ·...
![Page 1: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/1.jpg)
Technology for a better society 1
Martin Lilleeng Sætra
University of Oslo
18-06-2013
Local Adaptive Mesh Refinement
on the GPU
![Page 2: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/2.jpg)
Technology for a better society
Outline
2
• The starting point: Our GPU-based shallow water simulator + Original Local Adaptive Mesh Refinement-paper
• Novel work: Adaptive mesh refinement fully implemented on the GPU
• Performance results
• Video: Malpasset dam break case with adaptive mesh refinement
![Page 3: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/3.jpg)
Technology for a better society
The Shallow Water Equations (SWE)
3
Vector ofConserved variables Flux Functions
Bed slopesource term
Bed frictionsource term
Numerical Simulation of the SWE:• Hyperbolic partial differential equation
– Enables explicit schemes.• Solutions form discontinuities / shocks
– Require high accuracy in smooth parts without oscillations near discontinuities.
• Solutions include dry areas– Avoid negative water depths in
simulations.• Accuracy
– 2nd order spatial/temporal discretization.
Scheme of choice: A. Kurganov and G. Petrova, A Second-Order Well-Balanced Positivity Preserving Central-Upwind Scheme for the Saint-Venant System Communications in Mathematical Sciences, 5 (2007), 133-160
![Page 4: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/4.jpg)
Technology for a better society
Spatial Discretization
4
Write on vector form:
Impose finite-volume grid with discrete fluxes:
Continuous equation
Discrete spatial grid
Discrete flux calculation
![Page 5: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/5.jpg)
Technology for a better society
Calculate Fluxes
5
Continuous variables Discrete variables
Dry states fix
Slope reconstruction
Evaluate integration pointsFlux calculation
Vector ofConservedvariables
Flux FunctionsBed slope
source termBed frictionsource term
![Page 6: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/6.jpg)
Technology for a better society 6
Evolve in Time
![Page 7: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/7.jpg)
Technology for a better society
One Full Time Step Using the Kurganov-Petrova-scheme on the GPU
7
3. Halfstep
1. Calculate fluxes
4. Calculate fluxes5. Evolve in time
6. Apply boundaryconditions
2. Calculate Δt
![Page 8: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/8.jpg)
Technology for a better society
Domain Decomposition
8
• “Traditional” CUDA block decomposition.• Each Streaming Multiprocessor of the GPU computes on a small 2D block.• Neighboring blocks use overlap to exchange information.• Global ghost cells for boundary conditions (wall, open, fixed depth etc.).
![Page 9: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/9.jpg)
Technology for a better society
Local Adaptive Mesh Refinement (AMR)
9
• Goal: Increase accuracy, minimize cost.
Cost
Accuracy
Grid re
finement
![Page 10: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/10.jpg)
Technology for a better society
Capture Local Features in the Solution
10
![Page 11: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/11.jpg)
Technology for a better society
“AMR Grid” Hierarchy
11
• “Traditional” CUDA block decomposition within each grid.• Initialize new subgrids at level ℓ +1, with twice the resolution, using
reconstructed cell values from current grids at level ℓ. • The boundary interfaces of grid ℓ +1 must be aligned with grid cell
boundaries in grid ℓ.• Each new subgrid can be viewed as a stand-alone simulator.
(ℓ=0)
(ℓ=1)
(ℓ=1)
(ℓ=2)
Grid 0
Grid 1 Grid 2
Grid 3
![Page 12: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/12.jpg)
Technology for a better society
Keep Data on the GPU
12
![Page 13: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/13.jpg)
Technology for a better society
AMR: Time Integration
13
AMR: Refine
![Page 14: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/14.jpg)
Technology for a better society 14
Extending the Simulation Cycle
BC
Compute boundaries
Reset AMR data(Not a CUDA kernel)
Flux correction
Coarsen / Average
• Pre-step • “Regular” time stepping
• Post-step
![Page 15: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/15.jpg)
Technology for a better society
Boundary Values – Space
15
• Save the solution at the beginning and at the end of each time step from the level ℓ grid at all boundaries to level ℓ+1 subgrids.
• Each subgrid has data structures for saving these values.• The solution is reconstructed in space at both the beginning and the
end of the time step.
ℓ
ℓ+1
![Page 16: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/16.jpg)
Technology for a better society
Boundary Values – Time
16
• The saved boundary values are then linearly interpolated in time, by a special boundary condition kernel used on all grids except the root grid.
t+Δt0t
Grid 0 (ℓ)
Grid 1 (ℓ+1)
Δt0
Δt1 Δt2
![Page 17: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/17.jpg)
Technology for a better society
Time Step Size
17
• Since the time step size, Δt, is dependent on the eigenvalues of the solution, each grid has a different Δt.
• After each time step on the level ℓ grid, the time step size is given to all level ℓ+1 grids.
• The last time step in Grid 1 and 2 (Δt2 and Δt4) is reduced so that all grids reach t+Δt0 after one full simulation cycle.
t+Δt0t
Grid 0 (ℓ)
Grid 1 (ℓ+1)
Grid 2 (ℓ+1)
Δt0
Δt1 Δt2
Δt3 Δt4
![Page 18: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/18.jpg)
Technology for a better society
Averaging From Level ℓ +1 to ℓ
18
• After all level ℓ+1 subgrids are advanced to the same time as the level ℓ grid, the distribution of the solution will be more accurate in the level ℓ+1 subgrids.
• The solution is “moved up” in the AMR hierarchy by replacing the values in level ℓ by the values from a overlaying subgrid at level ℓ+1.
• A simple average is used.
ℓ+1 ℓ
![Page 19: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/19.jpg)
Technology for a better society
Flux Correction Step – Fine-coarse
19
• Fluxes across all fine-coarse (ℓ+1 to ℓ) interfaces are accumulated for each time step in the level ℓ+1 subgrids.
• After all the level ℓ+1 subgrids are at the same advanced time as the level ℓ grid, a flux correction step is performed on the level ℓ grid.
• The correction step uses the difference between the flux from the level ℓ grid and the accumulated fluxes from the level ℓ+1 subgrids, and corrects the values in the level ℓ grid.
t0
t0
t1
t2
t1
t2
ℓ+1 ℓ
(Δt1 + Δt2 = Δt0)
![Page 20: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/20.jpg)
Technology for a better society
Flux Correction Step – Fine-fine
20
• Fine grid cells adjacent to other fine grids also needs to be corrected.• Again, we use the difference between the flux from the level ℓ grid and
the accumulated fluxes from the level ℓ+1 subgrids, but now we write the correction to the adjacent ℓ+1 grid, instead of the level ℓ grid.
• This, together with the coarse-fine flux correction, maintains conservation of mass.
t2
t2
t0
t1
t0
t1
ℓ+1
(Δt0 + Δt1 = Δt2 + Δt3)
ℓ+1
t3
t3
![Page 21: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/21.jpg)
Technology for a better society 21
Extending the Simulation Cycle
BC
Compute boundaries
Reset AMR data(Not a CUDA kernel)
Flux correction
Coarsen / Average
• Pre-step • “Regular” time stepping
• Accumulate fine fluxes
• Reduce last time step
• Special BC
• Post-step
![Page 22: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/22.jpg)
Technology for a better society
AMR: Refine
22
AMR: Time Integration
![Page 23: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/23.jpg)
Technology for a better society
“Correct” Refinement Criteria
23
![Page 24: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/24.jpg)
Technology for a better society
Refinement Test
24
• After a given number of time steps, each cell in every grid is checked against a given refinement criterion, and potentially flagged for refinement. This test is performed by a dedicated kernel.
• The same kernel does a reduction to tiles, which is a collection of cells, and writes the number of cells marked for refinement per tile to a map.
• This map is used to generate new proposed grids.
• The new proposed grids are checked for overlaps with existing grids. In case of overlap, the proposed grids are broken down into smaller grids.
• The two last points are the only work done by the CPU.
![Page 25: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/25.jpg)
Technology for a better society
Initializing New Subgrids
25
• The bathymetry of new subgrids are bilinearly interpolated from the parent grid using effective dedicated GPU hardware (texture memory).
• The initialization of the variables is done by reconstructing the solution from the last time step on the parent grid, and evaluating it in the cell centers of the new grid.
![Page 26: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/26.jpg)
Technology for a better society
Performance Results – N Subgrids (2D)
26
![Page 27: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/27.jpg)
Technology for a better society
Performance Results – 1 Subgrid
27
![Page 28: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/28.jpg)
Technology for a better society 28
![Page 29: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/29.jpg)
Technology for a better society
Video: Malpasset Dam Break Case w/AMR
29
./malpasset_amr.avi
![Page 30: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/30.jpg)
Technology for a better society 30
Thank you for your attention
Contact:Martin Lilleeng Sæ[email protected]: http://martinsa.at.ifi.uio.no/
SINTEF homepage: http://www.sintef.no/heterocomp
Acknowledgements: Mustafa Altinakar, André R. Brodtkorb, Christopher Dyken, Trond R. Hagen, Knut-Andreas Lie, and Jostein R. Natvig.
![Page 31: Local Adaptive Mesh Refinement on the GPUmartinsa.at.ifi.uio.no/files/siam_gs13_amr_on_gpu.pdf · Local Adaptive Mesh Refinement on the GPU. Technology for a better society Outline](https://reader030.fdocuments.us/reader030/viewer/2022040117/5e08aea5b198e4038e26016e/html5/thumbnails/31.jpg)
Technology for a better society
• A. R. Brodtkorb, M. L. Sætra, and M. Altinakar, Efficient Shallow Water Simulations on GPUs: Implementation, Visualization, Verification, and Validation, Computers & Fluids 55(0):1–12, 2012.
33
References