A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight...

25
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys University of Virginia Graphics Hardware 2003 July 26-27 – San Diego, CA

Transcript of A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight...

Page 1: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

A Multigrid Solver for Boundary Value Problems Using Programmable

Graphics HardwareNolan Goodnight Cliff Woolley Gregory Lewin

David Luebke Greg Humphreys

University of Virginia

Graphics Hardware 2003July 26-27 – San Diego, CA

Page 2: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

General-Purpose GPU Programming

Why do we port algorithms to the GPU?

How much faster can we expect it to be, really?

What is the challenge in porting?

Page 3: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Case Study

Problem: Implement a Boundary Value Problem (BVP) solver using the GPU

Could benefit an entire class of scientific and engineering applications, e.g.:

Heat transfer

Fluid flow

Page 4: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Related Work

Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms

Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

Very similar to our system Developed concurrently

Complementary approach

Page 5: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Driving problem: Fluid mechanics sim

Problem domain is a warped disc:

regular grid

regular grid

Page 6: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

BVPs: Background

Boundary value problems are sometimes governedby PDEs of the form:

= f

is some operator

is the problem domain

f is a forcing function (source term)

Given and f, solve for .

Page 7: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

BVPs: Example

Heat Transfer Find a steady-state temperature distribution T

in a solid of thermal conductivity k with thermal source S

This requires solving a Poisson equation of the form:

k2T = -S

This is a BVP where is the Laplacian operator 2

All our applications require a Poisson solver.

Page 8: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

BVPs: Solving

Most such problems cannot be solved analytically

Instead, discretize onto a grid to form a set of linear equations, then solve:

Direct elimination

Gauss-Seidel iteration

Conjugate-gradient

Strongly implicit procedures

Multigrid method

Page 9: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Multigrid method

Iteratively corrects an approximation to the solution

Operates at multiple grid resolutions

Low-resolution grids are used to correct higher-resolution grids recursively

Very fast, especially for large grids: O(n)

Page 10: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Multigrid method

Use coarser grid levels to recursively correct an approximation to the solution

Algorithm:

smooth

residual

restrict recurse

interpolate 1

111 -4

1/8

1/8

1/81/8 1/4

1/16

1/16

1/16

1/16 1/2

1/2

1/21/2 11/4

1/4

1/4

1/4

= i - f

Page 11: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Implementation

For each step of the algorithm:

Bind as texture maps the buffers that contain the necessary data

Set the target buffer for rendering

Activate a fragment program that performs the necessary kernel computation

Render a grid-sized quad with multitexturing

fragment program

render target buffer

render target buffer

source buffer texture

source buffer texture

Page 12: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver

Detect steady-state natively on GPU

Minimize shader length

Special-case whenever possible

Avoid context-switching

Page 13: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Steady-state

How to detect convergence?

L1 norm - average error

L2 norm – RMS error (common in visual sim)

L norm – max error (common in sci/eng apps) Can use occlusion query!

secs to steady statevs. grid size

Page 14: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Shader length

Minimize number of registers used

Vectorize as much as possible

Use the rasterizer to perform computations of linearly-varying values

Pre-compute invariants on CPU

shader original fp

fastpath fp

fastpath vp

smooth 79-6-1 20-4-1 12-2

residual 45-7-0 16-4-0 11-1

restrict 66-6-1 21-3-0 11-1

interpolate 93-6-1 25-3-0 13-2

Page 15: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Special-case

Fast-path vs. slow-path

write several variants of each fragment program to handle boundary cases

eliminates conditionals in the fragment program

equivalent to avoiding CPU inner-loop branching

slow path with boundaries

fast path, no boundaries

Page 16: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Special-case

Fast-path vs. slow-path

write several variants of each fragment program to handle boundary cases

eliminates conditionals in the fragment program

equivalent to avoiding CPU inner-loop branching

secs per v-cyclevs. grid size

Page 17: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Context-switching

Find best packing data of multiple grid levelsinto the pbuffer surfaces

Page 18: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Context-switching

Find best packing data of multiple grid levelsinto the pbuffer surfaces

Page 19: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Context-switching

Find best packing data of multiple grid levelsinto the pbuffer surfaces

Page 20: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Optimizing the Solver: Context-switching

Remove context switching

Can introduce operations with undefined results: reading/writing same surface

Why do we need to do this?

Can we get away with it?

What about superbuffers?

Page 21: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Data Layout

Performance:

secs to steady statevs. grid size

Page 22: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Data Layout

Compute 4 values at a time

Requires source, residual, solution values to be in different buffers

Complicates boundary calculations

Adds setup and teardown overhead

Stacked domain

Possible additional vectorization:

Page 23: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Results: CPU vs. GPU

Performance:

secs to steady statevs. grid size

Page 24: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Conclusions

What we need going forward:

Superbuffers or: Universal support for multiple-surface

pbuffers

or: Cheap context switching

Developer tools Debugging tools

Documentation

Global accumulator

Ever increasing amounts of precision, memory Textures bigger than 2048 on a side

Page 25: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Acknowledgements

Hardware

David Kirk

Matt Papakipos

Driver Support

Nick Triantos

Pat Brown

Stephen Ehmann

Fragment Programming

James Percy

Matt Pharr

General-purpose GPU

Mark Harris

Aaron Lefohn

Ian Buck

Funding

NSF Award #0092793