Numerical Simulation of 3D Fully Nonlinear Waters Waves on Parallel Computers
Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel...
Transcript of Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel...
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 1 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Fifth SimLab Short Course on
Parallel Numerical Simulation
Belgrade, October 1-7, 2006
Examples of Parallel Algorithms
October 5, 2006
Ioan Lucian MunteanDepartment of Computer Science – Chair VTechnische Universität München, Germany
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 2 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
6.1. The Jacobi and Gauß-Seidel Iterations• scenario:
– solve an elliptic partial differential equation (PDE) with Dirichlet boundary con-ditions on a given domain Ω
– simple example: Poisson’s equation ∆u = f on the unit square Ω =]0, 1[2 withu given on Ω’s boundary
∆u(x, y) =∂2u(x, y)
∂x2+
∂2u(x, y)
∂y2= f(x, y) for (x, y) ∈ Ω,
u(x, y) = g(x, y) for (x, y) ∈ δΩ
– the function u(x, y) (or an approximation to it) has to be found
– occurrences: a fitted membrane, the stationary heat equation, ...
• discretization:
– for its solution, the PDE has to be discretized
– again a simple example: the finite difference discretization for mesh width h
∂2u(x, y)
∂x2≈ u(x− h, y)− 2u(x, y) + u(x + h, y)
h2,
∂2u(x, y)
∂y2≈ u(x, y − h)− 2u(x, y) + u(x, y + h)
h2
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 3 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
The Jacobi and Gauß-Seidel Iterations (cont’d)
• discretization (cont’d):
– introduce an equidistant grid of (N + 1)2 grid points ui,j ≈ u(ih, jh), i =0, ..., N , j = 0, ..., N , N = 1/h
– resulting discrete equation in the interior:
ui,j−1 + ui−1,j − 4ui,j + ui+1,j + ui,j+1 = h2fi,j , 0 < i, j < N
this scheme is called five-point difference star
– resulting equation on the boundary:
ui,j = g(ih, jh), i = 0 or i = N or j = 0 or j = N
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 4 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
The Resulting System of Linear Equations• for each inner point one linear equation in the unknowns ui,j
• equations in points next to the boundary (i.e. i = 1 or i = N−1 or j = 1 or j = N−1)access the boundary values
– these are shifted to the right-hand side of the equation– hence, all unknowns are located to the left of the ‘=’ sign, all known quantities
to its right
• assemble the overall vector of unknowns by lexicographic row-wise ordering
• result: system Ax = b of (N − 1)2 linear equations in (N − 1)2 unknowns• matrix A is block-tridiagonal with identity or tridiagonal blocks I or T , resp.
A =
0BBBBBB@
T II T I
I. . .
. . .. . .
. . . II T
1CCCCCCA , T =
0BBBBBB@
−4 11 −4 1
1. . .
. . .. . .
. . . 11 −4
1CCCCCCA ∈ RN−1,N−1
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 5 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Solving Large Sparse Systems of Linear Equations
• the standard textbook method is Gaussian elimination
• this is a so-called direct solver which provides the exact solution of the system (apartfrom round-off errors)
• drawbacks of Gaussian elimination:
– for M unknowns, one needs O(M3) arithmetic operations (not acceptable forreally large M as they are standard in modern simulation problems)
– the algorithm does not exploit the sparsity of the matrix:existing zeroes are “destroyed” (turned into non-zeroes), which produces morecomputational work and more storage requirements
• therefore: use iterative methods instead
– they approach the exact the solution and approximate it, but typically don’treach it
– one step of iteration costs O(M) operations
– typically much less than O(M2) steps needed (the gain)
– ideal case (multigrid or multilevel methods): only O(1) steps needed
– basic (and not that sophisticated) methods (number of steps still depending onM ):
* relaxation methods: Jacobi, Gauß-Seidel, SOR
* minimization methods: steepest descent, conjugate gradients
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 6 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
The Jacobi Iteration• decompose A in its diagonal part DA, its upper triangular part UA, and its lower
triangular part LA:A = LA + DA + UA
• starting point: b = Ax = DAx + (LA + UA)x
• writing b = DAx(it+1) + (LA + UA)x(it) with x(it) denoting the approximation to xafter it steps of the iteration leads to the following iterative scheme:
x(it+1) := −D−1A (LA + UA)x(it) + D−1
A b = x(it) + D−1A r(it)
where the residual is defined as r(it) = b−Ax(it)
• or in a more explicit algorithmic form:
for it=0,1,2,...:for k=1,...,M:
x(it+1)k = 1
ak,k
“bk −
Pj 6=k ak,jx
(it)j
”• for our special A resulting from the finite difference discretization of the Poisson equa-
tion, this means (pay attention to the indices!):
for it=0,1,2,...:for j=1,...,N-1:
for i=1,...,N-1:
u(it+1)i,j = 1
4
“u
(it)i,j−1 + u
(it)i−1,j + u
(it)i,j+1 + u
(it)i+1,j − h2fi,j
”• remember that the boundary values are fixed
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 7 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
The Gauß-Seidel Iteration
• take the same decomposition A = LA + DA + UA
• new starting point: b = Ax = (DA + LA)x + UAx
• writing b = (DA + LA)x(it+1) + UAx(it) leads to the following iterative scheme:
x(it+1) := − (DA + LA)−1UAx(it) + (DA + LA)−1b = x(it) + (DA + LA)−1r(it)
• or in a more explicit algorithmic form:
for it=0,1,2,...:for k=1,...,M:
x(it+1)k = 1
ak,k
“bk −
Pk−1j=1 ak,jx
(it+1)j −
PMj=k+1 ak,jx
(it)j
”• for our special A resulting from the finite difference discretization of the Poisson equa-
tion, this means (pay attention to the indices!):
for it=0,1,2,...:for j=1,...,N-1:
for i=1,...,N-1:
u(it+1)i,j = 1
4
“u
(it+1)i,j−1 + u
(it+1)i−1,j + u
(it)i,j+1 + u
(it)i+1,j − h2fi,j
”• remember again that the boundary values are fixed
• there is no general superiority of Gauß-Seidel to Jacobi;in our case discussed here, however, Gauß-Seidel is twice as fast as Jacobi
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 8 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Parallelizing Jacobi
• note that neither Jacobi nor Gauß-Seidel are used today any more – they are tooslow; nevertheless, the algorithmic aspects are still of interest
• a parallel Jacobi algorithm is quite straightforward:
– in the current iteration step, only values from the previous step are used
– hence, all updates of one iteration step can be made in parallel (if that manyprocessors are available)
– more realistic scenario: subdivide the domain into strips or squares, for exam-ple (what is better with respect to a good communication-computation ratio?)
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 9 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Parallelizing Jacobi (cont’d)
• each processor needs for its calculations:
– if adjacent to the boundary: a subset of the boundary values– one row or one column of values from the processors dealing with the neigh-
bouring subdomains– some hint when to stop
• the above considerations lead to the following algorithm each processor has to exe-cute:
1. update all local approximate values u(it)i,j to u
(it+1)i,j
2. send all updates in points next to interior boundaries to the respective proces-sors
3. receive all necessary updates from the “neighbouring” processors4. compute the local residual values and provide them via a reduce operation5. receive the overall residual as the reduce operation’s result and go back to 1. if
this value is larger than some given threshold
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 10 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Parallelizing Gauß-Seidel• at first glance, there seems to be an enforced sequential order, since the updated
values are immediately used where available
• remedy: change the order of visiting and updating the grid points
• first possibility: wavefront ordering
– diagonal order of updating
– all values along a diagonal line can be updated in parallel
– the single diagonal lines have to be processed sequentially, however
– problem: suppose we have P = N − 1 processors; then there are P 2 overallupdates that can be organized in 2P − 1 sequential steps (diagonals), whichrestricts the speed-up to roughly P/2
– better: use P = (N − 1)/k processors only; the we get k sequential strips ofkP 2 updates and kP + P − 1 sequential internal steps; now, the speed-up isgiven by k · kP 2/(k(kP + P − 1)), which is roughly kP/(k + 1)
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 11 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
Parallelizing Gauß-Seidel (cont’d)
• second possibility: red-black or checkerboard ordering
– give the grid points a checkerboard colouring of red (!) and black
– order of visiting and updating: first lexicographically the red ones, then lexico-graphically the black ones
– no dependences within the red set nor within the black set
– subdivide the grid such that each processor has some red and some blackpoints (roughly the same number)
– the result: two necessarily sequential steps (red and black), but perfect paral-lelism within each of them
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 12 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
6.2. CG Algorithm
Conjugate Gradients
• above method + efficient construction of the conjugate directions
• principle of construction: Gram-Schmidt conjugation of r’s
• no detailed derivation here, just the algorithm:
repeat(i) :αi =d(i)T
r(i)
d(i)T Ad(i);
x(i+1) = x(i) + αid(i);
r(i+1) = r(i) − αiAd(i);
βi+1 =r(i+1)T
r(i+1)
r(i)T r(i);
d(i+1) = r(i+1) + β(i+1)d(i);
• faster than steepest descent, but still depending on n!
• search spaces form a so-called Krylov sequence:
spand(0), . . . , d(i−1) = spand(0), Ad(0), . . . , Ai−1d(0)
= spanr(0), Ar(0), . . . , Ai−1r(0)
• other famous Krylov methods: GMRES, Bi-CGSTAB
The Jacobi and Gauß- . . .
CG Algorithm
Other Algorithms
Page 13 of 13
Examples of Parallel AlgorithmsIoan Lucian Muntean
6.3. Other Algorithms
• just some name dropping, due to lack of time
• graph partitioning:
– take a graph and try to define P subsets of points such that the number ofconnections (edges) between the subsets becomes as small as possible
– example 1: an arbitrary sparse matrix; unknowns are points of the graph, non-zero matrix entries are edges; how to parallelize an iterative algorithm?
– example 2 (and closely related): a finite element mesh; grid points are pointsof the graph, neighbourship relations are edges; how to define subdomains inan optimal way?
• domain decomposition methods
• ...