Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel...

13
The Jacobi and Gauß- . . . CG Algorithm Other Algorithms Page 1 of 13 Examples of Parallel Algorithms Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October 5, 2006 Ioan Lucian Muntean Department of Computer Science – Chair V Technische Universität München, Germany

Transcript of Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel...

Page 1: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 1 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Fifth SimLab Short Course on

Parallel Numerical Simulation

Belgrade, October 1-7, 2006

Examples of Parallel Algorithms

October 5, 2006

Ioan Lucian MunteanDepartment of Computer Science – Chair VTechnische Universität München, Germany

Page 2: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 2 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

6.1. The Jacobi and Gauß-Seidel Iterations• scenario:

– solve an elliptic partial differential equation (PDE) with Dirichlet boundary con-ditions on a given domain Ω

– simple example: Poisson’s equation ∆u = f on the unit square Ω =]0, 1[2 withu given on Ω’s boundary

∆u(x, y) =∂2u(x, y)

∂x2+

∂2u(x, y)

∂y2= f(x, y) for (x, y) ∈ Ω,

u(x, y) = g(x, y) for (x, y) ∈ δΩ

– the function u(x, y) (or an approximation to it) has to be found

– occurrences: a fitted membrane, the stationary heat equation, ...

• discretization:

– for its solution, the PDE has to be discretized

– again a simple example: the finite difference discretization for mesh width h

∂2u(x, y)

∂x2≈ u(x− h, y)− 2u(x, y) + u(x + h, y)

h2,

∂2u(x, y)

∂y2≈ u(x, y − h)− 2u(x, y) + u(x, y + h)

h2

Page 3: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 3 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

The Jacobi and Gauß-Seidel Iterations (cont’d)

• discretization (cont’d):

– introduce an equidistant grid of (N + 1)2 grid points ui,j ≈ u(ih, jh), i =0, ..., N , j = 0, ..., N , N = 1/h

– resulting discrete equation in the interior:

ui,j−1 + ui−1,j − 4ui,j + ui+1,j + ui,j+1 = h2fi,j , 0 < i, j < N

this scheme is called five-point difference star

– resulting equation on the boundary:

ui,j = g(ih, jh), i = 0 or i = N or j = 0 or j = N

Page 4: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 4 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

The Resulting System of Linear Equations• for each inner point one linear equation in the unknowns ui,j

• equations in points next to the boundary (i.e. i = 1 or i = N−1 or j = 1 or j = N−1)access the boundary values

– these are shifted to the right-hand side of the equation– hence, all unknowns are located to the left of the ‘=’ sign, all known quantities

to its right

• assemble the overall vector of unknowns by lexicographic row-wise ordering

• result: system Ax = b of (N − 1)2 linear equations in (N − 1)2 unknowns• matrix A is block-tridiagonal with identity or tridiagonal blocks I or T , resp.

A =

0BBBBBB@

T II T I

I. . .

. . .. . .

. . . II T

1CCCCCCA , T =

0BBBBBB@

−4 11 −4 1

1. . .

. . .. . .

. . . 11 −4

1CCCCCCA ∈ RN−1,N−1

Page 5: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 5 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Solving Large Sparse Systems of Linear Equations

• the standard textbook method is Gaussian elimination

• this is a so-called direct solver which provides the exact solution of the system (apartfrom round-off errors)

• drawbacks of Gaussian elimination:

– for M unknowns, one needs O(M3) arithmetic operations (not acceptable forreally large M as they are standard in modern simulation problems)

– the algorithm does not exploit the sparsity of the matrix:existing zeroes are “destroyed” (turned into non-zeroes), which produces morecomputational work and more storage requirements

• therefore: use iterative methods instead

– they approach the exact the solution and approximate it, but typically don’treach it

– one step of iteration costs O(M) operations

– typically much less than O(M2) steps needed (the gain)

– ideal case (multigrid or multilevel methods): only O(1) steps needed

– basic (and not that sophisticated) methods (number of steps still depending onM ):

* relaxation methods: Jacobi, Gauß-Seidel, SOR

* minimization methods: steepest descent, conjugate gradients

Page 6: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 6 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

The Jacobi Iteration• decompose A in its diagonal part DA, its upper triangular part UA, and its lower

triangular part LA:A = LA + DA + UA

• starting point: b = Ax = DAx + (LA + UA)x

• writing b = DAx(it+1) + (LA + UA)x(it) with x(it) denoting the approximation to xafter it steps of the iteration leads to the following iterative scheme:

x(it+1) := −D−1A (LA + UA)x(it) + D−1

A b = x(it) + D−1A r(it)

where the residual is defined as r(it) = b−Ax(it)

• or in a more explicit algorithmic form:

for it=0,1,2,...:for k=1,...,M:

x(it+1)k = 1

ak,k

“bk −

Pj 6=k ak,jx

(it)j

”• for our special A resulting from the finite difference discretization of the Poisson equa-

tion, this means (pay attention to the indices!):

for it=0,1,2,...:for j=1,...,N-1:

for i=1,...,N-1:

u(it+1)i,j = 1

4

“u

(it)i,j−1 + u

(it)i−1,j + u

(it)i,j+1 + u

(it)i+1,j − h2fi,j

”• remember that the boundary values are fixed

Page 7: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 7 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

The Gauß-Seidel Iteration

• take the same decomposition A = LA + DA + UA

• new starting point: b = Ax = (DA + LA)x + UAx

• writing b = (DA + LA)x(it+1) + UAx(it) leads to the following iterative scheme:

x(it+1) := − (DA + LA)−1UAx(it) + (DA + LA)−1b = x(it) + (DA + LA)−1r(it)

• or in a more explicit algorithmic form:

for it=0,1,2,...:for k=1,...,M:

x(it+1)k = 1

ak,k

“bk −

Pk−1j=1 ak,jx

(it+1)j −

PMj=k+1 ak,jx

(it)j

”• for our special A resulting from the finite difference discretization of the Poisson equa-

tion, this means (pay attention to the indices!):

for it=0,1,2,...:for j=1,...,N-1:

for i=1,...,N-1:

u(it+1)i,j = 1

4

“u

(it+1)i,j−1 + u

(it+1)i−1,j + u

(it)i,j+1 + u

(it)i+1,j − h2fi,j

”• remember again that the boundary values are fixed

• there is no general superiority of Gauß-Seidel to Jacobi;in our case discussed here, however, Gauß-Seidel is twice as fast as Jacobi

Page 8: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 8 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Parallelizing Jacobi

• note that neither Jacobi nor Gauß-Seidel are used today any more – they are tooslow; nevertheless, the algorithmic aspects are still of interest

• a parallel Jacobi algorithm is quite straightforward:

– in the current iteration step, only values from the previous step are used

– hence, all updates of one iteration step can be made in parallel (if that manyprocessors are available)

– more realistic scenario: subdivide the domain into strips or squares, for exam-ple (what is better with respect to a good communication-computation ratio?)

Page 9: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 9 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Parallelizing Jacobi (cont’d)

• each processor needs for its calculations:

– if adjacent to the boundary: a subset of the boundary values– one row or one column of values from the processors dealing with the neigh-

bouring subdomains– some hint when to stop

• the above considerations lead to the following algorithm each processor has to exe-cute:

1. update all local approximate values u(it)i,j to u

(it+1)i,j

2. send all updates in points next to interior boundaries to the respective proces-sors

3. receive all necessary updates from the “neighbouring” processors4. compute the local residual values and provide them via a reduce operation5. receive the overall residual as the reduce operation’s result and go back to 1. if

this value is larger than some given threshold

Page 10: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 10 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Parallelizing Gauß-Seidel• at first glance, there seems to be an enforced sequential order, since the updated

values are immediately used where available

• remedy: change the order of visiting and updating the grid points

• first possibility: wavefront ordering

– diagonal order of updating

– all values along a diagonal line can be updated in parallel

– the single diagonal lines have to be processed sequentially, however

– problem: suppose we have P = N − 1 processors; then there are P 2 overallupdates that can be organized in 2P − 1 sequential steps (diagonals), whichrestricts the speed-up to roughly P/2

– better: use P = (N − 1)/k processors only; the we get k sequential strips ofkP 2 updates and kP + P − 1 sequential internal steps; now, the speed-up isgiven by k · kP 2/(k(kP + P − 1)), which is roughly kP/(k + 1)

Page 11: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 11 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

Parallelizing Gauß-Seidel (cont’d)

• second possibility: red-black or checkerboard ordering

– give the grid points a checkerboard colouring of red (!) and black

– order of visiting and updating: first lexicographically the red ones, then lexico-graphically the black ones

– no dependences within the red set nor within the black set

– subdivide the grid such that each processor has some red and some blackpoints (roughly the same number)

– the result: two necessarily sequential steps (red and black), but perfect paral-lelism within each of them

Page 12: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 12 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

6.2. CG Algorithm

Conjugate Gradients

• above method + efficient construction of the conjugate directions

• principle of construction: Gram-Schmidt conjugation of r’s

• no detailed derivation here, just the algorithm:

repeat(i) :αi =d(i)T

r(i)

d(i)T Ad(i);

x(i+1) = x(i) + αid(i);

r(i+1) = r(i) − αiAd(i);

βi+1 =r(i+1)T

r(i+1)

r(i)T r(i);

d(i+1) = r(i+1) + β(i+1)d(i);

• faster than steepest descent, but still depending on n!

• search spaces form a so-called Krylov sequence:

spand(0), . . . , d(i−1) = spand(0), Ad(0), . . . , Ai−1d(0)

= spanr(0), Ar(0), . . . , Ai−1r(0)

• other famous Krylov methods: GMRES, Bi-CGSTAB

Page 13: Parallel Numerical Simulation · Ioan Lucian Muntean Fifth SimLab Short Course on Parallel Numerical Simulation Belgrade, October 1-7, 2006 Examples of Parallel Algorithms October

The Jacobi and Gauß- . . .

CG Algorithm

Other Algorithms

Page 13 of 13

Examples of Parallel AlgorithmsIoan Lucian Muntean

6.3. Other Algorithms

• just some name dropping, due to lack of time

• graph partitioning:

– take a graph and try to define P subsets of points such that the number ofconnections (edges) between the subsets becomes as small as possible

– example 1: an arbitrary sparse matrix; unknowns are points of the graph, non-zero matrix entries are edges; how to parallelize an iterative algorithm?

– example 2 (and closely related): a finite element mesh; grid points are pointsof the graph, neighbourship relations are edges; how to define subdomains inan optimal way?

• domain decomposition methods

• ...