3D bubbly flow simulation on the GPU - Iterative Solution...

24
3D bubbly flow simulation on the GPU - Iterative Solution of a linear system using sub-domain and level-set deflation Rohit Gupta, Martin van Gijzen, Kees Vuik PDP 2013, Belfast, UK. PDP 2013, Belfast, UK. February 27, 2013

Transcript of 3D bubbly flow simulation on the GPU - Iterative Solution...

Page 1: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

3D bubbly flow simulation on theGPU - Iterative Solution of a linear

system using sub-domain andlevel-set deflation

Rohit Gupta, Martin van Gijzen, Kees Vuik

PDP 2013, Belfast, UK.

PDP 2013, Belfast, UK. February 27, 2013

Page 2: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Outline

Problem Description

Preconditioning

Deflation

Deflation Vectors Implementation

Results and Observations

Conclusions

2

Page 3: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Problem Description

Bubbly Flow

Mass-Conserving Level-Set method to solve the Navier Stokes equation. Markerfunction φ changes sign at interface.

S(t) = x |φ(x , t) = 0. (1)

Interface is evolved using advection of Level-Set function

∂φ

∂t+ u. φ = 0 (2)

1A mass-conserving Level-Set method for modeling of multi-phase flows. S.P. van der Pijl, A. Segal and C. Vuik.

International Journal for Numerical Methods in Fluids 2005; 47:339–361

3

Page 4: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Problem Description

Bubbly Flow

−∇.(1

ρ(x)∇p(x)) = f (x), x ∈ Ω (1)

∂np(x) = 0, x ∈ ∂Ω (2)

Pressure-Correction (above) equation is discretized to a linear system Ax = b. Most time consuming part is the solution of this linear system A is Symmetric Positive-Semi-Definite (SPSD) so Conjugate Gradient is the

method of choice.

3

Page 5: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Problem Description

8 bubbles 9 bubbles

3

Page 6: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Preconditioning

100

101

102

103

10−3

10−10

10−5

100

105

1010

index

mag

nitu

de

A

D−1A

M−1A (ic)

M−1A (neu2)

Figure: Spectrum of preconditioned matrices. 163 number of unknowns.

4

Page 7: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

PreconditioningTruncated Neumann Series Preconditioning1,2

M−1 = K T D−1K , where K = (I − LD−1 + (LD−1)2 + · · · ). (1)

1. More terms give better approximation.2. In general the series converges if ‖ LD−1 ‖∞< 1.

3. As much parallelism on offer as Sparse Matrix Vector Product.

L is the strictly lower triangular of A,where D = diag(A).

1A vectorizable variant of some ICCG methods. Henk A. van der Vorst. SIAM Journal of Scientific Computing. Vol. 3 No. 3

September 1982.2

Approximating the Inverse of a Matrix for use in Iterative Algorithms on Vector Processors. P.F. Dubois. Computing (22)1979.

4

Page 8: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

DeflationBackground

Removes small eigenvalues from the eigenvalue spectrum of M−1A.The linear system Ax = b can then be solved by employing the splitting,

x = (I − PT )x + PT x where P = I − AQ. (2)

⇔ Pb = PAx . (3)

Q = ZE−1Z T , E = Z T AZ .E is the coarse system that is solved every iteration.Z is the deflation sub-space matrix. It contains an approximation of the eigenvectorsof M−1A.

For our experiments Z consists of piecewise constant vectors.

5

Page 9: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

DeflationDeflated Preconditioned Conjugate Gradient Algorithm

1: Select x0. Compute r0 := b − Ax0 and r0 = Pr0.2: Solve My0 = r0 and set p0 := y0.3: for j:=0,..., until convergence do4: wj := PApj

5: αj :=(rj ,yj )

(pj ,wj )

6: xj+1 := xj + αjpj

7: rj+1 := rj − αj wj

8: Solve Myj+1 = rj+1

9: βj :=(rj+1,yj+1)

(rj ,yj )

10: pj+1 := yj+1 + βjpj

11: end for12: xit := Qb + PT xj+1

5

Page 10: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

DeflationOperations involved in deflation1 2.

a1 = Z T p. m = E−1a1. a2 = AZm. w = p − a2.

where, E = Z T AZ is the Galerkin Matrix and Z is the matrix of deflation vectors.

1Efficient deflation methods applied to 3-D bubbly flow problems. J.M. Tang, C. Vuik Elec. Trans. Numer. Anal. 2007.

2An efficient preconditioned CG method for the solution of a class of layered problems with extreme contrasts in the

coefficients. C. Vuik, A. Segal, J.A. Meijerink J. Comput. Phys. 1999.

5

Page 11: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

DeflationChoices for Deflation Vectors.

Stripe-wise Block-wise Level-Set

5

Page 12: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

DeflationStripe-wise vectors are effective for simpler problems1.

Figure: Piece-wise constant Stripe vectors.

With piece-wise constant stripe-wise vectors one can reduce the first operation indeflation Z T p to a reduction and the matrix vector product AZm can be optimized asmuch as Ax .

1R.Gupta, Martin B. van Gijzen and Kees Vuik, Efficient Two-Level Preconditioned Conjugate Gradient Method on the GPU,

Proceedings of VECPAR2012, Springer LNCS.

5

Page 13: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Deflation

Deflation Vectors contd...

1. Sub-domain and Stripe-wise Vectors are simple to create.

2. Level-Set vectors require information from the level-set function.One way of defining them is one vector per bubble.

3. Level-Set Sub-domain vectors aim at capturing parts of bubblescut by sub-domains. Furthermore, they combine this informationwith bubble interfaces within sub-domains.

5

Page 14: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results

Stopping Criteria →‖b−Axk‖2

‖r0‖≤ ǫ

1. ǫ is the tolerance we set for the solution.

2. xk is the solution vector after k iterations of (P)CG.

3. r0 is the initial residual.

6

Page 15: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and ResultsHardware

1. CPU - single core of E8500-3.16 GHz.

2. GPU - Tesla C2070.

Software

1. Inner System solve on CPU is with CG. On GPU it is an Explicit inverse basedsolution.

2. First Level Preconditioning on CPU is Incomplete Cholesky (IC). On GPU it isTruncated Neumann Series based.

3. Deflation operation is highly-optimized on the CPU.

4. All deflation vectors are piece-wise constant.

6

Page 16: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and ResultsTiming and Speedup DefinitionSpeedup is measured as a ratio of the time taken(T ) to complete kiterations (of the DPCG method) on the two different architectures,

Speedup =TCPU

TGPU(2)

Number of Unknowns = 1283. Tolerance set to 10−6. Density Contrast is 10−3

Naming deflation vectors SD-i -> Sub-domain deflation with i vectors. LS-i -> Level-Set deflation with i vectors. LSSD-i -> Level-Set Sub-domain deflation with i vectors.

6

Page 17: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results8 bubbles - 8 Sub-domains

CPU CUSPDICCG(0) DPCG(neu2)

SD-8 SD-7 LS-7 LSSD-15Number of Iterations 197 245 381 203

Total Time 33.79 7.4 9.5 6.6Iteration Time 33.49 4.4 6.5 3.6

Speedup - 7.6 5.1 9.3

Table: 8 bubbles. Comparison of deflation vector choices on the GPU (CUSP &CUSPARSE based implementation).

6

Page 18: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results

9 bubbles - 8 Sub-domains

CPU GPU-CUSPDICCG(0) DPCG(neu2)

SD-8 SD-7 LS-7 LSSD-23Number of Iterations 508 632 381 206

Total Time 85.9 14.4 9.3 6.8Iteration Time 85.6 11.3 6.5 3.8

Speedup - 7.57 13.1 22.5

Table: 9 bubbles. Comparison of deflation vector choices for deflation on the GPU(CUSP based implementation) vs. CPU.

6

Page 19: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results

9 bubbles - 64 Sub-domains

CPU GPU-CUSPDICCG(0) DPCG(neu2)

SD-64 SD-63 LSSD-135Number of Iterations 472 603 136

Total Time 81.39 13.61 5.58Iteration Time 81.1 10.61 2.48

Speedup - 7.64 32.7

Table: 9 bubbles. Two deflation variants. GPU and CPU Execution Times andSpeedup.64 sub-domains.

6

Page 20: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results

9 bubbles - 512 Sub-domains

CPU GPU-CUSPDICCG(0) DPCG(neu2)SD-512 SD-511 LSSD-583

Number of Iterations 67 81 81Total Time 12.51 4.56 4.62

Iteration Time 12.18 1.56 1.62Speedup - 7.81 7.52

Table: 9 bubbles. Two deflation variants. GPU and CPU Execution Times andSpeedup.512 sub-domains.

6

Page 21: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Experiments and Results

Figure: Comparison of Speedup on GPU with openMP parallelization of CPUcode

6

Page 22: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Conclusions and Outlook

1. Sub-domain deflation can be an effective and computationallyadvantageous deflation vector choice on the GPU.

2. Level-Set Sub-domain deflation can handle cases when bubblescut sub-domains.

3. OpenMP based parallelization for CPU implementation improves,however sequential IC preconditioning dominates CPU code.

More work is needed to scale this approach of solving anill-conditioned linear system on multi-level parallel systems (e.g. withMPI).

7

Page 23: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Further References

1. R.Gupta, Martin B. van Gijzen and Kees Vuik, Efficient Two-LevelPreconditioned Conjugate Gradient Method on the GPU,Proceedings of VECPAR2012, Springer LNCS.

2. R.Gupta, Masters’ Thesis, Implementation of the DeflatedPreconditioned Conjugate Gradient Method for Bubbly Flow onthe Graphical Processing Unit(GPU), 2010.

3. Jok M. Tang, PhD Thesis, Two-level Preconditioned ConjugateGradient Methods with Applications to Bubbly Flows, TUDelft,2008.

8

Page 24: 3D bubbly flow simulation on the GPU - Iterative Solution ...ta.twi.tudelft.nl/nw/users/rohit/projects/talks/pdp2013.pdf · 3D bubbly flow simulation on the GPU - Iterative Solution

Questions/Suggestions/Comments

9