Recent advances in HPC with FreeFem+++/uploads/Main/P-Jolivet.pdf · Recent advances in HPC with...

42
Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud’homme.

Transcript of Recent advances in HPC with FreeFem+++/uploads/Main/P-Jolivet.pdf · Recent advances in HPC with...

  • Recent advances in HPC with FreeFem++

    Pierre Jolivet

    Laboratoire Jacques-Louis LionsLaboratoire Jean Kuntzmann

    Fourth workshop on FreeFem++

    December 6, 2012

    With F. Hecht, F. Nataf, C. Prud’homme.

  • Introduction New solvers Domain decomposition methods Conclusion

    Outline

    1 IntroductionFlashback to last yearA new machine

    2 New solversMUMPSPARDISO

    3 Domain decomposition methodsThe necessary toolsPreconditioning Krylov methodsNonlinear problems

    4 Conclusion

    2 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Where were we one year ago ?

    What we had:• FreeFem++ version 3.17,• a “simple” toolbox for domain decomposition methods,• 3 supercomputers (SGI, Bull, IBM).

    What we achieved:• linear problems up to 120 million unkowns in few minutes,• good scaling up to ∼2048 processes on a BlueGene/P.

    3 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ is working on the following parallel architectures:N◦ of cores Memory Peak performance

    hpc1@LJLL [email protected] GHz 640 Go ∼ 10 TFLOP/stitane@CEA [email protected] GHz 37 To 140 TFLOP/sbabel@IDRIS 40960@850 MHz 20 To 139 TFLOP/s

    curie@CEA [email protected] GHz 320 To 1.6 PFLOP/s

    http://www-hpc.cea.fr, Bruyères-le-Châtel, France.http://www.idris.fr, Orsay, France (grant PETALh).

    http://www.prace-project.eu (grant HPC-PDE).

    4 / 26 Pierre Jolivet HPC and FreeFem++

    http://www-hpc.cea.frhttp://www.idris.frhttp://www.prace-project.eu

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ is working on the following parallel architectures:N◦ of cores Memory Peak performance

    hpc1@LJLL [email protected] GHz 640 Go ∼ 10 TFLOP/stitane@CEA [email protected] GHz 37 To 140 TFLOP/sbabel@IDRIS 40960@850 MHz 20 To 139 TFLOP/scurie@CEA [email protected] GHz 320 To 1.6 PFLOP/s

    http://www-hpc.cea.fr, Bruyères-le-Châtel, France.http://www.idris.fr, Orsay, France (grant PETALh).http://www.prace-project.eu (grant HPC-PDE).

    4 / 26 Pierre Jolivet HPC and FreeFem++

    http://www-hpc.cea.frhttp://www.idris.frhttp://www.prace-project.eu

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ and the linear solvers

    set(A, solver = sparsesolver) in FreeFem++⇓

    instance of VirtualSolver from which inherits a solver.

    Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse&)2 the pure virtual method

    void Solver(const MatriceMorse&, KN_&,const KN_&) const

    3 the destructor MySolver(const MatriceMorse&)

    real[int] x = Aˆ-1 * b will call MySolver::Solver.

    5 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ and the linear solvers

    set(A, solver = sparsesolver) in FreeFem++⇓

    instance of VirtualSolver from which inherits a solver.

    Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse&)

    2 the pure virtual methodvoid Solver(const MatriceMorse&, KN_&,

    const KN_&) const3 the destructor MySolver(const MatriceMorse&)

    real[int] x = Aˆ-1 * b will call MySolver::Solver.

    5 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ and the linear solvers

    set(A, solver = sparsesolver) in FreeFem++⇓

    instance of VirtualSolver from which inherits a solver.

    Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse&)2 the pure virtual method

    void Solver(const MatriceMorse&, KN_&,const KN_&) const

    3 the destructor MySolver(const MatriceMorse&)

    real[int] x = Aˆ-1 * b will call MySolver::Solver.

    5 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ and the linear solvers

    set(A, solver = sparsesolver) in FreeFem++⇓

    instance of VirtualSolver from which inherits a solver.

    Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse&)2 the pure virtual method

    void Solver(const MatriceMorse&, KN_&,const KN_&) const

    3 the destructor MySolver(const MatriceMorse&)

    real[int] x = Aˆ-1 * b will call MySolver::Solver.

    5 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    FreeFem++ and the linear solvers

    set(A, solver = sparsesolver) in FreeFem++⇓

    instance of VirtualSolver from which inherits a solver.

    Interfacing a new solver basically consists in implementing:1 the constructor MySolver(const MatriceMorse&)2 the pure virtual method

    void Solver(const MatriceMorse&, KN_&,const KN_&) const

    3 the destructor MySolver(const MatriceMorse&)

    real[int] x = Aˆ-1 * b will call MySolver::Solver.

    5 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    MUltifrontal Massively Parallel Sparse direct Solver

    http://graal.ens-lyon.fr/MUMPS

    Distributed memory direct solver.Compiled by FreeFem++ with --enable-download.Renumbering via AMD, QAMD, AMF, PORD, (Par)METIS,

    (PT-)SCOTCH.

    Solves unsymmetric and symmetric linear systems !

    6 / 26 Pierre Jolivet HPC and FreeFem++

    http://graal.ens-lyon.fr/MUMPS

  • Introduction New solvers Domain decomposition methods Conclusion

    MUMPS and FreeFem++

    1 load "MUMPS"int[int] l = [1, 1, 2, 2];mesh Th;if(mpirank != 0) // no need to store the matrix on ranks other than 0

    5 Th = square(1, 1, label = l);else

    Th = square(150, 150, label = l);fespace Vh(Th, P2);

    9 varf lap(u, v) = int2d(Th)(dx(u)∗dx(v) + dy(u)∗dy(v)) + int2d(Th)(v)+ on(1, u = 1);

    real[int] b = lap(0, Vh);matrix A = lap(Vh, Vh);set(A, solver = sparsesolver);

    13 Vh x; x[] = A−1 ∗ b;plot(Th, x, wait = 1, dim = 3, fill = 1, cmm = "sparsesolver", value =

    1);

    7 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Intel MKL PARDISO

    http://software.intel.com/en-us/intel-mkl

    Shared memory direct solver.Part of the Intel Math Kernel Library.Renumbering via Metis or threaded nested dissection.

    Solves unsymmetric and symmetric linear systems !

    8 / 26 Pierre Jolivet HPC and FreeFem++

    http://software.intel.com/en-us/intel-mkl

  • Introduction New solvers Domain decomposition methods Conclusion

    PARDISO and FreeFem++

    load "PARDISO"2 include "cube.idp"int n = 35; int[int] NN = [n, n, n]; real[int, int] BB = [[0,1], [0,1],

    [0,1]]; int[int, int] L = [[1,1], [3,4], [5,6]];mesh3 Th = Cube(NN, BB, L);fespace Vh(Th, P2);

    6 varf lap(u, v) = int3d(Th)(dx(u)∗dx(v) + dz(u)∗dz(v) + dy(u)∗dy(v))+ int3d(Th)(v) + on(1, u = 1);

    real[int] b = lap(0, Vh);matrix A = lap(Vh, Vh);real timer = mpiWtime();

    10 set(A, solver = sparsesolver);cout

  • Introduction New solvers Domain decomposition methods Conclusion

    Is it really necessary ?

    If you are using direct methods within FreeFem++: yes !

    Why ? Sooner or later, UMFPACK will blow up:

    UMFPACK V5.5.1 (Jan 25, 2011): ERROR: out of memory

    umfpack_di_numeric failed

    10 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Motivation

    One of the most straightforward way to solve BVP in parallel.

    Based on the “divide and conquer” paradigm:1 assemble,2 factorize and3 solve smaller problems.

    11 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Motivation

    One of the most straightforward way to solve BVP in parallel.

    Based on the “divide and conquer” paradigm:1 assemble,2 factorize and3 solve smaller problems.

    11 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    A short introduction I

    Consider the following BVP in R2:

    ∇ · (κ∇u) = F in ΩB(u) = 0 on ∂Ω

    Then, solve in parallel:

    ∇ · (κ∇um+11 ) = F1 in Ω1B(um+11 ) = 0 on ∂Ω1 ∩ ∂Ω

    um+11 = um2 on ∂Ω1 ∩ Ω2

    ∇ · (κ∇um+12 ) = F2 in Ω2B(um+12 ) = 0 on ∂Ω2 ∩ ∂Ω

    um+12 = um1 on ∂Ω2 ∩ Ω1

    12 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    A short introduction I

    Consider the following BVP in R2:

    ∇ · (κ∇u) = F in ΩB(u) = 0 on ∂Ω

    Ω2Ω1

    Then, solve in parallel:

    ∇ · (κ∇um+11 ) = F1 in Ω1B(um+11 ) = 0 on ∂Ω1 ∩ ∂Ω

    um+11 = um2 on ∂Ω1 ∩ Ω2

    ∇ · (κ∇um+12 ) = F2 in Ω2B(um+12 ) = 0 on ∂Ω2 ∩ ∂Ω

    um+12 = um1 on ∂Ω2 ∩ Ω1

    12 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    A short introduction II

    We have effectively divided, but we have yet to conquer.

    Duplicated unknowns coupled via a partition of unity:

    I =N∑

    i=1RTi DiRi ,

    where RTi is the prolongation from Vi to V .

    Then,

    um+1 =N∑

    i=1RTi Dium+1i .

    13 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    A short introduction II

    We have effectively divided, but we have yet to conquer.

    Duplicated unknowns coupled via a partition of unity:

    I =N∑

    i=1RTi DiRi ,

    where RTi is the prolongation from Vi to V .Then,

    um+1 =N∑

    i=1RTi Dium+1i .

    13 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    One-level preconditioners

    A common preconditioner is:

    M−1RAS :=N∑

    i=1RTi Di(RiARTi )−1Ri .

    For future references, let Ãij := RiARTj .

    These preconditioners don’t scale as N increases:

    κ(M−1A) 6 C 1H2

    (1 + H

    δh

    )

    They act as low-pass filters (think about mulgrid methods).

    14 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    One-level preconditioners

    A common preconditioner is:

    M−1RAS :=N∑

    i=1RTi Di(RiARTi )−1Ri .

    For future references, let Ãij := RiARTj .

    These preconditioners don’t scale as N increases:

    κ(M−1A) 6 C 1H2

    (1 + H

    δh

    )

    They act as low-pass filters (think about mulgrid methods).

    14 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Two-level preconditionersA common technique in the field of DDM, MG, deflation:

    introduce an auxiliary “coarse” problem.

    Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define

    E := Z T AZ Q := ZE−1Z T .

    Z has O(N) columns, hence E is relatively smaller than A.The following preconditioner can scale theoretically:

    P−1A-DEF1 := M−1(I − AQ) + Q .

    15 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Two-level preconditionersA common technique in the field of DDM, MG, deflation:

    introduce an auxiliary “coarse” problem.

    Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define

    E := Z T AZ Q := ZE−1Z T .

    Z has O(N) columns, hence E is relatively smaller than A.

    The following preconditioner can scale theoretically:

    P−1A-DEF1 := M−1(I − AQ) + Q .

    15 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Two-level preconditionersA common technique in the field of DDM, MG, deflation:

    introduce an auxiliary “coarse” problem.

    Let Z be a rectangular matrix so that the “bad eigenvectors”of M−1A belong to the space spanned by its columns. Define

    E := Z T AZ Q := ZE−1Z T .

    Z has O(N) columns, hence E is relatively smaller than A.The following preconditioner can scale theoretically:

    P−1A-DEF1 := M−1(I − AQ) + Q .

    15 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Preconditioners in actionBack to our Krylov method of choice: the GMRES.

    procedure GMRES(input vector x0, right-hand side b)r0 ← P−1A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do

    w ← P−1A-DEF1Avifor j = 0, . . . , i do

    hj,i ← 〈w , vj〉end forṽi+1 ← w −

    ∑ij=1 hj,i vj

    hi+1,i ← ||ṽi+1||2vi+1 ← ṽi+1/hi+1,iapply Givens rotations to h:,i

    end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym

    end procedure

    global sparse matrix-vector multiplication

    global dot products

    global preconditioner-vector computation

    16 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Preconditioners in actionBack to our Krylov method of choice: the GMRES.

    procedure GMRES(input vector x0, right-hand side b)r0 ← P−1A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do

    w ← P−1A-DEF1Avifor j = 0, . . . , i do

    hj,i ← 〈w , vj〉end forṽi+1 ← w −

    ∑ij=1 hj,i vj

    hi+1,i ← ||ṽi+1||2vi+1 ← ṽi+1/hi+1,iapply Givens rotations to h:,i

    end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym

    end procedure

    global sparse matrix-vector multiplication

    global dot products

    global preconditioner-vector computation

    16 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Preconditioners in actionBack to our Krylov method of choice: the GMRES.

    procedure GMRES(input vector x0, right-hand side b)r0 ← P−1A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do

    w ← P−1A-DEF1Avifor j = 0, . . . , i do

    hj,i ← 〈w , vj〉end forṽi+1 ← w −

    ∑ij=1 hj,i vj

    hi+1,i ← ||ṽi+1||2vi+1 ← ṽi+1/hi+1,iapply Givens rotations to h:,i

    end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym

    end procedure

    global sparse matrix-vector multiplication

    global dot products

    global preconditioner-vector computation

    16 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Preconditioners in actionBack to our Krylov method of choice: the GMRES.

    procedure GMRES(input vector x0, right-hand side b)r0 ← P−1A-DEF1(b−Ax0)v0 ← r0/||r0||2for i = 0, . . . ,m − 1 do

    w ← P−1A-DEF1Avifor j = 0, . . . , i do

    hj,i ← 〈w , vj〉end forṽi+1 ← w −

    ∑ij=1 hj,i vj

    hi+1,i ← ||ṽi+1||2vi+1 ← ṽi+1/hi+1,iapply Givens rotations to h:,i

    end forym ← arg min||Hmym − βe1||2 with β = ||r0||2return x0 + Vmym

    end procedure

    global sparse matrix-vector multiplication

    global dot products

    global preconditioner-vector computation

    16 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    In practice

    A scalable DDM framework must include routines for:• p2p communications to compute spmv,• p2p communications to apply a one-level preconditioner,• global transfer between master and slaves processes toapply “coarse” corrections,

    • direct solvers for the “coarse” problem and local problems.

    17 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    In FreeFem++

    subdomain A(S, D, restrictionIntersection, arrayIntersection,communicator = mpiCommWorld); // build buffers for spmv

    2 preconditioner E(A); // build M−1RASif(CoarseOperator) {

    ...attachCoarseOperator(E, A, A = GEVPA, B = GEVPB, parameters

    = parm); // build P−1A-DEF16 }GMRESDDM(A, E, u[], rhs, dim = 100, iter = 100, eps = eps);

    18 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Nonlinear elasticity

    We solve an evolution problem of hyperelasticity on a beam,find u such that: ∀t ∈ [0; T ], ∀v ∈ [H1(Ω)]3 ,∫

    Ωρü · v +

    ∫Ω

    (I +∇u) Σ︸ ︷︷ ︸=:F(u)

    : ∇v =∫

    Ωf · v +

    ∫∂ΩS

    g · v

    u = 0 on ∂ΩDσ(u) · n = 0 on ∂ΩN

    whereΣ = λtr(E )I + 2µE

    19 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    In FreeFem++

    Add another operator rebuild (currently only rebuilds Ãii−1).

    The rest is (almost) the same.1 for(int k = 0; k < 200; ++k) {

    SchemeInit(u)if(k > 0) {

    S = vPbNonlinear(Wh, Wh, solver = CG);5 rhs = vPbLinear(0, Wh);

    rebuild(S, A, E);}GMRESDDM(A, E, h[], rhs, dim = 100, iter = 100, eps = eps);

    9 SchemeAdvance(u, h)}

    20 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Outline

    1 IntroductionFlashback to last yearA new machine

    2 New solversMUMPSPARDISO

    3 Domain decomposition methodsThe necessary toolsPreconditioning Krylov methodsNonlinear problems

    4 Conclusion

    21 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Amuses bouche before lunch I

    1 0242 048

    4 0966 144

    8 192

    1

    2

    4

    6

    8

    #processes

    Tim

    ingrelative

    to102

    4processes

    Linear speedup 10

    15

    20

    25

    #iterations

    Linear elasticity in 2D with P3 FE.1 billion unknowns solved in 31 seconds at peak performance.

    22 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Amuses bouche before lunch II

    1 0242 048

    4 0966 144

    8 192

    1

    2

    4

    6

    8

    #processes

    Tim

    ingrelative

    to102

    4processes

    Linear speedup 10

    15

    20

    25

    #iterations

    Linear elasticity in 3D with P2 FE.80 million unknowns solved in 35 seconds at peak performance.

    23 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Amuses bouche before lunch III

    1 0242 048

    4 0966 144

    8 19212 288

    0%

    20%

    40%

    60%

    80%

    100%

    #processes

    Weakeffi

    ciency

    relativeto

    1024processes

    844

    10 176

    #d.o.f.(inmillion

    s)

    Scalar diffusivity in 2D with P3 FE.10 billion unknowns solved in 160 seconds on 12 228 processes.

    24 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    Amuses bouche before lunch IV

    1 0242 048

    4 0966 144

    8 1920%

    20%

    40%

    60%

    80%

    100%

    #processes

    Weakeffi

    ciency

    relativeto

    1024processes

    130

    1 051

    #d.o.f.(inmillion

    s)

    Scalar diffusivity in 3D with P2 FE.1 billion unknowns solved in 150 seconds on 8 192 processes.

    25 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    To sum up

    1 FreeFem+++ = easy framework to solve large systems.

    Two-level DDM2 New solvers give performance boost on smaller systems.

    New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.

    Thank you for your attention.

    26 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    To sum up

    1 FreeFem+++ = easy framework to solve large systems.

    Two-level DDM2 New solvers give performance boost on smaller systems.

    New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.

    Thank you for your attention.

    26 / 26 Pierre Jolivet HPC and FreeFem++

  • Introduction New solvers Domain decomposition methods Conclusion

    To sum up

    1 FreeFem+++ = easy framework to solve large systems.

    Two-level DDM2 New solvers give performance boost on smaller systems.

    New problems being tackled:• more complex nonlinearities,• reuse of Krylov and deflation subspaces,• BNN and FETI methods.

    Thank you for your attention.26 / 26 Pierre Jolivet HPC and FreeFem++

    IntroductionFlashback to last yearA new machine

    New solversMUMPSPARDISO

    Domain decomposition methodsThe necessary toolsPreconditioning Krylov methodsNonlinear problems

    Conclusion