A Hybrid Parallel Iterative Solver for Indefinite Systems ... · Chart 8 > MC-CARP-CG > J. Thies et...

22

Transcript of A Hybrid Parallel Iterative Solver for Indefinite Systems ... · Chart 8 > MC-CARP-CG > J. Thies et...

  • www.DLR.de • Chart 1 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    A Hybrid Parallel Iterative Solver forInde�nite Systems in Interior EigenvalueComputations

    Jonas Thies1 Lukas Krämer2 and Andreas Pieper3

    PMAA, Lugano, July 4, 2014

    1German Aerospace Center (DLR)Simulation and Software [email protected]

    2University of WuppertalApplied Computer [email protected]

    3University of GreifswaldTheoretical [email protected]

  • www.DLR.de • Chart 2 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Outline

    Graphene simulation and the FEAST method

    The CGMN algorithm

    Parallelization

    Experiments

  • www.DLR.de • Chart 3 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Graphene simulation and the FEASTmethod

  • www.DLR.de • Chart 4 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Graphene

    a1

    a

    a2

    Physical space: carbon atoms in2D hexagonal mesh

    Fourier space (`reciprocal mesh')

    Tight-binding Hamiltonian

    H = −t∑〈ij〉

    (c†i cj + c†j ci )

  • www.DLR.de • Chart 5 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Graphene (2)

    −3 −2 −1 0 1 2 3−2

    −1 0

    1 2

    −3

    −2

    −1

    0

    1

    2

    3

    E /

    t

    kx / a

    ky / a

    E /

    t

    KK’

    K

    −0.1 0

    0.1−0.1

    0

    0.1−0.2

    0

    0.2

    E /

    t

    (kx − Kx) / a(ky − Ky) / a

    E /

    t Analytical solution for in�nite Graphene sheet

    Dirac cones: graphene between conducter and semi-conducter

  • www.DLR.de • Chart 6 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Graphene modelling

    disorder

    long range stencil

    bilayer

    gate-de�ned quantum dots

    spin-orbit coupling

    ...

    Long range Hamiltonian:

    H =∑

    i

    Vic†i ci−t

    ∑〈ij〉

    (c†i cj + c†j ci )−t

    ′∑〈〈ij〉〉

    (c†i cj + c†j ci )−t

    ′′∑

    〈〈〈ij〉〉〉

    (c†i cj + c†j ci )

  • www.DLR.de • Chart 7 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    The FEAST algorithm at a glance

    Input: Iλ :=[λ, λ

    ], an estimate m̃ of the number of eigenvalues in Iλ.

    Output m̂ ≤ m̃ eigenpairs with eigenvalue in Iλ.Perform:

    1 Choose Y ∈ Cn×m̃ of full rank and computeU :=

    1

    2πi

    ∫C(zB− A)−1Bdz Y,

    2 Form AU := U?AU, BU := U

    ?BU,

    3 Solve size-m̃ eigenproblem AUW̃ = BUW̃�̃,

    4 Compute (�̃, X̃ := U · W̃),5 If no convergence: go to Step 1 with Y := X̃.

  • www.DLR.de • Chart 8 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Linear systems for FEAST/graphene

    Tough:

    very large (N = 108 − 1014) complex symmetric and completely inde�nite

    random numbers on and around the diagonal

    spectrum essentially continuous

    shifts get very close to the spectrum

    But also nice in some ways:

    2D mesh, very sparse (∼ 10 entries/row) multiple RHS/shift (block methods, recycling, ...)

    We need O(100) Eigenpairs =⇒ very coputationally heavy...

  • www.DLR.de • Chart 9 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    The CGMN algorithm

  • www.DLR.de • Chart 10 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    An ancient row projection method

    Björck and Elfving, 1979

    CG on the `minimum norm' problem, AAT x = b

    preconditioned by SSOR

    e�cient row-wise formulation

    extremely robust: A may be singular, non-square etc.

    row scaling aleviates issue of `squared condition number'

  • www.DLR.de • Chart 11 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Kernel operation: KACZ sweep

    Interpretations:

    Kaczmarz algorithm

    SOR(ω) on the normal equations AAT x = b

    successive projections onto the hyperplanesde�ned by the rows of AIn CRS (rptr,val,col):

    1: compute nrms=||ai,:||222: for (i=0; i

  • www.DLR.de • Chart 12 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Parallelization

  • www.DLR.de • Chart 13 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Multi-Coloring (MC)

    requires �distance 2� coloring

    software: ColPackhttp://cscapes.cs.purdue.edu/coloringpage/software.htm

    http://cscapes.cs.purdue.edu/coloringpage/software.htm

  • www.DLR.de • Chart 14 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Component-Averaged Row Projection (CARP)

    Gordon & Gordon, 2005

    Kaczmarz locally

    write to halo

    exchange and average

    Equivalent to Kaczmarz on a superspace of Rn

  • www.DLR.de • Chart 15 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Hybrid method: MC_CARP-CG

    global MC would require...

    an extremely scalable coloring method very well-balanced colors many global sync-points (> 20 colors in our examples)

    global CARP would require...

    huge number of MPI procs increasing amount of `interior halo elements' non-trivial implemention on GPU and Xeon Phi increasing number of iterations

    Idea: node-local MC with MPI-based CARP between the nodes

  • www.DLR.de • Chart 16 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Experiments

  • www.DLR.de • Chart 17 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Experimental setup

    Machine: Intel Xeon �Ivy Bridge�

    10 cores/socket, 2 sockets/node

    In�niBand between nodes

    Here's what we do:

    pick some shifts that may occur in FEAST

    handle 8 rhs at once (for good performance)

    conv tol 10−12

    solve linear systems using CGMN variants

  • www.DLR.de • Chart 18 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Sequential CGMN for various shifts

    0

    0.2

    0.4

    0.6

    0.8

    -1 -0.5 0 0.5 1

    imag

    part

    real part

    10−12

    10−10

    10−8

    10−6

    10−4

    10−2

    100

    50 100 150 200 250 300 350 400

    ||r|| 2/||r

    0|| 2

    iteration

  • www.DLR.de • Chart 19 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Coloring vs. CARP: single socket (10242 dof)

    0

    100

    200

    300

    400

    500

    1 2 3 4 5 6 7 8

    iterations

    chosen shift

    MCLEXCA 2CA 4CA 8CA 10

    0

    200

    400

    600

    800

    1000

    1200

    1400

    1 2 4 8 10

    runtimeforallshifts

    cores

    CARP

    Multi-Coloring

  • www.DLR.de • Chart 20 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    Weak scaling of Hybrid vs. CARP (40962 dof/node)

    0

    1000

    2000

    3000

    4000

    5000

    6000

    20 80 160 320

    timefor�rstshift

    cores

    MC-CARP

    CARP

  • www.DLR.de • Chart 21 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    The (almost) �nal slide

    Graphene gives nice and challenging test cases for Lin. Alg.

    FEAST requires fast linear solvers for indef. systems

    row projection methods work very well here

    hybrid is a natural choice here - and works

    Future work:

    integration in FEAST loop

    stencil-based implementation

    GPU and Xeon PhiAcknoledgement: DFG SPPEXA project ESSEX

    (Equipping Sparse Solvers for the EXa-scale)

  • www.DLR.de • Chart 22 > MC-CARP-CG > J. Thies et al. • > PMAA'14

    References

    Pieper et. al.: E�ects of disorder and contacts on transport throughgraphene nanoribbonsPhys. Rev. B 88, 195409, (2013).

    Polizzi: Density-Matrix-Based Algorithms for Solving EingenvalueProblems. Phys. Rev. B. 79 - 115112 (2009)

    Björck & Elfving: Accelerated projection methods for computingpseudoinverse solutions of systems of linear equations.BIT 19, pages 145�163, 1979.

    Gordon & Gordon: Component-averaged row projections: A robust,block-parallel scheme for sparse linear systems.SISC 27 (3), 2005, pages 1092�1117.