Multilevel Preconditioning Packageversion 3.1
Trilinos Users Group MeetingTrilinos Users Group MeetingNovember ’04November ’04
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
Marzio Sala
Outline
1. Basic concepts of multigrid/multilevel methods2. Configuring and building ML3. Using ML4. Interoperability with Trilinos packages5. Example of usage6. Numerical results7. Continuing Research8. Documentation, mailing lists, and getting help
ML Package
§ Provides parallel multilevel preconditioning for linear solver methods
§ Developers: – Ray Tuminaro, Jonathan Hu, Marzio Sala, Micheal Gee
§ Main methods– Geometric:
• Grid refinement hierarchy• 2-level FE basis function domain decomposition
– Algebraic (smoothed aggregation)• Classical smoothed aggregation• Extensions for non-symmetric systems• Edge-element for Maxwell’s equations
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Multilevel Preconditioners§ We are interesting in solving
A x = b, A nxn, x, b n
arising from FE discretizations on 2D/3D unstructured grids
§ The linear system is solved using Krylov accelerators (e.g., CG or GMRES)
§ A preconditioner is required:– Applicable in massively parallel environments– For large scale computations
§ Multilevel approaches furnish a very effective way to define scalable preconditioners
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Multilevel Preconditioners (2)
Main idea:1. Define a set of approximate solutions, on coarser
“grids” 2. Need a projection (P) and a restriction (R) to
move from one “grid” to the finer/coarser “grid”3. On each “grid”, approximate solvers (smoothers)
will dump the high frequencies of the error4. On the coarser “grid”, a direct solver is used
Two families of multilevel methods: geometric and algebraic methods
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Geometric Multigrid
§ Need to define a hierarchy of grids (and the corresponding restriction/prolongation operators)– 1D example:
§ However, for unstructured grids on complex geometries, the definition of the coarse grids can be problematic:– Computation of restriction/prolongator may be
computational expensive for completely uncorrelated grids
– Difficult to define boundary conditions on coarser grids
fine:
coarse:
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Algebraic Multigrid
§ Build multilevel operators (Ak, Pk, Rk, Sk’s) automatically to define a hierarchy:Akxk=bk, k=1,…,L
§ Once Pk’s are defined, the rest follows “easily”:q Rk = Pk
T (usually)q Ak-1 = Rk-1 Ak Pk-1 (triple matrix product) q Smoother (iterative method) Sk
+ Gauss-Seidel, polynomial, conjugate gradient, etc.
§ Smoothed aggreation (SA) algebraic multigrid– interpolation operators Pk’s are built automatically by aggregation of fine grid nodes– also main difficulty
Same Goal: Solve A x = b
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Smoothed Aggregation
§ Builds P by creating aggregates (set of contiguous nodes)
§ Works with structured and unstructured grids§ No need for nodal coordinates, based on matrix
entries (graph) only
Multilevel (V) Cycle
function xk = MLV(Ak,rk) if (k == 1) xk = Ak
-1 rk
else xk ← Spre,k(Ak,rk,0); qk ← rk – Ak xk
rk-1 ← Rk qk
xk-1 ← MLV(Ak-1,rk-1) xk ← xk + Pk xk-1
xk ← Spost,k(Ak,qk,xk); end
⇦ Direct (parallel) solver
⇦ presmoothing
⇦ postsmoothing
⇦ Ak-1 = Rk Ak Pk
⇦ Usually, RK = PKT
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Overview of ML capabilities
multilevel cyclingV, W, full V, full WAdditive and hybrid domain decomposition methods
Coarser level construction Uncoupled, MIS, METIS, ParMETIS, ZOLTAN
Smoothers
Jacobi, (symmetric) Gauss-Seidel, any AztecOO and IFPACK preconditioner (like ILU), SPAI, polynomial
Coarse solverSuperLU, SuperLU_DIST,Any Amesos solver
Kernels Sparse matrix-matrix
Other tools Visualization and analysis
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Common Decisions for ML Users
How to configure and build ML Choose among supported packagesMatrix and vector formats Aztec/ML/Epetra matrices
Coarsening strategy - How many levels?- Aggressive coarsening?
What smoother to use - # of smoothing steps- Smoothers can be different on each level- Finest level may require light-weight smoothers
Coarse solver - Serial, distributed- How many processors use for the coarse problem
Cycling strategy V, W, domain decomposition…Analyzing ML results Some visualization capabilities
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Configuring and Building ML
§ Builds by default when Trilinos is configured/built§ Configure help
$ Trilinos/packages/ml/configure --help§ ML interfaces with a variety of packages
– within Trilinos– outside Trilinos
§ Check out the supported packages you may need!
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Configuring and Building ML (2)
Configure option Used for--enable-epetra (def) Epetra_RowMatrix interface--enable-teuchos (def) Teuchos::ParameterList--enable-amesos Coarse Solver--enable-anasazi Eigenanalysis--enable-ifpack (def) Smoothers--enable-aztecoo (def) Examples--enable-triutils (def) Examples--with-ml_metis Local aggregation scheme--with-ml_parmetis3x Global aggregation scheme--with-ml_zoltan RCB partitioning--with-ml_superlu Coarse solver (serial)--with-ml_superludist Coarse solver (distributed)--with-ml_parasails SPAI smoother
Matrix Format
§ ML is not based on any matrix format, you just need– matvec() – getrow()
• returns the nonzero indices and values for any local row
§ Wrappers exist for– Aztec matrices (DMSR, DVBR)– Epetra matrices
• Any ML object can be (cheaply) wrapped as an Epetra object and vice-versa
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML Aggregation Choices
§ MIS– Expensive, usually best with many processors
§ Uncoupled (UC)– Cheap, usually works well.
§ Hybrid = UC + MIS– Uncoupled on fine grids, MIS on coarser grids
§ Local graph partitioning– At least one aggregate per processor– Requires METIS
§ Global graph partitioning– Aggregates can span processors– Requires ParMETIS
Fixe
d A
ggre
gate
size
Var
iabl
e A
ggre
gate
si
ze
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Large aggregates(METIS/ParMETIS)
Small aggregates(Uncoupled/MIS)
ML Aggregation Choices (2)
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
4 Aggregates on second levelFine level mesh: 32 x 32 nodes16 aggregates
ML Aggregation Choices: METIS
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML Aggregation Choices: Uncoupled/MIS
§ Uncoupled/MIS aggregation (Greedy algorithm)– parallel can be complicated
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML Smoother Choices
§ A smoother is an approximate solver that effectively dumps the high-frequencies of the error
Before smoothing
After smoothing
)( )(1)()1( iii AxbDxx −+= −+ ω
ML Smoother Choices (2)
§ Jacobi– Simplest, cheapest, usually least effective.
– Damping parameter (ω) needed
§ Point Gauss-Seidel– Equation satisfied one unknown at a time– Can be problematic in parallel (may need damping)
• processor-based (stale off-proc values)
§ Block Gauss-Seidel– Satisfy several equations simultaneously by modifying several
DOFs (inverting subblock associated with DOFs).– Blocks can correspond to aggregates
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML Smoother Choices (3)
§ AztecOO– Any Aztec preconditioner can be used as smoother. – Most interesting are ILU & ILUT: may be more robust than Gauss-
Seidel§ IFPACK:
– Block Jacobi and (symmetric) Gauss-Seidel– Incomplete factorizations– Exact LU solvers (through Amesos)
§ Hiptmair– Specialized 2-stage smoother for Maxwell’s Eqns.
§ MLS– Approximation to inverse based on Chebyshev polynomials of
smoothed operator.– In serial, competitive with true Gauss-Seidel– Doesn’t degrade with # processors (unlike processor-based GS)
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Coarsest-level Solvers
§ Any smoother can be used at the coarse level…
§ … but a direct method should be preferred:– Incomplete factorizations (via Aztec)– Serial or distributed SuperLU– Amesos interface:
• LAPACK, KLU, UMFPACK, SuperLU• SuperLU_DIST, MUMPS• Using Amesos with SuperLU_DIST or MUMPS, one can
specify the number of processors
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML Cycling Choices
§ “Multigrid” options:– V is default (usually works best)– W more expensive, may be more robust– Full MG (V cycle) more expensive (less
conventional within preconditioners)
§ “Domain decomposition” options (for 2-level methods):– Additive methods (requires no matvec in the
preconditioning step and only one application of the smoother)
– Various hybrid methods (generally more effective, but more expensive)
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
W Cycle
FMV Cycle
Multilevel Methods: Issues to be aware of
§ Severely stretched grids or anisotropies§ Loss of diagonal dominance§ Atypical stencils§ Jumps in material properties§ Non-symmetric matrices§ Boundary conditions§ Systems of PDEs§ Non-trivial null space (Maxwell’s equations, elasticity)
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
What can go wrong?
§ Small aggregates high complexity§ Large aggregates poor convergence§ Different size aggregates both
– Try different aggregation methods, drop tolerances
§ Stretched grids poor convergence§ Variable regions poor convergence
– Try different smoothers, drop tolerances
§ Ineffective smoothing poor convergence (perhaps due to non-diagonal dominance or non-symmetry in
operator)– Try different smoothers
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Things to Try if Multigrid Isn’t Working
§ Smoothers– Vary number of smoothing steps.– More `robust’ smoothers: block Gauss-Seidel, ILU, MLS polynomial
(especially if degradation in parallel).– Vary damping parameters (smaller is more conservative).
§ Try different aggregation schemes§ Try fewer levels§ Try drop tolerances, if …
– high complexity (printed out by ML).– Severely stretched grids– Anisotropic problems– Variable regions
§ Reduce prolongator damping parameter– Concerned about operators properties (e.g. highly non-symmetric).
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Analyzing ML matrices*** Analysis of ML_Operator `A matrix level 0' ***
Number of global rows = 256 Number of equations = 1 Number of stored elements = 1216 Number of nonzero elements = 1216 Mininum number of nonzero elements/row = 3 Maximum number of nonzero elements/row = 5 Average number of nonzero elements/rows = 4.750000 Nonzero elements in strict lower part = 480 Nonzero elements in strict upper part = 480 Max |i-j|, a(i,j) != 0 = 16 Number of diagonally dominant rows = 86 (= 33.59%) Number of weakly diagonally dominant rows = 67 (= 26.17%) Number of Dirichlet rows = 0 (= 0.00%) ||A||_F = 244.066240 Min_{i,j} ( a(i,j) ) = -14.950987 Max_{i,j} ( a(i,j) ) = 15.208792 Min_{i,j} ( abs(a(i,j)) ) = 0.002890 Max_{i,j} ( abs(a(i,j)) ) = 15.208792 Min_i ( abs(a(i,i)) ) = 2.004640 Max_i ( abs(a(i,i)) ) = 15.208792 Min_i ( \sum_{j!=i} abs(a(i,j)) ) = 2.004640 Max_i ( \sum_{j!=i} abs(a(i,j)) ) = 15.205902 max eig(A) (using power method) = 27.645954 max eig(D^{-1}A) (using power method) = 1.878674
Total time for analysis = 3.147979e-03 (s)
Analyzing ML Preconditioner*** ************************************************ ****** Analysis of the spectral properties using LAPACK ****** ************************************************ ****** Operator = A *** Computing eigenvalues of finest-level matrix *** using LAPACK. This may take some time... *** results are on file `eig_A.m'.
min |lambda_i(A)| = 0.155462 max |lambda_i(A)| = 26.3802 spectral condition number = 169.689
*** ************************************************ *** *** Analysis of the spectral properties using LAPACK *** *** ************************************************ *** *** Operator = P^{-1}A *** Computing eigenvalues of ML^{-1}A *** using LAPACK. This may take some time... *** results are on file `eig_PA.m'.
min |lambda_i(ML^{-1}A)| = 0.776591 max |lambda_i(ML^{-1}A)| = 1.09315 spectral condition number = 1.40762
ML Interoperability with Trilinos Packages
ML
Epetra
Accepts user data as Epetra objects
Can be wrapped as Epetra_Operator
TSF
TSF interface exists
Othermatvecs
Othersolvers
Accepts other solversand MatVecs
AmesosIFPACKAnasazi
Direct solvers, Smoothers, eigensolver
Aztecoo
Meros
Via Epetra & TSF
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML with Epetra
Two classes wrap ML preconditioners as Epetra preconditioners:
1) ML_Epetra::MultiLevelPreconditioner– A very easy, black-box interface to ML – Can be used as AztecOO preconditioner– All parameters specified using ParameterList– Still not ready for Maxwell users
2) ML_Epetra::MultilLevelOperator– Users need to build the preconditioner step-by-step– Ready for Maxwell users
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
ML with Epetra: Example of useTrilinos_Util_CrsMatrixGallery Gallery(“laplace_3d", Comm);Gallery.Set("problem_size", 100*100*100); // linear system matrix & linear problemEpetra_RowMatrix * A = Gallery.GetMatrix();Epetra_LinearProblem * Problem = Gallery.GetLinearProblem();
// Construct outer solver objectAztecOO solver(*Problem);solver.SetAztecOption(AZ_solver, AZ_cg);
// Set up multilevel precond. with smoothed aggr. defaultsParameterList MLList; // parameter list for ML optionsML_Epetra::SetDefaults(“SA”,MLList);MLList.set(“aggregation: type”, “METIS”);// create preconditioner ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(*A, MLList, true);
solver.SetPrecOperator(MLPrec); // set preconditionersolver.Iterate(500, 1e-12); // iterate at most 500 times
delete MLPrec;
Trilinos/packages/ml/examples/ml_example_MultiLevelPreconditioner.cpp
Triu
tils
Azt
ecO
OM
LA
ztec
OO
List.set(“max levels”, 4); List.set(“smoother (level 0)”, “Jacobi”); List.set(“damping factor”, 0.5); List.set(“smoother (level 1)”, “IFPACK”); List.set(“coarse: type”, “Amesos-MUMPS”); List.set(“coarse: max procs”, 4); List.set(“output level”, 10);
• 3D Thermal convection in a cube• Incompressible Navier-Stokes + energy • Smoother: processor-based ILU• Coarse solver: SuperLU• Sandia ASCI Red machine
Numerical Experiments: MPSalsa
Procs geom aggr
4 135 120
32 625 480
256 3645 2560
proc fine gridunknowns
avg its per time avg its per time avg its per time avg its per timeNewt step (sec) Newt step (sec) Newt step (sec) Newt step (sec)
4 24,565 28 106 20 88 33 78 30 71
32 179,685 32 163 34 140 44 94 50 109
256 1,373,125 35 273 38 187 47 219 58 152
Gauss-SeidelILUgeom aggr geom aggr
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Numerical Experiment: Zpinch simulation
procs # elements its complexity16 155,904 41 1.1450 492, 912 42 1.15
128 1,259,976 47 1.16240 ~ 2,400,000 52 1.18
CG preconditioned with V(1,1) AMG cycle
||r||2/||r0||2 < 10-8
2-stage Hiptmair smoother (4th order polynomials in each stage)Edge element interpolationPlatform : Sandia’s Cplant
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
On-going Research
§ Smoothed aggregation for non-symmetric systems§ Aggregation schemes for stretched elements§ New smoothers (via IFPACK)§ Compatible relaxation for smoothed aggregation§ Adaptive smoothed aggregation
– Starts with existing AMG method– Iteratively detects slow-to-converge modes (e.g., rigid
body modes)– Goal: Improve prolongator during simulation
§ Variable-block aggregation§ Nonlinear preconditioning
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Getting Help
§ See Trilinos/packages/ml/examples§ See guide:
– ML User’s Guide, ver. 3.1 (in ml/doc/mlguide.ps)– ML Developer’s Guide, ver. 3.1 (in ml/doc/DevelopersGuide)
§ Mailing lists– [email protected]– [email protected]
§ Bug reporting, enhancement requests via bugzilla:– http://software.sandia.gov/bugzilla
§ Email us directly– [email protected]– [email protected]– [email protected]– [email protected]
>overview >configuring and building >using ML >ML with Trilinos >example >numerical results >doc and help
Top Related