8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National...
-
Upload
malcolm-lamb -
Category
Documents
-
view
224 -
download
2
Transcript of 8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National...
8/22/2006
1
Trilinos TutorialOverview and Basic Concepts
Michael A. HerouxSandia National Laboratories
ACTS Toolkit Workshop 2006
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
8/22/2006
2
Trilinos Development TeamRoss Bartlett Lead Developer of ThyraDeveloper of Rythmos
Paul Boggs Developer of Thyra
Todd CoffeyLead Developer of Rythmos
Jason Cross Developer of Jpetra
David DayDeveloper of Komplex
Clark DohrmannDeveloper of CLAPS
Michael Gee Developer of ML, NOX
Bob Heaphy Lead developer of Trilinos SQA
Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, TpetraDeveloper of Amesos, Belos, EpetraExt, Jpetra
Ulrich HetmaniukDeveloper of Anasazi
Robert Hoekstra Lead Developer of EpetraExtDeveloper of Epetra, Thyra, Tpetra
Russell HooperDeveloper of NOX
Vicki Howle Lead Developer of MerosDeveloper of Belos and Thyra
Jonathan Hu Developer of ML
Sarah KnepperDeveloper of Komplex
Tammy Kolda Lead Developer of NOX
Joe KotulskiLead Developer of Pliris
Rich Lehoucq Developer of Anasazi and Belos
Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos
Roger Pawlowski Lead Developer of NOX
Michael Phenow Trilinos Webmaster Lead Developer of New_Package
Eric Phipps Developer of LOCA and NOX
Marzio SalaLead Developer of Didasko and IFPACKDeveloper of ML, Amesos
Andrew Salinger Lead Developer of LOCA
Paul Sexton Developer of Epetra and Tpetra
Bill SpotzLead Developer of PyTrilinosDeveloper of Epetra, New_Package
Ken Stanley Lead Developer of Amesos and New_Package
Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos
Ray Tuminaro Lead Developer of ML and Meros
Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager
Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
8/22/2006
3
Outline of Talk
Background/Motivation. Trilinos Package Concepts. ANAs, LALs and APPs. Managing Parallel Data: Petra Object Model. Whirlwind Tour of Some Trilinos Packages. Generic Programming. SQA/SQE Issues. Concluding remarks.
8/22/2006
4
Target Problems: PDES and more…
PDES
Circuits
InhomogeneousFluids
And More…
8/22/2006
5
Motivation For Trilinos Sandia does LOTS of solver work. When I started at Sandia in May 1998:
Aztec was a mature package. Used in many codes. FETI, PETSc, DSCPack, Spooles, ARPACK, DASPK, and many
other codes were (and are) in use. New projects were underway or planned in multi-level
preconditioners, eigensolvers, non-linear solvers, etc… The challenges:
Little or no coordination was in place to:• Efficiently reuse existing solver technology.• Leverage new development across various projects.• Support solver software processes.• Provide consistent solver APIs for applications.
ASCI was forming software quality assurance/engineering (SQA/SQE) requirements:
• Daunting requirements for any single solver effort to address alone.
8/22/2006
6
Evolving Trilinos Solution Trilinos1 is an evolving framework to address these challenges:
Fundamental atomic unit is a package. Includes core set of vector, graph and matrix classes (Epetra/Tpetra packages). Provides a common abstract solver API (Thyra package). Provides a ready-made package infrastructure (new_package package):
• Source code management (cvs, bonsai).• Build tools (autotools).• Automated regression testing (queue directories within repository).• Communication tools (mailman mail lists).
Specifies requirements and suggested practices for package SQA. In general allows us to categorize efforts:
Efforts best done at the Trilinos level (useful to most or all packages). Efforts best done at a package level (peculiar or important to a package). Allows package developers to focus only on things that are unique to
their package.
1. Trilinos loose translation: “A string of pearls”
Full “Vertical” Solver Coverage Trilinos Packages
· Transient Problems:
· DAEs/ODEs: Rythmos0 0
Solve ( ( ), ( ), ) 0
[0, ], (0) , (0)
for ( ) , [0, ]n
f x t x t t
t T x x x x
x t t T
· Nonlinear Problems:
· Nonlinear equations:
· Stability analysis:
NOX
LOCA
Given nonlinear op ( , )
Solve ( ) 0 for
For ( , ) 0 find space singular
n m n
n
c x u
c x x
cc x u u U
x
· Explicit Linear Problems:
· Matrix/graph equations:
· Vector problems:Epetra, Tpetra
Compute ; ( ); ,
Compute ; , ; ,
m n m n
n
y Ax A A G A G
y x w x y x y
Anasazi
· Implicit Linear Problems:
· Linear equations:
· Eigen problems:
AztecOO, Belos, Ifpack,ML,etc.
Given linear ops (matrices) ,
Solve for
Solve for (all) ,
n n
n
n
A B
Ax b x
Av Bv v
· Optimization Problems:
· Unconstrained:
· Constrained:MOOCHO
Find that minimizes ( )
Find and that
minimizes ( , ) s.t. ( , ) 0
n
m n
u f u
y u
f y u c y u
Trilinos StatisticsTrilinos Statistics by Release
19
13
8
2.23
2.70
5
22
22
22
16
5.48
4.40
9
27
26
26
17
7.16
7.36
11
30
29
27
18
8.95
14.71
19
33
0 10 20 30 40
Packages inrepository
Limitedrelease
packages
Generalrelease
packages
Source lines(100K)
Dow nloads(100s)
AutomatedRegression
Testedpackages
Developers
Counts
Release 6.0 (9/05)
Release 5.0 (3/05)
Release 4.0 (6/04)
Release 3.1 (9/03)Registered Users by Type (1045 Total)
University, 568
Government, 239
Personal, 114
Industry, 99
Other, 25
Stats: Trilinos Download Page 8/18/2006.
Registered Users by Region (1045 Total)
359
319
176
126
42
13
10
Europe
US (except Sandia)
Sandia (includesunregistered)
Asia
Americas (except US)
Australia/NZ
Africa
8/22/2006
9
Trilinos Package Concepts
Package: The Atomic Unit
8/22/2006
10
Trilinos Packages Trilinos is a collection of Packages. Each package is:
Focused on important, state-of-the-art algorithms in its problem regime.
Developed by a small team of domain experts. Self-contained: No explicit dependencies on any other software
packages (with some special exceptions). Configurable/buildable/documented on its own.
Sample packages: NOX, AztecOO, ML, IFPACK, Meros. Special package collections:
Petra (Epetra, Tpetra, Jpetra): Concrete Data Objects Thyra: Abstract Conceptual Interfaces Teuchos: Common Tools. New_package: Jumpstart prototype.
8/22/2006
11
Greek Names
Objective Package(s)
Linear algebra objects Epetra, Jpetra, Tpetra
Krylov solvers AztecOO, Belos, Komplex
ILU-type preconditioners AztecOO, IFPACK
Multilevel preconditioners ML, CLAPS
Eigenvalue problems Anasazi
Block preconditioners Meros
Direct sparse linear solvers Amesos
Direct dense solvers Epetra, Teuchos, Pliris
Abstract interfaces Thyra
Nonlinear system solvers NOX, LOCA
Time Integrators/DAEs Rythmos
C++ utilities, (some) I/O Teuchos, EpetraExt, Kokkos
Trilinos Tutorial Didasko
“Skins” PyTrilinos, WebTrilinos, Star-P, Stratimikos, ForTrilinos
Optimization (SAND) MOOCHO
Archetype package NewPackage
Other new in 7.0 release Galeri, Isorropia, Moertel, RTOp
Basic Linear Algebra classes
Block Krylov Methods
ScalablePreconditioners3rd Party Direct SolverPackage.
Abstract Interfaces.
Trilinos Package Summary
8/22/2006
13
Why Packages?
Let’s Do Some Matrix Analysis!
8/22/2006
14Trilinos Development Team (Again)Ross Bartlett Lead Developer of ThyraDeveloper of Rythmos
Paul Boggs Developer of Thyra
Todd CoffeyLead Developer of Rythmos
Jason Cross Developer of Jpetra
David DayDeveloper of Komplex
Clark DohrmannDeveloper of CLAPS
Michael Gee Developer of ML, NOX
Bob Heaphy Lead developer of Trilinos SQA
Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, TpetraDeveloper of Amesos, Belos, EpetraExt, Jpetra
Ulrich HetmaniukDeveloper of Anasazi
Robert Hoekstra Lead Developer of EpetraExtDeveloper of Epetra, Thyra, Tpetra
Russell HooperDeveloper of NOX
Vicki Howle Lead Developer of MerosDeveloper of Belos and Thyra
Jonathan Hu Developer of ML
Sarah KnepperDeveloper of Komplex
Tammy Kolda Lead Developer of NOX
Joe KotulskiLead Developer of Pliris
Rich Lehoucq Developer of Anasazi and Belos
Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos
Roger Pawlowski Lead Developer of NOX
Michael Phenow Trilinos Webmaster Lead Developer of New_Package
Eric Phipps Developer of LOCA and NOX
Marzio SalaLead Developer of Didasko and IFPACKDeveloper of ML, Amesos
Andrew Salinger Lead Developer of LOCA
Paul Sexton Developer of Epetra and Tpetra
Bill SpotzLead Developer of PyTrilinosDeveloper of Epetra, New_Package
Ken Stanley Lead Developer of Amesos and New_Package
Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos
Ray Tuminaro Lead Developer of ML and Meros
Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager
Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
8/22/2006
15
Developer and Package Node IDsDeveloper IDs
RossBartlett = 1;PaulBoggs = 2;ToddCoffey = 3;JasonCross = 4;DavidDay = 5;ClarkDohrmann= 6;MichaelGee = 7;BobHeaphy = 8;MikeHeroux = 9;UlrichHetmaniuk = 10;RobHoekstra = 11;RussellHooper = 12;VickiHowle = 13;JonathanHu = 14;SarahKnepper = 15;TammyKolda = 16;JoeKotulski = 17;RichLehoucq = 18;KevinLong = 19;RogerPawlowski = 20;MichaelPhenow = 21;EricPhipps = 22;MarzioSala = 23;AndrewSalinger = 24;PaulSexton = 25;BillSpotz = 26;KenStanley = 27;HeidiThornquist = 28;RayTuminaro = 29;JimWillenbring = 30;AlanWilliams = 31;
Package IDs
PyTrilinos = 1;amesos = 2;anasazi = 3;aztecoo = 4;belos = 5;claps = 6;didasko = 7;epetra = 8;epetraext = 9;ifpack = 10;jpetra = 11;kokkos = 12;komplex = 13;meros = 14;loca = 15;ml = 16;new_package = 17;nox = 18;pliris = 19;rythmos = 20;teuchos = 21;thyra = 22;tpetra = 23;trilinosframework = 24;
8/22/2006
16
Developer-Package EdgesA(i,j) = 1 if developer i contributes to package j
A = sparse(31,24);
A(RossBartlett,rythmos) = 1;A(RossBartlett,thyra) = 1;A(PaulBoggs,thyra) = 1;A(ToddCoffey,rythmos) = 1;A(JasonCross,jpetra) = 1;A(DavidDay,komplex) = 1;A(ClarkDohrmann,claps) = 1;A(MichaelGee,ml) = 1;A(MichaelGee,nox) = 1;A(BobHeaphy,trilinosframework) = 1;A(MikeHeroux,trilinosframework) = 1;A(MikeHeroux,epetra) = 1;A(MikeHeroux,aztecoo) = 1;A(MikeHeroux,kokkos) = 1;A(MikeHeroux,komplex) = 1;A(MikeHeroux,ifpack) = 1;A(MikeHeroux,thyra) = 1;A(MikeHeroux,tpetra) = 1;A(MikeHeroux,amesos) = 1;A(MikeHeroux,belos) = 1;A(MikeHeroux,epetraext) = 1;A(MikeHeroux,jpetra) = 1;A(UlrichHetmaniuk,anasazi) = 1;
A(MarzioSala,didasko) = 1;A(MarzioSala,ifpack) = 1;A(MarzioSala,ml) = 1;A(MarzioSala,amesos) = 1;A(AndrewSalinger,loca) = 1;A(PaulSexton,epetra) = 1;A(PaulSexton,tpetra) = 1;A(BillSpotz,PyTrilinos) = 1;A(BillSpotz,epetra) = 1;A(BillSpotz,new_package) = 1;A(KenStanley,amesos) = 1;A(KenStanley,new_package) = 1;A(HeidiThornquist,anasazi) = 1;A(HeidiThornquist,belos) = 1;A(HeidiThornquist,teuchos) = 1;A(RayTuminaro,ml) = 1;A(RayTuminaro,meros) = 1;A(JimWillenbring,epetra) = 1;A(JimWillenbring,new_package) = 1;A(JimWillenbring,trilinosframework) = 1;A(AlanWilliams,epetra) = 1;A(AlanWilliams,epetraext) = 1;A(AlanWilliams,aztecoo) = 1;A(AlanWilliams,tpetra) = 1;
A(RobHoekstra,epetra) = 1;A(RobHoekstra,thyra) = 1;A(RobHoekstra,tpetra) = 1;A(RobHoekstra,epetraext) = 1;A(RussellHooper,nox) = 1;A(VickiHowle,meros) = 1;A(VickiHowle,belos) = 1;A(VickiHowle,thyra) = 1;A(JonathanHu,ml) = 1;A(SarahKnepper,komplex) = 1;A(TammyKolda,nox) = 1;A(TammyKolda,trilinosframework) = 1;A(JoeKotulski,pliris) = 1;A(RichLehoucq,anasazi) = 1;A(RichLehoucq,belos) = 1;A(KevinLong,thyra) = 1;A(KevinLong,belos) = 1;A(KevinLong,teuchos) = 1;A(RogerPawlowski,nox) = 1;A(MichaelPhenow,trilinosframework) = 1;A(MichaelPhenow,trilinosframework) = 1;A(EricPhipps,loca) = 1;A(EricPhipps,nox) = 1;
8/22/2006
17
Developer-Package Matrix
Number of developers per package: Maximum: max(sum(A)) = 6 Average: sum(sum(A))/24 =
2.875
Number of package affiliations per developer: Maximum: max(sum(A’)) = 12 (me).
• Minus outlier: = 4.
Average: sum(sum(A’))/31 = 2.26
Observations: Small package teams. Multiple packages per developer.
spy(A);
8/22/2006
18
Developer-Developer Matrix (A*A’)
B = A*A’ Apply Symmetric AMD
reordering. Two developers connected if co-
authors of a package. Only two developers
“disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris.
Komplex Subteam
ML Subteam Anasazi Subteam
Epetra Subteam
NOX Subteam
Thyra Subteam
8/22/2006
19
Package-Package Matrix (A’*A)(Not sure what to say about this)
B = A’*A, Apply symamd. Two packages connected if they share
a developer. Only two packages “disconnected”:
• Clark Dohrmann: CLAPS.
• Joe Kotulski: Pliris.
Me
Epetra, Amesos, Didasko, Ifpack
Without Me
Meros, AztecOO, Tpetra, EpetraExt
8/22/2006
20
Why Packages?
Partial Answer: Small Inter-related teams.
8/22/2006
21
Package Interoperability
8/22/2006
22
Interoperability vs. Dependence (“Can Use”) (“Depends On”)
Although most Trilinos packages have no explicit dependence, each package must interact with some other packages: NOX needs operator, vector and solver objects. AztecOO needs preconditioner, matrix, operator and vector objects. Interoperability is enabled at configure time. For example, NOX:
--enable-nox-lapack compile NOX lapack interface libraries
--enable-nox-epetra compile NOX epetra interface libraries
--enable-nox-petsc compile NOX petsc interface libraries
Trilinos configure script is vehicle for: Establishing interoperability of Trilinos components… Without compromising individual package autonomy.
Trilinos offers seven basic interoperability mechanisms.
../configure –enable-python
8/22/2006
23
Trilinos Interoperability Mechanisms(Acquired as Package Matures)
Package builds under Trilinos configure scripts.
Package can be built as part of a suite of packages; cross-package interfaces enable/disable automatically
Package accepts user data as Epetra or Thyra objects
Applications using Epetra/Thyra can use package
Package accepts parameters from Teuchos ParameterLists
Applications using Teuchos ParameterLists can drive package
Package can be used via Thyra abstract solver classes
Applications or other packages using Thyra can use package
Package can use Epetra for private data.
Package can then use other packages that understand Epetra
Package accesses solver services via Thyra interfaces
Package can then use other packages that implement Thyra interfaces
Package available via PyTrilinos Package can be used with other Trilinos packages via Python.
8/22/2006
24
“Can Use” vs. “Depends On”“Can Use” Interoperable without dependence. Dense is Good. Encouraged.
“Depends On” OK, if essential. Epetra: 9 clients. Thyra, Teuchos, NOX: 2 clients. Discouraged.
8/22/2006
25
Interoperability Example: ML
ML: Multi-level Preconditioner Package. Primary Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala. No explicit, essential dependence on other Trilinos packages.
Uses abstract interfaces to matrix/operator objects. Has independent configure/build process (but can be invoked at Trilinos level).
Interoperable with other Trilinos packages and other libraries: Accepts user data as Epetra matrices/vectors. Can use Epetra for internal matrices/vectors. Can be used via Thyra abstract interfaces. Can be built via Trilinos configure/build process. Available as preconditioner to all other Trilinos packages. Can use IFPACK, Amesos, AztecOO objects as smoothers, coarse solvers. Can be driven via Teuchos ParameterLists. Available to PETSc users without dependence on any other Trilinos packages.
8/22/2006
26
Package Maturation Process
Asynchronicity
8/22/2006
27
Day 1 of Package Life CVS: Each package is self-contained in Trilinos/package/ directory. Bugzilla: Each package has its own Bugzilla product. Bonsai: Each package is browsable via Bonsai interface. Mailman: Each Trilinos package, including Trilinos itself, has four mail
lists: [email protected]
• CVS commit emails. “Finger on the pulse” list. [email protected]
• Mailing list for developers. [email protected]
• Issues for package users. [email protected]
• Releases and other announcements specific to the package.
New_package (optional): Customizable boilerplate for Autoconf/Automake/Doxygen/Python/Thyra/Epetra/TestHarness/Website
8/22/2006
28
Sample Package Maturation ProcessStep Example
Package added to CVS: Import existing code or start with new_package.
ML CVS repository migrated into Trilinos (July 2002).
Mail lists, Bugzilla Product, Bonsai database created.
ml-announce, ml-users, ml-developers, ml-checkins, ml-regression @software.sandia.gov created, linked to CVS (July 2002).
Package builds with configure/make, Trilinos-compatible
ML adopts Autoconf, Automake starting from new_package (June 2003).
Epetra objects recognized by package. ML accepts user data as Epetra matrices and vectors (October 2002).
Package accessible via Thyra interfaces. ML adaptors written for TSFCore_LinOp (Thyra) interface (May 2003).
Package uses Epetra for internal data. ML able to generate Epetra matrices. Allows use of AztecOO, Amesos, Ifpack, etc. as smoothers and coarse grid solvers (Feb-June 2004).
Package parameters settable via Teuchos ParameterList
ML gets manager class, driven via ParameterLists (June 2004).
Package usable from Python (PyTrilinos) ML Python wrappers written using new_package template (April 2005).
Startup Steps Maturation Steps
8/22/2006
29
Maturation Jumpstart: NewPackage NewPackage provides jump start to develop/integrate a new
package NewPackage is a “Hello World” program and website:
Simple but it does work with autotools. Compiles and builds.
NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working Autoconf/Automake files. Documentation templates (doxygen). Working regression test setup. Working Python and Thyra adaptors.
Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website.
NOTE: NewPackage can be used independent from Trilinos
8/22/2006
30
What Trilinos is not Trilinos is not a single monolithic piece of software. Each package:
Can be built independent of Trilinos. Has its own self-contained CVS structure. Has its own Bugzilla product and mail lists. Development team is free to make its own decisions about algorithms, coding
style, release contents, testing process, etc.
Trilinos top layer is not a large amount of source code: Trilinos repository (6.0 branch) contains: 660,378 source lines of code
(SLOC). Sum of the packages SLOC counts : 648,993. Trilinos top layer SLOC count: 11,385 (1.7%).
Trilinos is not “indivisible”: You don’t need all of Trilinos to get things done. Any collection of packages can be combined and distributed. Current public release contains only 16 of the 20+ Trilinos packages.
8/22/2006
31
Insight from HistoryA Philosophy for Future Directions
In the early 1800’s U.S. had many new territories. Question: How to incorporate into U.S.?
Colonies? No. Expand boundaries of existing states? No. Create process for self-governing regions. Yes. Theme: Local control drawing on national resources.
Trilinos package architecture has some similarities: Asynchronous maturation. Packages decide degree of interoperations, use of Trilinos
facilities.
Strength of each: Scalable growth with local control.
8/22/2006
32
Solver Collaboration: The Big Picture
8/22/2006
33
ANA Linear Operator Interface
Solver Software Components and Interfaces
2) LAL : Linear Algebra Library (e.g. vectors, sparse matrices, sparse factorizations, preconditioners)
ANA
APP
ANA/APP Interface
ANA Vector Interface
1) ANA : Abstract Numerical Algorithm (e.g. linear solvers, eigen solvers, nonlinear solvers, stability analysis, uncertainty quantification, transient solvers, optimization etc.)
3) APP : Application (the model: physics, discretization method etc.)
Example Trilinos Packages:• Belos (linear solvers)• Anasazi (eigen solvers)• NOX (nonlinear equations)• Rhythmos (ODEs,DAEs)• MOOCHO (Optimization)• …
Example Trilinos Packages:• Epetra/Tpetra (Mat,Vec)• Ifpack, AztecOO, ML (Preconditioners)• Meros (Preconditioners)• Pliris (Interface to direct solvers)• Amesos (Direct solvers)• Komplex (Complex/Real forms)• …Types of Software Components
ThyraANA Interfaces to Linear Algebra
FEI/ThyraAPP to LAL Interfaces Custom/Thyra
LAL to LAL Interfaces
Thyra::Nonlin
Examples:• SIERRA• NEVADA• Xyce• Sundance• …
LAL
Matrix Preconditioner
Vector
8/22/2006
34
LAL Foundation: Petra Petra provides a “common language” for distributed
linear algebra objects (operator, matrix, vector)
Petra provides distributed matrix and vector services. Has 3 implementations under development.
8/22/2006
35
Perform redistribution of distributed objects:• Parallel permutations.• “Ghosting” of values for local computations.• Collection of partial results from remote processors.
Petra Object Model
Abstract Interface to Parallel Machine• Shameless mimic of MPI interface.• Keeps MPI dependence to a single class (through all of Trilinos!).• Allow trivial serial implementation.• Opens door to novel parallel libraries (shmem, UPC, etc…)
Abstract Interface for Sparse All-to-All Communication• Supports construction of pre-recorded “plan” for data-driven communications.• Examples:
• Supports gathering/scatter of off-processor x/y values when computing y = Ax.• Gathering overlap rows for Overlapping Schwarz.• Redistribution of matrices, vectors, etc…
Describes layout of distributed objects:• Vectors: Number of vector entries on each processor and global ID• Matrices/graphs: Rows/Columns managed by a processor.• Called “Maps” in Epetra.
Dense Distributed Vector and Matrices:• Simple local data structure.• BLAS-able, LAPACK-able.• Ghostable, redistributable.• RTOp-able.
Base Class for All Distributed Objects:• Performs all communication.• Requires Check, Pack, Unpack methods from derived class.
Graph class for structure-only computations:• Reusable matrix structure.• Pattern-based preconditioners.• Pattern-based load balancing tools. Basic sparse matrix class:
• Flexible construction process.• Arbitrary entry placement on parallel machine.
8/22/2006
36
Petra Implementations
Three version under development: Epetra (Essential Petra):
Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users.
Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency.
Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.
8/22/2006
37
Data Model Wrap-Up
Our target apps require flexible data model. Petra Object Model (POM) supports:
Arbitrary placement of vector, graph, matrix entries on parallel machine.
Arbitrary redistribution, ghosting and collection of distributed data.
This flexibility is needed by LALs: Algebraic and multi-level preconditioners. Concrete distributed matrix kernels. Direct methods.
POM is complex: Non-LALs (ANAs) do not rely on it.
8/22/2006
38
Abstract Numerical Algorithms (ANAs)
8/22/2006
39Full “Vertical” Solver Coverage Trilinos Packages
· Transient Problems:
· DAEs/ODEs: Rythmos0 0
Solve ( ( ), ( ), ) 0
[0, ], (0) , (0)
for ( ) , [0, ]n
f x t x t t
t T x x x x
x t t T
· Nonlinear Problems:
· Nonlinear equations:
· Stability analysis:
NOX
LOCA
Given nonlinear op ( , )
Solve ( ) 0 for
For ( , ) 0 find space singular
n m n
n
c x u
c x x
cc x u u U
x
· Explicit Linear Problems:
· Matrix/graph equations:
· Vector problems:Epetra, Tpetra
Compute ; ( ); ,
Compute ; , ; ,
m n m n
n
y Ax A A G A G
y x w x y x y
Anasazi
· Implicit Linear Problems:
· Linear equations:
· Eigen problems:
AztecOO, Belos, Ifpack,ML,etc.
Given linear ops (matrices) ,
Solve for
Solve for (all) ,
n n
n
n
A B
Ax b x
Av Bv v
· Optimization Problems:
· Unconstrained:
· Constrained:MOOCHO
Find that minimizes ( )
Find and that
minimizes ( , ) s.t. ( , ) 0
n
m n
u f u
y u
f y u c y u
8/22/2006
40
Goal of Abstraction: Flexibility
Linear solvers (Direct, Iterative, Preconditioners) Abstraction of basic vector/matrix operations (dot, axpy, mv). Can use any concrete linear algebra library (Epetra, PETSc, BLAS).
Nonlinear solvers (Newton, etc.) + Abstraction of linear solve (solve Ax=b). Can use any concrete linear solver library (AztecOO, ML, PETSc, LAPACK).
Transient/DAE solvers (implicit) + Abstraction of nonlinear solve. … and so on.
But how do we really make it work, and retain high performance? Our answer: Thyra.
Introducing Abstract Numerical Algorithms
An ANA is a numerical algorithm that can be expressed abstractly solely in terms of vectors, vector spaces, and linear operators (i.e. not with direct element access)
Example Linear ANA (LANA) : Linear Conjugate Gradients
scalar product<x,y> defined by vector space
vector-vector operations
linear operator applications
Scalar operations
Types of operations Types of objects
What is an abstract numerical algorithm (ANA)?
Linear Conjugate Gradient Algorithm
8/22/2006
42Examples of Nonlinear Abstract Numerical Algorithms
Class of Numerical Problem Example ANA
Nonlinear equations Newton’s method (e.g. NOX, MOOCHO)
Initial Value DAE/ODE Backward Euler method (e.g. Rythmos)
Linear equations GMRES (e.g. Belos)
Requires
Requires
8/22/2006
43
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
Basic Thyra Vector and Operator Interfaces
LinearOps apply to Vectors
An operator knows its domain and range spaces
A linear operator is a kind of operator
A Vector knows its VectorSpace
<<create>>
VectorSpaces create Vectors!
8/22/2006
44
Basic Thyra Vector and Operator Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
<<create>>
• Basically compatible with HCL (Symes et al.) and many other operator/vector interfaces
• Near optimal for many but not all abstract numerical algorithms (ANAs)
• What’s missing?
=> Multi-vectors!
8/22/2006
45Introducing Multi-Vectors
What is a multi-vector?• An m multi-vector V is a tall thin dense
matrix composed of m column vectors vj
Example: m = 4 columns
V = =
What ANAs can exploit multi-vectors?• Block linear solvers (e.g. block GMRES):• Block eigen solvers (i.e. block Arnoldi):• Compact limited memory quasi-Newton• Tensor methods for nonlinear equations
Why are multi-vectors important?• Cache performance• Reduce global communication
Examples of multi-vector operations
=
Y = A X
Q = XT Y
Y = Y + X Q
• Block dot products (m2)
• Block update
• Operator applications
(i.e. mat-vecs)
= +
=
8/22/2006
46
Basic Thyra Vector and Operator Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
<<create>>
Where do multi-vectors fit in?
8/22/2006
47
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
Basic Thyra Vector and Operator Interfaces
A MulitVector is a linear operator
A MulitVector has a collection of column vectors
A MulitVector is a tall thin dense matrix
<<create>>
VectorSpaces create MultiVectors
LinearOps apply to MultiVectors
A Vector is a MultiVector
8/22/2006
48
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
Basic Thyra Vector and Operator Interfaces
What about standard vector ops? Reductions (norm, dot etc.)? Transformations (axpy, scaling etc.)?
What about specialized vector ops? e.g. Interior point methods for opt
<<create>>
8/22/2006
49What’s the Big Deal about Vector-Vector Operations?
Examples from OOQP (Gertz, Wright)
y y x i ni i i / , ...1y y x z i ni i i i , ...1
yy y y yy y y y
y y yi ni
i i
i i
i
min min
max max
min max
, ...ififif0
1 dx:max
Example from TRICE (Dennis, Heinkenschloss, Vicente)
d
b u w bw b
u a w aw a
i ni
i i i
i i
i i i
i i
( ) and and
( ) and . and .
, ...
/
/
1 2
1 2
01 0
01 0
1
ifififif
Example from IPOPT (Waechter)
Ui
Li
Ui
Lii
U
Li
Li
Ui
Lii
L
iU
iiU
iL
iiL
iU
iL
Li
UiL
i
i
xxxxx
xxxxxwhere
ni
xxx
xxx
xxxx
x
x
,maxˆ
,minˆ:
...1,
ˆifˆ
ˆifˆ
ˆˆif2 Currently in MOOCHO :
> 40 vector operations!
Many different and unusual vector operations are needed by interior point methods for optimization!
8/22/2006
50Vector Reduction/Transformation Operators Defined
Reduction/Transformation Operators (RTOp) Defined
z 1i … z q
i opt( i , v 1i … v p
i , z 1i … z q
i ) element-wise transformation opr( i , v 1
i … v pi , z 1
i … z qi ) element-wise reduction
2 oprr( 1 , 2 ) reduction of intermediate reduction objects
• v 1 … v p R n : p non-mutable (constant) input vectors
• z 1 … z q
R n : q mutable (non-constant) input/output vectors• : reduction target object (many be non-scalar (e.g. {yk ,k}), or NULL)
Key to Optimal Performance
• opt(…) and opr(…) applied to entire sets of subvectors (i = a…b) independently:z 1
a:b … z qa:b , op( a, b , v 1
a:b … v pa:b , z 1
a:b … z qa:b , )
• Communication between sets of subvectors only for NULL, oprr( 1 , 2 ) 2
RTOpT
apply_op(in sv[1..p], inout sz[1…q], inout beta )
ROpDotProd
apply_op(… )
TOpAssignVectors
apply_op(… )
…
Vector
applyOp( in : RTOpT, apply_op(in v[1..p] , inout z[1…q], inout beta)
SerialVectorStd
applyOp( … )
MPIVectorStd
applyOp( … )
Software Implementation (see RTOp paper on Trilinos website “Publications”)
• “Visitor” design pattern (a.k.a. function forwarding)
…
8/22/2006
51
Basic Thyra Vector and Operator Interfaces
The missing piece:Reduction/Transformation Operators • Supports all needed vector operations• Data/parallel independence• Optimal performance
R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, ACM TOMS, March 2004
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomain
space
8/22/2006
52
Managing Parallel Data
The Petra Object Model
8/22/2006
53
A Simple Epetra/AztecOO Program
// Header files omitted…int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_MpiComm Comm( MPI_COMM_WORLD );
// ***** Create x and b vectors ***** Epetra_Vector x(Map); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s
// ***** Create an Epetra_Matrix tridiag(-1,2,-1) *****
Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0;
for (int i=0; i<NumMyElements; i++) { int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1; if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1); if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1); A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow); }A.FillComplete(); // Transform from GIDs to LIDs
// ***** Map puts same number of equations on each pe *****
int NumMyElements = 1000 ; Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements();
// ***** Report results, finish *********************** cout << "Solver performed " << solver.NumIters() << " iterations." << endl << "Norm of true residual = " << solver.TrueResidual() << endl;
MPI_Finalize() ; return 0;}
// ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8);
// ***** Create Linear Problem ***** Epetra_LinearProblem problem(&A, &x, &b);
8/22/2006
54
Typical Flow of Epetra Object Construction
Construct Comm
Construct Map
Construct x Construct b Construct A
• Any number of Comm objects can exist.• Comms can be nested (ee.g., serial within MPI).
• Maps describe parallel layout.• Maps typically associated with more than one comp object.• Two maps (source and target) define an export/import object.
• Computational objects.• Compatibility assured via common map.
8/22/2006
55
Parallel Data Redistribution
Epetra vectors, multivectors, graphs and matrices are distributed via one of the map objects.
A map is basically a partitioning of a list of global IDs: IDs are simply labels, no need to use contiguous values (Directory class
handles details for general ID lists). No a priori restriction on replicated IDs.
If we are given: A source map and A set of vectors, multivectors, graphs and matrices (or other distributable
objects) based on source map. Redistribution is performed by:
1. Specifying a target map with a new distribution of the global IDs.2. Creating Import or Export object using the source and target maps.3. Creating vectors, multivectors, graphs and matrices that are redistributed
(to target map layout) using the Import/Export object.
8/22/2006
56
Example: epetra/ex9.cppint main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int NumGlobalElements = 4; // global dimension of
the problem int NumMyElements; // local nodes Epetra_IntSerialDenseVector MyGlobalElements;
if( Comm.MyPID() == 0 ) { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 0; MyGlobalElements[1] = 1; MyGlobalElements[2] = 2; } else { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 1; MyGlobalElements[1] = 2; MyGlobalElements[2] = 3; }// create a map Epetra_Map Map(-1,MyGlobalElements.Length(),
MyGlobalElements.Values(),0, Comm);
// create a vector based on map Epetra_Vector xxx(Map); for( int i=0 ; i<NumMyElements ; ++i ) xxx[i] = 10*( Comm.MyPID()+1 ); if( Comm.MyPID() == 0 ){ double val = 12; int pos = 3; xxx.SumIntoGlobalValues(1,0,&val,&pos); } cout << xxx; // create a target map, in which all elements are on proc 0 int NumMyElements_target; if( Comm.MyPID() == 0 ) NumMyElements_target = NumGlobalElements; else NumMyElements_target = 0; Epetra_Map TargetMap(-1,NumMyElements_target,0,Comm); Epetra_Export Exporter(Map,TargetMap); // work on vectors Epetra_Vector yyy(TargetMap); yyy.Export(xxx,Exporter,Add); cout << yyy;MPI_Finalize();return( EXIT_SUCCESS );}
8/22/2006
57
Output: epetra/ex9.cpp
> mpirun -np 2 ./ex9.exeEpetra::Vector MyPID GID Value 0 0 10 0 1 10 0 2 10Epetra::Vector 1 1 20 1 2 20 1 3 20Epetra::Vector MyPID GID Value 0 0 10 0 1 30 0 2 30 0 3 20Epetra::Vector
PE 0xxx(0)=10xxx(1)=10xxx(2)=10
PE 1xxx(1)=20xxx(2)=20xxx(3)=20
PE 0yyy(0)=10yyy(1)=30yyy(2)=30yyy(3)=20
PE 1
Export/Add
Before Export After Export
8/22/2006
58
Import vs. Export
Import (Export) means calling processor knows what it wants to receive (send).
Distinction between Import/Export is important to user, almost identical in implementation.
Import (Export) objects can be used to do an Export (Import) as a reverse operation.
When mapping is bijective (1-to-1 and onto), either Import or Export is appropriate.
8/22/2006
59
Example: 1D Matrix Assembly
a b
-uxx = fu(a) = 0
u(b) = 1
x1 x2 x3
PE 0 PE 1
• 3 Equations: Find u at x1, x2 and x3
• Equation for u at x2 gets a contribution from PE 0 and PE 1.
• Would like to compute partial contributions independently.
• Then combine partial results.
8/22/2006
60
Two Maps
We need two maps: Assembly map:
• PE 0: { 1, 2 }.
• PE 1: { 2, 3 }.
Solver map:• PE 0: { 1, 2 } (we arbitrate ownership of 2).
• PE 1: { 3 }.
8/22/2006
61
End of Assembly Phase
At the end of assembly phase we have AssemblyMatrix:
On PE 0:
On PE 1:
Want to assign all of Equation 2 to PE 0 for usewith solver.
NOTE: For a class of Neumann-Neumann preconditioners, the above layout is exactly what we want.
2 1 0
1 1 0
Equation 1:
Equation 2:
0 1 1
0 1 2
Equation 2:
Equation 3:
Row 2 is shared
8/22/2006
62
Export Assembly Matrix to Solver Matrix
Epetra_Export Exporter(AssemblyMap, SolverMap);
Epetra_CrsMatrix SolverMatrix (Copy, SolverMap, 0);
SolverMatrix.Export(AssemblyMatrix, Exporter, Add);
SolverMatrix.FillComplete();
8/22/2006
63
Matrix Export
2 1 0
1 2 1
Equation 1:
Equation 2:
Equation 3: 0 1 2
2 1 0
1 1 0
0 1 1
0 1 2
Equation 1:
Equation 2:
Equation 2:
Equation 3:
PE 0 PE 0
PE 1 PE 1
Before Export After Export
Export/Add
8/22/2006
64
Example: epetraext/ex2.cppint main(int argc, char *argv[]) {
MPI_Init(&argc,&argv);
Epetra_MpiComm Comm (MPI_COMM_WORLD);
int MyPID = Comm.MyPID();
int n=4;
// Generate Laplacian2d gallery matrix
Trilinos_Util::CrsMatrixGallery G("laplace_2d", Comm);
G.Set("problem_size", n*n);
G.Set("map_type", "linear"); // Linear map initially
// Get the LinearProblem.
Epetra_LinearProblem *Prob = G.GetLinearProblem();
// Get the exact solution.
Epetra_MultiVector *sol = G.GetExactSolution();
// Get the rhs (b) and lhs (x)
Epetra_MultiVector *b = Prob->GetRHS();
Epetra_MultiVector *x = Prob->GetLHS();
// Repartition graph using Zoltan
EpetraExt::Zoltan_CrsGraph * ZoltanTrans = new EpetraExt::Zoltan_CrsGraph();
EpetraExt::LinearProblem_GraphTrans * ZoltanLPTrans =
new EpetraExt::LinearProblem_GraphTrans( *(dynamic_cast<EpetraExt::StructuralSameTypeTransform<Epetra_CrsGraph>*>(ZoltanTrans)) );
cout << "Creating Load Balanced Linear Problem\n";
Epetra_LinearProblem &BalancedProb = (*ZoltanLPTrans)(*Prob);
// Get the rhs (b) and lhs (x)
Epetra_MultiVector *Balancedb = Prob->GetRHS();
Epetra_MultiVector *Balancedx = Prob->GetLHS();
cout << "Balanced b: " << *Balancedb << endl;
cout << "Balanced x: " << *Balancedx << endl;
MPI_Finalize() ;
return 0 ;
}
8/22/2006
65
Need for Import/Export
Solvers for complex engineering applications need expressive, easy-to-use parallel data redistribution: Allows better scaling for non-uniform overlapping Schwarz. Necessary for robust solution of multiphysics problems.
We have found import and export facilities to be a very natural and powerful technique to address these issues.
8/22/2006
66
Details about Epetra Maps
Getting beyond standard use case…
8/22/2006
67
1-to-1 Maps
1-to-1 map (defn): A map is 1-to-1 if each GID appears only once in the map (and is therefore associated with only a single processor).
Certain operations in parallel data repartitioning require 1-to-1 maps. Specifically: The source map of an import must be 1-to-1. The target map of an export must be 1-to-1. The domain map of a 2D object must be 1-to-1. The range map of a 2D object must be 1-to-1.
8/22/2006
68
2D Objects: Four Maps
Epetra 2D objects: CrsMatrix CrsGraph VbrMatrix
Have four maps: RowMap: On each processor, the GIDs of the rows that processor
will “manage”. ColMap: On each processor, the GIDs of the columns that
processor will “manage”. DomainMap: The layout of domain objects
(the x vector/multivector in y=Ax). RangeMap: The layout of range objects
(the y vector/multivector in y=Ax).
Must be 1-to-1 maps!!!
Typically a 1-to-1 map
Typically NOT a 1-to-1 map
8/22/2006
69
Sample Problem
2 1 0
1 2 1
0 1 2
1
2
3
x
x
x
=
1
2
3
y
y
y
y A x
8/22/2006
70
Case 1: Standard Approach
RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {0, 1} RangeMap = {0, 1}
1 1
22
2 1 0,... ,...
1 2 1
y xy A x
xy
First 2 rows of A, elements of y and elements of x, kept on PE 0. Last row of A, element of y and element of x, kept on PE 1.
PE 0 Contents
3 3,... 0 1 2 ,...y y A x x
PE 1 Contents
RowMap = {2} ColMap = {1, 2} DomainMap = {2} RangeMap = {2}
Notes: Rows are wholly owned. RowMap=DomainMap=RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete: A.FillComplete(); // Assumes
2 1 0
1 2 1
0 1 2
1
2
3
x
x
x
=1
2
3
y
y
y
y A xOriginal Problem
8/22/2006
71
1
2
3
x
x
x
1
2
3
y
y
y
Case 2: Twist 1
RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {1, 2} RangeMap = {0}
21
3
2 1 0,... ,...
1 2 1
xy y A x
x
First 2 rows of A, first element of y and last 2 elements of x, kept on PE 0. Last row of A, last 2 element of y and first element of x, kept on PE 1.
PE 0 Contents
21
3
,... 0 1 2 ,...y
y A x xy
PE 1 Contents
RowMap = {2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2}
Notes: Rows are wholly owned. RowMap is NOT = DomainMap
is NOT = RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete:
A.FillComplete(DomainMap, RangeMap);
2 1 0
1 2 1
0 1 2
=
y A xOriginal Problem
8/22/2006
72Case 2: Twist 2
RowMap = {0, 1} ColMap = {0, 1} DomainMap = {1, 2} RangeMap = {0}
21
3
2 1 0,... ,...
1 1 0
xy y A x
x
First row of A, part of second row of A, first element of y and last 2 elements of x, kept on PE 0.
Last row, part of second row of A, last 2 element of y and first element of x, kept on PE 1.
PE 0 Contents
21
3
0 1 1,... ,...
0 1 2
yy A x x
y
PE 1 Contents
RowMap = {1, 2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2}
Notes: Rows are NOT wholly owned. RowMap is NOT = DomainMap
is NOT = RangeMap (all 1-to-1). RowMap and ColMap are NOT 1-to-1. Call to FillComplete:
A.FillComplete(DomainMap, RangeMap);
2 1 0
1 2 1
0 1 2
=
y A xOriginal Problem
1
2
3
x
x
x
1
2
3
y
y
y
8/22/2006
73
Overview of Trilinos Packages
8/22/2006
74
1Petra is Greek for “foundation”.
Trilinos Common Language: Petra Petra provides a “common language” for distributed
linear algebra objects (operator, matrix, vector)
Petra1 provides distributed matrix and vector services. Exists in basic form as an object model:
Describes basic user and support classes in UML, independent of language/implementation.
Describes objects and relationships to build and use matrices, vectors and graphs.
Has 3 implementations under development.
8/22/2006
75
Petra Implementations
Three version under development: Epetra (Essential Petra):
Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users.
Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency.
Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.
Developers: Mike Heroux, Rob Hoekstra, Alan Williams, Paul Sexton
8/22/2006
76
Epetra
Basic Stuff: What you would expect. Variable block matrix data structures. Multivectors. Arbitrary index labeling. Flexible, versatile parallel data redistribution. Language support for inheritance, polymorphism and
extensions. View vs. Copy.
8/22/2006
77
AztecOO Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,… Incomplete factorization preconditioners
Aztec is the workhorse solver at Sandia: Extracted from the MPSalsa reacting flow code. Installed in dozens of Sandia apps. 1900+ external licenses.
AztecOO improves on Aztec by: Using Epetra objects for defining matrix and RHS. Providing more preconditioners/scalings. Using C++ class design to enable more sophisticated use.
AztecOO interfaces allows: Continued use of Aztec for functionality. Introduction of new solver capabilities outside of Aztec.
Developers: Mike Heroux, Alan Williams, Ray Tuminaro
8/22/2006
78
IFPACK: Algebraic Preconditioners Overlapping Schwarz preconditioners with incomplete
factorizations, block relaxations, block direct solves.
Accept user matrix via abstract matrix interface (Epetra versions).
Uses Epetra for basic matrix/vector calculations. Supports simple perturbation stabilizations and condition
estimation. Separates graph construction from factorization, improves
performance substantially. Compatible with AztecOO, ML, Amesos. Can be used by
NOX and ML.Developers: Marzio Sala, Mike Heroux
8/22/2006
79
Amesos
Interface to direct solver for distributed sparse linear systems
Challenges: No single solver dominates Different interfaces and data formats, serial and parallel Interface often change between revisions
Amesos offers: A single, clear, consistent interface, to various packages Common look-and-feel for all classes Separation from specific solver details Use serial and distributed solvers; Amesos takes care of data
redistribution
Developers: Ken Stanley, Marzio Sala, Tim Davis
8/22/2006
80
Epetra_LinearProblem Problem( A, x, b);
Amesos Afact;
Amesos_BaseSolver* Solver = Afact.Create( “Klu”, Problem ) ;
Solver->SymbolicFactorization( ) ;
Solver->NumericFactorization( ) ;
Solver->Solve( ) ;
Calling Amesos
8/22/2006
81
Amesos: Supported Libraries
Library Name Language communicator# procs
for solution1
KLU C Serial 1
Paraklete C MPI any
UMFPACK 4.4 C Serial 1
SuperLU 3.0 C Serial 1
SuperLU_DIST 2.0 C MPI any
MUMPS 4.6.2 F90 MPI any
ScaLAPACK 1.7 F77 MPI any
Others: DSCPACK, PARDISO, TAUCS
1Matrices can be distributed over any number of processes
8/22/2006
82
ML: Multi-level Preconditioners Smoothed aggregation, multigrid and domain decomposition
preconditioning package
Critical technology for scalable performance of some key apps.
ML compatible with other Trilinos packages: Accepts user data as Epetra_RowMatrix object (abstract interface).
Any implementation of Epetra_RowMatrix works. Implements the Epetra_Operator interface. Allows ML
preconditioners to be used with AztecOO and TSF.
Can also be used completely independent of other Trilinos packages.
Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala
8/22/2006
83
NOX: Nonlinear Solvers
Suite of nonlinear solution methods
NOX uses abstract vector and “group” interfaces: Allows flexible selection and tuning of various strategies:
• Directions.
• Line searches.
Epetra/AztecOO/ML, LAPACK, PETSc implementations of abstract vector/group interfaces.
Designed to be easily integrated into existing applications.
Developers: Tammy Kolda, Roger Pawlowski
8/22/2006
84
LOCA
Library of continuation algorithms
Provides Zero order continuation First order continuation Arc length continuation Multi-parameter continuation (via Henderson's MF Library) Turning point continuation Pitchfork bifurcation continuation Hopf bifurcation continuation Phase transition continuation Eigenvalue approximation (via ARPACK or Anasazi)
Developers: Andy Salinger, Eric Phipps
8/22/2006
85
EpetraExt: Extensions to Epetra
Library of useful classes not needed by everyone
Most classes are types of “transforms”. Examples:
Graph/matrix view extraction. Epetra/Zoltan interface. Explicit sparse transpose. Singleton removal filter, static condensation filter. Overlapped graph constructor, graph colorings. Permutations. Sparse matrix-matrix multiply. Matlab, MatrixMarket I/O functions. …
Most classes are small, useful, but non-trivial to write.
Developer: Robert Hoekstra, Alan Williams, Mike Heroux
8/22/2006
86
Teuchos Utility package of commonly useful tools:
ParameterList class: key/value pair database, recursive capabilities. LAPACK, BLAS wrappers (templated on ordinal and scalar type). Dense matrix and vector classes (compatible with BLAS/LAPACK). FLOP counters, Timers. Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc. Reference counted pointers, and more…
Takes advantage of advanced features of C++: Templates Standard Template Library (STL)
ParameterList: Allows easy control of solver parameters.
Developers: Roscoe Barlett, Kevin Long, Heidi Thorquist, Mike Heroux, Paul Sexton, Kris Kampshoff
8/22/2006
87
Belos and Anasazi
Next generation linear solvers (Belos) and eigensolvers (Anasazi) libraries, written in templated C++.
Provide a generic interface to a collection of algorithms for solving large-scale linear problems and eigenproblems.
Algorithm implementation is accomplished through the use of abstract base classes (mini interface). Interfaces are derived from these base classes to operator-vector products, status tests, and any arbitrary linear algebra library.
Includes block linear solvers and eigensolvers.
Developers:Heidi Thorquist, Mike Heroux, Chris Baker, Rich Lehoucq
8/22/2006
88
Kokkos
Goal: Isolate key non-BLAS kernels for the purposes of optimization.
Kernels: Dense vector/multivector updates and collective ops (not in
BLAS/Teuchos). Sparse MV, MM, SV, SM.
Serial-only for now. Reference implementation provided (templated). Mechanism for improving performance:
Default is aggressive compilation of reference source. BeBOP: Jim Demmel, Kathy Yelick, Rich Vuduc, UC Berkeley. Vector version: Cray.
Developer: Mike Heroux
8/22/2006
89
NewPackage Package NewPackage provides jump start to develop/integrate a new
package
NewPackage is a “Hello World” program and website: Simple but it does work with autotools. Compiles and builds.
NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working autotools files. Documentation templates (doxygen). Working regression test setup.
Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website.
8/22/2006
90
CLAPS Package Clark Dohrmann:
Expert in computational structural mechanics. Unfamiliar with parallel computing. Limited C++ experience.
Epetra: Advanced parallel basic linear algebra package. Supports sparse, dense linear algebra and parallel data redistribution.
Clark+Epetra: CLOP: Overlapping Schwarz preconditioner. Scales with increasing number of constraints. sleg[1,2,4]x joint models, 12 modes, vplant cluster (previously unsolvable).
# DOF #Constraints # procs Time
33,145 4,956 4 11m 36s
227,170 18,366 24 29m 22s
1,674,502 70,566 144 29m 22s
3 Months to develop. Depends only on Epetra. Compatible with rest of Trilinos.
CLAPS=Clark’s Linear Algebra Suite
8/22/2006
91
Pliris Package
Migration of Linpack benchmark code to supported environment. Used new_package as starting point. Put under source control for the first time. Portable build process for the first time. Documented distributed data model (Epetra). Has its own Bugzilla product, mail lists, regression tests. Source code browsable via Bonsai. And more… But Pliris is still an independent piece of code…and better off after
the process.
8/22/2006
92
Interface Packages
Thyra and PyTrilinos
8/22/2006
93template<class Scalar> bool sillyCgSolve( const Thyra::LinearOpBase<Scalar> &A ,const Thyra::VectorBase<Scalar> &b ,const int maxNumIters ,const typename Teuchos::ScalarTraits<Scalar>::magnitudeType tolerance ,Thyra::VectorBase<Scalar>*x ,std::ostream *out = NULL ) { using Teuchos::RefCountPtr; typedef Teuchos::ScalarTraits<Scalar> ST; // We need to use ScalarTraits to support arbitrary types. typedef typename ST::magnitudeType ScalarMag; // This is the type returned from a vector norm. typedef one ST::one());// Get the vector space (domain and range spaces should be the same) RefCountPtr<const Thyra::VectorSpaceBase<Scalar> > space = A.domain();
// Compute initial residual : r = b - A*x RefCountPtr<Thyra::VectorBase<Scalar> > r = createMember(space); Thyra::assign(&*r,b); // r = b Thyra::apply(A,Thyra::NOTRANS,*x,&* one, one); // r = -A*x + r const ScalarMag r0_nrm = Thyra::norm(*r); // Compute ||r0|| = sqrt(<r0,r0>) for convergence test if(r0_nrm == ST::zero()) return true; // Trivial RHS and initial LHS guess?
// Create workspace vectors and scalars RefCountPtr<Thyra::VectorBase<Scalar> > p = createMember(space), q = createMember(space); Scalar rho_old;
// Perform the iterations for( int iter = 0; iter <= maxNumIters; ++iter ) { // Check convergence and output iteration const ScalarMag r_nrm = Thyra::norm(*r); // Compute ||r|| = sqrt(<r,r>) if( r_nrm/r0_nrm < tolerance ) return true; // Converged to tolerance, Success!
// Compute iteration const Scalar rho = space->scalarProd(*r,*r); // <r,r> -> rho if(iter==0) Thyra::assign(&*p,*r); // r -> p (iter == 0) else Thyra::Vp_V( &*p, *r, Scalar(rho/rho_old) ); // r+(rho/rho_old)*p -> p (iter > 0) Thyra::apply(A,Thyra::NOTRANS,*p,&*q); // A*p -> q const Scalar alpha = rho/space->scalarProd(*p,*q); // rho/<p,q> -> alpha Thyra::Vp_StV( x, Scalar(+alpha), *p ); // +alpha*p + x -> x Thyra::Vp_StV( &*r, Scalar(-alpha), *q ); // -alpha*q + r -> r rho_old = rho; // rho-> rho_old (remember rho for next iter) } return false; // Failure} // end sillyCgSolve
Thyra: sillyCGSolve
Generic CG code. To use, provide:
Thyra::LinearOp. Thyra::Vector.
Works on any parallel machine.
Allows mixing of vectors matrices: PETSc & Epetra.
Works with any scalar datatype.
8/22/2006
94
PyTrilinos
PyTrilinos provides Python access to Trilinos packages. Uses SWIG to generate bindings. Epetra, AztecOO, IFPACK,
ML, NOX, LOCA, Amesosand NewPackage are support.
Possible to: Define RowMatrix implementation in Python. Use from Trilinos C++ code.
Performance for large grain is equivalent to C++. Several times hit for very fine grain code.
8/22/2006
95#include "mpi.h"#include "Epetra_MpiComm.h"#include "Epetra_Vector.h"#include "Epetra_Time.h"#include "Epetra_RowMatrix.h"#include "Epetra_CrsMatrix.h"#include "Epetra_Time.h"#include "Epetra_LinearProblem.h"#include "Trilinos_Util_CrsMatrixGallery.h"using namespace Trilinos_Util;int main(int argc, char *argv[]){MPI_Init(&argc, &argv);Epetra_MpiComm Comm(MPI_COMM_WORLD);int nx = 1000;int ny = 1000 * Comm.NumProc();CrsMatrixGallery Gallery("laplace_2d", Comm);Gallery.Set("ny", ny);Gallery.Set("nx", nx);Gallery.Set("problem_size",nx*ny);Gallery.Set("map_type", "linear");Epetra_LinearProblem* Problem =
Gallery.GetLinearProblem();assert (Problem != 0);// retrieve pointers to solution (lhs), right-hand side (rhs)// and matrix itself (A)Epetra_MultiVector* lhs = Problem->GetLHS();Epetra_MultiVector* rhs = Problem->GetRHS();Epetra_RowMatrix* A = Problem->GetMatrix();Epetra_Time Time(Comm);for (int i = 0 ; i < 10 ; ++i)A->Multiply(false, *lhs, *rhs);cout << Time.ElapsedTime() << endl;MPI_Finalize();return(EXIT_SUCCESS);} // end of main()
#! /usr/bin/env python
from PyTrilinos import Epetra, Triutils
Epetra.Init()
Comm = Epetra.PyComm()
nx = 1000
ny = 1000 * Comm.NumProc()
Gallery = Triutils.CrsMatrixGallery("laplace_2d", Comm)
Gallery.Set("nx", nx)
Gallery.Set("ny", ny)
Gallery.Set("problem_size", nx * ny)
Gallery.Set("map_type", "linear")
Matrix = Gallery.GetMatrix()
LHS = Gallery.GetStartingSolution()
RHS = Gallery.GetRHS()
Time = Epetra.Time(Comm)
for i in xrange(10):
Matrix.Multiply(False, LHS, RHS)
print Time.ElapsedTime()
Epetra.Finalize()
C++ vs. Python: Equivalent Code
8/22/2006
96
Templates and Generic Programming
8/22/2006
97
Investment in Generic Programming
2nd Generation Trilinos packages are templated on: OrdinalType (think int). ScalarType (think double).
Examples:Teuchos::SerialDenseMatrix<int, double> A; Teuchos::SerialDenseMatrix<short, float> B;
Scalar/Ordinal Traits mechanism completes support for genericity. The following packages support templates:
Teuchos (Basic Tools)Thyra (Abstract Interfaces)Tpetra (including MPI support)Belos (Krylov and Block Krylov Linear), IFPACK (algebraic preconditioners, next version), Anasazi (Eigensolvers),
8/22/2006
98
Potential Benefits of Templated Types
Templated scalar (and ordinal) types have great potential: Generic: Algorithms expressed over abstract field. High Performance: Partial template specialization +
Fortran. Facilitate variety of algorithmic studies. Allow study of asymptotic behavior of discretizations. Facilitate debugging: Reduces FP error as source error. Use your imagination… All new Trilinos packages are templated.
8/22/2006
99
Kokkos Results: Sparse MVPentium M 1.6GHz Cygwin/GCC 3.2 (WinXP Laptop)
Data Set Dimension Nonzeros # RHS MFLOPS
DIE3D 9873 1733371 1 247.62
2 418.94
3 577.79
4 691.62
5 787.90
dday01 21180 923006 1 230.75
2 407.96
3 553.80
4 668.24
5 738.40
FIDAP035 19716 218308 1 171.22
2 349.29
3 374.24
4 498.99
5 545.77
8/22/2006
100
ARPREC
The ARPREC library uses arrays of 64-bit floating-point numbers to represent high-precision floating-point numbers.
ARPREC values behave just like any other floating-point datatype, except the maximum working precision (in decimal digits) must be specified before any calculations are done mp::mp_init(200);
Illustrate the use of ARPREC with an example using Hilbert matrices.
8/22/2006
101
Hilbert Matrices
A Hilbert matrix HN is a square N-by-N matrix such that:
For Example: 1
1
jiH
ijN
51
41
31
41
31
21
31
211
3H
8/22/2006
102
Hilbert Matrices
Notoriously ill-conditioned κ(H3) 524
κ(H5) 476610
κ(H10) 1.6025 x 1013
κ(H20) 7.8413 x 1017
κ(H100) 1.7232 x 1020
Hilbert matrices introduce large amounts of error
8/22/2006
103
Hilbert Matrices and Cholesky Factorization
With double-precision arithmetic, Cholesky factorization will fail for HN for all N > 13.
Can we improve on this using arbitrary-precision floating-point numbers?
Precision Largest N for which Cholesky Factorization is successful
Single Precision 8
Double Precision 13
Arbitrary Precision (20) 29
Arbitrary Precision (40) 40
Arbitrary Precision (200) 145
Arbitrary Precision (400) 233
8/22/2006
104
New Trilinos 7.0 Packages
8/22/2006
105
Galeri & Isorropia
Galeri: Generates Epetra maps and matrices for testing. A set of functionalities that are very close to that of the
MATLAB's gallery() function. Capabilities to create several well-know finite difference matrices. A simple finite element code that can be used to discretize scalar
second order elliptic problems using Galerkin and SUPG techniques, on both 2D and 3D unstructured grids.
Isorropia: Repartitioning/rebalancing. Assists with redistributing objects such as:
• Matrices and matrix-graphs in a parallel execution setting.
• Allows for more efficient computations.
• Primarily an interface to the Zoltan library.
• Can be built and used with minimal capability without Zoltan.
8/22/2006
106
Moertel & Moocho
Moertel: Capabilities for nonconforming mesh tying and contact
formulations in 2 and 3 dimensions using Mortar methods. Mortar methods are types of Lagrange Multiplier constraints that
can be used in contact formulations and in non-conforming or conforming mesh tying as well as in domain decomposition techniques.
Used in a large class of nonconforming situations such as the surface coupling of different physical models, discretization schemes or non-matching triangulations along interior interfaces of a domain.
MOOCHO: (Multifunctional Object-Oriented arCHitecture for Optimization) Large-scale invasive simultaneous analysis and design (SAND)
using primarily SQP-related methods.
8/22/2006
107
RTOp & Rythmos
RTOp: (reduction/transformation operators) Provides basic mechanism for implementing vector operations in a
flexible and efficient manner.
Rythmos: ODE/DAE. Includes: backward Euler, forward Euler, explicit Runge-Kutta,
and implicit BDF at this time. Native support for operator split methods. Highly modular. Forward sensitivity computations will be included in the first
release with adjoint sensitivities coming in near future.
8/22/2006
108
WebTrilinos
WebTrilinos: Web interface to Trilinos Generate test problems or read from file. Generate C++ or Python code fragments and click-run. Hand modify code fragments and re-run. Will use during hands-on.
8/22/2006
109
Dynamic External Package Support
New directory Trilinos/packages/external. Supports seamless integration of externally developed
packages via package registration. Completes the goal of Trilinos-compatibility.
8/22/2006
110
Software Quality
8/22/2006
111
SQA/SQE
Software Quality Assurance/Engineering is important. No longer sufficient to say “We do a good job”. Trilinos facilitates SQA/SQE development/processes for
packages: 32 of 47 ASCI SQE practices are directly handled by Trilinos (no
requirements on packages). Trilinos provides significant support for the remaining 15. Trilinos Dev Guide Part II: Specific to ASCI requirements. Trilinos software engineering policies provide a ready-made
infrastructure for new packages. Trilinos philosophy:
Few requirements. Instead mostly suggested practices. Provides package with option to provide alternate process.
8/22/2006
112Trilinos Service SQE Practices Impact
Yearly Trilinos User Group Meeting (TUG) and Developer Forum:
Once a year gathering for tutorials, package feature updates, user/developer requirements discussion and developer training.
— All Requirements steps: gathering, derivation, documentation, feasibility,etc.— User and Developer training.
Monthly Trilinos leaders meetings:
Trilinos leaders, including package development leaders, key managers, funding sources and other stakeholders participate in monthly phone meetings to discuss any timely issues related to the Trilinos Project.
—Developer Training.
—Design reviews.
—Policy decisions across all development phases.
Trilinos and package mail lists:
Trilinos lists for leaders, announcements, developers, users, checkins and similar lists at the package level support a variety of communication. All lists are archived, providing critical artifacts for assessments and audits.
—Developer/user/client communication.
—Requirements/design/testing artifacts.
—Announcement/documenting of releases.
Trilinos and Trilinos3PL source repositories:
All source code, development and user documentation is retained and tracked. In addition, reference versions of all external software, including BLAS, LAPACK, Umfpack, etc. are retained in Trilinos3PL.
—Source management.
—Versioning.
—Third-party software management.
Bugzilla Products:
Each package has its own Bugzilla Product with standard components.
—Requirements/faults capturing and tracking.
Trilinos configure script and M4 macros:
The Trilinos configure script and related macros support portable installation of Trilinos and its packages
—Portability.
—Software release.
Trilinos test harness:
Trilinos provides a base testing plan and automated testing across multiple platforms, plus creation of testing artifacts. Test harness results are used to derive a variety of metrics for SQE.
—Pre-checkin and regression testing.
—Software metrics.
8/22/2006
113
The Trilinos Tutorial Package
Trilinos is large (and still growing…) More than 600K code lines Many packages
• a lot of functionalities…
• … and also a lot to learn
What is the best way for a new user to learn Trilinos? Using a Trilinos package of course: Didasko!
8/22/2006
114
What DIDASKO is not
DIDASKO, as a tutorial, cannot cover the most advanced features of Trilinos: Only the “stable” features are covered
DIDASKO is not a substitute of each package’s documentation and examples, which remain of fundamental importance
8/22/2006
115
Covered Packages
At present, we cover
Epetra Triutils IFPACKTeuchos AztecOO MLNOX Amesos EpetraExtAnasazi Tpetra
8/22/2006
116
Trilinos Availability/Information Trilinos and related packages are available via LGPL. Current release (6.0) is “click release”. Unlimited availability. Trilinos Release 7: September 2006. Trilinos Awards:
2004 R&D 100 Award. SC2004 HPC Software Challenge Award. Sandia Team Employee Recognition Award. Lockheed-Martin Nova Award Nominee.
More information: http://software.sandia.gov http://software.sandia.gov/trilinos Additional documentation at my website:
http://www.cs.sandia.gov/~mheroux. 4th Annual Trilinos User Group Meeting: November 7-9, 2006 at SNL.
8/22/2006
117
Summary
Trilinos package model provides a welcoming environment for solver developers: Extremely flexible. Building up package, not making it conform. Scalable growth model for complex multi-physics solver suites. Working toward full vertical coverage of capabilities.
Basic principles can be applied to any similar “project of projects”: Modularity Interfaces Common infrastructure.
Trilinos new_package package is self-contained software environment: Package and website useful to any project wanting to ramp up in use of
standard tools. Available separately, Open Source. Used by groups outside of Trilinos as well as within.
8/22/2006
118
Trilinos Might be of Interest to you if…
Your application includes non-PDE equations. Multiple RHS, Block Krylov linear, eigen solvers, SAND
needed. You develop an independent Trilinos-compatible solver. Interested in multi-physics. App is in C++. Interested in collaborating on Fortran interface. Unstructured Data redistribution is important. Multi-precision useful. Parallel Python interest.
8/22/2006
119
Extras
8/22/2006
120Algorithms Research: Truly Useful Multi-level Methods
Fly-through of next 4 slides. Theme:
Multi-level preconditioning has come of age across broad spectrum of problems.
1542/1414 (Gee,Heinstein,Key,Pierson,Tuminaro)
Novel nonlinear AMG
ML, NOX, Epetra, Amesos, AztecOO
Constraints projected out in F(x)
Improved coloring performance by 10x
Initial testing excellent convergence
Pressurize tire with 5 loadsteps
Nonlinear AMG/ADAGIO
its
Jacobi 22
MG 13
# elmts its
91 13
559 14
1241 25
2331 23
3925 30
6119 35
• Large reduction in iteration count.• Slow growth as size increases.
Helium plume V&V Project
260K, 16 processor run
ML/GMRES is ~25% faster than without ML
As problem size increase, ML expected to be more beneficial
ASC SIERRA Applications
SIERRA/Fuego/Syrinx
SIERRA/Aria Multiphysicscoupled potential/thermal/displacement DP MEMS problemML reduced solve time (40%) from ~20 minutes to ~12 minutes
compared to actual ANSYS runs & ARIA re-creation ANSYS scheme
123
Premo premo (Latin) – to squeeze (compress)
Compressible flow to determine aerodynamic characteristics
for the Nuclear Weapons Complex Additional Issues that have been addressed
FEI/Nox/Trilinos interface developmentBlock Algorithm ImprovementsBlock Gauss-Seidel, Block Grid Transfers
Numerical Issuesnonlinear path, time step path, setup time, linear sys. difficulty
B61 problem (6.5 million dofs, 64 procs) Pseudo-transient + Newton Euler flow, Mach .8 Linear solver: 173 solves, tolerance= 10-4
GMRES/ILU 7494 sec GMRES/MG 3704 sec
124
Premo premo (Latin) – to squeeze (compress)
Falcon problem (13+ Million dofs, 150 procs) Pseudo-transient + Newton Euler flow, Mach .75 Linear solver: 109 solves, tolerance= 10-4
GMRES/ILU 7620 sec GMRES/MG 3787 sec 10x improvement on final linear solve. > 5x gains on some problems
over entire sequence
New Grid Transfers (220+K Falcon, 1 proc) Pseudo-transient + Newton, Euler flow @ Mach .75 Last linear solve, tolerance= 10-9
GMRES/MG/old transfers 47 its, 49 sec GMRES/MG/new transfers 24 its, 26 sec
8/22/2006
125
Multi-level Methods Summary Solving hard, real problems fast, scalably. Still need more…
8/22/2006
126Algorithms Research: Specialized Solvers
Next wave of capabilities: Specialized solvers. Examples in Trilinos:
Optimal domain decomposition preconditioners for structures: CLAPS
Mortar methods for interface coupling: Moertel. Segregated Preconditioners for Navier-Stokes: Meros.
Examples in Applications: EMU (with Boeing). Tramonto.
8/22/2006
127
Lipid Bi-Layer Problem
00 0
0 00 0
0
F0 0 00
0
0 0
00 0
0
0
0
0
0
0
19n
19n
3n
3n
0
0 00
0
2n A11 A12
A21
A22
•A11 solve easily applied in parallel.
•Apply GMRES to S = A22 – A21*inv(A11)*A12•Need a preconditioner for S.
8/22/2006
128
Preconditioner for S
FA22 = D11
D21
D21
FA22 A22 = D11
0
D21
• D11, D22 = O(1), D21 = O(1e-10)• Ignore D21 for preconditioning.• P(S) requires
• 2 diagonal scalings, • matvec with F.
• All distributed operations.
8/22/2006
129
Tramonto Solver Summary 3-level linear operator structure. Matlab to fully parallel: 1 month. Complex orchestration:
Preconditioner: 100+ distributed Epetra matrices used in sequence. ML, IFPACK, Amesos used on subproblems.
Utilizes 8 Trilinos packages in total. 566 Lines of Code (Polymer Solver). Polymorphic Design.
130
3D Polymer Results
Same Iteration Counts asProcessor Count Increases
Approximately Linear PerformanceImprovement For Fixed Size Problem
8/22/2006
131
Properties of New Solver
Uniform distributed mesh → uniform distributed work. Preconditioner sub-steps naturally parallel:
→ Results invariant to processor count up to round-off.
Preconditioner requires almost no extra memory:→ 4-10X reduction over previous approach.
GMRES subspace and storage reduced 6X-10X or more. Speedup 20-2X. Solver has:
No tuning parameters. Near linear scaling.
Laurie’s Favorite Properties!
Michael A. Heroux, Laura J. D. Frink, and Andrew G. Salinger. Schur complement based approaches to solving density functional theories for inhomogeneous fluids on parallel computers. SIAM J. Sci. Comput. Submitted.
8/22/2006
132
Specialized Solver API: EMU
EMU: Peridynamics modeling code. Boeing: Modeling of composite material failure. Onset to failure: Explicit time stepping. Prior to onset: Desire implicit (large-time step). Previous capability: Serial Y12M. Present: Trilinos serial and (almost) parallel. Status:
Solver API production-ready (290 LOC). Serial results complete. Parallel soon (requires work in EMU also).
First results (smallest problem): 3.5X faster (complete run).
8/22/2006
133
Specialized Solvers Summary
Trilinos: Rich, configurable foundation Parallel data model. Large and growing collection of solvers in scalable
package framework.
Specialized solvers: Next-level advances Exploit inherent problem properties. Few Lines of Code: Leverage Trilinos. Robust, scalable, efficient.
Michael A. Heroux. A Solver-Independent API for multi-DOF Applications using Trilinos. Int. J. of Computational Science and Engineering. Submitted.