Download - Functional analysis and algebraic iterative methods. There ...

There and back again.
Zdenek Strakoš
Necas Center for Mathematical Modeling Charles University in Prague and Czech Academy of Sciences
http://www.karlin.mff.cuni.cz/˜strakos
Z. Strakoš 2
Ivo Marek (K. itný), (1974, later Teubner)
Functional theory of matrices should be build up as a theory of linear operators in Banach spaces. In this way, the matrix theory can be seen as a finite dimensional functional analysis, with the add of complex analysis. The finite dimension does not necessarily make things simpler.
Vlastimil Pták, IMA Preprint #1405 (1996):
A simple identity for Krylov sequences is used to study the relationship between spectral decompositions, orthogonal polynomials and the Lanczos algorithm.
ZS, Summer School SW and Algorithms of Numerical Math., Plzen, 1993:
Lanczos algorithm, orthogonal polynomials and continued fractions. A self-study perspective. Motivated by the work on G. H. Golub and ZS, Estimates in quadratic formulas, Numer. Alg (1994).
Z. Strakoš 3
Functional analysis and iterative methods
R. C. Kirby, SIREV (2010):
“We examine condition numbers, preconditioners and iterative methods for FEM discretization of coercive PDEs in the context of the solvability result, the Lax-Milgram lemma.
Moreover, useful insight is gained as to the relationship between Hilbert space and matrix condition numbers, and translating Hilbert space fixed point iterations into matrix computations provides new ways of motivating and explaining some classic iteration schemes. [ ... ] This paper is [ ... ] intending to bridge the functional analysis techniques common in finite elements and the linear algebra community.”
Z. Strakoš 4
K. A. Mardal and R. Winther, NLAA (2011):
“The main focus will be on an abstract approach to the construction of preconditioners for symmetric linear systems in a Hilbert space setting [ ... ] The discussion of preconditioned Krylov space methods for the continuous systems will be a starting point for a corresponding discrete theory.
By using this characterization it can be established that the conjugate gradient method converges [ ... ] with a rate which can be bounded by the condition number [ ... ] However, if the operator has a few eigenvalues far away from the rest of the spectrum, then the estimate is not sharp. In fact, a few ‘bad eigenvalues’ will have almost no effect on the asymptotic convergence of the method.”
Z. Strakoš 5
O. Axelsson and J. Karátson, Numer. Alg. (2009):
“To preserve sparsity, the arising system is normally solved using an iterative solution method, commonly a preconditioned conjugate gradient method [ ... ] the rate of convergence depends in general on a generalized condition number of the preconditioned operator [ ... ]
if the two operators (original and preconditioner) are equivalent then the corresponding PCG method provides mesh independent linear convergence [ ... ]
if the two operators (original and preconditioner) are compact-equivalent then the corresponding PCG method provides mesh independent superlinear convergence.”
Z. Strakoš 6
R. Hipmair, CMA (2006):
“There is a continuous operator equation posed in infinite-dimensional spaces that underlies the linear system of equations [ ... ] awareness of this connection is key to devising efficient solution strategies for the linear systems.
Operator preconditioning is a very general recipe [ ... ]. It is simple to apply, but may not be particularly efficient, because in case of the [ condition number ] bound of Theorem 2.1 is too large, the operator preconditioning offers no hint how to improve the preconditioner. Hence, operator preconditioner may often achieve [ ... ] the much-vaunted mesh independence of the preconditioner, but it may not perform satisfactorily on a given mesh.”
Z. Strakoš 7
V. Faber, T. Manteuffel and S. V. Parter, Adv. in Appl. Math. (1990):
“For a fixed h, using a preconditioning strategy based on an equivalent operator may not be superior to classical methods [ ... ] Equivalence alone is not sufficient for a good preconditioning strategy. One must also choose an equivalent operator for which the bound is small.
There is no flaw in the analysis, only a flaw in the conclusions drawn from the analysis [ ... ] asymptotic estimates ignore the constant multiplier. Methods with similar asymptotic work estimates may behave quite differently in practice.”
Z. Strakoš 8
Operator preconditioning:
Arnold, Falk, and Winther (1997, 1997); Steinbach and Wendland (1998); Mc Lean and Tran (1997); Christiansen and Nédélec (2000, 2000); Powell and Silvester (2003); Elman, Silvester, and Wathen (2005) ...
CG in Hilbert spaces:
Hayes (1954); Daniel (1967, 1967); ... , Fortuna (1979); Ernst (2000); Axelsson and Karatson (2002); Glowinski (2003); .... ; Zulehner (2011); Günnel, Herzog, and Sachs (2012); ...
Vorobyev (1958, 1965)
Z. Strakoš 9
Algebraic iterative computations
Computational cost of finding sufficiently accurate approximation to the exact solution heavily depends on
the underlying real world problem, the mathematical model, on its discretization.
Construction and analysis of computational algorithms should respect that.
Evaluation of accuracy and of the computational cost must take into account algebraic errors, including rounding errors.
Z. Strakoš 10
From Fermat, Descartes, Euler (principal axis theorem) ... Cauchy, Jacobi ... Fredholm ... through Stieltjes to Hilbert, Schmidt, Lebesgue, Hellinger and Toeplitz, Wintner, Stone, von Neumann, ...
L. A. Stein, Highlights in the history of spectral theory, Amer. Math. Monthly (1973)
In connection to Krylov subspace methods (CG) and the (Chebyshev-Markov-) Stieltjes moment problem (see Stieltjes (1894))
Y. V. Vorobyev, Method of moments in Applied mathematics, (1958,1965)
Z. Strakoš 11
3. Spectral theory and the moment problem formulation
4. Galerkin discretization and matrix CG
5. Algebraic preconditioning as the transformation of the discetization basis
6. CG computations, outliers and condition numbers
7. Distribution of errors, a-posteriori analysis and stopping criteria
8. Conclusions
Z. Strakoš 12
1 Basic setting
Let V be a real infinite dimensional Hilbert space with the inner product
(·, ·)V : V × V → R, the associated norm · V ,
V # be the dual space of bounded (continuous) linear functionals on V with the duality pairing
·, · : V # × V → R .
For each f ∈ V # there exists a unique τf ∈ V such that
f, v = (τf, v)V for all v ∈ V .
In this way the inner product (·, ·)V determines the Riesz map
τ : V # → V .
Z. Strakoš 13
1 Infinite dimensional problem
Let a(·, ·) = V × V → R be a bounded and coercive bilinear form. For a fixed u ∈ V we can see a(u, ·) as the bounded linear functional on V and we can write it as
Au ≡ a(u, ·) ∈ V # , i.e. ,
This defines the bounded and coercive operator
A : V → V #, inf u∈V, uV =1
Au, u = α > 0, A = C .
The Lax-Milgram theorem ensures that for any b ∈ V # there exist a unique solution x ∈ V of the following problem
a(x, v) = b, v for all v ∈ V .
Z. Strakoš 14
Using the Riesz map,
(τAx − τb, v)V = 0 for all v ∈ V .
Clearly, the Riesz map τ can be interpreted as transformation of the original problem Ax = b in V # into the equation in V
τAx = τb, τA : V → V, x ∈ V, τb ∈ V ,
which is commonly (and inaccurately) called preconditioning.
Z. Strakoš 15
1 Optimal preconditioning for a(·, ·) symmetric
If a(·, ·) is (in addition) symmetric, then A is self adjoint wrt to the duality pairing. Consider the special choice of the inner product
(u, v)V = (u, v)a ≡ a(u, v) = Au, v for all u, v ∈ V .
then the definition of the associated Riesz map τa gives
(u, v)a = Au, v = (τa(Au), v)a for all u, v ∈ V .
Therefore
τa(Au) = u, for all u ∈ V , i.e, τa = A−1 : V # → V ,
and τAx = τb will reduce to x = A−1b. Therefore the motivation of preconditioning is to take, in general, the inner product (·, ·)V as close as possible to (·, ·)a. Computational cost tradeoff.
Z. Strakoš 16
2 CG in Hilbert spaces
As τA : V → V , we can form for g ∈ V the Krylov sequence
g, τAg, (τA)2g, . . . in V
and define Krylov subspace methods in the Hilbert space operator setting. Here we will do CG. Our goal is to construct a method for solving the (operator) equation
Ax = b, x ∈ V, b ∈ V #
such that with r0 = b −Ax0 ∈ V # the approximations xn to the solution x , n = 1, 2, . . . belong to the Krylov manifolds in V
xn ∈x0 + Kn(τA, τr0) ≡
Z. Strakoš 17
2 Optimality conditions
z∈x0+Kn
x − za .
That is equivalent to the Galerkin orthogonality condition
b −Axn, w = rn, w = 0 for all w ∈ Kn ≡ Kn(τA, τr0) ,
and to finding the minimum
F (xn) = min z∈x0+Kn
F (z) , F (z) = 1
2 Az, z − b, z, z ∈ V .
Starting from p0 = τr0 ∈ V , we will construct a sequence of direction vectors p0, p1, . . . , and a sequence of scalars α0, α1, . . . such that
xn = xn−1 + αn−1pn−1, n = 1, 2, . . .
Z. Strakoš 18
2 Local and global minimization
Here αn−1 ensures the minimization of F (z) along the line z(α) = xn−1 + αpn−1. If we moreover satisfy
Kn = span{p0, p1, . . . , pn−1} and pi ⊥a pj , i 6= j ,
then the one-dimensional line minimizations along the given n individual lines will be equivalent to minimization of the functional F (z) over the whole n dimensional manifold x0 + Kn . Then
x − x0 =
n−1∑
=0
αp + (x − xn)
represents the orthogonal expansion of the initial error x − x0 along the direction vectors p0, . . . , pn−1 with (x − xn) ⊥a Kn .
Z. Strakoš 19
2 Search vectors (Hackbusch (1994))
It is important to realize that the steepest descent direction in minimizing F (z) depends on the inner product (·, ·)V . It minimizes the directional derivative,
δF (x; d) = lim ν→0
F (x + νd) − F (x)
ν = −r, d .
Here r = b −Ax ∈ V # while d ∈ V . In order to determine d, we use the Riesz map τ associated with the inner product (·, ·)V , which immediately gives
−r, d = − (τr, d)V and d = τr .
The choice pn = τrn + βnpn−1
with the condition pn ⊥a pn−1 will naturally deliver the global orthogonality (and, consequently, the global minimization).
Z. Strakoš 20
r0 = b −Ax0 ∈ V #, p0 = τr0 ∈ V
For n = 1, 2, . . . , nmax
αn−1 = rn−1, τrn−1
Apn−1, pn−1 =
(τrn−1, τrn−1)V
(τApn−1, pn−1)V
xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is satisfied
rn = rn−1 − αn−1Apn−1
βn = rn, τrn
3 Spectral theory and the moment problem
Using the orthogonal projection En onto Kn with respect to the inner product (·, ·)V , consider the orthogonally restricted operator
τAn : Kn → Kn , τAn ≡ En (τA) En ,
by formulating the following equalities
τAn (τr0) ≡ τA (τr0) ,
...
(τAn)n−1 τr0 = τAn ((τA)n−2 τr0) ≡ (τA)n−1 τr0 ,
(τAn)n τr0 = τAn ((τA)n−1 τr0) ≡ En (τA)n τr0 .
Z. Strakoš 22
The n-dimensional approximation τAn of τA matches the first 2n moments
((τAn)τr0, τr0)V = ((τA)τr0, τr0)V , = 0, 1, . . . , 2n − 1 .
Denote symbolically Qn = (q1, . . . , qn) a matrix composed of the columns q1, . . . , qn forming an orthonormal basis of Kn determined by the Lanczos process
τAQn = Qn Tn + δn+1 qn+1 eT n
with q1 = τr0/τr0V . We get (τAn) = Qn T n Q∗
n, = 0, 1, . . . and the matching moments condition
e∗1 T n e1 = q∗1(τA)q1, l = 0, 1, . . . , 2n − 1 ,
Z. Strakoš 23
Tn =

is the Jacobi matrix of the orthogonalization coefficients and the CG method is formulated by
Tnyn = τr0V e1, xn = x0 + Qnyn , xn ∈ V .
Z. Strakoš 24
3 Spectral moment problem
Since τA is bounded and self-adjoint, its spectral decomposition is written using the Riemann-Stieltjes integral as
τA =
∫ λU
λL
λ dEλ ,
The spectral function Eλ of τA represents a family of orthogonal projections which is non-decreasing, i.e., if µ > ν , then the subspace onto which Eµ
projects contains the subspace into which Eν projects; EλL
= 0, EλU = I ;
Eλ′ = Eλ .
The values of λ where Eλ increases by jumps represent the eigenvalues of τA , with the eigenvectors satisfying
τAz = λz, z ∈ V .
For the (finite) Jacobi matrix Tn we can analogously write
Tn = n∑
j=1
∫ λU
λL
(n) j , = 0, 1, . . . , 2n − 1 ,
where dω(λ) = q∗1dEλq1 represents the Riemann-Stieltjes distribution function associated with τA and q1 . The distribution function ω(n)(λ) approximates ω(λ) in the sense of the nth Gauss-Christoffel quadrature; Gauss (1814), Jacobi (1826), Christoffel (1858).
Any condition number approach should be checked against this.
Z. Strakoš 26
4 Galerkin discretization and matrix CG
Consider an N -dimensional subspace Vh ⊂ V with the duality pairing, the inner product and the Riesz map as above. Then the restriction to Vh
gives an approximation xh ∈ Vh to x ∈ V ,
a(xh, v) = b, v for all v ∈ Vh .
As above, the bilinear form a (·, ·) : Vh × Vh → R defines the operator Ah : Vh → V #
h such that
Ahxh − b, v = 0 for all v ∈ Vh .
With restricting b to Vh , i.e. bh, v ≡ b, v for all v ∈ Vh , we get the operator form
Ahxh = bh, xh ∈ Vh, bh ∈ V # h , Ah : Vh → V #
h .
(h) N ) be the basis of Vh , Φ#
h = (φ (h)# 1 , . . . , φ
(h)# N )
the basis of its dual V # h . Using the coordinates in Φh and Φ#
h ,
,
,
b → b ,
we get with xn = Φh xn , pn = Φh pn , rn = Φ# h rn
Z. Strakoš 28
r0 = b − Ax0, solve Mz0 = r0, p0 = z0
For n = 1, . . . , nmax
p∗ n−1Apn−1
xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is satisfied
rn = rn−1 − αn−1Apn−1
zn = M−1rn , solve for zn
βn = z∗nrn
5 Algebraic preconditioning - main point
Algebraic preconditioning can be viewed as CG (with M = I) applied to
Bw = c
h , c = L−1 h b, x = L−∗
h w , Mh = LhL ∗ h .
Observation:
The associated Hilbert space formulation of CG in Vh corresponds to the transformation of the bases
Φt = Φh L−∗ h , Φ#
t = Φ# h L∗
B ≡ (Bij) = (
and the right hand side c = Φ#
h L∗ h b .
The locality of supports and sparsity is on purpose lost.
Z. Strakoš 31
Theorem
λ1 . Then
k = s +
x − xnA ≤ x − x0A .
Assuming exact arithmetic, this statement is correct. In the context of CG computations it makes, however, no sense.
Z. Strakoš 32
0 20 40 60 80 100
10 −15
10 −10
10 −5
10 0
Z. Strakoš 33
6 Clusters = fast convergence ?
in exact arithmetic, CG applied to a matrix with the spectrum consisting of t tight clusters of eigenvalues does not find, in general, a reasonably close approximation to the solution within t steps.
Finite precision arithmetic CG computation can be viewed as exact CG applied to a larger matrix with the individual original eigenvalues replaced by tight clusters.
Finite precision arithmetic CG computation with a matrix having t isolated well separated eigenvalues may require for reaching a reasonable approximate solution a significantly larger number of steps than t .
Z. Strakoš 34
6 Analysis of FP CG behaviour
First k steps of finite precision CG (Lanczos) is analyzed as exact CG (Lanczos) for a different, possibly much larger problem. The central point is the computed Jacobi matrix.
Z. Strakoš 35
Z. Strakoš 36
Nevanlinna, 1993, Section 1.8
“... in the early sweeps the convergence is very rapid but slows down, this is the sublinear behavior. The convergence then settles down to a roughly constant linear rate ... Towards the end new speed may be picked up again, corresponding to the superlinear behavior. ...
In practice all phases need not be identifiable, nor need they appear only once and in this order.”
Z. Strakoš 37
6 An example
Consider a fixed point iteration in the Banach space with the bounded operator B ,
u = B u + f , u(+1) = B u() + f .
Using polynomial acceleration we can do better,
u − u() = p(B) (u − u(0)) .
Separating the operator polynomial from the initial error, it seems natural to minimize the appropriate norm of the operator polynomial
p(B) subject to p(0) = 1 .
Z. Strakoš 38
6 A drawback of oversimplification
Consider now a numerical (finite dimensional) approximation Bh of the bounded operator B . Then
p(B) − p(Bh) = 1
p(λ) [(λI − B)−1 − (λI − Bh)−1] dλ .
This is considered a sufficient argument why to study algebraic iterations directly in abstract (infinite dimensional) Banach spaces.
At this level of abstraction, many challenges which one must deal with in studying finite computational processes at finite dimensional spaces are simply not visible. Abstract Banach space settings make things certainly easier. They, however, may not see the trouble and do not answer questions about the cost of algebraic computations (and the cost of the whole solution process).
Z. Strakoš 39
Restrictive assumptions must be admitted.
Interpretations should be rigorous.
Challenging common views should be accepted as a crucial part of scientific methodology.
Analysis of the finite dimensional algebraic problem can not be done on the model problem level using functional analysis in infinite dimensional Banach or Hilbert spaces. Such approaches do not see discretisation and computation in an adequate way. On the other hand, analysis of the finite dimensional algebraic problem must “do justice” to the original (non-algebraic) problem as much as possible.
Z. Strakoš 40
Knupp and Salari, 2003:
“There may be incomplete iterative convergence (IICE) or round-off-error that is polluting the results. If the code uses an iterative solver, then one must be sure that the iterative stopping criteria is sufficiently tight so that the numerical and discrete solutions are close to one another. Usually in order-verification tests, one sets the iterative stopping criterion to just above the level of machine precision to circumvent this possibility.”
In solving tough problems this can not be afforded.
How to measure the algebraic error ?
Z. Strakoš 41
Discrete (piecewise polynomial) FEM approximation xh = Φh xn .
If xn is known exactly, then xh is approximated over the given domain as the (exact) linear combination of the local basis functions.
However, apart from trivial cases, xn that supply the global information is not known exactly. Then
x − x (n) h
Z. Strakoš 42
7 Local discretisation
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Theorem (Galerkin orthogonality)
∇(x − x (n) h )2 = ∇(x − xh)2 + ∇(xh − x
(n) h )2
.
What is the distribution of the algebraic error in the functional space ?
Z. Strakoš 44
−1 −0.5 0 0.5 1 −1
0
1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0
1
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
x 10 −4
Exact solution x (left) and the discretisation error x − xh (right).
Z. Strakoš 45
−1 −0.5 0 0.5 1 −1
0
1
−6
−4
−2
0
2
4
0
1
−6
−4
−2
0
2
4
6
x 10 −4
Algebraic error xh − x (n) h (left) and the total error x − x
(n) h (right). Here
Z. Strakoš 46
8 Conclusions
Patrick J. Roache’s book Validation and Verification in Computational Science, 1998, p. 387:
“With the often noted tremendous increases in computer speed and memory, and with the less often acknowledged but equally powerful increases in algorithmic accuracy and efficiency, a natural question suggest itself. What are we doing with the new computer power? with the new GUI and other set-up advances? with the new algorithms? What should we do? ... Get the right answer.”
This requires to consider modelling, discretisation, analysis, and computation tightly coupled parts of a single solution process and to avoid unjust simplifications.
Analysis is much needed.
References
J. Liesen and ZS, Krylov Subspace Methods, Principles and Analysis. Oxford University Press (2012)
T. Gergelits, and ZS, Composite convergence bounds based on Chebyshev polynomials and finite precision conjugate gradient computations, Numerical Algorithms (2013) (DOI 10.1007/s11075-013-9713-z)
J. Papez, J. Liesen and ZS, On distribution of the discretization and algebraic error in numerical solution of partial differential equations, Preprint MORE/2012/03, (2013)
M. Arioli, J. Liesen, A. Miedlar, and ZS, Interplay between discretization and algebraic computation in adaptive numerical solution of elliptic PDE problems, to appear in GAMM Mitteilungen (2013)
J. Málek and ZS, From functional analysis through finite elements to iterative methods, or there and back again. In preparation.
Z. Strakoš 48
Algebraic iterative computations
Outline
2 CG in Hilbert spaces
2 Optimality conditions
3 Matrix CG representation redof the first n steps
6 CG, outliers and condition numbers
6 Liesen, S (2012); Gergelits, S (2013)
6 Clusters = redfast convergence ?
6 Delay of convergence due to rounding errors
6 Functional analytic point of view
6 An example
6 In analysis red(heuristics are fine !)
7 Algebraic error in numerical PDEs
7 Local discretisation and global computation
7 Local discretisation
7 L-shape model, Pape, Liesen, ZS (2013)
7 L-shape model, Pape, Liesen, ZS (2013)
8 Conclusions