There and back again.
Zdenek Strakoš
Necas Center for Mathematical Modeling Charles University in Prague
and Czech Academy of Sciences
http://www.karlin.mff.cuni.cz/˜strakos
Z. Strakoš 2
Ivo Marek (K. itný), (1974, later Teubner)
Functional theory of matrices should be build up as a theory of
linear operators in Banach spaces. In this way, the matrix theory
can be seen as a finite dimensional functional analysis, with the
add of complex analysis. The finite dimension does not necessarily
make things simpler.
Vlastimil Pták, IMA Preprint #1405 (1996):
A simple identity for Krylov sequences is used to study the
relationship between spectral decompositions, orthogonal
polynomials and the Lanczos algorithm.
ZS, Summer School SW and Algorithms of Numerical Math., Plzen,
1993:
Lanczos algorithm, orthogonal polynomials and continued fractions.
A self-study perspective. Motivated by the work on G. H. Golub and
ZS, Estimates in quadratic formulas, Numer. Alg (1994).
Z. Strakoš 3
Functional analysis and iterative methods
R. C. Kirby, SIREV (2010):
“We examine condition numbers, preconditioners and iterative
methods for FEM discretization of coercive PDEs in the context of
the solvability result, the Lax-Milgram lemma.
Moreover, useful insight is gained as to the relationship between
Hilbert space and matrix condition numbers, and translating Hilbert
space fixed point iterations into matrix computations provides new
ways of motivating and explaining some classic iteration schemes. [
... ] This paper is [ ... ] intending to bridge the functional
analysis techniques common in finite elements and the linear
algebra community.”
Z. Strakoš 4
K. A. Mardal and R. Winther, NLAA (2011):
“The main focus will be on an abstract approach to the construction
of preconditioners for symmetric linear systems in a Hilbert space
setting [ ... ] The discussion of preconditioned Krylov space
methods for the continuous systems will be a starting point for a
corresponding discrete theory.
By using this characterization it can be established that the
conjugate gradient method converges [ ... ] with a rate which can
be bounded by the condition number [ ... ] However, if the operator
has a few eigenvalues far away from the rest of the spectrum, then
the estimate is not sharp. In fact, a few ‘bad eigenvalues’ will
have almost no effect on the asymptotic convergence of the
method.”
Z. Strakoš 5
O. Axelsson and J. Karátson, Numer. Alg. (2009):
“To preserve sparsity, the arising system is normally solved using
an iterative solution method, commonly a preconditioned conjugate
gradient method [ ... ] the rate of convergence depends in general
on a generalized condition number of the preconditioned operator [
... ]
if the two operators (original and preconditioner) are equivalent
then the corresponding PCG method provides mesh independent linear
convergence [ ... ]
if the two operators (original and preconditioner) are
compact-equivalent then the corresponding PCG method provides mesh
independent superlinear convergence.”
Z. Strakoš 6
R. Hipmair, CMA (2006):
“There is a continuous operator equation posed in
infinite-dimensional spaces that underlies the linear system of
equations [ ... ] awareness of this connection is key to devising
efficient solution strategies for the linear systems.
Operator preconditioning is a very general recipe [ ... ]. It is
simple to apply, but may not be particularly efficient, because in
case of the [ condition number ] bound of Theorem 2.1 is too large,
the operator preconditioning offers no hint how to improve the
preconditioner. Hence, operator preconditioner may often achieve [
... ] the much-vaunted mesh independence of the preconditioner, but
it may not perform satisfactorily on a given mesh.”
Z. Strakoš 7
Functional analysis and iterative methods
V. Faber, T. Manteuffel and S. V. Parter, Adv. in Appl. Math.
(1990):
“For a fixed h, using a preconditioning strategy based on an
equivalent operator may not be superior to classical methods [ ...
] Equivalence alone is not sufficient for a good preconditioning
strategy. One must also choose an equivalent operator for which the
bound is small.
There is no flaw in the analysis, only a flaw in the conclusions
drawn from the analysis [ ... ] asymptotic estimates ignore the
constant multiplier. Methods with similar asymptotic work estimates
may behave quite differently in practice.”
Z. Strakoš 8
Operator preconditioning:
Arnold, Falk, and Winther (1997, 1997); Steinbach and Wendland
(1998); Mc Lean and Tran (1997); Christiansen and Nédélec (2000,
2000); Powell and Silvester (2003); Elman, Silvester, and Wathen
(2005) ...
CG in Hilbert spaces:
Hayes (1954); Daniel (1967, 1967); ... , Fortuna (1979); Ernst
(2000); Axelsson and Karatson (2002); Glowinski (2003); .... ;
Zulehner (2011); Günnel, Herzog, and Sachs (2012); ...
Vorobyev (1958, 1965)
Z. Strakoš 9
Algebraic iterative computations
Computational cost of finding sufficiently accurate approximation
to the exact solution heavily depends on
the underlying real world problem, the mathematical model, on its
discretization.
Construction and analysis of computational algorithms should
respect that.
Evaluation of accuracy and of the computational cost must take into
account algebraic errors, including rounding errors.
Z. Strakoš 10
From Fermat, Descartes, Euler (principal axis theorem) ... Cauchy,
Jacobi ... Fredholm ... through Stieltjes to Hilbert, Schmidt,
Lebesgue, Hellinger and Toeplitz, Wintner, Stone, von Neumann,
...
L. A. Stein, Highlights in the history of spectral theory, Amer.
Math. Monthly (1973)
In connection to Krylov subspace methods (CG) and the
(Chebyshev-Markov-) Stieltjes moment problem (see Stieltjes
(1894))
Y. V. Vorobyev, Method of moments in Applied mathematics,
(1958,1965)
Z. Strakoš 11
3. Spectral theory and the moment problem formulation
4. Galerkin discretization and matrix CG
5. Algebraic preconditioning as the transformation of the
discetization basis
6. CG computations, outliers and condition numbers
7. Distribution of errors, a-posteriori analysis and stopping
criteria
8. Conclusions
Z. Strakoš 12
1 Basic setting
Let V be a real infinite dimensional Hilbert space with the inner
product
(·, ·)V : V × V → R, the associated norm · V ,
V # be the dual space of bounded (continuous) linear functionals on
V with the duality pairing
·, · : V # × V → R .
For each f ∈ V # there exists a unique τf ∈ V such that
f, v = (τf, v)V for all v ∈ V .
In this way the inner product (·, ·)V determines the Riesz
map
τ : V # → V .
Z. Strakoš 13
1 Infinite dimensional problem
Let a(·, ·) = V × V → R be a bounded and coercive bilinear form.
For a fixed u ∈ V we can see a(u, ·) as the bounded linear
functional on V and we can write it as
Au ≡ a(u, ·) ∈ V # , i.e. ,
This defines the bounded and coercive operator
A : V → V #, inf u∈V, uV =1
Au, u = α > 0, A = C .
The Lax-Milgram theorem ensures that for any b ∈ V # there exist a
unique solution x ∈ V of the following problem
a(x, v) = b, v for all v ∈ V .
Z. Strakoš 14
Using the Riesz map,
(τAx − τb, v)V = 0 for all v ∈ V .
Clearly, the Riesz map τ can be interpreted as transformation of
the original problem Ax = b in V # into the equation in V
τAx = τb, τA : V → V, x ∈ V, τb ∈ V ,
which is commonly (and inaccurately) called preconditioning.
Z. Strakoš 15
1 Optimal preconditioning for a(·, ·) symmetric
If a(·, ·) is (in addition) symmetric, then A is self adjoint wrt
to the duality pairing. Consider the special choice of the inner
product
(u, v)V = (u, v)a ≡ a(u, v) = Au, v for all u, v ∈ V .
then the definition of the associated Riesz map τa gives
(u, v)a = Au, v = (τa(Au), v)a for all u, v ∈ V .
Therefore
τa(Au) = u, for all u ∈ V , i.e, τa = A−1 : V # → V ,
and τAx = τb will reduce to x = A−1b. Therefore the motivation of
preconditioning is to take, in general, the inner product (·, ·)V
as close as possible to (·, ·)a. Computational cost tradeoff.
Z. Strakoš 16
2 CG in Hilbert spaces
As τA : V → V , we can form for g ∈ V the Krylov sequence
g, τAg, (τA)2g, . . . in V
and define Krylov subspace methods in the Hilbert space operator
setting. Here we will do CG. Our goal is to construct a method for
solving the (operator) equation
Ax = b, x ∈ V, b ∈ V #
such that with r0 = b −Ax0 ∈ V # the approximations xn to the
solution x , n = 1, 2, . . . belong to the Krylov manifolds in
V
xn ∈x0 + Kn(τA, τr0) ≡
Z. Strakoš 17
2 Optimality conditions
z∈x0+Kn
x − za .
That is equivalent to the Galerkin orthogonality condition
b −Axn, w = rn, w = 0 for all w ∈ Kn ≡ Kn(τA, τr0) ,
and to finding the minimum
F (xn) = min z∈x0+Kn
F (z) , F (z) = 1
2 Az, z − b, z, z ∈ V .
Starting from p0 = τr0 ∈ V , we will construct a sequence of
direction vectors p0, p1, . . . , and a sequence of scalars α0, α1,
. . . such that
xn = xn−1 + αn−1pn−1, n = 1, 2, . . .
Z. Strakoš 18
2 Local and global minimization
Here αn−1 ensures the minimization of F (z) along the line z(α) =
xn−1 + αpn−1. If we moreover satisfy
Kn = span{p0, p1, . . . , pn−1} and pi ⊥a pj , i 6= j ,
then the one-dimensional line minimizations along the given n
individual lines will be equivalent to minimization of the
functional F (z) over the whole n dimensional manifold x0 + Kn .
Then
x − x0 =
n−1∑
=0
αp + (x − xn)
represents the orthogonal expansion of the initial error x − x0
along the direction vectors p0, . . . , pn−1 with (x − xn) ⊥a Kn
.
Z. Strakoš 19
2 Search vectors (Hackbusch (1994))
It is important to realize that the steepest descent direction in
minimizing F (z) depends on the inner product (·, ·)V . It
minimizes the directional derivative,
δF (x; d) = lim ν→0
F (x + νd) − F (x)
ν = −r, d .
Here r = b −Ax ∈ V # while d ∈ V . In order to determine d, we use
the Riesz map τ associated with the inner product (·, ·)V , which
immediately gives
−r, d = − (τr, d)V and d = τr .
The choice pn = τrn + βnpn−1
with the condition pn ⊥a pn−1 will naturally deliver the global
orthogonality (and, consequently, the global minimization).
Z. Strakoš 20
r0 = b −Ax0 ∈ V #, p0 = τr0 ∈ V
For n = 1, 2, . . . , nmax
αn−1 = rn−1, τrn−1
Apn−1, pn−1 =
(τrn−1, τrn−1)V
(τApn−1, pn−1)V
xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is
satisfied
rn = rn−1 − αn−1Apn−1
βn = rn, τrn
3 Spectral theory and the moment problem
Using the orthogonal projection En onto Kn with respect to the
inner product (·, ·)V , consider the orthogonally restricted
operator
τAn : Kn → Kn , τAn ≡ En (τA) En ,
by formulating the following equalities
τAn (τr0) ≡ τA (τr0) ,
...
(τAn)n−1 τr0 = τAn ((τA)n−2 τr0) ≡ (τA)n−1 τr0 ,
(τAn)n τr0 = τAn ((τA)n−1 τr0) ≡ En (τA)n τr0 .
Z. Strakoš 22
3 Spectral theory and the moment problem
The n-dimensional approximation τAn of τA matches the first 2n
moments
((τAn)τr0, τr0)V = ((τA)τr0, τr0)V , = 0, 1, . . . , 2n − 1 .
Denote symbolically Qn = (q1, . . . , qn) a matrix composed of the
columns q1, . . . , qn forming an orthonormal basis of Kn
determined by the Lanczos process
τAQn = Qn Tn + δn+1 qn+1 eT n
with q1 = τr0/τr0V . We get (τAn) = Qn T n Q∗
n, = 0, 1, . . . and the matching moments condition
e∗1 T n e1 = q∗1(τA)q1, l = 0, 1, . . . , 2n − 1 ,
Z. Strakoš 23
Tn =
is the Jacobi matrix of the orthogonalization coefficients and the
CG method is formulated by
Tnyn = τr0V e1, xn = x0 + Qnyn , xn ∈ V .
Z. Strakoš 24
3 Spectral moment problem
Since τA is bounded and self-adjoint, its spectral decomposition is
written using the Riemann-Stieltjes integral as
τA =
∫ λU
λL
λ dEλ ,
The spectral function Eλ of τA represents a family of orthogonal
projections which is non-decreasing, i.e., if µ > ν , then the
subspace onto which Eµ
projects contains the subspace into which Eν projects; EλL
= 0, EλU = I ;
Eλ′ = Eλ .
The values of λ where Eλ increases by jumps represent the
eigenvalues of τA , with the eigenvectors satisfying
τAz = λz, z ∈ V .
For the (finite) Jacobi matrix Tn we can analogously write
Tn = n∑
j=1
∫ λU
λL
(n) j , = 0, 1, . . . , 2n − 1 ,
where dω(λ) = q∗1dEλq1 represents the Riemann-Stieltjes
distribution function associated with τA and q1 . The distribution
function ω(n)(λ) approximates ω(λ) in the sense of the nth
Gauss-Christoffel quadrature; Gauss (1814), Jacobi (1826),
Christoffel (1858).
Any condition number approach should be checked against this.
Z. Strakoš 26
4 Galerkin discretization and matrix CG
Consider an N -dimensional subspace Vh ⊂ V with the duality
pairing, the inner product and the Riesz map as above. Then the
restriction to Vh
gives an approximation xh ∈ Vh to x ∈ V ,
a(xh, v) = b, v for all v ∈ Vh .
As above, the bilinear form a (·, ·) : Vh × Vh → R defines the
operator Ah : Vh → V #
h such that
Ahxh − b, v = 0 for all v ∈ Vh .
With restricting b to Vh , i.e. bh, v ≡ b, v for all v ∈ Vh , we
get the operator form
Ahxh = bh, xh ∈ Vh, bh ∈ V # h , Ah : Vh → V #
h .
(h) N ) be the basis of Vh , Φ#
h = (φ (h)# 1 , . . . , φ
(h)# N )
the basis of its dual V # h . Using the coordinates in Φh and
Φ#
h ,
,
,
b → b ,
we get with xn = Φh xn , pn = Φh pn , rn = Φ# h rn
Z. Strakoš 28
r0 = b − Ax0, solve Mz0 = r0, p0 = z0
For n = 1, . . . , nmax
p∗ n−1Apn−1
xn = xn−1 + αn−1pn−1 , stop when the stopping criterion is
satisfied
rn = rn−1 − αn−1Apn−1
zn = M−1rn , solve for zn
βn = z∗nrn
5 Algebraic preconditioning - main point
Algebraic preconditioning can be viewed as CG (with M = I) applied
to
Bw = c
h , c = L−1 h b, x = L−∗
h w , Mh = LhL ∗ h .
Observation:
The associated Hilbert space formulation of CG in Vh corresponds to
the transformation of the bases
Φt = Φh L−∗ h , Φ#
t = Φ# h L∗
B ≡ (Bij) = (
and the right hand side c = Φ#
h L∗ h b .
The locality of supports and sparsity is on purpose lost.
Z. Strakoš 31
Theorem
λ1 . Then
k = s +
x − xnA ≤ x − x0A .
Assuming exact arithmetic, this statement is correct. In the
context of CG computations it makes, however, no sense.
Z. Strakoš 32
0 20 40 60 80 100
10 −15
10 −10
10 −5
10 0
Z. Strakoš 33
6 Clusters = fast convergence ?
in exact arithmetic, CG applied to a matrix with the spectrum
consisting of t tight clusters of eigenvalues does not find, in
general, a reasonably close approximation to the solution within t
steps.
Finite precision arithmetic CG computation can be viewed as exact
CG applied to a larger matrix with the individual original
eigenvalues replaced by tight clusters.
Finite precision arithmetic CG computation with a matrix having t
isolated well separated eigenvalues may require for reaching a
reasonable approximate solution a significantly larger number of
steps than t .
Z. Strakoš 34
6 Analysis of FP CG behaviour
First k steps of finite precision CG (Lanczos) is analyzed as exact
CG (Lanczos) for a different, possibly much larger problem. The
central point is the computed Jacobi matrix.
Z. Strakoš 35
Z. Strakoš 36
Nevanlinna, 1993, Section 1.8
“... in the early sweeps the convergence is very rapid but slows
down, this is the sublinear behavior. The convergence then settles
down to a roughly constant linear rate ... Towards the end new
speed may be picked up again, corresponding to the superlinear
behavior. ...
In practice all phases need not be identifiable, nor need they
appear only once and in this order.”
Z. Strakoš 37
6 An example
Consider a fixed point iteration in the Banach space with the
bounded operator B ,
u = B u + f , u(+1) = B u() + f .
Using polynomial acceleration we can do better,
u − u() = p(B) (u − u(0)) .
Separating the operator polynomial from the initial error, it seems
natural to minimize the appropriate norm of the operator
polynomial
p(B) subject to p(0) = 1 .
Z. Strakoš 38
6 A drawback of oversimplification
Consider now a numerical (finite dimensional) approximation Bh of
the bounded operator B . Then
p(B) − p(Bh) = 1
p(λ) [(λI − B)−1 − (λI − Bh)−1] dλ .
This is considered a sufficient argument why to study algebraic
iterations directly in abstract (infinite dimensional) Banach
spaces.
At this level of abstraction, many challenges which one must deal
with in studying finite computational processes at finite
dimensional spaces are simply not visible. Abstract Banach space
settings make things certainly easier. They, however, may not see
the trouble and do not answer questions about the cost of algebraic
computations (and the cost of the whole solution process).
Z. Strakoš 39
Restrictive assumptions must be admitted.
Interpretations should be rigorous.
Challenging common views should be accepted as a crucial part of
scientific methodology.
Analysis of the finite dimensional algebraic problem can not be
done on the model problem level using functional analysis in
infinite dimensional Banach or Hilbert spaces. Such approaches do
not see discretisation and computation in an adequate way. On the
other hand, analysis of the finite dimensional algebraic problem
must “do justice” to the original (non-algebraic) problem as much
as possible.
Z. Strakoš 40
Knupp and Salari, 2003:
“There may be incomplete iterative convergence (IICE) or
round-off-error that is polluting the results. If the code uses an
iterative solver, then one must be sure that the iterative stopping
criteria is sufficiently tight so that the numerical and discrete
solutions are close to one another. Usually in order-verification
tests, one sets the iterative stopping criterion to just above the
level of machine precision to circumvent this possibility.”
In solving tough problems this can not be afforded.
How to measure the algebraic error ?
Z. Strakoš 41
Discrete (piecewise polynomial) FEM approximation xh = Φh xn
.
If xn is known exactly, then xh is approximated over the given
domain as the (exact) linear combination of the local basis
functions.
However, apart from trivial cases, xn that supply the global
information is not known exactly. Then
x − x (n) h
Z. Strakoš 42
7 Local discretisation
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Theorem (Galerkin orthogonality)
∇(x − x (n) h )2 = ∇(x − xh)2 + ∇(xh − x
(n) h )2
.
What is the distribution of the algebraic error in the functional
space ?
Z. Strakoš 44
−1 −0.5 0 0.5 1 −1
0
1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0
1
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
x 10 −4
Exact solution x (left) and the discretisation error x − xh
(right).
Z. Strakoš 45
−1 −0.5 0 0.5 1 −1
0
1
−6
−4
−2
0
2
4
0
1
−6
−4
−2
0
2
4
6
x 10 −4
Algebraic error xh − x (n) h (left) and the total error x − x
(n) h (right). Here
Z. Strakoš 46
8 Conclusions
Patrick J. Roache’s book Validation and Verification in
Computational Science, 1998, p. 387:
“With the often noted tremendous increases in computer speed and
memory, and with the less often acknowledged but equally powerful
increases in algorithmic accuracy and efficiency, a natural
question suggest itself. What are we doing with the new computer
power? with the new GUI and other set-up advances? with the new
algorithms? What should we do? ... Get the right answer.”
This requires to consider modelling, discretisation, analysis, and
computation tightly coupled parts of a single solution process and
to avoid unjust simplifications.
Analysis is much needed.
References
J. Liesen and ZS, Krylov Subspace Methods, Principles and Analysis.
Oxford University Press (2012)
T. Gergelits, and ZS, Composite convergence bounds based on
Chebyshev polynomials and finite precision conjugate gradient
computations, Numerical Algorithms (2013) (DOI
10.1007/s11075-013-9713-z)
J. Papez, J. Liesen and ZS, On distribution of the discretization
and algebraic error in numerical solution of partial differential
equations, Preprint MORE/2012/03, (2013)
M. Arioli, J. Liesen, A. Miedlar, and ZS, Interplay between
discretization and algebraic computation in adaptive numerical
solution of elliptic PDE problems, to appear in GAMM Mitteilungen
(2013)
J. Málek and ZS, From functional analysis through finite elements
to iterative methods, or there and back again. In
preparation.
Z. Strakoš 48
Functional analysis and iterative methods
Functional analysis and iterative methods
Functional analysis and iterative methods
Functional analysis and iterative methods
Functional analysis and iterative methods
Functional analysis and iterative methods
Algebraic iterative computations
Outline
2 CG in Hilbert spaces
2 Optimality conditions
3 Spectral theory and the moment problem
3 Spectral theory and the moment problem
3 Matrix CG representation redof the first n steps
3 Spectral moment problem
3 Spectral moment problem
5 Algebraic preconditioning - main point
5 Algebraic preconditioning - main point
6 CG, outliers and condition numbers
6 Liesen, S (2012); Gergelits, S (2013)
6 Clusters = redfast convergence ?
6 Delay of convergence due to rounding errors
6 Functional analytic point of view
6 An example
6 In analysis red(heuristics are fine !)
7 Algebraic error in numerical PDEs
7 Local discretisation and global computation
7 Local discretisation
7 L-shape model, Pape, Liesen, ZS (2013)
7 L-shape model, Pape, Liesen, ZS (2013)
8 Conclusions