Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70...
Transcript of Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70...
Bi-CG, CGS, Bi-CGSTABand implementation aspects
Henk van der Vorst
January 8, 2007, Francqui Masterclass – p. 1/??
Krylov subspace
Standard iteration: x(i) = x(i−1) + r(i−1)
Take x(0) = 0, then x(i) = r(0) + r(1) + . . . + r(i−1)
Hence x(i) =∑
k(I − A)kr(0)
This shows that x(i) can be expressed as a sum
of powers of A times r(0)
x(i) ∈ span{r(0), Ar(0), . . . , A(i−1)r(0)} ≡ Ki(A; r(0))
Krylov subspace of dimension i generated by A and r(0)
general x(i) ∈ Ki(A; r(0)) can be written as x(i) = Qi−1(A)r(0)
corresponding residual r(i) = b − Ax(i) = (I − AQi−1(A))r(0)
Hence r(i) = Pi(A)r(0), with Pi(0) = 1
January 8, 2007, Francqui Masterclass – p. 2/??
The Petrov-Galerkin approach
The usual approach is to construct an xi, such that
ri ⊥ Ki(AT ; s0), s0 = r0, or AT r0, or random, or ...
Can be shown that this can be done by construction of biorthog onal
basis {vj} for Ki(A; r0) and {wj} for Ki(AT ; s0), with vTj wk = 0
these two sets of basis vectors can be generated by three term recurrences
Leads to AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di
We look for xi ∈ Ki(A; r0), which means xi = Viyi, such that
W Ti (b − AViyi) = 0, and hence W T
i ViTi,iyi = W Ti b, or:
DiTi,iyi = bi, solution as for CG: BiCG
many practical problems: breakdowns; irregular convergen ce
January 8, 2007, Francqui Masterclass – p. 3/??
Convergence behavior Bi-CG
0 10 20 30 40 50 60 70 80 90−14
−12
−10
−8
−6
−4
−2
0
2
4comparison of Bi−CG and CGS for definite A
iteration number
10lo
g(re
sidu
al)
dots: CGS- ri, line: Bi-CG
January 8, 2007, Francqui Masterclass – p. 4/??
Convergence behavior Bi-CG (2)
0 50 100 150−6
−4
−2
0
2
4
6comparison of Bi−CG and CGS for indefinite A
iteration number
10lo
g(re
sidu
al)
dots: CGS- ri, line: Bi-CG
January 8, 2007, Francqui Masterclass – p. 5/??
Bi-CG and variants
with short recurrencies we can construct
xi such that ri ⊥ Ki(AT ; s0)
• 2 MV’s per iteration ( A and AT )
• CG-like computational overhead (twice!)
• CG-like memory requirements (twice!)
• not optimal in Ki(A; r0)
• more iterations than GMRES: ‖AxBiCGi − b‖2 ≥ ‖AxGMRES
i − b‖2
The choice of s0 gives freedown, e.g., r0, AT r0, random
January 8, 2007, Francqui Masterclass – p. 6/??
variants of Bi-CG: QMR
Bi-orthogonalization leads to:
AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di
Try GMRES idea, that is try to minimize ‖b − Axi‖2 for xi ∈ Ki(A; r0)
Since xi = Viy and b = µv1,
we have ‖b − Axi‖2 = ‖b − AVixi‖2 = ‖µVi+1e1 − Vi+1Ti+1,iy‖2
In the case of GMRES, we had Vi+1 orthogonal, so could skip Vi+1
Now we pretend as if the Bi-CG Vi+1 is orthogonal, and we minimize:
‖µe1 − Ti+1,iy‖2 NOT minimum residual: Quasi-Minumum Residual
Solve small system as in GMRES (with Givens rotations)
QMR (Freund & Nachtigal, 1991)
- slightly better than Bi-CG - more smooth convergence - more iterations than GMRES
January 8, 2007, Francqui Masterclass – p. 7/??
variants of Bi-CG: CGS
Basis for Km(A; r0) and Km(AT ; s0) with same 3-term recursions:
ri = Ri(A)r0, and also: ri = Ri(AT )s0
Bi-CG co efficients through innerproducts like (ri, ri)
Sonneveld (1984):
(ri, ri) = (Ri(A)r0, Ri(AT )s0) = (Ri(A)Ri(A)r0, s0)
rj not necessary!!; no operations with AT
However: now we need recursions for R2i (A)r0 and other vectors
By the way: would be nice to have ri = R2i (A)r0 and corresponding xi; why?
January 8, 2007, Francqui Masterclass – p. 8/??
Bi-CG Algorithm
r0 = b − Ax0, r0 arbitraryfor i = 1, 2, 3, ...
ρi−1 = (ri−1, ri−1)if i = 1
p1 = r0; p1 = r1
elseβi−1 = ρi−1/ρi−2; pi = ri−1 + βi−1pi−1
pi = ri−1 + βi−1pi−1
qi = Api; qi = AT pi
αi = ρi−1/(pi, qi)xi = xi−1 + αipi
ri = ri−1 − αiqi
ri = ri−1 − αiqi
end
January 8, 2007, Francqui Masterclass – p. 9/??
Bi-CG recursions
Focus on recursions in Bi-CG
pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiqi = ri−1 − αiApi
pi and ri can be expressed as: ri = Ri(A)r0 and pi = Pi−1(A)r0
We are interested in ri = R2i (A)r0
From recursion for ri: Ri(A) = Ri−1(A) − αiAPi−1(A)
and from pi we have: Pi−1(A) = Ri−1(A) + βi−1Pi−2(A)
Squaring the expression for Ri(A) gives:
R2i (A) = R2
i−1(A) + α2i A
2P 2i−1(A) − 2αiARi−1(A)Pi−1(A)
Now we need also recursions for P 2i−1(A) and Ri−1(A)Pi−1(A)
January 8, 2007, Francqui Masterclass – p. 10/??
Bi-CG recursions (2)
recursions for P 2i−1(A) and Ri−1(A)Pi−1(A):
pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiApi with ri = Ri(A)r0 and pi = Pi−1(A)r0
Squaring the expression for pi gives:
P 2i−1(A) = R2
i−1(A) + β2i−1P
2i−2(A) + 2βi−1Ri−1(A)Pi−2(A)
Continuing in this fashion leads to recursions for:
ri ≡ R2i (A)r0 (and for corresponding xi)
pi ≡ P 2i−1(A)r0
ui ≡ Ri−1(A)Pi−1(A)r0 and
qi−1 ≡ Ri−1(A)Pi−2(A)r0
CGS
January 8, 2007, Francqui Masterclass – p. 11/??
CGS Algorithm
r0 = b − Ax0, r arbitraryfor i = 1, 2, 3, ...
ρi−1 = (r, ri−1)if i = 1
u1 = r0; p1 = u1
elseβi−1 = ρi−1/ρi−2; ui = ri−1 + βi−1pi−1
pi = ui + βi−1(qi−1 + βi−1pi−1)Solve p from Kp = pi
vi = Apαi = ρi−1/(r, vi)qi = ui − αivi
Solve z from Kz = ui + qi
xi = xi−1 + αizri = ri−1 − αiAz
end
January 8, 2007, Francqui Masterclass – p. 12/??
Convergence behavior Bi-CG (2)
0 50 100 150−6
−4
−2
0
2
4
6comparison of Bi−CG and CGS for indefinite A
iteration number
10lo
g(re
sidu
al)
dots: CGS- ri, line: Bi-CG
January 8, 2007, Francqui Masterclass – p. 13/??
variants of Bi-CG: CGS
• CGS (Sonneveld, 1989)
- 2 MV’s in BiCG can be used to apply BiCG twice: CGS
- same costs as BiCG
- often twice as fast
- very irregular convergence
- often faster than GMRES
- more MV’s than GMRES ( but far less overhead)
January 8, 2007, Francqui Masterclass – p. 14/??
Convergence behavior CGS
0 20 40 60 80 100 120 140 160 180−15
−10
−5
0
5
10comparison exact error and CGS for indefinite A
iteration number
10lo
g(re
sidu
al)
dots: CGS- ri, line: true residuals
January 8, 2007, Francqui Masterclass – p. 15/??
Computed and true residuals
Algorithm Template for Krylov Method:Input: x0; r0 = b − Ax0;For i = 1, 2, · · · until convergence
Generate pi by the method;xi = xi−1 + pi
ri = ri−1 − Api
End for
rn is the computed residual
b − Axn is the true residual
in exact arithmetic they are equal
Examples: CG, Bi-CG, CGS, and BiCGSTAB
January 8, 2007, Francqui Masterclass – p. 16/??
Are peaks bad?
Bi-CG type processes (Bi-CG, CGS, ...):
xi = xi−1 + αipi
ri = ri−1 − αiApi
errors in xi no effect on ri
In finite precision:
ri = ri−1 − αiApi − αi∆Api
|∆A| ≤ nAξ|A|
ri − (b − Axi) = −∑i
j=1 αj∆Apj
|‖ri‖2 − ‖b − Axi‖2| ≤2 i nAξ‖|A|‖‖A−1‖maxj ‖rj‖
January 8, 2007, Francqui Masterclass – p. 17/??
Cure: reliable updating
from suggestion by Neumaier ’94 made for CGS
x = x0; r = r0; xu = 0. . .for i = 0, 1, 2, . . .
. . .xu = xu + αipi
r = r − αiApi
. . .if (‖r‖ < ‖r‖ ∧ i − iprev < mi)
x = x + xu
r = r = b − Axxu = 0
endif
if ‖r‖ ≈ ξ‖r0‖: ri ≈ b − Axi
for analysis: Sleijpen and VDV ’94, simple criteria for mi: Ye and VDV ’99
January 8, 2007, Francqui Masterclass – p. 18/??
motivation to improve CGS
GOALS:
smoother convergence
faster convergence
POSSIBILITIES:
clever choice of s0
instead of ri = R2i (A)r0: ri = Ri(A)Ri(A)r0
with ”damping” Ri
January 8, 2007, Francqui Masterclass – p. 19/??
Variants of Bi-CG: Bi-CGSTAB
Construct ri = Ri(A)Ri(A)r0
Idea: take simple Ri(A):
Ri(A) = (I − ω1A)(I − ω2A) · · · (I − ωiA)
Leads to simple recursions, but how to select ωj?
Take ωj such that it minimizes ‖rj‖2 wrt ωj , for
residuals that are expressed as rj = Rj(A)Rj(A)r0
Leads directly to Bi-CGSTAB (vdv 1992)
in fact combination of Bi-CG with product of GMRES(1) steps
≈ same costs as BiCG, often much faster than BiCG
much smoother than CGS, often faster than CGS
breakdown when GMRES(1) stagnates, poor when GMRES(1) is ve ry poor
January 8, 2007, Francqui Masterclass – p. 20/??
Bi-CGSTAB Algorithm (with prec.)
r0 = b − Ax0
ρ−1 = α−1 = ω−1 = 1v−1 = p−1 = 0for i = 0, 1, 2, ...
ρi = (r0, ri), βi−1 = ρi
ρi−1
αi−1
ωi−1
pi = ri + βi−1(pi−1 − ωi−1vi−1)Solve p from Kp = pi
vi = Apαi = ρi/(r0, vi)s = ri − αivi
Solve z from Kz = st = Az
ωi = (t,s)(t,t)
x(i+1) = x(i) + αip + ωiz
if x(i+1) is accurate enough then stopri+1 = s − ωit
end
January 8, 2007, Francqui Masterclass – p. 21/??
variants of Bi-CG (3)
Bi-CGSTAB2: Gutknecht 1993: recombine successive Bi-CGST AB iterations
BiCGSTAB(2) (Sleijpen, Fokkema, vdVorst ’94) :
- after each two BiCG steps: GMRES(2)
- often faster than BiCGSTAB
- also for near skew-symm. matrices
- ≈ same costs as BiCG (and CGS)
- can be further generalized: BiCGSTAB( ℓ)
- BiCGSTAB(4): fast and rather robust
- but, of course, breakdown when GMRES( ℓ) stagnates
January 8, 2007, Francqui Masterclass – p. 22/??
avoiding breakdown
two reasons for breakdown in Bi-CGSTAB methods
(1) Bi-CG part may break-down: Look-ahead techniques (comp licated)
(2) GMRES-part gives no reduction: no expansion of Krylov su bspace
In that case, use a combination of GMRES and FOM
Gives locally larger ‖ri‖, but often helps to restore
global convergence (Sleijpen and VDV ’95)
January 8, 2007, Francqui Masterclass – p. 23/??
how to select?
For Ax = b, with A 6= AT , A ∈ IRn×n
1. If overhead no problem: GMRES
2. if too much overhead:
QMR, BiCGSTAB,
TFQMR, CGS, Bi-CGSTAB( ℓ)
3. Variable preconditioning: GMRESR, FGMRES
January 8, 2007, Francqui Masterclass – p. 24/??
Often preconditioning required
Convergence behavior depends on spectral properties
Iterative methods often applied to
ℓ left-preconditioned systemK−1Ax = K−1b
r right-preconditioned systemAK−1z = b
c central-preconditioned systemL−1AU−1w = L−1b
If K (= LU) is a good aproximation to A then all iterative methods robust
January 8, 2007, Francqui Masterclass – p. 25/??