WEIGHTED INNER PRODUCTS - Baylor University INNER PRODUCTS FOR GMRES AND ARNOLDI ITERATIONS MARK...

30
WEIGHTED INNER PRODUCTS FOR GMRES AND ARNOLDI ITERATIONS MARK EMBREE * , RONALD B. MORGAN , AND HUY V. NGUYEN 30 June 2016 Abstract. The convergence of the restarted GMRES method can be significantly improved, for some problems, by using a weighted inner product that changes at each restart. How does this weighting affect convergence, and when is it useful? We show that weighted inner products can help in two distinct ways: when the coefficient matrix has localized eigenvectors, weighting can allow restarted GMRES to focus on eigenvalues that otherwise slow convergence; for general problems, weighting can break the cyclic convergence pattern into which restarted GMRES often settles. The eigenvectors of matrices derived from differential equations are often not localized, thus limiting the impact of weighting. For such problems, incorporating the discrete cosine transform into the inner product can significantly improve GMRES convergence, giving a method we call W- GMRES-DCT. Integrating weighting with eigenvalue deflation via GMRES-DR also can give effective solutions. Similarly, weighted inner products can be combined with the restarted Arnoldi algorithm for eigenvalue computations: weighting can enhance convergence of all eigenvalues simultaneously. Key words. linear equations, eigenvalues, restarted GMRES, restarted Arnoldi, weighted inner product, deflation, localized eigenvectors AMS subject classifications. 65F10, 15A06 1. Introduction. We seek to solve large nonsymmetric systems of linear equa- tions and eigenvalue problems using variants of the GMRES [44] and Arnoldi [41] methods that change the inner product each time the method is restarted. To solve Ax = b for the unknown x, GMRES computes the approximate solution x k from the Krylov subspace K k (A, b) = span{b, Ab, . . . , A k-1 b} that minimizes the norm of the residual r k := b -Ax k . In practice this residual is usually optimized in the (Euclidean) 2-norm, but the algorithm can be implemented in any norm induced by an inner product. This generality has been studied in theoretical settings (e.g., [16, 17]) and occasionally invoked in practical computations. The inner product does not affect the spectrum of A, nor the approximation subspace K k (A, b) (assuming the method is not restarted), but it can significantly alter the departure of A from normality and thus the conditioning of the eigenvalues. Numerous authors have proposed inner products in which certain preconditioned saddle point systems are self-adjoint, thus permitting the use of short-recurrence Krylov methods in place of the long-recurrences required in the 2-norm case [7, 18, 26, 28, 50]. Recently Pestana and Wathen have used non-standard inner products to inform the design of preconditioners [39]. For any inner product , ·i W on n , there exists a matrix W n×n for which hu, vi W = v * Wu for all u, v n , and W is Hermitian positive definite in the Euclidean inner product. With the W -inner product is associated the norm kuk W = p hu, ui W = u * Wu. GMRES can be implemented in the W -inner product by simply replacing all norm and inner product computations in the standard GM- * Department of Mathematics and Computational Modeling and Data Analytics Division, Academy of Integrated Science, Virginia Tech, Blacksburg, VA 24061 ([email protected]). Supported in part by National Science Foundation grant DGE-1545362. Department of Mathematics, Baylor University, Waco, TX 76798-7328 (Ronald [email protected]). Supported in part by the National Science Foundation Computational Mathematics Program under grant DMS-1418677. Department of Mathematics, Computer Science, and Cooperative Engineering, University of St. Thomas, 3800 Montrose, Houston, TX 77006 ([email protected]). 1

Transcript of WEIGHTED INNER PRODUCTS - Baylor University INNER PRODUCTS FOR GMRES AND ARNOLDI ITERATIONS MARK...

WEIGHTED INNER PRODUCTSFOR GMRES AND ARNOLDI ITERATIONS

MARK EMBREE∗, RONALD B. MORGAN† , AND HUY V. NGUYEN‡

30 June 2016

Abstract. The convergence of the restarted GMRES method can be significantly improved,for some problems, by using a weighted inner product that changes at each restart. How doesthis weighting affect convergence, and when is it useful? We show that weighted inner productscan help in two distinct ways: when the coefficient matrix has localized eigenvectors, weightingcan allow restarted GMRES to focus on eigenvalues that otherwise slow convergence; for generalproblems, weighting can break the cyclic convergence pattern into which restarted GMRES oftensettles. The eigenvectors of matrices derived from differential equations are often not localized, thuslimiting the impact of weighting. For such problems, incorporating the discrete cosine transforminto the inner product can significantly improve GMRES convergence, giving a method we call W-GMRES-DCT. Integrating weighting with eigenvalue deflation via GMRES-DR also can give effectivesolutions. Similarly, weighted inner products can be combined with the restarted Arnoldi algorithmfor eigenvalue computations: weighting can enhance convergence of all eigenvalues simultaneously.

Key words. linear equations, eigenvalues, restarted GMRES, restarted Arnoldi, weighted innerproduct, deflation, localized eigenvectors

AMS subject classifications. 65F10, 15A06

1. Introduction. We seek to solve large nonsymmetric systems of linear equa-tions and eigenvalue problems using variants of the GMRES [44] and Arnoldi [41]methods that change the inner product each time the method is restarted.

To solve Ax = b for the unknown x, GMRES computes the approximate solutionxk from the Krylov subspace Kk(A, b) = span{b, Ab, . . . , Ak−1b} that minimizes thenorm of the residual rk := b−Axk. In practice this residual is usually optimized in the(Euclidean) 2-norm, but the algorithm can be implemented in any norm induced by aninner product. This generality has been studied in theoretical settings (e.g., [16, 17])and occasionally invoked in practical computations. The inner product does not affectthe spectrum of A, nor the approximation subspace Kk(A, b) (assuming the methodis not restarted), but it can significantly alter the departure of A from normalityand thus the conditioning of the eigenvalues. Numerous authors have proposed innerproducts in which certain preconditioned saddle point systems are self-adjoint, thuspermitting the use of short-recurrence Krylov methods in place of the long-recurrencesrequired in the 2-norm case [7, 18, 26, 28, 50]. Recently Pestana and Wathen haveused non-standard inner products to inform the design of preconditioners [39].

For any inner product 〈·, ·〉W on Rn, there exists a matrix W ∈ Rn×n forwhich 〈u, v〉W = v∗Wu for all u, v ∈ Rn, and W is Hermitian positive definite inthe Euclidean inner product. With the W -inner product is associated the norm‖u‖W =

√〈u, u〉W =

√u∗Wu. GMRES can be implemented in the W -inner product

by simply replacing all norm and inner product computations in the standard GM-

∗Department of Mathematics and Computational Modeling and Data Analytics Division,Academy of Integrated Science, Virginia Tech, Blacksburg, VA 24061 ([email protected]). Supportedin part by National Science Foundation grant DGE-1545362.†Department of Mathematics, Baylor University, Waco, TX 76798-7328

(Ronald [email protected]). Supported in part by the National Science Foundation ComputationalMathematics Program under grant DMS-1418677.‡Department of Mathematics, Computer Science, and Cooperative Engineering, University of

St. Thomas, 3800 Montrose, Houston, TX 77006 ([email protected]).

1

2 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

RES algorithm. Any advantage gained by optimizing in this new geometry must bebalanced against the extra expense of computing the inner products: essentially onematrix-vector product with W for each inner product and norm evaluation.

In 1998 Essai proposed an intriguing way to utilize the inner product in restartedGMRES [14]. In his algorithm the inner product is very simple (W is diagonal)and the inner product changes each time GMRES is restarted. His computationalexperiments show that this “weighting” can significantly improve performance forrestarted GMRES, especially for difficult problems and frequent restarts. Subsequentwork on weighted GMRES [9, 23, 30, 36, 46] and the related Arnolid method [45] hasfocused on how the algorithms perform on large-scale examples. In this work we offera new perspective on why weighting helps GMRES (or not), justified by some simpleanalysis and illustrated by careful computational experiments.

Section 2 describes Essai’s weighted GMRES algorithm. In section 3, we providesome basic analysis for weighted GMRES and illustrative experiments, arguing thatthe algorithm performs well when the eigenvectors are localized. In section 4 we pro-pose a new variant that combines weighting with the discrete cosine transform, whichlocalizes eigenvectors in many common situations. A weighted version of GMRES withdeflated restarting (GMRES-DR [33]) is developed in section 5. This algorithm bothsolves linear equations and computes eigenvalues and eigenvectors. The eigenvectorsimprove convergence of the linear equation solver, and a weighted inner product canimprove convergence further. Finally, in section 6, we apply weighted inner productsto improve convergence of the thick-restart Arnoldi method for computing eigenvaluesand eigenvectors.

2. Weighted GMRES. The standard restarted GMRES algorithm [44] com-putes the residual-minimizing approximation from a degree-m Krylov subspace, thenuses the residual associated with this solution estimate to generate a new Krylovsubspace. We call this algorithm GMRES(m), and every set of m iterations after arestart a cycle. Essai’s variant [14], which we call W-GMRES(m), changes the innerproduct at the beginning of each cycle of restarted GMRES, with weights determinedby the residual vector obtained from the last cycle. Let rkm = b − Axkm denotethe GMRES(m) residual after k cycles. Then the weight matrix defining the innerproduct for the next cycle is W = diag(w1, . . . , wn), where

(1) wj = max

{|(rkm)j |‖rkm‖∞

, 10−10},

where (rkm)j denotes the jth entry of rkm.1 The weighted inner product is then

〈u, v〉W = v∗Wu and the associated norm is ‖u‖W =√〈u, u〉W . At the end of a

cycle, the residual vector rkm is computed and from it the inner product for the nextcycle is built. We have found it helpful to restrict the diagonal entries in (1) to be atleast 10−10, thus limiting the condition number of W to ‖W‖2‖W−1‖2 ≤ 1010. Finiteprecision arithmetic influences GMRES in nontrivial ways; see, e.g., [29, sect. 5.7].Weighted inner products can further complicate this behavior; a careful study ofthis interesting phenomenon is beyond the scope of the present investigation, butwe note that Rozloznık et al. [40] have studied orthogonalization algorithms in non-standard inner products. (Although reorthogonalization is generally not needed forGMRES [37], all tests here use one step of full reothogonalization for extra stability.

1 Essai scales these entries differently, but, as Cao and Yu [9] note, scaling the inner product bya constant does not affect the residuals produced by the algorithm.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 3

Running the experiments in section 2 without this step yields qualitatively similarresults, but the iteration counts can differ.

Since W is diagonal, the inner product requires 3n operations (additions andmultiplications), instead of 2n for the Euclidean inner product, increasing the expenseof the Gram–Schmidt orthogonalization in GMRES by roughly 25%. For very sparseA this can be significant, but if the matrix-vector product is expensive, the relativecost of weighting is negligible.

How can weighting help? Suppose we restart GMRES after every m iterations.GMRES minimizes a residual of the form ϕ(A)r, where ϕ is a polynomial of degree nogreater than m, normalized so that ϕ(0) = 1. When A is nearly a normal matrix, onelearns much from the magnitude of ϕ on the spectrum of A. Loosely speaking, sinceGMRES must minimize the norm of the residual, the optimal ϕ cannot target isolatedeigenvalues near the origin, since ϕ would then be large at other eigenvalues. Thisconstraint often causes uniformly slow convergence of restarted GMRES from cycleto cycle; the restarted algorithm fails to match the “superlinear” convergence enabledby higher degree polynomials in full GMRES [53]. (These higher degree polynomialscan target a few critical eigenvalues with a few roots, while having sufficiently manyroots remaining to be uniformly small on the rest of the spectrum.) Weighting canpotentially skew the geometry to favor troublesome eigenvalues that otherwise impedeconvergence, leaving easy components to be eliminated quickly in subsequent cycles.In aggregate, weighting can produce a string of iterates, each sub-optimal in the 2-norm, that collectively give faster convergence than standard restarted GMRES, withits locally optimal (but globally sub-optimal) iterates.

Essai’s strategy for the inner product relies on the intuition that one should em-phasize those components of the residual vector that have the largest magnitude.Intuitively speaking, these components have thus far been neglected by the method,and may benefit from some preferential treatment. By analyzing the spectral proper-ties of the weighted GMRES algorithm in section 3, we explain when this intuition isvalid, show how it can go wrong, and offer a possible remedy.

Essai showed that his weighting can give significantly faster convergence for someproblems, but did not provide guidance on matrix properties for which this was thecase. Later, Saberi Najafi and Zareamoghaddam [46] showed that taking W to bediagonal with entries from a random uniform distribution on [0.5, 1.5] can sometimesbe effective, too. Weighting appears to be particularly useful when the GMRES(m)restart parameter m is small, though it can help for larger m when the problem isdifficult. We seek insight into this behavior through experiment and analysis.

Example 1. We first test W-GMRES on an example used by Essai [14], the ma-trix Add20 from the Matrix Market collection [6]. (Essai apparently uses the term“iterations” for what we call “cycles” here.) This nonsymmetric matrix has dimensionn = 2395; for the right-hand side b we use a random Normal(0,1) vector. Figure 1compares GMRES(m) and W-GMRES(m) for m = 6 and m = 20. In terms of matrix-vector products (with A), W-GMRES(6) converges about 10 times faster than GM-RES(6); the relative improvement for m = 20 is less extreme, but W-GMRES is stillbetter by more than a factor of two. (Full GMRES converges to ‖rk‖2/‖r0‖2 ≤ 10−10

in about 509 iterations.) Note that the ‖rk‖2 convergence curves for W-GMRESdo not decrease monotonically, since the method generates residuals that are onlyminimized in the weighted norm, not the 2-norm.

Is weighting typically this effective? Essai shows several other tests where weight-ing helps, though not as much as for the Add20 problem. Cao and Yu [9] give evidence

4 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

0 250 500 750 1000 1250 1500matrix{vector products with A

10-10

10-8

10-6

10-4

10-2

100

krkk2

kr0k2

GMRES(6)

GMRES(20)

W-GMRES(6)

W-GM

RES(20)

fullGM

RES

Fig. 1. Convergence of full GMRES, GMRES(m), and W-GMRES(m) with m = 6 and m = 20for the Add20 matrix. In this and all subsequent illustrations, the residual norm is measured in thestandard 2-norm for all algorithms. Full GMRES converges to this tolerance in about 509 iterations.

that suggests weighting is not as effective for preconditioned problems, though it canstill help. In their recent study of harmonic Ritz values for W-GMRES, Guttel andPestana arrive at a similar conclusion after conducting extensive numerical experi-ments on 109 ILU-preconditioned test matrices: “weighted GMRES may outperformunweighted GMRES for some problems, but more often this method is not competitivewith other Krylov subspace methods. . . ” [23, p. 733] and “we believe that WGMRESshould not be used in combination with preconditioners, although we are aware thatfor some examples it may perform satisfactorily” [23, p. 750]. Yet weighting is fairlyinexpensive and can improve convergence for some difficult problems. When weightingworks, what makes it work?

3. Analysis of Weighted GMRES. Analyzing W-GMRES(m) must be atleast as challenging as understanding standard restarted GMRES, an algorithm forwhich results are quite limited. (One knows conditions under which GMRES(m)converges for all initial residuals [15], and about cyclic behavior for GMRES(n −1) for normal A [4]. Few other rigorous results are available.) Restarted GMRESis a nonlinear dynamical system involving the entries of A and the initial residual;experiments suggest its convergence can depend sensitively on the initial residual [12].With these challenges in mind, we seek to gain some insight into the performance ofW-GMRES by first studying a setting in which the algorithm is ideally suited.

Let W = S∗S denote a positive definite matrix that induces the inner product〈u, v〉W = v∗Wu on Rn, with induced norm

‖u‖W =√

(u, u)W =√u∗Wu =

√u∗S∗Su = ‖Su‖2.

Given the initial guess x0 and residual r0 = b− Ax0, at the kth step GMRES in theW -inner product computes the residual rk that satisfies

‖rk‖W = minϕ∈Pk

ϕ(0)=1

‖ϕ(A)r0‖W = minϕ∈Pk

ϕ(0)=1

‖Sϕ(A)S−1Sr0‖2

= minϕ∈Pk

ϕ(0)=1

‖ϕ(SAS−1)Sr0‖2.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 5

Thus, before any restarts are performed, W-GMRES applied to (A, r0) is equivalentto 2-norm GMRES applied to (SAS−1, Sr0). (From this observation Guttel andPestana propose an alternative implementation of W-GMRES, their Algorithm 2 [23,p. 744].) This connection between W-GMRES and standard GMRES via the similaritytransformation SAS−1 was first noted in an abstract by Gutknecht and Loher [22],who considered “preconditioning by similarity transformation;” further details havenot been published. Since A and SAS−1 have a common spectrum, one expects fullGMRES in the 2-norm and W -norm to have similar asymptotic convergence behavior.However, the choice of S can significantly affect the departure of A from normality.For example, if A is diagonalizable with A = V ΛV −1, then taking W = V −∗V −1 withS = V −1 renders SAS−1 = Λ a diagonal (and hence normal) matrix.

The W -norms of the residuals produced by a cycle of W-GMRES will decreasemonotonically, but this need not be true of the 2-norms, as seen in Figure 1. In fact,

‖rk‖W = ‖Srk‖2 ≤ ‖S‖2‖rk‖2,

and similarly

‖rk‖2 = ‖S−1Srk‖2 ≤ ‖S−1‖2‖Srk‖2 = ‖S−1‖2‖rk‖W ,

so1

‖S‖2‖rk‖W ≤ ‖rk‖2 ≤ ‖S−1‖2‖rk‖W .

Moreover, a result of Pestana and Wathen [39, Thm. 4] concerning GMRES in generalinner products implies that the same bound holds when the residual in the middle of

this inequality is replaced by the optimal 2-norm GMRES residual. That is, if r(W )k

and r(2)k denote the kth residuals from GMRES in the W -norm and 2-norm (both

before any restart is performed, k ≤ m), then

1

‖S‖2‖r(W )

k ‖W ≤ ‖r(2)k ‖2 ≤ ‖S−1‖2‖r(W )

k ‖W .

Thus for W -norm GMRES to depart significantly from 2-norm GMRES over thecourse of one cycle, S must be (at least somewhat) ill-conditioned. However, restart-ing with a new inner product can change the dynamics across cycles significantly.

3.1. The ideal setting for W-GMRES. For diagonal W , as in [14], let

S = diag(s1, . . . , sn) := W 1/2.

When the coefficient matrix A is diagonal,

A = diag(λ1, . . . , λn),

Essai’s weighting is well motivated and its effects can be readily understood.2 In thissetting, the eigenvectors are columns of the identity matrix, V = I, and SA = AS.Consider one cycle of W-GMRES(m), which we presume starts with the initial residual

2Examples 1 and 2 of Guttel and Pestana [23] use diagonal A, and the analysis here can helpexplain their experimental results.

6 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

r0 = b = [b1, . . . , bn]T . Then for all k ≤ m,

‖rk‖2W = minϕ∈Pk

ϕ(0)=1

‖ϕ(A)b‖2W = minϕ∈Pk

ϕ(0)=1

‖Sϕ(A)b‖22(2a)

= minϕ∈Pk

ϕ(0)=1

‖ϕ(SAS−1)Sb‖22(2b)

= minϕ∈Pk

ϕ(0)=1

‖ϕ(A)Sb‖22 = minϕ∈Pk

ϕ(0)=1

n∑j=1

|ϕ(λj)|2|sjbj |2,(2c)

using the commutativity of diagonal S and A. This last equation reveals that theweight sj affects GMRES in precisely the same way as the component bj of the right-hand side (which is just the component of b in the jth eigenvector direction for thisA), and thus the weights tune how much GMRES will emphasize any given eigenvalue.Up to scaling, Essai’s proposal amounts to sj =

√|bj |, so

(3) ‖rk‖2W = minϕ∈Pk

ϕ(0)=1

n∑j=1

|ϕ(λj)|2 |bj |3,

This approach favors large components in b, while underemphasizing smaller ones.Consider this intuition: small entries in b (reduced at a previous cycle of restartedGMRES) were “easy” for GMRES to reduce because, for example, they correspondto eigenvalues far from the origin. Large entries in b have been neglected by previouscycles, most likely because they correspond to eigenvalues close to the origin. A low-degree polynomial that is small at these points is likely to be large at large magnitudeeigenvalues, and so will not be picked as the optimal polynomial by the restartedGMRES process. Weighting tips the scales to favor of these neglected eigenvaluesnear the origin for one cycle; this preference will usually increase the residual in“easy” components, but such an increase can be quickly remedied at the next cycle.Consider, for example, how standard GMRES(1) handles the scenario

0 < λ1 � 1 ≤ λ2 ≈ · · · ≈ λn.

Putting the root ζ of the residual polynomial ϕ(z) = 1 − z/ζ near the cluster ofeigenvalues λ2 ≈ · · · ≈ λn will significantly diminish the residual vector in components2, . . . , n, while not increasing the first component, since |ϕ(λ1)| < 1. On the otherhand, placing ζ near λ1 would give |ϕ(λj)| � 1 for j = 2, . . . , n, significantly increasingthe overall residual norm – but in a manner that could be remedied at the next cycle,if only GMRES(1) had the foresight to take a locally sub-optimal cycle to acceleratethe overall convergence. Essai’s weighting gives GMRES the ability to target thatsmall magnitude eigenvalue for one cycle, potentially at an increase to the 2-norm ofthe residual that can easily be corrected at the next step without undoing the reducedcomponent associated with λ1. We illustrate this idea with a detailed example.

3.1.1. The diagonal case: targeting small eigenvalues.

Example 2. Consider the diagonal matrix

A = diag(0.01, 0.1, 3, 4, 5, . . . , 9, 10).

The right-hand side is a Normal(0,1) vector (MATLAB’s randn with rng(1319)),normalized so that ‖r0‖2 = 1. Figure 2 compares W-GMRES(m) to GMRES(m) for

WEIGHTED GMRES AND ARNOLDI ITERATIONS 7

0 50 100 150 200matrix{vector products with A

10-8

10-6

10-4

10-2

100

krkk2

kr0k2

GMRES(3)

W-G

MRES(3)

GM

RES(6)

W-G

MRES(6)

Fig. 2. GMRES(m) and W-GMRES(m) with m = 3, 6 for the diagonal test case, Example 2.

0

-2

-400

log10 krkk2

-650

0.2

-8

iteration, k

-10100

GMRES(3), 5000 trials

0.4

-12150

freq

uen

cy

0.6

0.8

1

0

-2

-400

log10 krkk2

-650

0.2

-8

iteration, k

-10100

W-GMRES(3), 5000 trials

0.4

-12150

freq

uen

cy

0.6

0.8

1

Fig. 3. Histograms of GMRES(3) and W-GMRES(3) convergence for the matrix in Example 2with 5000 random unit vectors b = r0 (the same for both methods). For each iteration, these plotsshow the distribution of the 2-norm of the residual; while W-GMRES(3) converges more rapidly innearly all cases, the convergence is more variable.

m = 3 and m = 6: in both cases, weighting gives a big improvement. Figure 3 investi-gates this behavior further, comparing convergence of GMRES(3) and W-GMRES(3)for 5000 Normal(0,1) right-hand side vectors (also normalized so ‖r0‖2 = 1). As inFigure 2, W-GMRES(3) usually substantially improves convergence, but it also addsconsiderable variation in performance.

We will focus on the case of m = 3. GMRES(3) converges quite slowly: the smalleigenvalues of A make this problem difficult. Figure 4 shows the residual componentsin each eigenvector after cycles 2, 3 and 4: GMRES(3) makes little progress in thesmallest eigencomponent. Contrast this with W-GMRES(3): after cycles 2 and 3 alleigencomponents have been reduced except the first one, corresponding to λ = 0.01.Thus cycle 4 places a significant emphasis on this previously neglected eigenvalue,reducing the corresponding eigencomponent in the residual by two orders of magnitude(while increasing some of the other eigencomponents). Without weighting, GMRES(3)does not target this eigenvalue.

Figure 5 examines the GMRES residual polynomials ϕ` that advance GMRES(m)and W-GMRES(m) from cycle ` to cycle ` + 1 (i.e., r(`+1)m = ϕ`(A)r`m withϕ`(0) = 1) for ` = 2, . . . , 5. GMRES(3) shows a repeating pattern, with every other

8 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

1 2 3 4 5 6 7 8 9 10eigencomponent index, j

10-8

10-6

10-4

10-2

100

102

magnitude

ofjt

hei

gen

com

ponen

tofre

sidual

234 23

42

3

42

34

2

34

2

34

2

34

2

34

2

34

2

34

GMRES(3)

6

λ = 0.01

1 2 3 4 5 6 7 8 9 10eigencomponent index, j

10-8

10-6

10-4

10-2

100

102

magnitude

ofjt

hei

gen

com

ponen

tofre

sidual

23

423

4

23

4

23

4 23

4

2

3

4 2

3

4

2

3

4

2

3

423

4

W-GMRES(3)

6

λ = 0.01

Fig. 4. Eigencomponents of the residual at the end of cycles ` = 2, 3, and 4 for standardGMRES(3) (top) and W-GMRES(3) (bottom) applied to the matrix in Example 2. Cycle 4 ofW-GMRES(3) reduces the eigencomponent corresponding to λ1 = 0.01 by two orders of magnitude.

polynomial very similar; such behavior has been noted before [4, 56]. None of thesefour polynomials is small at the eigenvalue λ = 0.01, as can be seen in the zoomed-inimage on the right of Figure 5. (At cycle 4, ϕ4(0.01) ≈ 0.908.) The W-GMRES(3)polynomials at cycles 2, 3, and 5 are much like those for GMRES(3); however, thepolynomial for cycle 4 breaks the pattern, causing a strong reduction in the firsteigencomponent. Specifically, ϕ4(0.01) ≈ 0.00461. Note that ϕ4 is also small at theeigenvalues λ = 5 and λ = 10. These are the other eigencomponents that are reducedat this cycle; the others all increase, as seen in Figure 4. While the weighted normdecreases from 0.0274 to 0.0018 during cycle 4, the 2-norm shows a more modestreduction, from 0.0913 to 0.0734. Indeed, experiments with different initial residualsoften produce similar results, though the cycle that targets λ = 0.01 can vary, andthe 2-norm of the W-GMRES residual can increase at that cycle.

The next result makes such observations precise for 2× 2 matrices.

Theorem 3. Let A = diag(λ, 1) with real λ 6= 0, and suppose the residual vectorat the start of a cycle of GMRES is b = [ b1 b2 ]T for b1, b2 ∈ R with b1 6= 0. Define

WEIGHTED GMRES AND ARNOLDI ITERATIONS 9

0 3 4 5 6 7 8 9 10z

-100

-80

-60

-40

-20

0

20

40

60

80

100

j'`(

z)j cycles 3, 5

cycles 2, 4

GMRES(3)

0 0.05 0.1 0.15z

-10

-8

-6

-4

-2

0

1

2cycles 3, 5

cycles 2, 4

GMRES(3), zoom

0 3 4 5 6 7 8 9 10z

-100

-80

-60

-40

-20

0

20

40

60

80

100

j'`(

z)j cycles 3, 5

cycle 2

cycle4

W-GMRES(3)

0 0.05 0.1 0.15z

-10

-8

-6

-4

-2

0

1

2cycles 3, 5

cycle 2

cycle4

W-GMRES(3), zoom

Fig. 5. Residual polynomials for cycles 2–5 for standard GMRES(3) (top) and W-GMRES(3)(bottom) for Example 2. The solid gray vertical lines denote the eigenvalues of A. At cycle 4, theweighted inner product allows W-GMRES(3) to make progress on the first eigencomponent, at theexpense of increasing other eigencomponents.

β := b2/b1. Then the GMRES(1) polynomial has the root

λ2 + β2

λ+ β2,

while the W-GMRES(1) polynomial has the root

λ2 + |β|3

λ+ |β|3.

The proof is a straightforward calculation. To appreciate this theorem, takeλ = β = 0.1. Then the roots are 0.1818 for GMRES(1) and 0.1089 for W-GMRES(1).These polynomials cut the residual component corresponding to the small eigenvalueby a factor 0.450 for GMRES(1) and by 0.0818 for W-GMRES(1), the latter at thecost that W-GMRES(1) increases the magnitude of the other eigencomponent bya factor of 8.18. This increase is worthwhile, given the significant progress in thetough small component; later cycles will handle the easier component. For anotherexample, take λ = β = 0.01: GMRES(1) reduces the small eigencomponent by 0.495,while W-GMRES(1) reduces it by 0.0098, two orders of magnitude.

3.1.2. The diagonal case: breaking patterns in residual polynomials.Beyond the potential to make progress on small magnitude eigenvalues, W-GMREShas the added advantage of breaking the cyclic pattern into which GMRES(m) resid-ual polynomials often fall. Baker, Jessup and Manteuffel [4, Thm. 2] prove that if A

10 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

is an n × n symmetric (or skew-symmetric) matrix, then GMRES(n − 1) producesresidual vectors that exactly alternate in direction, i.e., the residual vector at theend of a cycle is an exact multiple of the residual vector two cycles before; thus theGMRES residual polynomial at the end of each cycle repeats the same polynomialfound two cycles before. Baker et al. observe the same qualitative behavior for smallerrestart parameters, and suggest that disrupting this cyclic pattern can accelerate GM-RES convergence. (Longer cyclic patterns can emerge for nonnormal matrices [56].)To break the pattern, Baker et al. propose changing the restart parameter betweencycles of standard restarted GMRES. Changing inner products can have a similareffect, with the added advantage of targeting difficult eigenvalues if the correspondingeigenvectors are well-disposed. While Essai’s residual-based weighting scheme canbreak the cyclic pattern in restarted GMRES, other schemes (such as the randomweighting scheme used in the experiments of [23, 46]; see subsection 3.5) can have asimilar effect; however, such arbitrary inner products take no advantage of eigenvectorstructure. The next example cleanly illustrates how W-GMRES can break patterns.

Example 4. Let A = diag(2, 1), and apply GMRES(1) with starting vector b =[ 1 1 ]T . The roots of the (linear) GMRES(1) residual polynomials alternate between5/3 and 4/3 (exactly), values given by Theorem 3. Therefore the GMRES(1) residualpolynomials for the cycles alternate between ϕk(z) = 1 − (3/5)z and ϕk(z) = 1 −(3/4)z. GMRES(1) converges (‖rk‖2/‖r0‖2 ≤ 10−8) in 17 iterations. In contrast,W-GMRES(1) requires only 7 iterations. As Figure 6 shows, weighting breaks thecycle of polynomial roots, giving 1.667, 1.200, 1.941, 1.0039, 1.999985, 1.0000000002,and ≈ 2: the roots move out toward the eigenvalues, alternating which eigenvaluethey favor and reducing the residual much faster in the process.

3.2. Measuring eigenvector localization. The justification for W-GMRESin (2c) relied on the fact that A was diagonal, i.e., all its eigenvectors were columnsof the identity matrix. When this is not the case, the motivation for W-GMRESbecomes more tenuous, as explored in the next subsection. (This is true even ifthe eigenvectors are orthogonal, i.e., A is normal but not diagonal.) Yet in manyimportant applications A is not close to diagonal, but still has eigenvectors that arelocalized, meaning that only a few entries in the eigenvector are large in magnitude. Inthis case computational evidence suggests that W-GMRES can still be very effective,particularly when the eigenvectors associated with the small magnitude eigenvaluesare localized. Indeed, one such example is the Add20 matrix in Example 1.

To inform the discussion in the next subsection, we propose to roughly gaugethe localization of the p smallest magnitude eigenvectors, measuring how much these

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17k

1

4/3

5/3

2GMRES(1)

1 2 3 4 5 6 7k

1

4/3

5/3

2W-GMRES(1)

Fig. 6. Roots of the GMRES(1) and W-GMRES(1) residual polynomials for A = diag(2, 1)with b = [1, 1]T , as a function of the restarted GMRES cycle, k. The GMRES(1) roots occur in arepeating pair, and the method takes 17 iterations to converge to ‖rk‖2/‖r0‖2 ≤ 10−8. In contrast,the W-GMRES(1) roots are quickly attracted to the eigenvalues, giving convergence in just 7 steps.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 11

0 5 10 15 20 25 30 35 40p

0

0.2

0.4

0.6

0.8

1

loc p

(A)

Add20

2d Laplacian

Orsirr 1

Sherman5

Fig. 7. The localization measures locp(A) for p = 1, . . . , 40 for four matrices that are usedin our W-GMRES experiments. If locp(A) is close to 1, the eigenvectors associated with smallestmagnitude eigenvalues are highly localized.

eigenvectors are concentrated in their p largest magnitude entries.3 In particular, fordiagonalizable A label the eigenvalues in increasing magnitude, |λ1| ≤ |λ2| ≤ · · · ≤|λn|, with associated unit 2-norm eigenvectors v1, . . . , vn. For any vector y ∈ Cn, lety(p) ∈ Cp denote the subvector containing the p largest magnitude entries of y (inany order). Then

(4) locp(A) :=1√p

( p∑j=1

‖v(p)j ‖22

)1/2.

Notice that locp(A) ∈ (0, 1], and locn(A) = 1. If locp(A) ≈ 1 for some p � n, theeigenvectors associated with the p smallest magnitude eigenvalues are localized withinp positions. This measure is imperfect, for it does not reflect the specific location ofthe smallest magnitude eigenvalues, nor does it reflect potential eigenvector alignment(i.e., ill-conditioned eigenvector matrix) for nonnormal A. Nevertheless, we find thismeasure to be a helpful instrument for assessing localization of eigenvectors. Figure 7shows locp(A) for p = 1, . . . , 40 for four matrices that are used in experiments in thissection.4 The Add20 matrix has strongly localized eigenvectors; the discretization ofthe 2d Laplacian has eigenvectors that are far from being localized, i.e., they are global.The Orsirr 1 and Sherman5 matrices have eigenvectors with an intermediate degreeof localization. We shall refer back to these measures in our future experiments.

3.3. The nondiagonal case. Essai proposed W-GMRES for general nonsingu-lar matrices. When A is not diagonal, the justification for the algorithm in equa-tion (2c) is lost, and the iteration becomes more subtle and less compelling. Theweights can still break cyclic patterns in restarted GMRES, but the interplay be-tween the weights and the eigenvectors is more difficult to understand. Suppose A is

3Certain applications motivate more specialized measures of localization based on the rate ofexponential decay of eigenvector entries about some central entry.

4Two of these matrices, Sherman5 and the 2d Laplacian, have repeated (derogatory) eigenvalues;according to the principle of reachable invariant subspaces (see, e.g., [5, 29]), the GMRES processonly acts upon one eigenvector in the invariant subspace, given by the component of the initialresidual in that space. Thus the localization measures in Figure 7 include only one eigenvector forthe derogatory eigenvalues, specified by the initial residual vectors used in the experiments here.

12 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

diagonalizable, A = V ΛV −1, again with weight matrix W = S∗S. Unlike the diagonalcase, S will not generally commute with V and Λ, so the analysis in (2) becomes

‖rk‖W = minϕ∈Pk

ϕ(0)=1

‖ϕ(SAS−1)Sb‖2(5a)

= minϕ∈Pk

ϕ(0)=1

‖(SV )ϕ(Λ)(SV )−1Sb‖2.(5b)

The matrix S transforms the eigenvectors of A [22]. Suppose, impractically, that V −1

were known, allowing the nondiagonal weight

S = diag(s1, . . . , sn)V −1.

In this case (5b) reduces to

(6) ‖rk‖2W = minϕ∈Pk

ϕ(0)=1

n∑j=1

|ϕ(λj)|2|sjbj |2, eigenvector weighting

a perfect analogue of the diagonal case (2c) that could appropriately target the smallmagnitude eigenvalues that delay convergence. One might imagine approximatingthis eigenvector weighting by using, as a proxy for V −1, some estimate of the lefteigenvectors of A associated with the smallest magnitude eigenvectors. (Recall thatthe rows of V −1 are the left eigenvectors of A, since V −1A = ΛV −1.) We do notpursue this idea here, instead preferring to incorporate eigenvector information in thedeflated restarting technique described in section 5.

The conventional W-GMRES(m) algorithm instead uses diagonal W (and S).In this case, SV scales the rows of V , effectively emphasizing certain entries of theeigenvectors at the expense of others. Write out the right and left eigenvectors,

V = [v1 v2 · · · vn] ∈ Cn×n, V −1 =

v∗1v∗2...v∗n

∈ Cn×n.

Let c := V −1b denote the expansion coefficients for b in the eigenvectors of A: b = V c.Then one can also render (5b) in the form

(7) ‖rk‖W = minϕ∈Pk

ϕ(0)=1

∥∥∥ n∑j=1

cj ϕ(λj)Svj

∥∥∥2, diagonal weighting

a different analogue of (2c). Denote the `th entry of vj by (vj)`. In most cases V willbe a dense matrix. Suppose that only the qth entry of rk is large, and that (vj)q 6= 0for all j = 1, . . . , n. Then Essai’s weighting strategy will make |(Svj)q| � |(Svj)`|for all ` 6= q: all the vectors Svj form a small angle with eq, the qth column of theidentity, making SV ill-conditioned. Thus the transformed coefficient matrix SAS−1

will have a large departure from normality, often a troublesome case for GMRES; see,e.g., [52, chap. 26]. It is not evident how such weighting could cause W-GMRES(m)to focus on small magnitude eigenvalues, as it does for diagonal A, suggesting whyW-GMRES(m) has shown mixed results in past experiments, e.g., [9, 23].

WEIGHTED GMRES AND ARNOLDI ITERATIONS 13

We show that by transforming the eigenvectors, diagonal weighting can even pre-vent W-GMRES(m) from converging at all. Faber et al. [15] prove that GMRES(m)converges for all initial residuals provided there exists no v ∈ Cn such that

v∗Akv = 0, for all k = 1, . . . ,m.

A condition sufficient to guarantee convergence of GMRES(m) for all m ≥ 1 is thatthe field of values of A,

F (A) := {v∗Av : v ∈ Cn, ‖v‖2 = 1},

does not contain the origin. (Often some aspect of the motivating problem ensuresthis condition holds, e.g., coercivity in finite element methods.) Weighting effectivelyapplies GMRES to the transformed matrix SAS−1, and now it is possible that 0 ∈F (SAS−1) even though 0 6∈ F (A). In extreme cases this means that W-GMRES(m)can stagnate, even when GMRES(m) converges, as shown by the next example. (Incontrast, for A and S diagonal, SAS−1 = A, so F (A) = F (SAS−1).)

Example 5 (W-GMRES(1) stagnates while GMRES(1) converges).Consider the matrix and initial residual

(8) A =

[1 −40 5

], r0 =

[1

110 (5 +

√5)

]=

[1

0.72360 . . .

].

Figure 8 shows F (A), an ellipse in the complex plane. From the extreme eigenvalues ofthe Hermitian and skew-Hermitian parts of A, one can bound F (A) within a rectanglein C that does not contain the origin,

Re(F (A)) = [3− 2√

2, 3 + 2√

2] ≈ [0.17157, 5.82843], Im(F (A)) = [−2i, 2i],

and hence GMRES(1) converges for all initial residuals. Now consider the specific r0given in (8). Using the scaling S = diag(

√|(r0)1|/‖r0‖∞,

√|(r0)2|/‖r0‖∞) at the first

step gives the W-GMRES(1) polynomial (see, e.g., [12, eq. (2.2)])

r1 = ϕ(A)r0 = r0 −(Sr0)∗(SAS−1)(Sr0)

‖(SAS−1)(Sr0)‖22(SAS−1)(Sr0) = r0 −

r∗0S2Ar0

‖SAr0‖22SAr0.

0 1 2 3 4 5 6

-2

-1

0

1

2

F (A)

F (SAS!1 )

Fig. 8. The fields of values F (A) (gray region) and F (SAS−1) (boundary is a black line) in Cfor A and r0 in (8); the cross (+) marks the origin. Since 0 6∈ F (A), GMRES(m) converges for anyr0; however, 0 ∈ F (SAS−1) and (Sr0)∗(SAS−1)(Sr0) = 0, so W-GMRES(1) completely stagnates.

14 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

This example has been engineered so that 0 ∈ F (SAS−1); indeed,

Re(F (SAS−1)) =

[3−

√14− 2

√5, 3 +

√14− 2

√5

]≈ [−0.086724, 6.08672],

as seen in Figure 8. Worse still, this r0 gives r∗0S2Ar0 = 0, so r1 = r0: W-GMRES(1)

makes no progress, and the same weight is chosen for the next cycle. The weightedalgorithm completely stagnates, even though GMRES(1) converges for any r0.

This last example was designed to accentuate the difference between GMRES(m)and W-GMRES(m) when A is not diagonal. Next we show that global eigenvectors as-sociated with small magnitude eigenvalues can also contribute to poor W-GMRES(m)convergence.

Example 6 (W-GMRES(m) worse than GMRES(m): global eigenvectors).We revisit Example 2, again with dimension n = 10 and eigenvalues

Λ = diag(0.01, 0.1, 3, 4, 5, . . . , 9, 10),

but now building the eigenvector matrix V from the orthonormal eigenvectors ofthe symmetric tridiagonal matrix tridiag(−1, 0,−1), ordering the eigenvalues fromsmallest to largest. (The jth eigenvector has entries (vj)` = sin(j`π/(n + 1))/‖vj‖2;see, e.g., [48].) The resulting A = V ΛV ∗ is a symmetric matrix, a unitary similar-ity transformation of Λ. Let b be the same Normal(0,1) vector used in Example 2.Since V is unitary, GMRES(m) applied to (Λ, b) produces the same residual normsas GMRES(m) applied to (V ΛV ∗, V b). However, weighted GMRES can behave verydifferently for the two problems: when applied to Λ (as in Example 2), W-GMRES(6)significantly outperforms GMRES(6); when applied to A with its non-localized eigen-vectors, W-GMRES(6) converges much more slowly, as seen in Figure 9.

The next example shows similar behavior for a larger matrix: eigenvalues on bothsides of the origin with global eigenvectors cause poor W-GMRES(m) performance;convergence improves for the same spectrum with localized eigenvectors.

0 50 100 150 200 250 300 350 400 450matrix{vector products with A

10-8

10-6

10-4

10-2

100

krkk2

kr0k2

GM

RES(6)

W-G

MR

ES(6)

on$

W-GMRES(6) on A

Fig. 9. GMRES(6) and W-GMRES(6) for Example 6. Applying GMRES(6) to (Λ, b) and(A, V b) produces identical residual 2-norms. When applied to (Λ, b), W-GMRES(6) converges fasterthan GMRES(6) (seen previously in Figure 2); when applied to (A, V b) the convergence is muchslower (A having non-localized eigenvectors).

WEIGHTED GMRES AND ARNOLDI ITERATIONS 15

0 1000 2000 3000 4000 5000 6000 7000 8000 9000matrix{vector products with A

10-6

10-4

10-2

100

krkk2

kr0k2

W-GMRES(30)

GMRES(30)

W-GMRES(100)

GMRES(100)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000matrix{vector products with $

10-10

10-8

10-6

10-4

10-2

100

krkk2

kr0k2GMRES(30)

GMRES(100)

W-GMRES(30)

W-GMRES(100)

Fig. 10. GMRES(m) and W-GMRES(m) for the nonsymmetric Sherman5 matrix. In the topplot, the methods are applied to Ax = b, and weighting slows convergence. In the bottom plot, thesame methods are applied to Λx = b, where Λ is diagonal with the same eigenvalues of A; now W-GMRES converges more rapidly. (The GMRES(100) curve is magenta; the W-GMRES(30) curveis blue.) For both Ax = b and Λx = b, taking m = 100 steps with each weighted inner product leadsto significant growth in the 2-norm of the residual between restarts.

Example 7 (W-GMRES(m) slower than GMRES(m): global eigenvectors).We present one final example showing how weighting can slow convergence. The

Sherman5 matrix from Matrix Market [6] has dimension n = 3312. Figure 10 showsthat weighting slows down restarted GMRES, particularly for the larger restart valueof m = 100. (The initial residual is a Normal(0,1) vector.) Figure 7 shows that theeigenvectors are poorly localized. For example, in the eigenvector v1 correspondingto the smallest eigenvalue, the 100 largest components range only from 0.051 downto 0.041 (with ‖v1‖2 = 1). Moreover, this matrix has eigenvalues to the left and rightof the origin, and the eigenvectors are not orthogonal.5 The second plot in Figure 10shows how W-GMRES performance improves when A is replaced by a diagonal matrixwith the same spectrum.

5Note λ = 1 is an eigenvalue of multiplicity 1674 for Sherman5, and the corresponding eigenvec-tors are columns of the identity matrix. This high multiplicity eigenvalue has little effect on GMRESconvergence. The eigenvalues closest to the origin are well-conditioned, while some eigenvaluesfar from the origin are relatively ill-conditioned. (For the diagonalization computed by MATLAB,‖V ‖2‖V −1‖2 ≈ 3.237×106.) The pseudospectra of this matrix are well-behaved near the origin; theeffect of the eigenvalue conditioning on GMRES convergence is mild [52, chap. 26].

16 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

0 500 1000 1500 2000 2500 3000matrix{vector products with A

10-8

10-6

10-4

10-2

100

krkk2

kr0k2

GMRES(10)W-GMRES(10)

GMRES(20)

W-G

MRES(20)

Fig. 11. GMRES(m) and W-GMRES(m) for a discretization of the Dirichlet Laplacian on theunit square. The eigenvectors are not localized, but weighting still accelerates convergence.

The past three examples suggest that W-GMRES performs poorly for matriceswhose eigenvectors are not localized, but the algorithm is more nuanced than that.The next example is characteristic of behavior we have often observed in our ex-periments. We attribute the improvement over standard restarted GMRES to thetendency of weighting to break cyclic patterns in the GMRES residual polynomials.

Example 8 (W-GMRES(m) better than GMRES(m): global eigenvectors).Consider the standard finite difference discretization of the Laplacian on the unitsquare in two dimensions with Dirichlet boundary conditions. The uniform grid spac-ing h = 1/100 in both directions gives a matrix of order n = 992 = 9801; the initialresidual is a random Normal(0,1) vector. The small locp(A) values shown in Figure 7confirm that the eigenvectors are not localized; in fact, they are less localized thanfor the Sherman5 matrix in the last example. Despite this, Figure 11 shows that W-GMRES(m) outperforms GMRES(m) for m = 10 and m = 20, though not as muchas for the Add20 matrix in Example 1. In this case, all the eigenvalues are positive,and none is particularly close to the origin.

3.4. Extra weighting. If weighting often helps GMRES convergence, particu-larly when the smallest eigenvalues of A have localized eigenvectors, might convergencebe further improved if the weights exaggerate the differences in the residual vectorentries, thus accentuating the small-magnitude eigenvalues? Recall from equation (2)that for diagonal A,

(9) ‖rk‖2W = minϕ∈Pk

ϕ(0)=1

n∑j=1

|ϕ(λj)|2|sjbj |2.

Essai’s standard weighting scheme uses s2j = wj = |bj |. Suppose one replaces this

weight with s2j = wj = |bj |p for some p ≥ 0 (where p = 0 is GMRES(m), p = 1 isW-GMRES(m), and p > 1 gives extra weighting). For large p we find it particularlyimportant to limit the minimal weight, as in (1); we require wj ∈ [10−10, 1].

For most cases we have studied the extra weighting makes little qualitative dif-ference in convergence. For diagonal A, where the motivation from (9) holds, p > 1can notably accelerate convergence. When A is not diagonal but has localized eigen-

WEIGHTED GMRES AND ARNOLDI ITERATIONS 17

Table 1W-GMRES(m) performance for the Orsirr 1 matrix, with extra weighting of the residual entries

by the power p. The table entries show the number of matrix-vector products with A required to reduce‖rk‖2/‖r0‖2 by a factor of 10−8.

p 0 1 2 3 4 5 6 8 10

m = 5 > 20000 > 20000 16190 9055 8505 7715 4750 5475 5140m = 10 17820 16780 6270 6240 3950 3850 3690 3600 3610m = 20 16400 3480 3020 2600 2680 2640 2740 2620 2800m = 30 4980 3240 2490 2610 2640 2370 2400 2430 2310

vectors, extra weighting sometimes helps. It does little for the Add20 matrix fromExample 1, but has a greater effect for the Orsirr 1 matrix studied next.

Example 9 (Extra weighting, localized eigenvectors).The Orsirr 1 matrix from Matrix Market [6] is a nonsymmetric matrix of dimensionn = 1030. Figure 7 shows the eigenvectors to be fairly well localized; all the eigen-values are real and negative. Table 1 reports the matrix-vectors products (with A)required by W-GMRES(m) with different weighting powers p to satisfy the conver-gence criterion ‖rk‖2/‖r0‖2 ≤ 10−8. (Here r0 is a random Normal(0,1) vector.) Whenm is small (e.g., m = 5 and 10), extra weighting improves convergence significantly.For example, when m = 10, the extreme value p = 8 gives the best result among thereported p. For larger m, extra weighting has less impact.

3.5. Random weighting. The discussion throughout this section has arguedthat when A has local eigenvectors, residual-based weighting helps in two ways: it canencourage convergence in certain eigenvector directions, as described in subsection 3.1,in addition to breaking repetitive patterns in GMRES(m). In contrast, when theeigenvectors are predominantly global, weighting can still sometimes help by breakingpatterns in GMRES(m), but there is no compelling connection to the eigenvectors.

This explanation suggests that if the eigenvectors of A are global, a residual-based weighting scheme should fare no better than the random diagonal weightingmatrix proposed by Saberi Najafi and Zareamoghaddam [46]. In contrast, for local-ized eigenvectors, residual-based weighting should have a significant advantage overrandom weighting. Indeed, this behavior is observed in the next two examples.

Example 10 (Random weights for a matrix with local eigenvectors).The matrix Add20 from Example 1 has localized eigenvectors, and W-GMRES(m)significantly improved convergence of GMRES(m) for m = 3 and m = 6. Table 2compares this performance to two random weighting schemes: W is a diagonal matrixwith uniform random entries chosen in the interval [0.5, 1.5] (first method) and [0, 1](second method), with fresh weights generated at each restart. All experiments usethe same initial residual as in Figure 1; since performance changes for each run withrandom weights, we average the matrix-vector products with A (iteration counts) overten trials each. All three weighted methods beat GMRES(m), but the residual-basedweighting has a significant advantage over the random methods.

Example 11 (Random weights for a matrix with global eigenvectors).The results are very different for the discretization of the Dirichlet Laplacian on theunit square, from Example 8. Indeed, Table 3 shows that residual-based weightingis inferior to most of the random weighting experiments, with particularly extreme

18 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

Table 2For the Add20 matrix, the matrix-vector products (with A) required to converge to

‖rk‖2/‖r0‖2 ≤ 10−8 for GMRES(m), W-GMRES(m), and two random schemes: weights cho-sen uniformly in [0.5, 1.5] and [0, 1]. The random counts are averaged over 10 trials, all with thesame r0. The residual-based weighting of W-GMRES(m) gives significantly faster convergence forthis matrix with local eigenvectors.

m GMRES(m) W-GMRES(m) rand(.5,1.5) rand(0,1)

1 > 20000 2152 19451.0 4436.32 > 20000 5162 12900.0 10429.83 > 20000 6108 11384.1 9388.26 12462 996 5275.8 4227.6

10 5240 920 2414.0 1484.015 3075 975 1783.5 1273.520 2000 840 1560.0 1308.0

Table 3For the discrete Laplacian on the unit square, the matrix-vector products (with A) required

to converge to ‖rk‖2/‖r0‖2 ≤ 10−8 for GMRES(m), W-GMRES(m), and two random schemes:weights chosen uniformly in [0.5, 1.5] and [0, 1]. The random counts are averaged over 10 trials, allwith the same r0. For smaller values of m, the residual-based weighting of W-GMRES(m) performssignificantly worse than the random weighting for this matrix with global eigenvectors.

m GMRES(m) W-GMRES(m) rand(.5,1.5) rand(0,1)

1 > 20000 > 20000 1671.8 1429.52 19082 13194 2946.6 3142.43 12792 8172 3080.4 2797.56 6426 4848 2801.4 2350.8

10 3850 2790 2079.0 1949.015 2670 2115 2148.0 1983.020 2060 1580 1900.0 1662.0

differences for small restart parameters, m. There appears to be no advantage tocorrelating weighting to the residual components. (This example shows an interesting“tortoise and hare” effect [12], where the random weighting schemes with m = 1converge in fewer iterations than the larger m values shown here.)

4. W-GMRES-DCT: Localizing eigenvectors with Fourier transforms.Equation (6) suggests that, for a diagonalizable matrix A = V ΛV −1, the ideal weight-ing scheme should incorporate eigenvector information. It will be convenient to nowexpress W in the form W = Z∗Z with Z = SV −1 for S = diag(s1, . . . , sn), giving

‖rk‖2W = minϕ∈Pk

ϕ(0)=1

n∑j=1

|ϕ(λj)|2|sjbj |2.

Incorporating V −1 into the weighting diagonalizes the problem and makes residualweighting compelling, but such an inner product is clearly impractical. However, wehave seen that weighting can improve GMRES convergence if the eigenvectors aremerely localized. This suggests a new weighting strategy that uses W = Z∗Z with

Z = SQ∗, S = diag(s1, . . . , sn),

WEIGHTED GMRES AND ARNOLDI ITERATIONS 19

0 5 10 15 20 25 30 35 40p

0

0.2

0.4

0.6

0.8

1

loc p

(Q$A

Q)

2d Laplacian

Orsirr 1

Sherman5

Add20

Fig. 12. Eigenvector localization measures locp(Q∗AQ) for the coefficient matrices transformedby the DCT matrix Q∗ = dct(eye(n)). The transformation reduces the localization of the Add20eigenvectors, while significantly enhancing localization for the three other matrices.

where Q is unitary and the weights s1, . . . , sn are derived from the residual just asbefore. Since GMRES on (A, b) in this W -inner product amounts to standard 2-normGMRES on (ZAZ−1, Zb) = (SQ∗AQS−1, SQ∗b), we seek Q in which the eigenvectorsof Q∗AQ are localized. When A comes from the discretization of a constant-coefficientdifferential operator, a natural choice for Q∗ is the discrete cosine transform ma-trix [51],6 since for many such problems the eigenvectors associated with the smallest-magnitude eigenvalues are low in frequency. We refer to the resulting algorithm asW-GMRES-DCT(m). In MATLAB notation, one could compute Q∗ = dct(eye(n)),a unitary but dense matrix that should never be formed; instead, it can be applied tovectors in O(n log n) time using the Fast Fourier Transform. In MATLAB code, theinner product becomes

〈x, y〉W = y∗QS∗SQ∗x = (SQ∗y)∗(SQ∗x) = (s.*dct(y))’*(s.*dct(x)),

where s is a column vector containing s1, . . . , sn. Alternatively, Q∗ could be the dis-crete Fourier transform matrix (fft(eye(n)), which would introduce complex arith-metic), a discrete wavelet transform, or any other transformation that localizes (i.e.,sparsifies) the eigenvectors of A (informed by properties of the motivating applica-tion). Figure 12 shows how the DCT completely alters the localization measure locp

defined in (4) for the four test matrices used in Figure 7.Presuming Q is unitary, one can expedite the Arnoldi process by only storing the

vectors Q∗vj , rather than the usual Arnoldi vectors vj . For example, in the W -innerproduct, the kth step of the conventional modified Gram–Schmidt Arnoldi process,

v = Avkfor j = 1, . . . , k

hj,k = 〈v, vj〉W = v∗jQS∗SQ∗v

v = v − hj,kvjend,

can be sped up by storing only the vectors uj := Q∗vj , giving

6In Strang’s parlance [51], MATLAB’s dct routine gives a DCT-3, scaled so the matrix is unitary.

20 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

u = Q∗Avk = Q∗AQukfor j = 1, . . . , k

hj,k = 〈v, vj〉W = u∗jS∗Su

u = u− hj,kujend.

This latter formulation, which we use in our numerical experiments, removes all appli-cations of Q from the innermost loop. It is mathematically equivalent to the conven-tional version and stable (using only unitary transformations), but over many cyclesone can observe small drift in the numerical behavior of the two implementations.

Example 12 (Sherman5, revisited with discrete cosine transform).Consider the Sherman5 matrix from Example 7, but now incorporate the DCT into theinner product. Compare the results in Figure 13 to the top plot in Figure 10; the DCTlocalizes the eigenvectors substantially, and now W-GMRES-DCT(30) slightly out-performs GMRES(30), while W-GMRES-DCT(100) converges in significantly feweriterations than GMRES(100). The DCT approach does not localize the eigenvectorsas effectively as for the diagonal matrix of eigenvalues (bottom plot in Figure 10), butit much more practical.

Example 13 (Discrete Laplacian and convection–diffusion).In Example 8 we argued that W-GMRES(m) converged better than GMRES(m) be-cause, although the eigenvectors of A are global, the weighted inner product disruptsthe cyclic pattern into which GMRES(m) can settle. Now adding the DCT to theweighting additionally localizes the eigenvectors, and, as Figure 14 shows, convergenceis further improved. Indeed, W-GMRES-DCT(20) requires roughly half as many iter-ations to converge as W-GMRES(20), and both of those are better than GMRES(20).

The eigenvectors of the discrete Laplacian are two-dimensional sinusoids, so theefficacy of the DCT is no surprise. The success of the method holds up when somenonsymmetry is added. Figure 15 compares GMRES(10), W-GMRES(10), and W-GMRES-DCT(10) for finite difference discretizations of the convection–diffusion op-erator Lu = −(uxx + uyy) + ux, again with Dirichlet boundary conditions and meshsize h = 1/100 (n = 992 = 9801). Due to the convection term ux the eigenvectors areno longer orthogonal, yet the DCT-enhanced weighting remains quite effective.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000matrix{vector products with A

10-6

10-4

10-2

100

krkk2kr0k2

GMRES(30)W-GMRES-DCT(30)

GMRES(100)

W-GMRES-DCT(100)

Fig. 13. GMRES(m) and W-GMRES-DCT(m) for the nonsymmetric Sherman5 matrix. TheDCT helps to localize the eigenvectors, resulting in significantly better convergence than seen forW-GMRES(m) in the top plot of Figure 10.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 21

0 500 1000 1500 2000 2500 3000matrix{vector products with A

10-8

10-6

10-4

10-2

100

krkk2kr0k2

GMRES(10)

W-GMRES-DCT(10)

GMRES(20)

W-GMRES-DCT(20)

Fig. 14. GMRES(m) and W-GMRES-DCT(m) for the discretization of the 2d Dirichlet Lapla-cian studied in Example 8. Again, the DCT localizes the eigenvectors, significantly improving theconvergence of W-GMRES(m) shown in Figure 11.

0 500 1000 1500 2000 2500 3000matrix{vector products with A

10-8

10-6

10-4

10-2

100

krkk2kr0k2

GMRES(10)W-GMRES(10)

W-GMRES-DCT(10)

Fig. 15. GMRES(10), W-GMRES(10), and W-GMRES-DCT(10) for the finite differencediscretization of a 2d convection–diffusion problem with Dirichlet boundary conditions.

5. Weighted GMRES-DR. To improve convergence, many modifications torestarted GMRES have been proposed; see, e.g., [3, 4, 11, 24, 42, 47, 54]. DeflatedGMRES methods compute and use approximate eigenvectors in an attempt to removesome troublesome eigenvalues from the spectrum; see [1, 2, 8, 10, 13, 19, 20, 25, 31,33, 38, 43] and references in [21]. Inner-product weighting has been incorporated intodeflated GMRES in [30, 36]. Here we apply weighting to the efficient deflation methodGMRES-DR [33].

As detailed in [33], GMRES-DR computes approximate eigenvectors while solvinglinear equations, using the eigenvectors to improve convergence of the linear equationsolver. We show how to restart the method with a change of inner product; cf. [14,Prop. 1]. The end of a GMRES-DR cycle in theW -inner product gives the Arnoldi-likerecurrence

(10) AVk = Vk+1Hk+1,k,

where Vk is an n × k matrix whose columns span a subspace of approximate eigen-vectors, Vk+1 is the same but with an extra column appended, and Hk+1,k is a full

22 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

(k + 1)× k matrix. Since this recurrence is generated with the W -inner product, thecolumns of Vk+1 are W -orthogonal, V ∗k+1WVk+1 = I, and Hk+1,k = V ∗k+1WAVk. We

wish to convert (10) to a new inner product, say defined by W . First, compute theCholesky factorization

V ∗k+1WVk+1 = R∗R,

where R is a (k + 1) × (k + 1) upper triangular matrix. Let Rk denote the leadingk × k submatrix of R, and set

Vk+1 := Vk+1R−1

andHk+1,k := RHk+1,kR

−1k .

Then we have the new Arnoldi-like recurrence in the W -norm:

AVk = Vk+1Hk+1,k

with V ∗k+1W Vk+1 = I and Hk+1,k = V ∗k+1WAVk. Building on this decomposition,W-GMRES-DR extends the approximation subspace via the Arnoldi process, andcomputes from this subspace the iterate that minimizes the W -norm of the residual.

Example 14. We apply W-GMRES-DR to the matrix Add20 from Example 1.Figure 16 shows convergence for GMRES(20) and GMRES-DR(20,5), with and with-out weighting. Weighting improves convergence more than deflating eigenvalues; com-bining the two strategies in W-GMRES-DR modestly improves convergence.

Example 15. For the matrix Sherman5, weighting did not help restarted GMRESin Example 7. Figure 17 adds GMRES-DR to the picture: deflating eigenvalues is veryimportant, and now adding weighting to GMRES-DR gives convergence in 534 fewermatrix-vector products with A.

6. Weighted restarted Arnoldi algorithm. Sorensen’s implicitly restartedArnoldi method [49] is a leading technique for computing eigenvalues and eigenvec-tors of large nonsymmetric matrices. Here we will use a mathematically equivalentrestarted Arnoldi given in [35] and for the symmetric case in [1, 55]. (For an ear-lier method that is mathematically equivalent at the end of each cycle, see [32].)

0 100 200 300 400 500 600 700 800matrix{vector products with A

10-10

10-8

10-6

10-4

10-2

100

GMRES(20)

GMRES-DR(20,5)

W-GMRES(20)

W-GMRES-DR(20,5)

krkk2

kr0k2

Fig. 16. Comparison of restarted GMRES, with and without weighting and deflation, for Add20.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 23

0 1000 2000 3000 4000 5000 6000 7000 8000matrix{vector products with A

10-8

10-6

10-4

10-2

100

GMRES(30)

W-GMRES(30)

GMRES-D

R(30,5)

W-GMRES-D

R(30,5)

krkk2kr0k2

Fig. 17. Comparison of methods with and without weighting and deflation for Sherman5.

The restarted Arnoldi method from [35] is also equivalent to the eigenvalue portionof GMRES-DR, except that it uses regular Rayleigh–Ritz eigenvalue approximation,rather than the harmonic Rayleigh–Ritz estimates used in GMRES-DR. So we againhave the Arnoldi-like recurrence (10), although the Vk+1 and Hk+1,k matrices differfrom those in GMRES-DR.

Zhong and Wu proposed a weighted thick-restart Arnoldi method with the weight-ing fixed for the entire run [57], while Saberi Najafi and Ghazvini proposed an ex-plicitly restarted weighted Arnoldi method where the weighting depends on a Ritzvector residual at each cycle [45]. We show how to incorporate variable weighting ina thick-restart Arnoldi method, and point out that such a method can target manyeigenvalues at the same time, since at then end of each cycle all the Ritz vectorresiduals are collinear.

Suppose we have just executed a cycle of length m in the W -norm, giving

AVm = Vm+1Hm+1,m,

where Vm+1 = [Vm vm+1]. The Arnoldi algorithm then approximates eigenvalues ofA with those of Hm,m, the m ×m upper portion of Hm+1,m. Call these eigenpairs(θi, gi) for i = 1, . . . ,m. The corresponding Ritz eigenvector estimates are yi := Vmgi.How close is (θi, yi) to being an eigenpair of A? Compute the residual

Ayi − θiyi = AVmgi − θiVmgi= Vm(Hm,mgi − θigi) + hm+1,mvm+1e

∗mgi

= hm+1,m(gi)mvm+1,(11)

where em denotes the last column of the m×m identity matrix and (gi)m is the mthentry of gi ∈ Cm. Notice from (11) that all the residuals Ayi − θiyi are parallel tovm+1. This suggests restarting the Arnoldi process with a new inner product defined

by a diagonal matrix W whose diagonal entries are a multiple of vm+1, so this oneweighting targets all eigenpairs being computed.

W-Arnoldi(m, k): Weighted Restarted Arnoldi Algorithm1. Start: Choose m, the maximum size of the subspace; k, the number of approx-

imate eigenvectors that are retained from one cycle to the next; and numev,

24 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

the desired number of eigenpairs. It is standard to keep more approximateeigenvectors than the desired number to help the convergence, so k > numev .Choose an initial vector w, and set the entries of the diagonal weighting ma-trix W equal to max(|w|/‖w‖∞, 10−8). Let v1 be w scaled to be a unit vectorin the W -norm.

2. Arnoldi iteration: Apply the weighted Arnoldi iteration from the currentpoint to generate the Arnoldi-like recurrence AVm = Vm+1Hm+1,m. The“current point” is either from v1 (if it is the first cycle) or from vk+1 (forsubsequent cycles). Here Hm+1,m is upper Hessenberg for the first cycle;subsequently, it is upper Hessenberg, aside from a full leading (k+1)×(k+1)portion. Note that V ∗m+1WVm+1 = I and Hm+1,m = V ∗m+1WAVm.

3. Small eigenvalue problem: Compute the k appropriate7 eigenpairs (θi, gi) ofHm,m, with the gi vectors normalized (in the 2-norm on Cm).

4. Check convergence: Compute residual norms and check convergence. (Seebelow for a discussion of computing the residual norm.) If all desired numeveigenpairs have acceptable residual norm, then stop, first computing eigen-vectors yi = Vmgi; otherwise continue. The next step begins the restart.

5. Orthonormalize the first k short vectors: Orthonormalize the gi vectors, 1 ≤i ≤ k, first separating into real and imaginary parts if complex, to form a realm × k matrix Pm,k. Both parts of complex vectors need to be included, sotemporarily reduce k by one if necessary (or k can be increased by one).

6. Extend Pm,k: Extend p1, . . . , pk to length m+ 1 by appending a zero to each,then set pm+1 = em+1, the (m+ 1)st coordinate vector of length m+ 1. LetPm+1,k+1 be the (m+ 1)× (k + 1) matrix with the pi vectors as columns.

7. Form the new H and V from the old H and V : To use later, save the lastcolumn of Vm+1: let w := vm+1. Then update Vk+1 ← Vm+1Pm+1,k+1 andHk+1,k ← P ∗m+1,k+1Hm+1,mPm,k.

8. Re-orthogonalize vk+1: Re-orthogonalize vk+1 in the W -inner product againstthe earlier columns of the new Vk+1.

9. Form the new weight matrix: Set the diagonal components of the weightmatrix W equal to max(|w|/‖w‖∞, 10−8), for w defined in step 7.

10. Convert the new Arnoldi-like recurrence to the new inner-product: ComputeT = V ∗k+1WVk+1 and its Cholesky factorization T = R∗R, where R is a(k + 1)×(k+1) upper triangular matrix. Let Rk denote the leading k×k por-tion of R. Next update Vk+1 ← Vk+1R

−1 and Hk+1,k ← RHk+1,kR−1k , giving

the new Arnoldi-like recurrence AVk = Vk+1Hk+1,k. Note that V ∗k+1WVk+1 =I and Hk+1,k = V ∗k+1WAVk. Go to step 2.

Full reorthogonalization is generally used with Arnoldi eigenvalue methods; see,e.g., [27, p. 50]. All experiments here use one step of such reorthogonalization in Step 2of the algorithm. (In Step 3 of this implementation of Arnoldi, one should specify the’nobalance’ option to eig in older versions of MATLAB.) To monitor convergence,one naturally checks the norm of eigenpair residuals. However, the weighted Arnoldialgorithm leads to several different options for both the residual and the norm.

• The eigenvalue problem in Step 2 gives the approximate eigenpairs (θi, yi),where yi = Vmgi, and residuals ri = Ayi−θiyi. Since yi and vm+1 areW -normunit vectors, the W -norm of the normalized residual ri := (Ayi−θiyi)/‖yi‖W

7One picks the k eigenvalues of Hm,m that most resemble those being sought, e.g., rightmost,largest magnitude, smallest magnitude. (All our examples seek the smallest magnitude eigenvalues.)

WEIGHTED GMRES AND ARNOLDI ITERATIONS 25

can be computed via a shortcut that avoids the need to compute yi = Vmgi:

(12) ‖ri‖W = ‖Ayi − θiyi‖W = hm+1,m |(gi)m|.

• The eigenvalues θi of Hm,m are Rayleigh quotients of A with respect to theW -inner product (and so they fall in the field of values of A with respect tothat inner product). Alternatively, one can use the approximate eigenvectorsyi to compute Rayleigh quotients ρi := y∗iAyi/‖yi‖22 in the Euclidean innerproduct. One could then monitor the 2-norm of the residual ri := (Ayi −ρiyi)/‖yi‖2. To do so, one must compute yi = Vmgi.

• A cheaper alternative (that avoids forming Ayi) is to use the Ritz value fromthe W -inner product, but measure the residual in the 2-norm. Using (11),the (normalized) residual ri := (Ayi − θiyi)/‖yi‖2 has 2-norm

‖ri‖2 =hm+1,m |(gi)m| ‖vm+1‖2

‖yi‖2.

This is the residual norm reported in our subsequent experiments. This for-mula does require formation of the Ritz vectors yi = Vmgi at the end of thecycle, at the cost of m× numev length-n vector operations (daxpy’s). To re-duce this expense, the residual norms could be checked occasionally, but notafter every cycle, or checked once the weighted residual norms (first optiondescribed above) have converged to a certain level.

Example 16. The smallest five eigenvalues λ1, . . . , λ5 of Add20 are approximately

5.995× 10−5, 6.007× 10−5, 6.062× 10−5, 6.110× 10−5, 6.127× 10−5.

As they are not well separated, these eigenvalues are difficult to compute. We attemptto find the first three of them with Arnoldi(20,8), which generates subspaces of di-mension m = 20 and saves k = 8 approximate eigenvectors at each restart. Figure 18shows the convergence for λ1 and λ3, with and without weighting. As with solvinglinear equations, weighting significantly improves the method for this matrix.

0 2000 4000 6000 8000 10000matrix{vector products with A

10-12

10-10

10-8

10-6

10-4

10-2

100

residual

norm

krik

2foreigenpair

W-Arnoldi(20,8), 6

1 W-Arnoldi(20,8), 63

Arnoldi(20,8), 63

Arnoldi(20,8), 61

Fig. 18. Arnoldi(20,8) and W-Arnoldi(20,8) performance for Add20, showing convergence ofthe residual norms for approximations to the first and third smallest magnitude eigenvalues.

26 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

0 2000 4000 6000 8000 10000matrix{vector products with A

10-6

10-4

10-2

100

102

104

resi

dual

nor

mkr

5k 2

for-ft

hei

genpai

r

Arnoldi(40,8)

W-A

rnoldi(40,8)

Arnoldi(20,8)

W-Arnoldi(20,8)

Fig. 19. Results for computing the fifth smallest magnitude eigenvalue of the Orsirr 1 matrixwith the standard and weighted Arnoldi methods. Weighting makes a particularly significant impactfor the smaller subspace dimension.

Example 17. Now consider the matrix Orsirr 1 from Example 9. As with theAdd20 matrix, the eigenvalue problem is more difficult than solving linear equations.First, Arnoldi(20,8) is applied with and without weighting and we monitor conver-gence of the fifth smallest magnitude eigenvalue, λ5; see Figure 19. Standard Arnoldihas trouble (it converges after 39,297 matrix-vector products); W-Arnoldi(20,8) per-forms much better. Arnoldi(40,8) and W-Arnoldi(40,8) give similar results: weightingis better, though by a less dramatic margin than for Arnoldi(20,8). As when solvinglinear equations, weighting tends to be most helpful for smaller subspaces.

As with linear equation solving, one might wonder if accentuating the weightmight improve convergence. To test this idea, pick some power p ≥ 0, and in steps 1and 7 of the algorithm set the diagonal weight matrix W to have (j, j) entry

max{(|wj |/‖w‖∞)p, 10−8}.

Example 18. Table 4 shows a more extensive investigation of whether weightingis typically helpful for computing eigenvalues, applying W-Arnoldi to 15 matrices.The first four have been considered earlier in this paper. The next 11 are from theNonsymmetric Eigenvalue Problem collection in Matrix Market [6]. For each set inthe collection one matrix is chosen, provided at least one of the Arnoldi(20,8) methodsconverged. We pick the smallest matrix from the set with n ≥ 1000, or the largestmatrix if n < 1000 for all matrices. If there is more than one matrix of the selectedsize, we sought to pick the tougher problem. In all cases we compute the smallestmagnitude eigenvalues. The test of convergence for the ith eigenvalue is ‖ri‖2/‖A‖1 <10−8. Arnoldi(20,8) is run without weighting (p = 0), with weighting (p = 1), andwith double weighting (p = 2) until the five smallest magnitude eigenvalues have allconverged. The second column of Table 4 records the value of loc10(A), defined inequation (4), the level of local support of the eigenvectors. The next three columnsshow the number of cycles needed for convergence. Weighting improves Arnoldi formost of these matrices, often significantly. However, there are also problems for whichthe standard unweighted method is better. Weighting with p = 2 is sometimes betterthan p = 1, but generally not by a large margin. The localization of the eigenvectors,as measured by loc10(A), appears to play a less significant role than it did for theW-GMRES method.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 27

Table 4Comparison of Arnoldi(20,8) without weighting, with weighting, and with double weighting for

several matrices. The number of cycles is given for each of the three methods, averaged over 10 trials.In the last two rows, some (but not all) of the runs required more than 3000 cycles. For such cases,the parenthetical quantities show the number of trials that completed, followed by the average cyclesonly of those successful runs. For example, for Tols1090 with p = 1, only one of ten runs completed,and it took 289 cycles.

matrix loc10(A) p = 0 p = 1 p = 2

Add20 0.9999 > 3000 603.1 345.2Sherman5 0.2531 597.1 543.3 453.9Laplacian 0.0712 58.7 51.5 47.7Orsirr 1 0.4439 2008.1 631.3 576.5Af23560 0.1702 1078.8 805.4 698.4Rdb1250l 0.2117 201.3 172.5 168.5Ck656 0.5872 98.7 67.4 65.1Dw2048 0.4000 396.8 294.9 303.2Olm1000 0.1925 640.7 572.9 622.1Lop163 0.9964 56.8 51.5 48.7Tub1000 0.3422 408.2 171.2 174.4Pde2961 0.3233 52.7 48.3 47.6Bwm2000 0.1073 1626.6 470.6 477.1Cdde6 0.9974 (6) 1106.8 (3) 907.7 1059.7Tols1090 0.9471 297.4 (1) 289.0 1583.3

Local support seems less important for eigenvalues than for linear equations, butwe suggest another possible reason why weighting can help, a reason that applies foreigenvalue computations but not solving linear equations. Increasing nonnormalityis generally not helpful for linear equations, but can assist eigenvalue computations.If several eigenvectors are skewed toward the same direction, approximating any ofthem helps convergence toward the others; see the related discussion in [52, chap. 28]and [5, Fig. 2.1]. An example showing this effect is next.

Example 19. Here we demonstrate that weighting can skew eigenvectors in a waythat assists convergence. While this example shows how this effect can be helpful,we do not claim it is the only mechanism responsible for the improved convergence ofW-Arnoldi. Consider the matrix and initial vector

A =

1 2 20 2 20 0 3

, v1 =

1.1650.6310.075

.For solving linear equations, GMRES(1) and W-GMRES(1) converge similarly. How-ever for finding one eigenvalue, W-Arnoldi(2,1) is much faster than Arnoldi(2,1), con-verging in 7 cycles instead of 25. W-Arnoldi(2,1) makes no progress in the first fourcycles, but drops three orders of magnitude in cycle 5. After cycle 4, the approximateeigenvector is y1 ≈ [0.994, 0.114, 0.0000238]T , so the last component is accurate, butthe second is too large. The residual vector is r1 ≈ [−0.0305, 0.0843, 0.0000414]T , andthe diagonal of the weighting matrix W = S∗S is a scaled version of this. The matrix

28 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

A = SAS−1 (discussed in section 3) and its eigenvectors are

A ≈

1 1.204 54.330 2 90.270 0 3

, 1

00

, 0.7692

0.63910

, 0.6700

0.74230.0082

.This A has a much larger departure from normality than A, and the eigenvectors areskewed to be large in only the first two components. This leads to needed improvementin the second component of the approximate eigenvector. The Ritz values for cycle 5are 1.00006 and 2.0008, so the method is able to focus on the first two componentsand remove much of the second component from the desired approximate eigenvector.The added departure from normality improves convergence.

7. Conclusions. Since its introduction by Essai [14], the Weighted GMRESalgorithm has been an intriguing method whose adoption has been impeded by anincomplete understanding of its convergence properties. In this manuscript we haveargued, through some simple analysis and numerous experiments, that its advantage isdue to (1) its ability to break repetitive cycles in restarted GMRES, and (2) its abilityto target small-magnitude eigenvalues that slow GMRES convergence, provided thecorresponding eigenvectors are localized. In cases where those vectors are not localizedthe proposed W-GMRES-DCT algorithm, which combines residual weighting with afast transform to sparsify eigenvectors, can significantly improve convergence.

Weighted inner products can also be applied to deflated GMRES methods, such asGMRES-DR. While the improvement appears to be less significant than for standardGMRES, weighting can sometimes reduce the number of matrix-vector products evenfurther than with the deflation alone or weighting alone. We also show how weightingcan be incorporated into a thick-restart Arnoldi method for eigenvalue computations,where residual-based weighting can improve all eigenvector estimates simultaneously.Here it appears that in some situations the tendency of weighting to increase nonnor-mality can improve convergence.

Many topics related to weighted inner products merit further investigation, suchas additional sparsifying transformations or more general non-diagonal weighting ma-trices, and the potential of using weighting to avoid near-breakdown cases in thenonsymmetric Lanczos [34] method.

Acknowledgments. We thank Jorg Liesen, Jennifer Pestana and Andy Wathenfor helpful conversations about this work. ME is grateful for the support of theEinstein Stiftung Berlin, which allowed him to visit the Technical University of Berlinwhile working on this project. RBM appreciates the support of the Baylor UniversitySabbatical Program.

REFERENCES

[1] A. M. Abdel-Rehim, R. B. Morgan, D. A. Nicely, and W. Wilcox, Deflated and restartedsymmetric Lanczos methods for eigenvalues and linear equations with multiple right-handsides, SIAM J. Sci. Comput., 32 (2010), pp. 129–149.

[2] J. Baglama, D. Calvetti, G. H. Golub, and L. Reichel, Adaptively preconditioned GMRESalgorithms, SIAM J. Sci. Comput., 20 (1998), pp. 243–269.

[3] A. H. Baker, E. R. Jessup, and T. V. Kolev, A simple strategy for varying the restartparameter in GMRES(m), J. Comput. Appl. Math., 230 (2009), pp. 751–761.

[4] A. H. Baker, E. R. Jessup, and T. Manteuffel, A technique for accelerating the convergenceof restarted GMRES, SIAM J. Matrix Anal. Appl., 26 (2005), pp. 962–984.

[5] C. Beattie, M. Embree, and J. Rossi, Convergence of restarted Krylov subspaces to invariantsubspaces, SIAM J. Matrix Anal. Appl., 25 (2004), pp. 1074–1109.

WEIGHTED GMRES AND ARNOLDI ITERATIONS 29

[6] R. Boisvert, R. Pozo, K. Remington, B. Miller, and R. Lipman, Matrix Market, 2004.Web site: http://math.nist.gov/MatrixMarket.

[7] J. H. Bramble and J. E. Pasciak, A preconditioning technique for indefinite systems resultingfrom mixed approximations of elliptic problems, Math. Comp., 50 (1988), pp. 1–17.

[8] K. Burrage and J. Erhel, On the performance of various adaptive preconditioned GMRESstrategies, Numer. Linear Algebra Appl., 5 (1998), pp. 101–121.

[9] Z. Cao and X. Yu, A note on weighted FOM and GMRES for solving nonsymmetric linearsystems, Appl. Math. Comput., 151 (2004), pp. 719–727.

[10] A. Chapman and Y. Saad, Deflated and augmented Krylov subspace techniques, Numer. LinearAlgebra Appl., 4 (1997), pp. 43–66.

[11] E. De Sturler, Truncation strategies for optimal Krylov subspace methods, SIAM J. Numer.Anal., 36 (1999), pp. 864–889.

[12] M. Embree, The tortoise and the hare restart GMRES, SIAM Review, 45 (2003), pp. 259–266.[13] J. Erhel, K. Burrage, and B. Pohl, Restarted GMRES preconditioned by deflation, J. Comp.

Appl. Math., 69 (1996), pp. 303–318.[14] A. Essai, Weighted FOM and GMRES for solving nonsymmetric linear systems, Numer. Al-

gorithms, 18 (1998), pp. 277–292.[15] V. Faber, W. Joubert, E. Knill, and T. A. Manteuffel, Minimal residual method stronger

than polynomial preconditioning, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 707–729.[16] V. Faber, J. Liesen, and P. Tichy, The Faber–Manteuffel theorem for linear operators, SIAM

J. Numer. Anal., 46 (2008), pp. 1323–1337.[17] V. Faber and T. A. Manteuffel, Necessary and sufficient conditions for the existence of a

conjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 352–362.[18] B. Fischer, A. Ramage, D. J. Silvester, and A. J. Wathen, Minimum residual methods for

augmented systems, BIT, 38 (1998), pp. 527–543.[19] J. Frank and C. Vuik, On the construction of deflation-based preconditioners, SIAM J. Sci.

Statist. Comput., 23 (2001), pp. 442–462.[20] L. Giraud, S. Gratton, X. Pinel, and X. Vasseur, Numerical experiments on a flexible

variant of GMRES-DR, Proc. Appl. Math. Mech., 7 (2008), pp. 1020701–1020702.[21] M. H. Gutknecht, Spectral deflation in Krylov solvers: a theory of coordinate space based

methods, Elect. Trans. Numer. Anal., 39 (2012), pp. 156–185.[22] M. H. Gutknecht and D. Loher, Preconditioning by similarity transformations: another

valid option? Abstract, GAMM Workshop on Numerical Linear Algebra, 2001.[23] S. Guttel and J. Pestana, Some observations on weighted GMRES, Numer. Alg., 67 (2014),

pp. 733–752.[24] W. Joubert, On the convergence behavior of the restarted GMRES algorithm for solving

nonsymmetric linear systems, Numer. Linear Algebra Appl., 1 (1994), pp. 427–447.[25] S. A. Kharchenko and A. Y. Yeremin, Eigenvalue translation based preconditioners for the

GMRES(k) method, Numer. Linear Algebra Appl., 2 (1995), pp. 51–77.[26] A. Klawonn, Block-triangular preconditioners for saddle point problems with a penalty term,

SIAM J. Sci. Comput., 19 (1998), pp. 172–184.[27] R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users’ Guide: Solution of Large-

Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, SIAM, Philadelphia,1998.

[28] J. Liesen and B. N. Parlett, On nonsymmetric saddle point matrices that allow conjugategradient iterations, Numer. Math., 108 (2008), pp. 605–624.

[29] J. Liesen and Z. Strakos, Krylov Subspace Methods: Principles and Analysis, Oxford Uni-versity Press, Oxford, 2013.

[30] B. Meng, A weighted GMRES method augmented with eigenvectors, Far East J. Math. Sci.,34 (2009), pp. 165–176.

[31] R. B. Morgan, A restarted GMRES method augmented with eigenvectors, SIAM J. MatrixAnal. Appl., 16 (1995), pp. 1154–1171.

[32] , On restarting the Arnoldi method for large nonsymmetric eigenvalue problems, Math.Comp., 65 (1996), pp. 1213–1230.

[33] , GMRES with deflated restarting, SIAM J. Sci. Comput., 24 (2002), pp. 20–37.[34] R. B. Morgan and D. A. Nicely, Restarting the nonsymmetric Lanczos algorithm for eigen-

values and linear equations including multiple right-hand sides, SIAM J. Sci. Comput., 33(2011), pp. 3037–3056.

[35] R. B. Morgan and M. Zeng, A harmonic restarted Arnoldi algorithm for calculating eigen-values and determining multiplicity, Linear Algebra Appl., 415 (2006), pp. 96–113.

[36] Q. Niu, L. Lu, and J. Zhou, Accelerate weighted GMRES by augmenting error approximations,Inter. J. Comput. Math., 87 (2010), pp. 2101–2112.

30 MARK EMBREE, RONALD B. MORGAN, AND HUY NGUYEN

[37] C. C. Paige, M. Rozloznık, and Z. Strakos, Modified Gram–Schmidt (MGS), least squares,and backward stability of MGS-GMRES, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 264–284.

[38] M. L. Parks, E. de Sturler, G. Mackey, D. D. Johnson, and S. Maiti, Recycling Krylovsubspaces for sequences of linear systems, SIAM J. Sci. Comput., 28 (2006), pp. 1651–1674.

[39] J. Pestana and A. J. Wathen, On choice of preconditioner for minimum residual methodsfor non-Hermitian matrices, J. Comp. Appl. Math., 249 (2013), pp. 57–68.

[40] M. Rozloznık, M. Tuma, A. Smoktunowicz, and J. Kopal, Numerical stability of orthogo-nalization methods with a non-standard inner product, BIT, 52 (2012), pp. 1035–1058.

[41] Y. Saad, Variations on Arnoldi’s method for computing eigenelements of large unsymmetricmatrices, Linear Algebra Appl., 34 (1980), pp. 269–295.

[42] Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM J. Sci. Statist. Com-put., 14 (1993), pp. 461–469.

[43] , Analysis of augmented Krylov subspace techniques, SIAM J. Matrix Anal. Appl., 18(1997), pp. 435–449.

[44] Y. Saad and M. H. Schultz, GMRES: a generalized minimum residual algorithm for solvingnonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856–869.

[45] H. Saberi Najafi and H. Ghazvini, Weighted restarting method in the weighted Arnoldialgorithm for computing eigenvalues of a nonsymmetric matrix, Appl. Math. Comp., 175(2006), pp. 1276–1287.

[46] H. Saberi Najafi and H. Zareamoghaddam, A new computational GMRES method, Appl.Math. Comput., 199 (2008), pp. 527–534.

[47] V. Simoncini, A new variant of restarted GMRES, Numer. Linear Algebra Appl., 6 (1999),pp. 61–77.

[48] G. D. Smith, Numerical Solution of Partial Differential Equations: Finite Difference Methods,Oxford University Press, Oxford, third ed., 1985.

[49] D. C. Sorensen, Implicit application of polynomial filters in a k-step Arnoldi method, SIAMJ. Matrix Anal. Appl., 13 (1992), pp. 357–385.

[50] M. Stoll and A. Wathen, Combination preconditioning and the Bramble–Pasciak+ precon-ditioner, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 582–608.

[51] G. Strang, The discrete cosine transform, SIAM Review, 41 (1999), pp. 135–147.[52] L. N. Trefethen and M. Embree, Spectra and Pseudospectra: The Behavior of Nonnormal

Matrices and Operators, Princeton University Press, Princeton, NJ, 2005.[53] H. A. van der Vorst and C. Vuik, The superlinear convergence of GMRES, J. Comp. Appl.

Math., 48 (1993), pp. 327–341.[54] H. A. van der Vorst and C. Vuik, GMRESR: A family of nested GMRES methods, Numer.

Linear Algebra Appl., 1 (1994), pp. 369–386.[55] K. Wu and H. Simon, Thick-restart Lanczos method for symmetric eigenvalue problems, SIAM

J. Matrix Anal. Appl., 22 (2000), pp. 602–616.[56] B. Zhong and R. B. Morgan, Complementary cycles of restarted GMRES, Numer. Linear

Algebra Appl., 15 (2008), pp. 559–571.[57] H. Zhong and G. Wu, Thick restarting the weighted harmonic Arnoldi algorithm for large

interior eigenproblems, Inter. J. Comp. Math., 88 (2011), pp. 994–1012.