WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

70
Alma Mater Studiorum · Università di Bologna SCUOLA DI SCIENZE Corso di Laurea in Informatica WEIGHTED GEOMETRIC MEAN OF LARGE-SCALE MATRICES: NUMERICAL ANALYSIS AND ALGORITHMS Relatore Laureando Chiar.mo Prof. Massimiliano Fasi Valeria Simoncini Correlatore Chiar.mo Prof. Bruno Iannazzo Sessione III Anno Accademico 2013–2014

Transcript of WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Page 1: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Alma Mater Studiorum · Università di Bologna

SCUOLA DI SCIENZECorso di Laurea in Informatica

WEIGHTED GEOMETRIC MEAN

OF LARGE-SCALE MATRICES:

NUMERICAL ANALYSIS AND ALGORITHMS

Relatore LaureandoChiar.mo Prof. Massimiliano Fasi

Valeria Simoncini

CorrelatoreChiar.mo Prof.Bruno Iannazzo

Sessione IIIAnno Accademico 2013–2014

Page 2: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...
Page 3: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Table of contents

Table of contents ii

List of Illustrations iii

Abstract v

Sommario vii

1 Preliminaries 11.1 Linear algebra notation . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Positive and non-negative definite matrices . . . . . . . . . . . . . . 21.3 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Gauss–Jacobi quadrature . . . . . . . . . . . . . . . . . . . . 51.4 Gram–Schmidt orthogonalization . . . . . . . . . . . . . . . . . . . 6

2 Matrix weighted geometric mean 112.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Quadrature formulae 173.1 The Cauchy formula . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Gauss–Jacobi quadrature formulae . . . . . . . . . . . . . . . . . . 183.3 Contour integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Krylov subspace methods 234.1 Polynomial Krylov methods . . . . . . . . . . . . . . . . . . . . . . 244.2 Extended Krylov methods . . . . . . . . . . . . . . . . . . . . . . . 304.3 Rational Krylov methods . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Experimental results 395.1 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1.1 Computing the weighted geometric mean of matrices . . . . 39

i

Page 4: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

ii TABLE OF CONTENTS

5.1.2 Test matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Quadrature methods . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Krylov subspace methods . . . . . . . . . . . . . . . . . . . . . . . 45

6 Conclusion 51

A MATLAB implementations 53A.1 Polynomial Krylov method . . . . . . . . . . . . . . . . . . . . . . . 53A.2 Extended Krylov method . . . . . . . . . . . . . . . . . . . . . . . . 55

Bibliography 60

Page 5: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

List of Illustrations

Algorithms1.1 Classical Gram–Schmidt procedure . . . . . . . . . . . . . . . . . . . 71.2 Modified Gram–Schmidt procedure . . . . . . . . . . . . . . . . . . . 8

4.1 Arnoldi algorithm for the computation of f(Z)b . . . . . . . . . . . 274.2 Extended Arnoldi algorithm for the computation of f(Z)b . . . . . . 32

Figures4.1 Incremental update of Hi . . . . . . . . . . . . . . . . . . . . . . . 294.2 Block structure of Ti . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1 Convergence profiles – quadrature formulas . . . . . . . . . . . . . . 435.2 Convergence profiles – Krylov methods I . . . . . . . . . . . . . . . 485.3 Convergence profiles – Krylov methods II . . . . . . . . . . . . . . . 49

Tables5.1 Implemented methods . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Test matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Execution time – quadrature formulae . . . . . . . . . . . . . . . . 445.4 Execution time – Krylov methods . . . . . . . . . . . . . . . . . . . 46

iii

Page 6: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

iv TABLE OF CONTENTS

Page 7: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Abstract

Computing the weighted geometric mean of large sparse matrices is an operationthat tends to become rapidly intractable, when the size of the matrices involvedgrows. However, if we are not interested in the computation of the matrix functionitself, but just in that of its product times a vector, the problem turns simplerand there is a chance to solve it even when the matrix mean would actually beimpossible to compute. Our interest is motivated by the fact that this calculationhas some practical applications, related to the preconditioning of some operatorsarising in domain decomposition of elliptic problems.

In this thesis, we explore how such a computation can be efficiently performed.First, we exploit the properties of the weighted geometric mean and find severalequivalent ways to express it through real powers of a matrix. Hence, we focusour attention on matrix powers and examine how well-known techniques can beadapted to the solution of the problem at hand. In particular, we consider twobroad families of approaches for the computation of f(A)v, namely quadratureformulae and Krylov subspace methods, and generalize them to the pencil casef(A−1B)v.

Finally, we provide an extensive experimental evaluation of the proposed al-gorithms and also try to assess how convergence speed and execution time areinfluenced by some characteristics of the input matrices. Our results suggest thata few elements have some bearing on the performance and that, although thereis no best choice in general, knowing the conditioning and the sparsity of the ar-guments beforehand can considerably help in choosing the best strategy to tacklethe problem.

v

Page 8: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

vi ABSTRACT

Page 9: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Sommario

Calcolare la media geometrica pesata di matrici sparse è un problema che tendea diventare rapidamente intrattabile col crescere delle dimensione delle matricistesse. Tuttavia, nel caso in cui si sia interessati a conoscere non la media inquestione, ma soltanto il prodotto di tale quantità per un vettore, il problemasi semplifica notevolmente, e diventa possibile considerare matrici che sarebberoaltrimenti fuori portata. L’interesse nei confornti di questa operazione è motivatoda alcune ripercussioni di carattere applicativo. Ad esempio, può essere utilenel precondizionamento di alcuni operatori che compaiono nelle decomposizionidi dominio per prolemi ellittici, come l’operatore biarmonico e quello di Steklov–Poincaré.

In questo lavoro di tesi si è seguito un duplice approccio. Da una parte, siè cercato di sfruttare le proprietà della media geometrica pesata per esprimereil problema in termini di potenze di matrice ad esponente reale. D’altro canto,si sono esaminati metodi noti per il calcolo di f(A)v e si è tentato di adattarlial problema della media geometrica pesata. In particolare, si sono generalizzatiformule di quadrature e metodi dei sottospazi di Krylov al caso del calcolo delprodotto f(A−1B)v.

Infine, si è provveduto ad un’analisi sperimentale delle performance degli algo-ritmi presentati, sia in termini di velocità di convergenza che in termini di tempodi esecuzione effettivo. I nostri esperimenti, orientati a valutare l’impatto dellematrici in input su accuratezza della soluzione e prestazioni dell’algoritmo, sug-geriscono che, sebbene non esista una strategia migliore in assoluto, una sceltabasata sulla conoscenza di condizionamento e sparsità delle matrici in gioco garan-tisce le migliori prestazioni.

vii

Page 10: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

viii SOMMARIO

Page 11: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 1

Preliminaries

1.1 Linear algebra notation

In order to establish the notation used throughout the thesis, we put here a shortreference section for the convenience of the reader.

Let us denote by R and C the set of real and complex scalars, respectively. Weextend this notation to matrices so that Rm×n indicates the space of real matriceswith m rows and n columns, and Cm×n the space of complex matrices havingthe same dimensions. When n = 1 we drop the superscript and use the shorternotations Rm for the real vector space of dimension m and the same for Cm. Letus stress that in general we consider column vectors, unless otherwise specified.

To denote the elements of these sets, we use lower-case italic Latin or Greekletters for scalars, lower-case bold Latin letters for vectors in Rm and Cm andupper-case bold Latin letters for matrices. We use the notation zi,j to denote theelement at row i and column j of Z ∈ Cm×n, implicitly assuming 1 ≤ i ≤ m and1 ≤ j ≤ n.

When dealing with matrices, two very common unary operations are transpo-sition and conjugate transposition. The transpose of a matrix Z ∈ Cn×n is thematrix X = Zᵀ such that xi,j = zj,i, whereas its conjugate transpose or adjoint isY = Z∗ such that yi,j = zj,i. We call Z symmetric if Zᵀ = Z and Hermitian ifZ∗ = Z. For the sake of brevity, we denote the set of Hermitian matrices of sizen by Hn.

The identity matrix, that is a diagonal matrix D ∈ Cn×n such that di,j = 1 ifi = j and di,j = 0 otherwise, for 1 ≤ i, j ≤ n, is denoted by In. A square matrixZ ∈ Cn×n is invertible if there exists a matrix Z−1 such that Z−1Z = ZZ−1 = In.The set of invertible matrices of size n× n is a subset of Cn×n usually denoted byGL (n). Two important classes of invertible matrices are unitary and orthogonalmatrices.

1

Page 12: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

2 CHAPTER 1. PRELIMINARIES

A matrix Q ∈ Cn×n is unitary if Q∗Q = QQ∗ = In, orthogonal when QᵀQ =QQᵀ = In. Clearly, if Q ∈ Rn×n, then it is orthogonal if and only if it is unitary,since Qᵀ = Q∗ in that case.

A matrix Z ∈ Cn×n is normal if Z∗Z = ZZ∗. Let us note that Hermitian andunitary matrices are normal, since they trivially verify the definition. Moreover, itis easy to observe that, if we restrict ourselves to the real case, then also symmetricand orthogonal matrices are normal.

Let Z ∈ Cn×n be a square matrix. We use λi for 1 ≤ i ≤ n to indicatethe eigenvalues of Z, i.e. the scalars such that Zb = λib for some eigenvectorb ∈ Cn \ {0}. The set of all the distinct eigenvalues of Z, usually called thespectrum of Z, is denoted by σ (Z).

1.2 Positive and non-negative definite matricesPositive definite matrices are well-known and widely-used in a variety of scientificfields, mainly because of the nice properties they show. These matrices arisein a broad range of applications: discretization matrices for partial differentialequations in engineering and physics, covariance matrices in statistics, elementsof the search space in convex and semidefinite programming, kernels in machinelearning, density matrices in quantum information, data points in radar imagingand diffusion tensors in medical imaging [51]. In this section we define them,establish some notation and give a few useful characterization.

A matrix Z ∈ Hn is Hermitian positive semidefinite if for any x ∈ Cn \ {0} itholds that

x∗Z x ≥ 0,

while is Hermitian positive definite when the inequality holds strictly, i.e. when

x∗Z x > 0.

We denote the set of Hermitian positive definite matrices of size n× n by Pn andadopt the symbol P0

n when referring to the former case. Clearly we have thatPn ⊂ P0

n and it follows immediately from the definitions that a Hermitian positivesemidefinite matrix is positive definite if and only if is invertible.

For the sake of brevity, we often drop the word Hermitian, so that positivesemidefinite stands for Hermitian positive semidefinite and positive definite forHermitian positive definite.

Positive definite matrices can be characterized in several ways. We presenthere just a few conditions which are equivalent to the above definition, referringthe reader to the comprehensive work of Bathia [35, Sec. 1.1] for a larger list.

Theorem 1. Let Z ∈ Cn×n be Hermitian. Then the following are equivalent:

Page 13: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

1.3. NUMERICAL INTEGRATION 3

i. Z ∈ Pn;

ii. σ (Z) ⊂ R+;

iii. there exists B ∈ GL (n) such that B∗B = Z.

Let us conclude by pinpointing that it is possible to define on Pn a total orderrelation, which is especially used in operator theory and can be of use when dealingwith monotonicity properties of matrix operators. For any pair X,Y ∈ Pn we saythat X ≤ Y if for all vectors z ∈ Cn we have that z∗Xz ≤ z∗Y z.

1.3 Numerical integration

In this section, we briefly recall some notions regarding the numerical evaluation ofdefinite integrals over a bounded interval. In particular, we consider the so-calledapproximate quadrature techniques, illustrate how weighted Gauss formulae canbe derived and present into some details the Gauss–Jacobi quadrature method.Let a, b ∈ R and let f : [a, b]→ R be integrable on the closed interval [a, b]. Thus,we can approximate the integral of f between a and b using a linear combinationof the values of the integrand∫ b

a

f(x) dx ≈p∑i=1

wif(xi), (1.1)

for any p ∈ N [13, Chap. 1]. The points x1, . . . , xp, usually chosen inside the inter-val of integration, are called nodes, whereas the (positive) coefficients w1, . . . , wpare called weights.

A good question one might be tempted to ask about these approximations iswhether there exists a class C of functions for which the equation (1.1) is exact,i.e. such that for any f ∈ C, it holds that∫ b

a

f(x) dx−p∑i=1

wif(xi) = 0.

It can be easily shown [12, Sect. 4.3] that, if we assume the xi to be fixed, thenthere exists a choice of the wi such that the quadrature formula is exact for anypolynomial in Rp−1 [x], that is the space of polynomials of real variable of degreeat most p−1. This restriction makes sense when dealing with tabulated functions,whose value is known just for a fixed set of predetermined points. However, if weremove this limitation, then it can be shown that with a suitable choice of nodes

Page 14: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4 CHAPTER 1. PRELIMINARIES

and weights, the formula (1.1) becomes exact for any polynomial inR2p−1 [x], sincethe constants we can work with are now 2p [12, Sect. 4.3].

To build such quadrature rules, let us consider two functions f, g : [a, b]→ C

and their inner product with respect to a positive weight function w : R→ R+,which is defined as

〈f, g〉 =

∫ b

a

w(x)f(x)g(x) dx.

Two functions f and g such that f 6= g are said to be orthogonal to the weightw(x) over the interval [a, b] if

〈f, g〉 =

∫ b

a

w(x)f(x)g(x) dx = 0.

It is well known [7] that for any weight function there exists a sequence of poly-nomials {pi}i∈N such that any polynomial pi is exactly of degree i and 〈pi, pj〉 = 0for i 6= j. Moreover, each one of the pi can be normalized, i.e. multiplied by anappropriate parameter such that 〈pi, pj〉 = 1 for i = j and 〈pi, pj〉 = 0 otherwise. Asequence {pi}i∈N of orthogonal normalized polynomials is called orthonormal. Fora thorough introduction to the theoretical aspects of orthogonal (and orthonormal)polynomials we refer the interested reader to the classical book by Szegő [2].

Let us now denote by θi the leading coefficient of pi, i.e. the coefficient of xi,which we can assume without restriction to be positive. The following classicalresult [13, Sect. 2.7] gives a simple rule to obtain weights and nodes of a quadratureapproximation rule, given the integration interval and the weight function.

Theorem 2 (Davis, Rabinowitz). Let w : [a, b] → R be an admissible weightfunction and let {pi}i∈N be the sequence of polynomials orthonormal on [a, b] withrespect to w. Let the zeros of pi be x1, . . . , xi such that

a < x1 < · · · < xj < · · · < xi < b,

and let the weights for 1 ≤ j ≤ i be defined by

wj = −θi+1

θi

1

pi+1(xj)p′i(xj)

.

Then for any polynomial p ∈ R2i−1 [x] it holds that∫ b

a

w(x)p(x) dx =i∑

j=1

wjp(xj).

When nodes and weights are determined as in Theorem 2, the resulting inte-

Page 15: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

1.3. NUMERICAL INTEGRATION 5

gration rule is said to be of Gauss type. It is customary to add the name of thefamily of orthogonal polynomials after the name of Gauss, so that it is commonto find expressions like Gauss–Chebyshev, Gauss–Jacobi, Gauss–Gegenbauer, andso on.

The most common weight functions correspond to well established families oforthogonal polynomials, for which simple expressions to determine the values ofθi as well as the roots of pi for any i are known [9, Chap. 22]. For less commonchoices of weight functions, there exist methods to derive such data [7].

Indeed the orthogonal polynomials we are discussing usually contain some kindof exponential factor, that makes the computation of their roots ill-conditionedeven for small values of i. In order to cope with these numerical issues, a greatvariety of methods have been developed. The older technique we are aware of isthe so called Golub–Welsch algorithm, which exploits the fact that for any sequence{pi}i∈N of orthogonal polynomials there exists a three-term recurrence of the form

pj(x) = (x− βj)pj−1(x)− γjpj−2(x), j = 2, 3, 4, . . .

p1(x) = (x− β1)p0(x),

p0(x) = 1,

(1.2)

where βj and γj can be found for any j by simply imposing the orthogonalitycondition [6, Sect. 1.2]. Golub and Welsch [7] show how to use the coefficients ofthe recursion (1.2) to build a matrix such that its eigenvalues are the quadraturenodes and the weights are the first components of each eigenvector. In addition,they give a method to find both nodes and weights with a computational costwhich is quadratic in the number of nodes. Another approach with the samecomplexity is the one developed by Petras [26], who derives from the recurrence(1.2) a Newton iterate that converges to the roots of the orthogonal polynomials.An improved version of this method can be found in the work of Glaser, Liu andRokhlin [36], where a Newton-based algorithm with linear complexity is developed.

For our implementations, we use another linear procedure, also based on theNewton iteration, developed by Hale and Townsend [50], whose efficient imple-mentation is readily available as part of the Chebfun package [52].

1.3.1 Gauss–Jacobi quadrature

Gauss–Jacobi integration relies on the so-called Jacobi polynomials, which areorthogonal to the weight function w(x) = (1 − x)α(1 + x)β over [−1, 1], providedthat α, β > −1.

Given the two parameters α and β, there exist several equivalent ways to definethe Jacobi polynomial of degree p, which we denote by J

(α,β)p . Since we do not

Page 16: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

6 CHAPTER 1. PRELIMINARIES

actually need nor use any of these representations, we do not report any of themhere, but refer the reader to the extended discussion of Szegő [2, Chap. IV].

The Gauss–Jacobi quadrature formula we are interested in is thus of the form[12, Sec. 4.8-1] ∫ 1

−1

(1− x)α(1 + x)βf(x) dx ≈p∑i=1

wi f(xi),

where x1, . . . , xp are roots of the Jacobi polynomial of order p and, according toTheorem 2, the weights are

wi = −2p+ α + β + 2

p+ α + β + 1

Γ(p+ α + 1)Γ(p+ β + 1)

Γ(p+ α + β + 1)(p+ 1)!

2α+β

J′(α,β)p (xi)J

(α,β)p+1 (xi)

,

where Γ is the Euler Gamma function [3].The derivative of J (α,β)

p , required to compute wi is easy to derive and can beexpressed by means of a Jacobi polynomial of smaller degree [2, p. 4.21.7]

dJ(α,β)p (z)

dz=p+ α + β + 1

2J

(α+1,β+1)p−1 (z).

1.4 Gram–Schmidt orthogonalization

We briefly present a well-known method to compute an orthonormal basis of thevector space spanned by a set of linearly independent vectors. Then we adaptit to the standard case required when implementing Krylov subspace methods,i.e. orthonormalizing a few new vectors with respect to an already orthonormalset. The picture is complicated by the fact that weaker assumptions are required,since we do not have any guarantee that all these new vectors are not a linearcombination of vectors of the original set.

Thus, let Q = {q1, . . . , qm}, with qi ∈ Cn for 1 ≤ i ≤ m. The set Q islinearly independent if the null vector (0, . . . , 0)ᵀ ∈ Cn cannot be written as alinear combination of its element, that is, there do not exist α1, . . . , αm ∈ C suchthat

0 =m∑i=0

αiqi. (1.3)

Clearly, for m > 1, this is equivalent to say that none of the qi can be written asa linear combination of the other elements of Q.

Let us now define the inner product 〈x,y〉M = x∗My, for x,y ∈ Cn andM ∈ Pn, and its induced norm ‖x‖M =

√〈x,x〉M . We retain the most nat-

Page 17: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

1.4. GRAM–SCHMIDT ORTHOGONALIZATION 7

Algorithm 1.1: Classical Gram–Schmidt procedureInput: q1, . . . , qm ∈ Cn, M ∈ PnOutput: v1, . . . ,vm ∈ Cn M -orthonormal or an error

1 v1 = q1‖q1‖M

2 for i← 1 to m− 1 do3 yi ← 0

4 for j ← 1 to i− 1 do5 yi ← yi + 〈vj, qi+1〉M vj6 end7 yi ← qi+1 − yi8 αi ← ‖yi‖M9 if αi 6= 0 then

10 vi+1 ← yi

αi

11 else12 error: q1, . . . , qm are not linearly independent13 end14 end

ural definition of orthonormality, and thus we say that two vectors x and y areM -orthonormal if

〈x,y〉M =

{0 x 6= y,

1 otherwise,(1.4)

and that a set is M -orthonormal if its vectors are pairwise M -orthonormal. Letus pinpoint that this condition is equivalent to say that the vectors are pairwiseM -orthogonal, i.e. they respect the first case of definition (1.4) and have unitaryM -norm.

Now, we want to compute an orthonormal set V = {v1, . . . ,vm} such thatspan (V) = span (Q). To achieve this, we proceed inductively. The first step of thealgorithm just requires the M -normalization of q1 and thus

v1 =q1

‖q1‖M.

At step i+ 1, we orthogonalize qi+1 with respect to v1, . . . ,vi by subtracting fromqi+1 its projections onto v1, . . . ,vi and then M -normalizing the remainder. In

Page 18: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

8 CHAPTER 1. PRELIMINARIES

Algorithm 1.2: Modified Gram–Schmidt procedureInput: q1, . . . , qm ∈ Cn, M ∈ PnOutput: v1, . . . ,vm ∈ Cn M -orthonormal

1 for i← 1 to m do2 αi ← ‖qi‖M3 if αi 6= 0 then4 vi ← qi

αi

5 else6 error: q1, . . . , qm are not linearly independent7 end8 for j ← i+ 1 to m do9 qj ← qj − 〈vi, qj〉M vi

10 end11 end

other words we compute

yi = qi+1 −i∑

j=1

〈vj, qi+1〉M〈vj,vj〉M

vj,

and then add to the M -orthonormal set the vector

vi+1 =yi‖yi‖M

. (1.5)

This procedure, called classical Gram–Schmidt (CGS) method, is summed upby Algorithm 1.1. Since the assumption that Q is a linearly independent setcannot be guaranteed in the setting we are interested in, we exchange it with thepossibility of throwing an exception. In fact, let us assume that a vector qi is alinear combination of other vectors of Q having smaller index. Then at line 7 weget that yi−1 = 0, since qi belongs to the subspace generated by some of the vjfor 1 ≤ j < i and can be written as the sum of its projections. The null vectorcannot be normalized and thus the Gram–Schmidt procedure fails. Algorithm 1.1explicitly checks this condition and warns the user when this happens.

It is well known [5] [22, Sect. 5.2.8] though, that the CGS method tends tobehave poorly from a numerical point of view, and a severe loss of orthogonalityarises even when the cardinality of Q is limited. A way to reduce the impact ofthis numerical issue is represented by themodified Gram–Schmidth (MGS) process,whose pseudocode is presented in Algorithm 1.2. Let us pinpoint that although

Page 19: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

1.4. GRAM–SCHMIDT ORTHOGONALIZATION 9

these two formulations are mathematically equivalent [29, Sect. 18.7], there aretwo aspects that make this variation preferable from a computational point ofview.

A minor difference is the order in which the operations are performed. Indeed,while the CGS method updates each vector just once, the MGS procedure modifiesthe i-th vector exactly i − 1 times and all the projections on a certain vector areperformed together. Moreover, in the latter case the orthogonal projections arecomputed using partially orthogonalized vectors, i.e. the qi are used to computethe inner products even after having been modified.

Nevertheless, the MGS procedure does not always guarantee numerical orthog-onality, and can suffer as much as the CGS method when the number of vectorsto be considered is rather large. A more effective solution is represented by theso-called reorthogonalization, introduced by Abdelmalek [8], in which the orthog-onalization step of the CGS or MGS process is repeated several times. For ourimplementations, we use the MGS approach performing twice each outer iterationof Algorithm 1.2, as prescribed by Giraud and Langou [28].

In the pseudocode of the following chapters, we denote the procedure thatperforms theM -orthogonalization of a vector v with respect to anM -orthonormalset V by the function orthogonalize(v,V ,M ). We assume that such an instructionexploits the MGS procedure with single reorthogonalization.

Page 20: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

10 CHAPTER 1. PRELIMINARIES

Page 21: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 2

Matrix weighted geometric mean

Relying on a parallelism between Hermitian positive definite matrices and positivereal numbers, it is possible to define the weighted geometric mean of a pair ofmatrices as a non-trivial extension of the scalar case. We discuss here a naturalway to define such a mean and show that ours is indeed a good choice, since itpresents a lot of nice properties we would expect. Therefore, we conclude that ittruly behaves as a generalization of its scalar counterpart.

2.1 Definition

Given a set of values X = {x1, . . . , xk} and a set of weights W = {w1, . . . wk} suchthat

∑ki=1 wi = 1, the weighted geometric mean can be defined as

GW (X) =

(k∏i=1

xwii

)(∑k

i=1 wi)−1

. (2.1)

For our purposes, we are just interested in extending to matrices the case k = 2and can rewrite equation (2.1) explicitly as

GW (a, b) = (aw1 bw2)1

w1+w2 . (2.2)

Moreover, we assume the fixed set of weights {1− t, t} for t ∈ [0, 1], thus we canfurther simplify equation (2.2) and get

Gt(a, b) = a1−tbt. (2.3)

When dealing with matrices, one could be tempted to consider a direct exten-sion of equation (2.3) and to define the geometric weighted mean of A,B ∈ Pn

11

Page 22: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

12 CHAPTER 2. MATRIX WEIGHTED GEOMETRIC MEAN

asGt(A,B) = A1−tBt, (2.4)

but such a definition is not adequate, since in general Gt(A,B) is not even Hermi-tian. Let us note that if A commutes with B, then the expression in equation (2.4)is equivalent to the symmetric definition

A#tB = A12

(A−

12BA−

12

)tA

12 , (2.5)

where A12 ,A−

12 are the principal square roots of A and of its inverse A−1, respec-

tively [40, Chap. 6]. Since A is positive definite, the principal square root, i.e.the matrix X ∈ Cn×n such that X2 = A and all the eigenvalues have positivereal part, is positive definite and unique [23]. The same is true for A−1, whichis positive definite being the inverse of a positive definite matrix. Thus A#tB iswell defined.

Moreover, A#tB it is clearly Hermitian, and it can be shown that all itseigenvalues are strictly positive. Therefore, we can define a bivariate functionweighted geometric mean · #t · : Pn×Pn → Pn, which is an expected step towardsthe parallelism among positive numbers and positive definite matrices stated atthe beginning of this chapter.

Although being satisfactory from a theoretical point of view, definition (2.5) iscomputationally impractical, since it requires the inversion ofA and the extractionof two matrix square roots. Also, the algebraic manipulations of this expressionare made quite difficult by the presence of the square roots and we would like toremove them without losing the positive definiteness of the result. In order to finda good alternative definition, let us pinpoint that if A commutes with B, then Bcommutes with any function of A [40, Th. 1.13] and thus A#tB = A (A−1B)

t.We claim now that provided that A and B are positive definite, this formula istrue and commutativity is not a necessary condition. Indeed, let us note first that

A−1B = A−12

(A−

12BA−

12

)A

12 (2.6)

where A−12BA−

12 is positive definite. As a consequence of the Spectral Theo-

rem [11, Th. 32.16], there exist Q,D ∈ Cn×n such that Q is unitary and D isdiagonal with positive real elements, which eigendecompose the matrix as(

A−12BA−

12

)= QDQ∗. (2.7)

Page 23: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

2.2. PROPERTIES 13

Substituting equation (2.7) into equation (2.6) leads to

A−1B = A−12QDQ∗A

12 =

(A−

12Q)D(A−

12Q)−1

,

that is, an eigendecomposition of A−1B. Finally, recalling that matrix functionsare invariant under similitude [40, Th. 1.13], we can write

A(A−1B

)t= A

(A−

12Q)Dt(A−

12Q)−1

= A12

(QDQ−1

)tA

12

= A12

(A−

12BA−

12

)tA

12 ,

which proves thatA#tB = A

(A−1B

)t. (2.8)

For t = 1/2, this result is known, and the proof is very similar to the one wepresent here [35].

2.2 Properties

The following proposition collects a few well-known properties of the matrix weightedgeometric mean [51]. Most of them are closely related to similar results which holdfor the scalar case.

Proposition 1. Let A,B ∈ Pn, let diag (d1, . . . , dn) = D ∈ Pn be diagonal andlet In ∈ Pn be the identity matrix of size n. Then

(a) A#tB is positive definite;

(b) A#tB = Gt(A,B) if A and B commute;

(c) diag (D#tIn) = diag (Gt(d1, 1), . . . , Gt(dn, 1));

(d) (αA) #t (βB) = α1−tβt (A#tB) for any α, β > 0.

(e) A#tB = B#1−tA;

(f) (Q∗AQ) #t (Q∗BQ) = Q∗ (A#tB)Q for any Q ∈ GL (n);

(g) (A#tB)−1 = A−1#tB−1;

(h) det (A#tB) = det (A)1−t det (B)t

Page 24: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

14 CHAPTER 2. MATRIX WEIGHTED GEOMETRIC MEAN

Proof. A proof is given for each item.

(a) Positivity is immediate from (2.8).

(b) Since A commutes with B, any function of A commutes with B. Thereforea straightforward calculation shows that A#tB = Gt(A,B).

(c) The result follows directly from (b). It suffices to put A = D,B = In andtake the diagonal.

(d) Because of the commutativity of matrices with scalars, we have

(αA) #t (βB) = αA(α−1A−1βB

)t= α1−tβt (A#tB) ,

which proves the property.

(e) A few simple manipulations show that

A#tB = A(A−1B

)t= B

(B−1A

) (B−1A

)−t= B

(B−1A

)1−t= B#1−tA.

(f) Because of the invariance of matrix functions under similitude, we can write

(Q∗AQ) #t (Q∗BQ) = (Q∗AQ)((Q∗AQ)−1 (Q∗BQ)

)t= Q∗AQ

(Q−1A−∗Q−1Q∗BQ

)t= Q∗AQQ−1

(A−1B

)tQ

= Q∗ (A#tB)Q

which proves the claim.

(g) The proof is obvious using definition (2.5).

(h) Let us prove first that for any positive definite matrix Z ∈ Cn×n it is truethat det (Zt) = det (Z)t for any t ∈ R+.

In fact, there exist an eigendecomposition Z = QDQ−1 with Q ∈ GL (n)and diag (d1, . . . , dn) = D ∈ Cn×n diagonal. Since the determinant isinvariant under similitude we have that det (Z) = det (D) and becauseof the invariance of any matrix function under similitude we have thatdet (Zt) = det (Dt). Thus, we can write

det(Dt)

= dt1 · · · · · dtn = (d1 · · · · · dn)t = det (D)t ,

Page 25: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

2.2. PROPERTIES 15

which implies det (Zt) = det (Z)t.

Exploiting the fact that for any pair of square matrices X,Y ∈ Cn×n it holdsthat det (XY ) = det (X) det (Y ), one has that

det (A#tB) = det(A(A−1B)t

)= det (A) det

(A−1B

)t= det (A) det

(A−1

)tdet (B)t = det (A)1−t det (B)t .

Page 26: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

16 CHAPTER 2. MATRIX WEIGHTED GEOMETRIC MEAN

Page 27: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 3

Quadrature formulae

The aim we pursue in this chapter is twofold. On the one hand, we discuss howto write the matrix power Zρ with ρ ∈ (0, 1) as a contour integral. This requiresconcepts of complex analysis and non-trivial procedures of complex calculus, butwe are convinced that an exhaustive discussion of such a topic would be quiteburdensome and far out of the scope of this work. Therefore, we keep the theoret-ical aspects to the minimum, pointing the interested reader to classical referencebooks whenever the need be. On the other hand, we illustrate how to applyknown quadrature techniques to compute matrix functions of the form (X−1Y )

ρ

or (XY −1)ρ for ρ as above. Finally, we show how the product of these objects

by a vector can be implemented avoiding expensive operations as matrix-matrixmultiplications and matrix inversions.

3.1 The Cauchy formulaLet γ : [0, 1]→ C be a simply closed, continuously differentiable curve, i.e. a curvewhich does not intersect itself anywhere and is such that γ(0) = γ(1). Let usassume that this curve has counterclockwise orientation and divides the complexplane into two open connected sets with γ as common boundary. Let us call U theinterior of the curve and let us consider a function f : V ⊂ C→ C holomorphic ina neighbourhood of U . Then, for any z ∈ U we can write

f(z) =1

2πi

∮γ

f (ζ)

(ζ − z)dζ. (3.1)

Equation (3.1) is the so-called Cauchy integral formula, whose correctness is adirect consequence of Cauchy’s integral theorem [33, Sect 2.4].

Being able to write complex functions by means of integrals is related to anabstract yet elegant way to define matrix functions [40, Sect 1.2.3], which results

17

Page 28: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

18 CHAPTER 3. QUADRATURE FORMULAE

particularly useful in our context. Indeed, for any Z ∈ Cn×n we can define

f(Z) =1

2πi

∮γ

f(Z) (ζI−Z)−1 dζ, (3.2)

provided that f : V ⊂ C→ C is holomorphic on U and σ (Z) ⊂ U .

3.2 Gauss–Jacobi quadrature formulae

It is well known that for f(Z) = Zρ with 0 < ρ < 1, the contour integral (3.2)gives a representation on the real line. In fact, by choosing as curve γ the Hankelcontour [18, Chap. XV], one has [34, Prop. 3.1.12]

Zρ =sin (ρπ)

π

∫ ∞0

tρ−1 (t+ Z)−1 Z dt. (3.3)

This integral presents a major issue, namely the fact that the integration intervalis not bounded. The Gauss–Laguerre quadrature formula cannot be used in thiscase, because it requires a weight function of the form e−x. Therefore, we have torely on a variable substitution.

Let us recur to the Cayely transform, a conformal mapping [16, Chap. 14]defined by C (z) = 1−z

1+z, which sends the upper half of the complex plane onto the

unit disk [17, Sect. 2.2-2.3]. With the substitution t = C (s) we get

dt

ds= − 2

(1 + s)2,

and thus that

Zρ =sin(ρπ)

π

∫ −1

1

−2

((1 + s)2

(1− s1 + s

)1−ρ(1− s1 + s

+ Z

))−1

Z ds

=2 sin(ρπ)

π

∫ 1

−1

((1− s)(1− s)−ρ(1 + s)ρ (1− s+ (1 + s)Z)

)−1Z ds

=2 sin(ρπ)

π

∫ 1

−1

(1− s)ρ−1(1 + s)−ρ (1− s+ (1 + s)Z)−1 Z ds.

It is easy to see that the value of this integral can be determined by means of theGauss–Jacobi quadrature described in Section 1.3.1, if we put α = ρ− 1, β = −ρand

f(Z) = fD(Z) = (1− s+ (1 + s)Z)−1 Z. (3.4)

Page 29: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

3.2. GAUSS–JACOBI QUADRATURE FORMULAE 19

Another integral expansion which can be of interest is the one for Z−ρ. Indeed,it can be shown, by considering the same contour used to derive equation (3.3),that

Z−ρ =sin(ρπ)

π

∫ ∞0

(tρ (t+ Z))−1 dt. (3.5)

The same substitution as above, i.e. t = C (s), gives

Z−ρ =sin(ρπ)

π

∫ −1

1

−2

((1 + s)2

(1− s1 + s

)−ρ(1− s1 + s

+ Z

))−1

ds

=2 sin(ρπ)

π

∫ 1

−1

((1− s)−ρ(1 + s)(1 + s)ρ (1− s+ (1 + s)Z)

)−1ds

=2 sin(ρπ)

π

∫ 1

−1

(1− s)−ρ(1 + s)ρ−1 (1− s+ (1 + s)Z)−1 ds.

Again this expression can be treated by means of a Gauss–Jacobi formula, whereα = −ρ, β = ρ− 1 and

f(Z) = f I(Z) = (1− s+ (1 + s)Z)−1 . (3.6)

Let us note that fD(Z) and f I(Z) are quite similar, but the former requires amatrix-matrix multiplication more than the latter, which becomes a matrix-vectorproduct when dealing with f(Z)v. However, if we consider Z = X−1Y , thisadvantage of the second method over the first is no longer significant. Indeed,rearranging and simplifying we get that

fD(X−1Y ) = ((1− s)X + (1 + s)Y )−1 Y , (3.7)

f I(X−1Y ) = ((1− s)X + (1 + s)Y )−1 X. (3.8)

The same considerations hold for Z = XY −1, which can also be employed to com-pute the weighted geometric mean after applying a well-known matrix functionsidentity (see Chapter 5). By substituting Z = XY −1 into equations (3.4) and(3.6) we get

fD(XY −1) = Y ((1− s)Y + (1 + s)X)−1 XY −1, (3.9)

f I(XY −1) = Y ((1− s)Y + (1 + s)X)−1 . (3.10)

In this second case, the complexity of the two formulations is no longer the same.Indeed, the computation of the latter expression requires just the solution of alinear system and one matrix-vector multiplication, much as it happens in theprevious case. On the other hand, fD(XY −1) seems to require almost twice as

Page 30: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

20 CHAPTER 3. QUADRATURE FORMULAE

much time, since it entails two linear solves and two matrix-vector multiplicationsfor each value of s. Nevertheless, let us note that the rightmost XY −1 does notdepend on s and thus can be brought out of the integral to update the vector.This way, evaluating fD(XY −1) requires the same amount of operations of theother three cases, save for the small preprocessing cost for the computation ofb = XY −1b.

3.3 Contour integrals

It is possible to address the numerical quadrature of equation (3.2) in other ways.In particular, Hale, Higham and Trefethen [39] propose to apply a few ellipticvariable substitutions after having transplanted the region of analyticity of f and(tIn −Z)−1 to an annulus

A = {z ∈ C : r < |z| < R} , (3.11)

where r, R ∈ R are the radii of the inner and outer circle of the annulus and dependon the spectrum of Z. At this point, a sequence of substitutions involving ellipticfunctions is performed and a quadrature formula is computed.

Without giving any further details, we report the results of interest from theoriginal paper [39], referring the reader to it for a complete explanation. Letf : C → C be analytic in C \ (−∞, 0] and let us assume that (−∞, 0) is abranch cut for f and at most 0 is a singularity. Under these assumptions, theapproximation using a quadrature formula with p nodes is given by

−8KZ 4√λmλM

πpk=

(p∑j=1

f(w (tj)

2) cn(tj) dn(tj)

w(tj) (k−1 − sn (tj))2

(w(tj)

2In −Z)−1

), (3.12)

where λm and λM are the minimum and maximum of the spectrum, respectively,k = −C

(4√λM/λm

), K is the complete elliptic integrals associated with k [39],

w(t) = 4√λmλM

k−1+sn(t)k−1−sn(t)

, tj = −K + iK′

2+ 2 j−2−1K

pfor 1 ≤ j ≤ p and cn, dn and

sn are Jacobi elliptic functions in standard notation [9]. The theoretical aspects ofthese exotic functions can be found in the book of Driscoll and Trefethen [27]. Forour implementations we use the functions ellipkjc and ellipkkp of Driscoll’sSchwarz–Christoffel Toolbox [21][31].

Let us note that, although the computation of (3.12) requires matrix-matrixproducts, when the problem to solve is f(Z)b, an evaluation from right to leftlet us exchange this expensive operation for just a pair of matrix-vector products.Moreover, by substituting Z = X−1Y we add just a third matrix-vector operation,

Page 31: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

3.3. CONTOUR INTEGRALS 21

since equation (3.12) becomes

−8K 4√λmλM

πpkX−1Y =

(p∑j=1

f(w (tj)

2) cn(tj) dn(tj)

w(tj) (k−1 − sn (tj))2

(w(tj)

2X − Y)−1

Xb

),

(3.13)which again does not require any matrix-matrix operation if evaluated from rightto left. On the other hand, the substitution of Z = XY −1 in equation (3.12)gives, after a few simple manipulations,

−8K 4√λmλM

πpkXY −1 =

(Y

p∑j=1

f(w (tj)

2) cn(tj) dn(tj)

w(tj) (k−1 − sn (tj))2

(w(tj)

2Y −X)−1

b

).

Let us note that the method presented in this section, unlike the Gauss–Jacobirule and the Krylov subspace methods of the following chapter, is the only oneto require some information about the spectrum of X−1Y (or XY −1) and thus anon-negligible amount of preprocessing.

Page 32: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

22 CHAPTER 3. QUADRATURE FORMULAE

Page 33: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 4

Krylov subspace methods

Let us assume we want to compute f(Z)b, where Z ∈ Cn×n and b ∈ Cn. It isclear that the most straightforward algorithm, i.e to compute F = f(X) and theny = Fb, is not a worthy choice when n is large. Krylov subspaces methods tryto avoid this kind of approach and to approximate the value of f(Z)b by solvingmuch smaller problems in a way that can sometimes save a lot of computation.The general idea is quite linear and can be roughly subdivided into four steps:

1. Find a basis of a suitable subspace of Cn;

2. Project the matrix Z onto this subspace and restrict the projection to thesubspace itself;

3. Solve a smaller version of the original problem in the subspace;

4. Bring the solution back to Cn.

The choice of the subspace is crucial, and the main difference among the methodspresented here concerns precisely how it is generated. Since the subspace shouldideally catch, in some sense, properties of both Z and b, we would expect it todepend on both the parameters. In fact, that is the case and thence a Krylovsubspace is usually denoted by an expression of the form

Km(Z, b),

where the calligraphic upper-case letter indicates how the elements of the subspaceare chosen, the subscript index signifies the number of vectors which generate thesubspace and the terms in parentheses declare for which matrix-vector pair themethod is expected to work. Let us note, on the other hand, that the subspacedoes not depend on the function one is trying to approximate. Since the generating

23

Page 34: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

24 CHAPTER 4. KRYLOV SUBSPACE METHODS

vectors are not necessarily linearly independent, m is also an upper bound for thesize of Km(Z, b).

The elements are chosen so that the resulting subspace exhibits some interestingproperties, notably exactness of the approximation for certain families of functions.Nevertheless, the bases built by the algorithms are usually ill-conditioned, and thiswould severely hinder the effectiveness of the whole procedure. Thus, the basisof Km(Z, b) undertakes a pass of the Gram–Schmidt process, which generates anorthonormal basis Vm of dimension m.

At that point, one can compute the reduced matrix Zm = V ∗mZVm ∈ Cm×m,i.e the restricted projection of Z onto the subspace Sm(Z, b), and then solve theoriginal problem on Zm. In other words, for a given m we get ym = V ∗mf(Zm)Vmb,which is an approximation of y exact in the subspace Sm(Z, b).

The remainder of the chapter analyses a few of these methods, justifying themand describing how they can be generalized to the pencil (X,Y ) for the compu-tation of f(X−1Y )b and f(XY −1)b.

4.1 Polynomial Krylov methods

The simplest Krylov subspace one could imagine is the polynomial one. LetZ ∈ Cn×n and b ∈ Cn \ {0}, then the polynomial Krylov space of dimension massociated with (Z, b) is defined as the subspace of Cn spanned by the vectors

Bm ={b,Zb,Z2b, . . . ,Zm−1b

}.

In other words, the subspace is the set of all the linear combinations of vectors ofthe form Zib. We write it more conveniently as

Pm(Z, b) =

{m−1∑i=0

αiZib : αi ∈ C

}, (4.1)

which shows that the dimension of the subspace can be of order at most m.These spaces have a few remarkable properties. First, it is clear that for any

m ∈ N, we have the relation Pm(Z, b) ⊆ Pm+1(Z, b), that is the subspaces arenested and of strictly increasing dimension. Moreover, as a consequence of theCayley–Hamilton theorem [15, Th. 5], there exists an index s ∈ N such thatPs(Z, b) = Pt(Z, b) for any t ≥ s [24, Lect. 34].

Since the sequence of directions given by {Zib}i∈N converges to a direction ofthe eigenspace corresponding to the dominant eigenvalues for almost every choiceof b [10, Sect 4.2], the natural Krylov basis Bm tends to become ill-conditionedeven when m is very small. A standard solution to fix this issue is the M -

Page 35: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.1. POLYNOMIAL KRYLOV METHODS 25

orthonormalization of Bm, which can be achieved by applying the Gram–Schmidtprocedure described in Section 1.4. This operation produces an M -orthonormalbasis Vm, which is called Arnoldi basis if Z is not Hermitian and Lanczos basisotherwise.

Let us further analyse how this basis Vm = {v1, . . . ,vm} is generated. Our goalis to derive an efficient iterative algorithm which at each step adds a new vectorof the natural Krylov basis and M -orthonormalizes it. The first vector is clearly

v1 =b

‖b‖M,

and then we can proceed by induction. Let us assume that Vi is anM -orthonormalbasis of Pi(Z, b), and that we want to add Zib preserving the M -orthonormalityof the basis. The new vector is nothing but the M -normalization of

yi = Zib−i∑

j=1

〈vj,Zib〉M〈vj,vj〉M

vj,

which can be rewritten as

yi = Zib−i∑

j=1

vj⟨vj,Z

ib⟩M,

since vj is M -normalized and 〈vj,Zib〉M is a scalar and commutes with vj. Thespace spanned by the vectors before and after theM -orthogonalization is the same,provided that the Gram–Schmidt process is used, and thus we can further simplifythe computation by considering Zvi instead of Zib. The M -orthogonalization ofZvi with respect to {v1, . . .v1} gives

yi = Zvi −i∑

j=1

〈vj,Zvi〉M vj, (4.2)

that, once M -normalized, gives the last element of Vi+1, that is

vi+1 =yi‖yi‖M

. (4.3)

Equation (4.3) can be seen as a way to express either yi, i.e.

yi = vi+1‖yi‖M , (4.4)

Page 36: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

26 CHAPTER 4. KRYLOV SUBSPACE METHODS

or its M -norm‖yi‖M = v∗i+1Myi = v∗i+1MZvi. (4.5)

It is convenient to write explicitly the inner product in equation (4.2)

yi = Zvi −i∑

j=1

vjv∗jMZvi,

which rearranged gives

Zvi = yi +i∑

j=1

vjv∗jMZvi, (4.6)

At this point, combining equations (4.4) and (4.5), we get that

yi = vi+1

(v∗i+1MZvi

),

which substituted into (4.6) yields

Zvi =i+1∑j=1

vjhj,i, (4.7)

where we put hi,j = v∗iMZvj, to make the expression more compact.

Finally, for 1 < i < m, equations (4.7) can be collected and written in matrixform as

ZVm = VmHm + vm+1hm+1,meᵀm, (4.8)

where Vm = [v1| . . . |vm] ∈ Cn×m andHm ∈ Cm×m. Equation (4.8) is usually calledthe Arnoldi relation. It is important to stress that the form of this decompositiondoes not depend on the inner product chosen for the orthonormalization, i.e Mdoes not appear in the formula, although different M matrices lead, in general, todifferent Vm.

It is well known [40, Chap. 13] that equation (4.8) can be readily exploitedto compute an approximation of f(Z)b. In fact, by imposing ZVm ≈ VmHm, wecan write that

f(Z)Vm ≈ Vmf(Hm),

and by observing that b = v1‖b‖M = Vme1‖b‖M , we can conclude that

f(Z)b = f(Z)Vme1‖b‖M ≈ Vmf(Hm)e1‖b‖M .

For the sake of convenience, we define the Arnoldi approximation for f(Z)b from

Page 37: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.1. POLYNOMIAL KRYLOV METHODS 27

Algorithm 4.1: Arnoldi algorithm for the computation of f(Z)b

Input: Z ∈ Cn×n, b ∈ Cn, f : Cn×n → Cn×n,m ∈ NOutput: f A

k with prescribed accuracy or k = m

1 v1 = b‖b‖M

2 V0 = [ ]

3 for i← 1 to m− 1 do4 Vi ← [Vi−1vi]

5 Update Hi−1 with new row and column so that Hi = V ∗i MZVi

6 f Ak ← Vif(Hi)e1‖b‖M

7 if converged then8 return f A

k

9 end10 qi ← Zvi

11 vi+1 ← orthogonalize(qi,Vi,M )

12 end13 return f A

m

Pm(Z, b) asf Am := Vmf(Hm)e1‖b‖M . (4.9)

Algorithm 4.1 illustrates the general structure of an algorithm for the com-putation of a sequence of Arnoldi approximations of f(Z)b, where f is a generalmatrix function.

Looking at the pseudocode, it is natural to ask what is the reason for usingan orthogonalization matrix M 6= In, since putting M = In seems to be the onlyreasonable choice. The explanation is to be found in that we are interested inexpressions of the form f(X−1Y )b rather than f(Z)b. In fact, letting M = Xand Z = X−1Y we can rewrite equation (4.8) as

Y Vm = XVmHm + Xvm+1hm+1,meᵀm,

which left multiplied by V ∗m gives

V ∗mY Vm = Hm. (4.10)

The importance of equation (4.10) is twofold. From a strictly computational pointof view, it shows that if M = X the computation of Hm at line 5 of Algorithm 4.1

Page 38: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

28 CHAPTER 4. KRYLOV SUBSPACE METHODS

is much easier. Moreover, it clearly shows that Hm is symmetric, provided thatY is symmetric too. This is a great advantage because it implies that the matrixfunction f(Hm) can be accurately computed in the naivest possible way, that isdiagonalization [40, Sect. 4.5].

Let us note that the substitution Z = XY −1 does not give an equally niceresult. Indeed, for M = X−1 equation (4.8) becomes

Y −1Vm = X−1VmHm + X−1vm+1hm+1,meᵀm,

where all the matrices appear inverted. Clearly, a left multiplication by V ∗m cannotsolve the issue and in fact gives

V ∗mY−1Vm = Hm,

which is harder than equation (4.10) to compute, since its evaluation requires thesolution ofm linear systems au lieu of the same quantity of matrix-vector products.For the other choices of Krylov subspaces, similar difficulties arise.

Let us now clarify a few technical points about the implementation in Algo-rithm 4.1. The first issue to address is the choice of the stopping criterion at line 7.A worthy choice is the one investigated for the matrix exponential by van den Hes-hof and Hochbruck [32] and by Knizhnerman and Simoncini [46] in the contextof extended Krylov subspace methods (see Section 4.2). The criterion we proposebelow is a minor variation of a result stated for the extended Krylov method [46,Prop. 2.2].

Proposition 2 (Knizhnerman, Simoncini). With the notation above, let us definethe two quantities δm+j and ηm+j as

δm+j =

∥∥f Am+j − f A

m

∥∥‖f Am ‖

, ηm+j =∥∥f(Z)b− f A

m+j

∥∥,and assume that m+ j iterations of Algorithm 4.1 have been performed. Then therelative error at step m can be bounded by

∥∥f Am − f(Z)b

∥∥‖f(Z)b‖

≤δm+j +

ηm+j

‖f Am ‖

1− δm+j −ηm+j

‖f Am ‖

(4.11)

Let us note that for ηm+j � ηm equation (4.11) becomes∥∥f Am − f(Z)b

∥∥‖f(Z)b‖

≤ δm+j

1− δm+j

, (4.12)

Page 39: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.1. POLYNOMIAL KRYLOV METHODS 29

Hi−1 h12

h21 h22

=V ∗i−1

v∗i

Y Vi−1 vi

Figure 4.1: Incremental update of Hi

which can be effectively used to check the converge of the Arnoldi procedure inAlgorithm 4.1, for a sufficiently large choice of j. Experiments show that in practicej = 5 gives a very reliable stopping criterion, whilst j = 2 or even j = 1 seem tobe sufficiently accurate for the class of matrix functions we are considering.

Carefully assessing the computational cost of such an algorithm is not trivial,and falls beyond the scope of our investigation, but it is quite easy to see that thecost of an iteration is not constant. In fact, the size of Vi influences the number ofoperation needed to calculate Hi and f A

i and has a dramatic impact on the costof the Gram–Schmidt orthogonalization at line 10. It is obvious that nothing canbe done for the size of Vi, which has to be stored in memory from the beginningto the end of the execution. Nonetheless, with a few assumptions on Z, boththe computation of Hi and the orthogonalization of yi can be made considerablyfaster.

A key observation is that since its elements are defined as in equation (4.7), Hi

is an upper Hessemberg matrix, i.e. has zero entries below the first subdiagonal.Moreover, let us recall that, for the generalized problem, line 5 of Algorithm 4.1requires to compute Hi = V ∗i Y Vi with Y symmetric, which implies that also Hi

is symmetric. Being at the same time upper Hessemberg and symmetric, we canconclude that Hi is tridiagonal and thus all the hi,j = v∗iXvj for j < i− 1 wouldbe zeroes in real arithmetic. Therefore, instead of X-orthonormalizing ui withrespect to all the columns of Vi, it makes sense to ignore the first i− 2 vectors ofthe basis and use a three-term recurrence which involves just the last two addedvectors.

This variation of the Arnoldi method for the construction of an orthogonalbasis of Pm(Z, b) goes under the name of Lanczos recurrence. In finite arithmetic,the convergence of Lanczos’s algorithm is slower than that of Arnoldi’s, but forlarge m the larger number of iterations required to achieve convergence is oftenmore than compensated by the decrease in terms of execution time per iteration.

Let us now focus our attention on the update of Hi. The naive approachwould be the computation of V −1

i ZVi from scratch at each step, which wouldalso be the least efficient way to obtain Hi. This way, the computational cost of

Page 40: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

30 CHAPTER 4. KRYLOV SUBSPACE METHODS

the operation would depend not only on the size of the matrix, but also on thesize of Vi. Nevertheless, writing this operation in terms of block matrices, we getequation in Figure 4.1. By equating the two sides, we can get explicit expressionsfor the border of Hi, which are

h12 = V ∗i−1Y vi, h21 = v∗iY Vi−1, h22 = v∗iY vi.

Thus, there is no need to compute Hi from scratch at each step, and a lot ofcomputation can be saved by just adding the last column and row to Hi−1. Inparticular, recalling that the matrix is upper Hessenberg, we can save even moreoperations by noting that

h21 = [0 · · · 0 v∗iY vi−1] .

Finally, since in our case Hi is tridiagonal, we have that h12 = h∗21, and thereforethe update of Hi requires just the computation of two inner products

hi,i = 〈vi,vi〉Y , hi,i−1 = hi−1,i = 〈vi,vi−1〉Y ,

which require two matrix-vector and two vector-vector products. In conclusion,we have seen that if Y is symmetric then the only instruction whose cost dependsnot only on n but also on i is the one at line 6. Indeed, there is no known wayto determine how adding a border to Hi−1 can influence the result of f(Hi) for ageneral f .

A MATLAB/Octave implementation of the polynomial Krylov subspace methodfor general f described in this section can be found in Section A.1 of the Appendix.

4.2 Extended Krylov methods

The germinal idea of extended Krylov methods can be ascribed to the work ofDruskin and Knizhnerman [25], who suggest to extend the base of Pm(Z, b) witha few vectors of the form Z−ib. Simoncini [37] proposes to further extend the basisand add at each step a power of Z and a power of Z−1, showing that a carefulanalysis of the algorithm can lead to a consistent reduction of the execution time.A similar analysis is conducted by Jagels and Reichel [42] [43], who derive shortrecursion formulae exploiting the connection between extended Krylov methodsand Laurent Polynomials.

As usual, let Z ∈ Cn×n and b ∈ Cn \ {0}. The extended Krylov space ofdimension m associated with (Z, b) is the subspace of Cn spanned by the vectors

Bm ={b,Z−1b,Zb,Z−2b . . . ,Zm−1b,Z−mb

}.

Page 41: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.2. EXTENDED KRYLOV METHODS 31

Even in this case the subspace is the set of all the linear combinations of the vectorsof Bm, that is

Em(Z, b) =

{m−1∑i=−m

αiZib : αi ∈ C

}. (4.13)

It is clear that the largest possible dimension of the subspace Em(Z, b) is 2m,which occurs when the vectors of Bm are linearly independent and thus Bm is abasis of Em(Z, b).

As for the polynomial Krylov subspaces, it can be shown that for any m ∈ Nthe relation Em(Z, b) ⊆ Em+1(Z, b) holds and also that there exists an index s ∈ Nsuch that Es(Z, b) = Et(Z, b) for any t ≥ s [46].

Let us now turn our attention to the development of Arnoldi-like and Lanczos-likealgorithms for the computation of an M -orthonormal basis of Em(Z, b). In thiscase, we imagine to add two vectors at a time, so that at the i-th step both Zi−1band Z−ib are added. Proceeding much like we did in the previous section, it ispossible to derive the relation

ZVm = VmTm +[v2m+1v2(m+1)

]Tm+1,mE

ᵀm, (4.14)

where the reduced matrix Tm = V ∗mMZVm ∈ C2m×2m is 2×2 block upper Hessm-berg, Tm+1,m =

[v2m+1v2(m+1)

]∗MZ

[v2m+1v2(m+1)

]∈ C2×2, Eᵀ

m ∈ Cn×2 containsthe last two rows of the identity matrix I2m and Vm ∈ Cn×2m is theM -orthonormalbasis at step m. We call equation (4.14) the extended Arnoldi decomposition ofZ. By means of the same considerations we used in Section 4.1, we can derive theArnoldi approximation of f(Z)b from Em(Z, b)

f Em := Vmf(Tm)e1‖b‖M . (4.15)

The pseudocode of the method is given in Algorithm 4.2. Let us justify, be-fore discussing how the algorithm can be optimized, the apparently inappropriatechoice of using the inverse of a large sparse matrix. In fact, the computation ofZ−1y can be interpreted more correctly as the solution of the linear system whosecoefficient matrix is Z and constant term is y. There exists a wide variety ofmethods for the numerical solution of these systems. Since here the focus is onlarge-scale problems and thus iterative methods are the most reasonable choice,we refer the reader to the handbook by Barret et al. [19] and the monograph bySaad [30], which provide a thorough introduction.

The solution of large sparse linear system is a crucial operation for our problem,since it let us avoid the explicit computation of the matrix X−1Y , which is, ingeneral, dense and non-Hermitian. In fact, we are consideringZ = X−1Y and thusthe vectors which have to be M -orthonormalized at step i are X−1Y v2i−1 and

Page 42: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

32 CHAPTER 4. KRYLOV SUBSPACE METHODS

Algorithm 4.2: Extended Arnoldi algorithm for the computation of f(Z)b

Input: Z ∈ Cn×n, b ∈ Cn, f : Cn×n → Cn×n,m ∈ NOutput: f E

k with prescribed accuracy or k = m

1 v1 = b‖b‖M

2 v2 = orthogonalize(Z−1b,v1,M )

3 V0 = [ ]

4 for i← 1 to m− 1 do5 Vi ← [Vi−1v2i−1v2i]

6 Update Ti−1 with two rows and columns so that Ti = V ∗i MZVi

7 f Ek ← Vif(Ti)e1‖b‖M

8 if converged then9 return f E

k

10 end11 q2i−1 ← Zv2i−1

12 v2i+1 ← orthonormalize(q2i−1,Vi,M)

13 q2i ← Z−1v2i

14 v2(i+1) ← orthonormalize(q2i, [Viv2i+1] ,M )

15 end16 return f E

m

Y −1Xv2i, which can be computed avoiding matrix-matrix operations, providedthat the correct evaluation order is respected. It is easy to see that both are theresult of a sparse matrix-vector product followed by the solution of a sparse linearsystem, assuming that both X and Y are sparse.

In our implementation, we use the same stopping criterion described in theprevious section for the polynomial Arnoldi algorithm.

Let us now turn our attention to the matrix Ti. It can be shown [37, Sect. 3]that, for any matrix Z, the reduced matrix Ti has a really special structure. Infact, it is a block upper Hessemberg matrix such that each block has size 2 × 2,and it can be proven that the lower diagonal blocks have zero second rows (seeFigure 4.2a) [37]. Using the matrix X−1Y , and putting M = X, we get againthat Ti = V ∗i Y Vi and thus that Ti is Hermitian if and only if Y is such. Hence,it has the block structure in Figure 4.2b, i.e. pentadiagonal with alternating zerosin the most external diagonals. Therefore, updating Ti requires the computationof just a diagonal block t22 ∈ C2×2 and of a row block [t2i+1,2i−1 t2i+1,2i] ∈ C1×2

Page 43: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.3. RATIONAL KRYLOV METHODS 33

? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?

? ? ? ? ? ?? ? ? ? ? ?

? ? ? ?? ? ? ?

? ?

(a) General matrix

? ? ?? ? ?? ? ? ? ?

? ? ?? ? ? ? ?

? ? ?? ? ? ?

? ?

(b) Hermitian matrix

Figure 4.2: Block structure of Ti = V ∗i BVi for general 4.2a and Hermitian 4.2bB.

belonging to the first subdiagonal. By means of a block decomposition similar tothe one in Figure 4.1, it is easy to see that

[t2i+1,2i−1 t2i+1,2i] = v∗2i+1Y [v2i−1v2i] , t22 =[v2i+1 v2(i+1)

]∗Y[v2i+1 v2(i+1)

].

Therefore, the incremental update of Ti requires a number of operations whichdepends on n but not on i.

Section A.2 of the Appendix reports a MATLAB/Octave implementation ofthe extended Krylov subspace method described in this section.

4.3 Rational Krylov methods

A yet more powerful approach to the problem of the computation of f(Z)b bymeans of Krylov subspaces is due to Ruhe. He is the first to contemplate thepossibility of using a sequence of rational functions {ϕi}i∈N applied to Z to buildthe subspace

Fj(Z,x1) = span {ϕ1(Z)x1, . . . , ϕj(Z)x1} ,

and suggests to use it to solve large eigenvalue problems [14]. Ten years later, thesame author [20] proposes an algorithm for the computation of the basis of therational Krylov subspace method, based on the combination of rational functionsand matrix shifting. Our implementation is based on this variant.

The definition of the rational Krylov subspace is slightly more subtle than thetwo previous cases, and requires a few preliminary definitions. Let Z ∈ Cn×n andb ∈ Cn \ {0}. Let us define, for a choice of poles ξ1, ξ2, · · · ∈ C \ (σ (Z) ∪ {0}), the

Page 44: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

34 CHAPTER 4. KRYLOV SUBSPACE METHODS

sequence of polynomials {qi}i∈N so that any qi ∈ Ri [x] can be factorized as

qi(z) =i∏

j=1

(1− z

ξi

). (4.16)

It is clear that ξi 6= 0 is required in order to ensure that no division by zerooccurs during the evaluation of equation (4.16). Nonetheless, this restriction isnot essential and any other s ∈ C could be excluded, provided that both z andthe poles of qi are shifted accordingly.

These polynomials can be used to define the rational Krylov subspace as astraightforward generalization of their polynomial counterpart discussed in Sec-tion 4.1. In particular, the rational Krylov space of order m associated with (Z, b)is defined as the subspace of Cn spanned by the vectors of

Bm = qm−1(Z)−1{b,Z, b,Z2b, . . . ,Zm−1b

}.

Again, the subspace is the set of all the linear combinations of the vectors in Bmand can be written as

Rm(Z, b) =

{qm−1(Z)−1

m−1∑i=0

αiZib : αi ∈ C

}. (4.17)

Before considering how this subspace can be employed to actually compute anapproximation of f(Z)b, a few observations are mandatory. First, let us note thatif we set all ξi = ∞ for i = 1, . . . ,m − 1 then we get that qm−1 is the constantpolynomial p(z) = 1. Thus, we have that Rm(Z, b) = Pm(Z, b) and hence thatpolynomial Krylov subspaces are nothing but a special case of rational Krylovsubspaces for a trivial choice of the poles. If the denominator polynomial is chosenas in (4.16), rational Krylov spaces are nested and of strictly increasing dimensionuntil an invariant space is reached [49, Sect. 2].

By means of considerations analogous to the ones in Section 4.1, it is easy to seethat the basis Bm is ill-conditioned, and it is customary to adapt the Arnoldi andLanczos approaches to construct an M -orthonormal basis of Rm(Z, b). Followingthe procedure in Section 4.1, let the first vector of our orthonormal basis be

v1 =b

‖b‖M.

Let us assume that Vi is an M -orthonormal basis of Ri(Z, b), and that we wantto obtain an M -orthonormal basis Vi+1 of Ri+1(Z, b), which in general is different

Page 45: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.3. RATIONAL KRYLOV METHODS 35

from Ri(Z, b). According to Ruhe’s algorithm, the vector to be added is

yi = (In −Z/ξi)−1 Zvi−1. (4.18)

We canM -orthonormalize it, by exploiting the generalized Gram–Schmidt processpresented in Section 1.4, and obtain

yi = yi −i∑

j=1

〈vj,yi〉M〈vj,vj〉M

vj.

Expanding the inner products, applying the M -orthonormality of the vi’s fori = 1, . . . , i and the substitution hj,i = v∗jMyi yields

yi = yi −i∑

j=1

vjhj,i. (4.19)

The M -orthonormalization of yi gives the relation

‖yi‖M = v∗i+1Myi = hi+1,i, (4.20)

which is similar to the one in (4.5). At this stage, the set Vi+1 = {v1, . . . ,vi+1}is an orthonormal basis of Ri+1(Z, b). Combining equations (4.4) and (4.20) intoequation (4.19) we can write

(In −Z/ξi)−1 Zvi =

i+1∑j=1

vjhj,i. (4.21)

To derive a matrix decomposition analogous to the Arnoldi relation in (4.8),we need to remove the inverted factor on the left hand side of equation (4.21). Byleft multiplying both sides by (In −Z/ξi) and rearranging the terms to group theterms containing Z together, we get

Z

(vi +

i+1∑j=1

vjhj,iξ−1i

)=

i+1∑j=1

vjhj,i. (4.22)

For 1 < i < m equations (4.22) can be collected in matrix form and give theso-called rational Arnoldi relation

ZVm (Im + HmDm) + Zvm+1hm+1,mξ−1m eᵀ

m = VmHm + vm+1hm+1,meᵀm, (4.23)

where Dm is the diagonal matrix diag(ξ−1

1 , . . . , ξ−1m

)∈ Cm×m and eᵀ

m ∈ Cm is the

Page 46: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

36 CHAPTER 4. KRYLOV SUBSPACE METHODS

last row of Im ∈ Cm×m.The matrix decomposition in equation (4.23) is not very handy because of the

presence of a second residual term, depending on Z and ξm, in the left hand side.To get rid of it, it is convenient to assume that the last pole is not limited, i.e thatξm =∞, which readily gives the shorter decomposition

ZVmKm = VmHm + vm+1hm+1,meᵀm. (4.24)

where we put Km = (Im + HmDm) for the sake of brevity.It follows from equation (4.24) that the approximation of Z onto the subspace

is Rm = HmK−1m ∈ Cm×m, whose computation requires the solution of m linear

systems of size m×m and can be very cheap when m is small. Nevertheless, it iseasy to convince oneself that Rm is not Hermitian, which is a serious drawback asfar as the computation of f(Rm) is concerned.

When dealing with rational Krylov subspaces, the main issue is clearly thechoice of the poles and of the matrix shift in order to achieve fast convergence. Herewe consider the adaptive pole selection method proposed by Güttel and Knizh-nerman [48], which is a minor variation of a well established strategy, originallyexplored by Druskin, Lieberman and Zaslavsky [44] and Druskin and Simoncini [47]in the context of shift-based rational Krylov subspace methods.

By the definition of rational Krylov subspace of order m follows that thereexists a rational function rm such that

f Rm = rm(Z)b =

pm−1(Z)

qm−1(Z)b,

for some polynomial pm−1 ∈ Rm−1 [x]. It can be shown that rn is in fact a rationalinterpolant for f , with prescribed denominator, whose interpolation nodes are theeigenvalues of Zm.

By the Hermite–Walsh formula for rational interpolants [4, Th. VIII.2] itfollows that

‖f(Z)v − rm(X)v‖ ≤ ‖sm(X)v‖∥∥∥∥∮

Γ

zIn −X−1

sm(z)dγ(x)

∥∥∥∥, (4.25)

where Γ ⊂ C is a closed set, γ : Γ→ C is a complex measure and sm is the rationalnodal function, defined by

sm(X)v =

∏mi=1(z − λi)qn−1(z)

,

which minimizes the quantity ‖s(Z)b‖ for s ∈ Rn−1 [x] /qn−1 [45, Lemma 4.5].

Page 47: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

4.3. RATIONAL KRYLOV METHODS 37

Because of how it is defined, it is clear that at the beginning of the m-th itera-tion of the rational Arnoldi method, the rational nodal function sm is known, andthis suggests a simple heuristic pole selection strategy. In fact, if the aim is to re-duce as much as possible the bound (4.25) at each iteration of the rational Arnoldimethod, it suffices to make |sm| uniformly large on Γ, which can be achieved bychoosing [48]

ξm = arg minz∈Γ

|sm(z)|.

Page 48: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

38 CHAPTER 4. KRYLOV SUBSPACE METHODS

Page 49: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 5

Experimental results

5.1 Implementation details

In order to assess the performance of the methods presented so far, we reporthere the results of some numerical experiments. All of them were implemented inMATLAB 7.14 and the running tests were executed on a machine equipped withan AMD A8-6410 CPU (2.00 GHz) and 6.7 GiB of RAM memory.

This first section of the chapter explains how the weighted geometric mean canbe computed and illustrates the matrices and vectors used in our experiments.Sections 5.2 and 5.3 illustrate into some detail how each of the methods can beimplemented, discussing the open challenges which could lead to further improve-ments.

5.1.1 Computing the weighted geometric mean of matrices

In Chapter 3 and Chapter 4 we addressed the problem of computing f(X−1Y )band f(XY −1)b for some X,Y ∈ Hn and b ∈ Cn. Now, we want to apply theseapproaches to the computation of (A#tB)v, where A,B ∈ Pn and v ∈ Cn.Because of the numerous properties of the weighted geometric mean, there existseveral ways to express it in terms of real roots of a matrix. These formulation,theoretically equivalent, can potentially lead to outstanding differences in termsof performance, and thus we present many of them, discussing their merits andweaknesses.

Since the only matrix functions we need to compute are matrix powers Zρ,with ρ ∈ [0, 1], we simplify the notation by defining a function

g(Z,X,Y , b, ρ) = Z(X−1Y

)ρb.

Let us note that, especially whenM -orthogonalization is an option, the number

39

Page 50: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

40 CHAPTER 5. EXPERIMENTAL RESULTS

ID formulation M function

1 A (A−1B)tv A g(A,A,B,v, t)

2 A (A−1B)tv B g(A,A,B,v, t)

3 A (B−1A)−t

v A g(A,B,A,v,−t)4 A (B−1A)

−tv B g(A,B,A,v,−t)

5 B (B−1A)1−t

v A g(B,B,A,v, 1− t)6 B (B−1A)

1−tv B g(B,B,A,v, 1− t)

7 B (A−1B)t−1

v A g(B,A,B,v, t− 1)

8 B (A−1B)t−1

v B g(B,A,B,v, t− 1)

Table 5.1: Synoptic table of the implemented methods. The colours used hereare the same as in the graphs.

of possible variants is huge, and would be insane to pretend to compare them all.In particular, the numerical simulation of these sections intends to assess howconditioning and sparsity pattern of A and B influence the performance. For thesake of completeness, we discuss in the following paragraphs a large number ofpossible solutions, but limit ourselves to the choices listed in Table 5.1 for theimplementations.

Direct methods Definition (2.5) readily suggests a natural way to apply thealgorithms for f(X−1Y )b to the solution of our problem. In fact, by lettingX = A, Y = B and b = v, we get that

(A#tB)v = g(A,A,B,v, t). (5.1)

A straightforward variation of this approach, which entails just a few algebraic ma-nipulation, comes form the double inversion of the argument of the matrix power.In fact, if B ∈ GL (n), then one has that A#tB = A (A−1B)

t= A (B−1A)

−t,and thus a second way to compute the weighted matrix mean times a vector is

(A#tB)v = g(A,B,A,v,−t). (5.2)

Such a modification is clearly favourable when the B matrix is substantially lessdense or better-conditioned than A, which makes the solution of the linear systemseasier. However, this is by no means an advantage for methods which require thesolution of linear systems with both A and B as coefficients, as it is the case forextended Krylov methods.

Let us note that, when applying any of the Krylov subspace methods with

Page 51: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

5.1. IMPLEMENTATION DETAILS 41

generalized inner product, in principle we can always choose to build either anA-orthogonal or B-orthogonal basis. We assess the characteristic of both choicesin our experiments.

Commuted methods Item (e) of Proposition 1 establishes a sort of commuta-tivity of the weighted geometric mean. This property can be effectively exploitedin order to devise variations of the method presented above. Indeed, the commu-tativity gives that (A#tB)v = (B#1−tA)v and thus that

(A#tB)v = g(B,B,A,v, 1− t).

Again we can invert the argument of the matrix power to get an equivalent defi-nition, that is

(A#tB)v = g(B,A,B,v, t− 1).

This case is specular to the previous one and thence there is no restriction inconsidering A-orthonormalization as well as B-orthonormalization.

Other methods Using the identity Af(A−1B) = f(BA−1)A [40, Sect. 1.8],we can develop another pair of methods. A straightforward substitution of thisrelation into the weighted matrix mean definition gives

(A#tB)v = g(In,B−1,A−1,Av, t).

Such a formulation is not very helpful, since it contains the inverse of A and B,which in general we cannot afford to compute. The same problem holds if weinvert the argument of the real power, since we get

(A#tB)v = g(In,A−1,B−1,Av, 1− t).

5.1.2 Test matrices

Although our investigation is based on a problem originally proposed by Arioliand Loghin [38] [41], we choose not to use the test matrices they consider [53]. Infact, in their case A and B are both 2D Poisson matrices, and thus have the samesparsity patter and possibly very similar conditioning number, which would makeimpossible to understand whether these two characteristics of the input have anybearing upon the performance of the different approaches.

Page 52: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

42 CHAPTER 5. EXPERIMENTAL RESULTS

m n nnz (PW ) χ (PW ) κ (PW ) nnz (PI) χ (PI) κ (PI)

30 900 4380 0.005 5.65e+02 11104 0.014 2.40e+0540 1600 7840 0.003 9.89e+02 20004 0.008 7.34e+0550 2500 12300 0.002 1.53e+03 31504 0.005 1.76e+0660 3600 17760 0.001 2.19e+03 45604 0.004 3.60e+0670 4900 24220 0.001 2.97e+03 62304 0.003 6.60e+0680 6400 31680 0.001 3.87e+03 81604 0.002 1.12e+0790 8100 40140 0.001 4.88e+03 103504 0.002 1.78e+07100 10000 49600 0.000 6.01e+03 128004 0.001 2.70e+07

Table 5.2: Synoptic table of test matrices. Here nnz (Z) is the number of non-zero entries of Z, χ (Z) is its sparsity, defined as the ratio of nnz (Z) to n2 andκ (Z) is its 1-norm condition number.

Let us consider two tridiagonal matrices U+,U− ∈ Cm×m defined by

U+ =

2 1

1. . . . . .. . . . . . 1

1 2

, U− =

2 −1

−1. . . . . .. . . . . . −1−1 2

.For our experiments, we consider the standard Poisson discretization matrix on atwo-dimensional grid of size m×m, that is PW = U−⊗Im+Im⊗U− and anothermatrix built in a similar fashion PI = (U+ ⊗ Im + Im ⊗U+)

2.Both these matrices are of size n = m2, symmetric and positive definite. Ta-

ble 5.2 shows how their characteristics vary increasing the value of m. In thetable as well as in our experiments, we stop at m = 100, which gives a matrixof size 10 000 × 10 000. This dimension is within reach for all of the methods wetest, but not for our accuracy benchmark, which is the function sharp of Bini andIannazzo’s Matrix Means Toolbox [1]. This function computes the Schur decompo-sition of one of the two arguments, which is, in general, dense. Hence, its memoryrequirements are too high for our testing environment and we limit ourselves tomuch smaller matrices when an accurate solution is needed.

5.2 Quadrature methods

We compare here the performance of two of the methods presented in Chapter 3,namely the Gauss–Jacobi (GJ) quadrature formula for (X−1Y )

ρ using the func-tion (3.7) and the contour integral quadrature of Hale, Higham and Trefethen

Page 53: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

5.2. QUADRATURE METHODS 43

50 100 15010−16

10−12

10−8

10−4

100

(a) Gauss–Jacobi

50 100 15010−16

10−12

10−8

10−4

100

(b) Hale–Higham–Trefethen

50 100 15010−16

10−12

10−8

10−4

100

(c) Gauss–Jacobi

50 100 15010−16

10−12

10−8

10−4

100

(d) Hale–Higham–Trefethen

Figure 5.1: Convergence profiles for the computation of (PW#tPI)v (top row)and (PI#tPW )v (bottom row) for m = 30 and t = 0.8.

(HHT) given by equation (3.13). The Gauss–Jacobi quadrature formulae for theapproximation of (X−1Y )

−ρ cannot be applied in this case in that they requirethe inversion of a sum of inverted matrices, operation which cannot be interpretedas the solution of a bunch of linear systems.

Let us note that, since there is no use for the matrix M , the methods of Ta-ble 5.1 reduce to four, namely 1, 3, 5 and 7. Moreover, not all the remainingmethods are suitable for a GJ implementation. In fact, the definition of the Jacobipolynomial J (α,β)

i requires α and β to be greater than −1. Since for the computa-tion of Zρ we need to put α = ρ− 1 and β = −ρ, we can consider only exponentsbelonging to the open interval ]0, 1[. We can further extend this interval by notingthat the computation of Z0 = In and Z1 = Z does not actually require any kind

Page 54: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

44 CHAPTER 5. EXPERIMENTAL RESULTS

(PW#tPI)v (PI#tPW )vn pGJ TGJ pHHT THHT pGJ TGJ pHHT THHT

900 512 12.390 s 32 1.874 s 512 12.425 s 32 1.939 s1600 512 27.645 s 32 3.712 s 512 26.722 s 32 3.723 s2500 1024 92.819 s 64 11.718 s 512 45.838 s 32 6.251 s3600 1024 161.772 s 64 17.590 s 1024 161.632 s 32 9.456 s4900 1024 200.556 s 64 25.956 s 1024 201.365 s 64 27.198 s6400 1024 295.545 s 64 35.375 s 1024 293.958 s 64 35.311 s8100 1024 417.154 s 64 55.658 s 1024 418.346 s 64 54.240 s10000 2048 921.537 s 64 58.686 s 1024 460.332 s 64 60.525 s

Table 5.3: Number of nodes (p) and seconds (T ) required to compute a solutionof prescribed accuracy with GJ and HHT, for t = 0.8.

of quadrature technique, and we can conclude that the method works if and onlyif ρ ∈ [0, 1]. Let us recall that, by the definition of weighted geometric mean, t hasto be chosen in the same interval [0, 1] and thus no problem arises for method 1.An analogous observation can establish the applicability of method 5 and subse-quently shows that the remaining methods would lead either α or β to be smallerthan −1 are thus are not valid in this context.

The convergence profiles for GJ and HHT are illustrated in Figure 5.1. Thegraphs report the evolution of the relative error with respect to the result obtainedby invoking the sharp function, which is considered exact. Comparing the twographs, the most striking feature is the huge difference in convergence speed, whichis much greater for HHT than for GJ. In fact, even for a small matrix of size900 × 900 the latter method requires more than 200 iterations to reach singleprecision accuracy whereas the former achieves it in slightly more than 20.

Moreover, let us note that for GJ the choice of the method (1 or 5) does notinfluence the convergence speed in any way, even though it has some bearing onthe execution time, since the sparser is the matrix, the fastest are the several linearsystem solutions and thus the overall execution. On the other hand, the exponentof the matrix power seems to be important not only for the convergence speed butalso for the accuracy of the method. Indeed, regarding HHT, we can readily noticethat, although there is no difference for methods 1 and 5, as it was the case for GJ,method 7 achieves the same accuracy with a few iterations more whereas method3, besides being even slower, cannot reach the precision of the squared root of themachine precision.

To assess the execution time, it is mandatory to choose a stopping criterion.In our examples we just check the difference among two consecutive iterations

Page 55: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

5.3. KRYLOV SUBSPACE METHODS 45

and stop when it goes below a certain tolerance parameter which, for all theexperiments of this chapter, we always set to be 10−10. An iteration of any ofthe quadrature methods, i.e. the evaluation of the quadrature for a given p, isquite long, and increasing by one the number of nodes at each step makes thealgorithm utterly slow and useless even for matrices much smaller than the oneswe are considering. Thus, observing that there is no intermediate result that canbe reused among consecutive iterations, we choose not to increase the numberof nodes linearly, but to double it at each iteration. This choice, experimentallysound, has one serious drawback, since we double the number of nodes and thusof matrix-vector operations involved even when just a few more nodes would havebeen enough. This is not an impeding problem in our context, since we are mainlyinterested in assessing the relative performance of the two quadrature techniquesfor the computation of the weighted matrix mean times a vector, but that thisis not very handy for real-use implementations. An optimization of the strategyto select the number of nodes for the next iteration is currently under study andmight be the subject of future work.

Table 5.3 compares the execution time of the implementations of GJ and HTTtuned as described above. The difference in terms of performance between the twoalgorithms is remarkable, since HHT never requires more than one fourth of theiterations of GJ, and this reflects on the overall execution time of the algorithm,in line with what suggested by the results in Figure 5.1. The position of theill-conditioned matrix seem to have some bearing on the time required by thecomputation. Indeed, when the ill-conditioned matrix is the second argument, theconvergence is slower than that of the specular case, and the execution time is thusgreater.

Therefore, our results do not indicate any use case where GJ is better thanHHT, but on the contrary show that the former is impractical even when thematrices involved in the computation are of modest size.

5.3 Krylov subspace methods

This section is devoted to an experimental evaluation of the three Krylov sub-space approaches described in Chapter 4. For the sake of brevity, in the re-mainder we adopt the notation PKSM, EKSM and RKSM for the polynomial,extended and rational Krylov approximations, respectively. Figure 5.2 and Fig-ure 5.3 show the convergence profile of the 48 variants we examine, grouped by kindof Krylov subspace. The problems considered of the computation of (PW#tPI)vand (PI#tPW )v, for m = 30 and thus n = 900.

Let us start by examining Figure 5.2. For the three cases on the left, the eightmethods exhibit different behaviours but are always convergent and apparently

Page 56: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

46 CHAPTER 5. EXPERIMENTAL RESULTS

Polynomial Extended Rationaln m T m T m T

900 272 24.396 s 39 1.165 s 27 0.530 s1600 436 98.789 s 54 3.473 s 29 1.092 s2500 539 200.106 s 68 8.860 s 30 1.927 s3600 597 296.708 s 81 19.240 s 33 3.366 s4900 672 459.632 s 93 38.056 s 32 4.124 s6400 704 586.150 s 106 71.763 s 36 6.714 s8100 756 940.443 s 117 119.562 s 34 10.022 s10000 800 1358.198 s 128 192.617 s 33 11.559 s

Table 5.4: Number of iterations (m) and seconds (T ) required to compute asolution of prescribed accuracy with PKSM, EKSM and RKSM, for t = 0.8.

stable once the solution is reached. Moreover, for each kind of Krylov subspace,it is easy to group the methods into two classes which show different convergencespeed (Figure 5.2a) or different accuracy (Figure 5.2c and Figure 5.2e). In fact, letus note that the four worse methods are the ones which require to invert the ill-conditioned matrix PI . The remaining four have comparable profiles with the onlyexception of method 2 in Figure 5.2a. This seems to indicate that trying to invertthe better conditioned matrix has benefits not only in terms of final accuracy, butalso in terms of stability and convergence properties. A comparison with the firstcolumn of Figure 5.3 strengthens this impression, since the less satisfying methodsbecome in general the most promising by just switching the first and the secondoperand of the weighted mean.

Let us now focus the attention on the right column ot the two illustrations,which consider the Lanczos-like version of the corresponding method on the left. Itis immediate to see that the behaviour of these implementations is generally worsethan that of their counterpart. In fact no extended Krylov subspace method con-verge, four of the rational ones start diverging after having reached the solutionand none of them achieves the accuracy of the Arnoldi version. Only the perfor-mance of the algorithms in Figure 5.2b and Figure 5.3b are actually comparable,but the extremely slow convergence rate makes this Lanczos-like algorithm of lim-ited practical interest. Hence, we choose to consider just the algorithms with fullorthogonalization, i.e. the ones on the left, to try to determine which one is thebest from a performance standpoint.

In Table 5.4, we report the execution time of the three implementations ofmethod 8, which seems to be the best overall. We examine the computation of(PW#tPI)v on matrices whose size ranges from 100 to 2500. For the sake of

Page 57: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

5.3. KRYLOV SUBSPACE METHODS 47

comparison, we put in the table also the execution time of the dense method, thatsolve the problem by means of the trivial MATLAB line

sharp (full (A), full(B), rho) * v

where A = PW , B = PI and v is a vector of size n whose entries are all ones.The picture that the table gives is clear. The rational Krylov method with

adaptive pole selection is by far the best choice for our test matrices, and is thefastest in terms of both number of iterations and execution time. EKSM is con-siderably slower than RKSM on our test set. Let us stress that at each iterationthe algorithm adds two new vectors, thus the dimension of the space generatedin m steps is 2m, which implies that the space needed by EKSM is up to almosteight times bigger than that required by RKSM. Finally, PKSM is of no use forour problem, since it requires considerably more time than the dense approach ofsharp.

Page 58: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

48 CHAPTER 5. EXPERIMENTAL RESULTS

50 100 150 20010−16

10−12

10−8

10−4

100

(a) Polynomial Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(b) Polynomial Krylov (short)

50 100 150 20010−16

10−12

10−8

10−4

100

(c) Extended Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(d) Extended Krylov (short)

50 100 150 20010−16

10−12

10−8

10−4

100

(e) Rational Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(f) Rational Krylov (short)

Figure 5.2: Convergence profile of our implementations for the computationof (PW#tPI)v. The implementations on the left use full reorthogonalization,whereas the ones on the right exploit Lanczos-like short recurrences.

Page 59: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

5.3. KRYLOV SUBSPACE METHODS 49

50 100 150 20010−16

10−12

10−8

10−4

100

(a) Polynomial Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(b) Polynomial Krylov (short)

50 100 150 20010−16

10−12

10−8

10−4

100

(c) Extended Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(d) Extended Krylov (short)

50 100 150 20010−16

10−12

10−8

10−4

100

(e) Rational Krylov

50 100 150 20010−16

10−12

10−8

10−4

100

(f) Rational Krylov (short)

Figure 5.3: Convergence profile of our implementations for the computationof (PI#tPW )v. The implementations on the left use full reorthogonalization,whereas the ones on the right exploit Lanczos-like short recurrences.

Page 60: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

50 CHAPTER 5. EXPERIMENTAL RESULTS

Page 61: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Chapter 6

Conclusion

In this thesis, we consider a matrix operation which arises in domain decompositionof elliptic problems, namely the product of the weighted geometric mean of largepositive definite matrices times a vector. First, we derive several algorithms, basedon quadrature formulas and Krylov subspace methods, and illustrate their meritsand weaknesses, pinpointing a few straightforward optimizations. Then, we facethe challenges posed by the implementation and compare their performance interms of convergence speed and execution time.

According to our results, although both families of techniques are well suited forthe solution of the problem, the Krylov subspace approach appears to be superior.The rational Krylov method is the best choice disregarding the size of the inputmatrices. It is more than five times faster than the best quadrature formula,based on contour integrals, and seventeen time faster than the third best solution,which exploits the extended Krylov subspace. The remaining methods we analyseare considerably slower and thus are of little interest for the computation we areinterested in.

Nevertheless, we are convinced that optimized implementations of the quadra-ture techniques are likely to make these much more interesting. In fact, thesemethods have the great advantage that the number of nodes does not need to beincreased little by little, as it is the case for the Krylov subspace methods, anda heuristic to assess the number of nodes which gives the required precision injust one attempt could lead to a considerable speed-up. In particular, preliminaryevaluations show that the GJ approach with such an optimization could achievean execution time comparable with that of the extended Krylov subspace method,but with considerably lower memory requirements.

Finally, an aspect which could be of interest but has not been addressed inthis thesis is a theoretical analysis of the convergence of the methods we propose.As it is clear, the convergence rate does not directly influence the performanceof the implementations, since the cost per iteration, which is as much important

51

Page 62: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

52 CHAPTER 6. CONCLUSION

in assessing the final execution time, vary greatly among the considered methods.Nevertheless, this latter value can be estimated a priori and this knowledge canhelp in foreseeing the expected runtime of an algorithm, and thus in determiningwithout simulations the fastest method for a given problem.

Page 63: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Appendix A

MATLAB implementations

A.1 Polynomial Krylov method

1 function y = gen_arnoldi (X, Y, M, v, f, delta, tol, max_iter, symm)2 % function y = gen_arnoldi (S, P, Q, M, v, f, max_iter, symm)3 %4 % Compute an M-orthogonal basis of the generalized Krylov subspace5 % K_k(P, Q, v) = {v, P \ (Q v), ..., (P \ Q)^k v)}6 % well as the generalized Arnoldi decomposition of P and Q.7 % The optional argument symm can be either 0 (use Arnoldi8 % algorithm) or 1 (use Lanczos-like recursion).9

10 n = size (X, 1);11 n_reorth = 2;12 itol = 1e-15;13

14 % Choose between Arnoldi and Lanczos15 if (symm == 1)16 k_max = 4;17 else18 k_max = n;19 end20

21 k1 = max_iter + 1;22 V = zeros (n, k1);23 H = zeros (k1, max_iter);24 D = zeros (n, max_iter);25 [LX UX] = lu (X);26

27 % First column of V28 V(:, 1) = v / sqrt (v' * M * v);29

53

Page 64: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

54 APPENDIX A. MATLAB IMPLEMENTATIONS

30 for j = 1 : max_iter;31 j1 = j + 1;32 it = 0;33 t = 0.;34 V(:, j1) = UX \ (LX \ (Y * V(:, j)));35 i_min = max (1, j - k_max);36 while (it < n_reorth)37 it = it + 1;38 for i = i_min : j39 t = V(:, i)' * M * V(:, j1);40 H(i, j) = H(i, j) + t;41 V(:, j1) = V(:, j1) - t * V(:, i);42 end;43 t = sqrt (V(:, j1)' * M * V(:, j1));44 end45 H(j1, j) = t;46

47 % Normalize48 if (t > itol)49 V(:, j1) = V(:, j1) / t;50 end51

52 % compute f(H)53 fH = f (H(1 : j, 1 : j));54 D (:, j) = V(:, 1 : j) * (fH(:,1) * sqrt (v' * M * v));55

56 % Estimate the residual during last step and return current y57 if (j >= delta + 1)58 dt = norm (D (:, j) - D (:, j - delta)) / norm (D (:, j - ...

delta));59 if (abs (dt / (1 - dt)) <= tol)60 y = D (:, j);61 return62 end63 end64 end

Page 65: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

A.2. EXTENDED KRYLOV METHOD 55

A.2 Extended Krylov method

1 function y = gen_kpik (X, Y, M, v, f, delta, tol, max_iter, symm)2 % function y = gen_kpik (X, Y, M, v, f, delta, tol, max_iter, symm)3 %4 % Compute an M-orthogonal basis of the generalized Krylov plus5 % inverted Krylov subspace6 % K_k(Q, P, v) = {v, Q \ (P v), P \ (Q v), ..., (Q \ P)^(k + 1) v}7 % well as the generalized extended Arnoldi decomposition of P and Q.8 % The value of the argument symm can be either 0 (use Arnoldi9 % algorithm) or 1 (use Lanczos-like recursion).

10

11 n = size (v, 1);12 n_reorth = 2;13

14 % Compute the maximum number of steps15 m_max = min (floor (n / 2), max_iter);16 m_max2 = 2 * m_max;17 m_max22 = m_max2 + 2;18

19 if (symm == 1)20 k_max = 4;21 else22 k_max = m_max2;23 end24

25 [LP UP] = lu (X);26 [LQ UQ] = lu (Y);27

28 v1 = v;29 v2 = UQ \ (LQ \ (X * v));30 Vp = [v1 v2];31 Vp = Vp / chol (Vp' * M * Vp);32 V(:, 1 : 2) = Vp;33

34 T = zeros (m_max22, m_max2);35 D = zeros (n, m_max);36

37 % choose the correct term for the computation of T38 MmVm = zeros (n, m_max2);39 if (M == X)40 MmVm(:, 1 : 2) = Y * V(:, 1 : 2);41 else42 MmVm(:, 1 : 2) = (M * (UP \ (LP \ (Y * V(:, 1 : 2)))));43 end44 T(1 : 2, 1 : 2) = V(:, 1 : 2)' * MmVm(:, 1:2);45

Page 66: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

56 APPENDIX A. MATLAB IMPLEMENTATIONS

46 for j = 1 : m_max,47 j2 = j * 2;48 j22 = j2 + 2;49 j2m1 = j2 - 1;50 j21 = j2 + 1;51

52 Vp(:, 1) = UP \ (LP \ (Y * V(:, j2m1)));53 Vp(:, 2) = UQ \ (LQ \ (X * V(:, j2)));54

55 % Extend orthogonal base56 for l = 1 : n_reorth57 k_min = max (1, j - k_max);58 for kk = k_min : j59 k1 = (kk - 1) * 2 + 1;60 k2 = kk * 2;61 coef = V(1 : n, k1 : k2)' * M * Vp;62 Vp = Vp - V(:, k1 : k2) * coef;63 end64 end65

66 Vp = Vp / chol(Vp' * M * Vp);67 V(:, j21 : j22) = Vp;68

69 % Incremental update of T70 if (j ~= m_max)71 if (M == X)72 Vc = Y * V(:, j21:j22);73 else74 Vc = (M * (UP \ (LP \ (Y * V(:, j21:j22)))));75 end76 T(1 : j2, j21 : j22) = V(:, 1 : j2)' * Vc;77 T(j21 : j22, 1 : j2) = V(:, j21 : j22)' * MmVm (:, 1:j2);78 T(j21 : j22, j21 : j22) = V(:, j21 : j22)' * Vc;79 MmVm(:, j21 : j22) = Vc;80 end81

82 fT = f (T(1 : j2, 1 : j2));83 D (:, j) = V(:, 1 : j2) * (fT(:, 1) * sqrt (v' * M * v));84

85 % Estimate the residual during last step and return current u86 if (j >= delta + 1 && running)87 dt = norm (D(:, j) - D(:, j - delta)) / norm (D (:, j - ...

delta));88 if (abs (dt / (1 - dt)) <= tol)89 y = D (:, j);90 return91 end92 end93 end

Page 67: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

Bibliography

[1] D. Bini and B. Iannazzo. The Matrix Means Toolbox .http://bezout.dm.unipi.it/software/mmtoolbox/. Last accessed: 08–03–2015.

[2] G. Szegő. Orthogonal Polynomials . American Math. Soc: Colloquium publvol. 23, ptie. 2. American Mathematical Society, 1939.

[3] E. Artin. The Gamma Function. Athena Series: Selected Topics in Mathe-matics. Holt, Rinehart and Winston, 1964.

[4] J. L. Walsh. Interpolation and Approximation by Rational Functions in theComplex Domain. Providence: American Mathematical Society, 1965.

[5] J. R. Rice. “Experiments on Gram–Schmidt Orthogonalization”. Mathemat-ics of Computation 20 (1966), pp. 325–328.

[6] A. Stroud and D. Secrest. Gaussian Quadrature Formulas . Prentice-Hall Se-ries in Automatic Computation. Englewood Cliffs, NJ: Prentice-Hall, 1966.IX, 374.

[7] G. H. Golub and J. H. Welsch. “Calculation of Gauss Quadrature Rules”.Mathematics of Computation 23.106 (1969), pp. 221–230.

[8] N. N. Abdelmalek. “Round off Error Analysis for Gram–Schmidt Methodand Solution of Linear Least Squares Problems”. English. BIT NumericalMathematics 11.4 (1971), pp. 345–367.

[9] M. Abramowitz and I. Stegun. Handbook of Mathematical Functions . TenthEdition. New York: Dover Publications Inc., 1972.

[10] A. Gourlay and G. Watson. Computational methods for matrix Eigenprob-lems . Wiley, 1973.

[11] C. Curtis. Linear algebra: an introductory approach. Allyn and Bacon, 1974.

[12] A. Ralston and P. Rabinowitz. A First Course in Numerical Analysis . SecondEdition. Mineola, New York: Dover Publications, 1978.

[13] P. J. Davis and P. Rabinowitz. Methods of Numerical Integration: SecondEdition. Dover Publications, 1984.

57

Page 68: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

58 BIBLIOGRAPHY

[14] A. Ruhe. “Rational Krylov Sequence Methods for Eigenvalue Computation”.Linear Algebra and its Applications 58.0 (1984), pp. 391–405.

[15] P. Lancaster and M. Tismenetsky. The Theory of Matrices: With Applica-tions . Computer Science and Scientific Computing Series. Academic Press,1985.

[16] W. Rudin. Real and Complex Analysis . Third edition. New York, USA:McGraw-Hill, Inc., 1987.

[17] R. Remmert and R. B. Burckel. Theory of Complex Functions . Graduatetexts in mathematics. Translation of: Funktionentheorie I. 2nd ed. New York,Berlin: Springer-Verl, 1991.

[18] S. Lang. Complex analysis . Graduate texts in mathematics. Springer, 1993.

[19] R. Barrett et al. Templates for the Solution of Linear Systems: BuildingBlocks for Iterative Methods . 1994.

[20] A. Ruhe. “Rational Krylov Algorithms for Nonsymmetric Eigenvalue Prob-lems. II. Matrix Pairs”. Linear Algebra and its Applications 197–198.0 (1994),pp. 283–295.

[21] T. A. Driscoll. “Algorithm 756: A MATLAB Toolbox for Schwarz–ChristoffelMapping”. ACM Trans. Math. Softw. 22.2 (1996), pp. 168–186.

[22] G. H. Golub and C. F. Van Loan. Matrix Computations . Third edition.Baltimore, MD, USA: Johns Hopkins University Press, 1996.

[23] N. J. Higham. “Stable Iterations for the Matrix Square Root”. English. Nu-merical Algorithms 15.2 (1997), pp. 227–242.

[24] L. Trefethen and D. Bau. Numerical Linear Algebra. Society for Industrialand Applied Mathematics, 1997.

[25] V. Druskin and L. Knizhnerman. “Extended Krylov Subspaces: Approxima-tion of the Matrix Square Root and Related Functions”. SIAM Journal onMatrix Analysis and Applications 19.3 (1998), pp. 755–771.

[26] K. Petras. “On the Computation of the Gauss-Legendre Quadrature Formulawith a Given Precision”. Journal of Computational and Applied Mathematics112.1-2 (1999), pp. 253–267.

[27] T. Driscoll and L. Trefethen. Schwarz–Christoffel Mapping . Cambridge Mono-graphs on Applied and Computational Mathematics. Cambridge UniversityPress, 2002.

[28] L. Giraud and J. Langou. “When Modified Gram–Schmidt Generates a Well-conditioned Set of Vectors”. IMA Journal of Numerical Analysis 22.4 (2002),pp. 521–528.

Page 69: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

BIBLIOGRAPHY 59

[29] N. Higham. Accuracy and Stability of Numerical Algorithms . Second. Societyfor Industrial and Applied Mathematics, 2002.

[30] Y. Saad. Iterative Methods for Sparse Linear Systems . 2nd. Philadelphia,PA, USA: Society for Industrial and Applied Mathematics, 2003.

[31] T. A. Driscoll. “Algorithm 843: Improvements to the Schwarz–ChristoffelToolbox for MATLAB”. ACM Trans. Math. Softw. 31.2 (June 2005), pp. 239–251.

[32] J. van den Eshof and M. Hochbruck. “Preconditioning Lanczos Approxima-tions to the Matrix Exponential”. SIAM Journal on Scientific Computing27.4 (2006), pp. 1438–1457.

[33] R. E. Greene and S. G. Krantz. Function Theory of One Complex Vari-able. Third Edition. Vol. 40. Graduate Studies in Mathematics. AmericanMathematical Society, Providence, RI, 2006, pp. x+504.

[34] M. Haase. “The Functional Calculus for Sectorial Operators”. English. TheFunctional Calculus for Sectorial Operators. Vol. 169. Operator Theory: Ad-vances and Applications. Birkhäuser Basel, 2006, p. 278.

[35] R. Bhatia. Positive Definite Matrices . Princeton Series in Applied Mathe-matics. Princeton, NJ, USA: Princeton University Press, 2007.

[36] A. Glaser, X. Liu, and V. Rokhlin. “A Fast Algorithm for the Calculationof the Roots of Special Functions”. SIAM Journal on Scientific Computing29.4 (June 2007), pp. 1420–1438.

[37] V. Simoncini. “A New Iterative Method for Solving Large-Scale LyapunovMatrix Equations”. SIAM Journal on Scientific Computing 29.3 (2007),pp. 1268–1288.

[38] M. Arioli and D. Loghin. Discrete Fractional Sobolev Norms for DomainDecomposition Preconditioning . Tech. rep. RAL-TR-2008-031. Science andTechnology Facilities Council, 2008.

[39] N. Hale, N. J. Higham, and L. N. Trefethen. “Computing Aα , log(A) and Re-lated Matrix Functions by Contour Integrals”. SIAM Journal on NumericalAnalysis 46.5 (2008), pp. 2505–2523.

[40] N. J. Higham. Functions of Matrices: Theory and Computation. Philadel-phia, PA, USA: Society for Industrial and Applied Mathematics, 2008.

[41] M. Arioli and D. Loghin. “Discrete Interpolation Norms with Applications”.SIAM Journal on Numerical Analysis 47.4 (2009), pp. 2924–2951.

[42] C. Jagels and L. Reichel. “Recursion Relations for the Extended KrylovSubspace Method”. Linear Algebra and Its Applications 431.3–4 (July 2009),pp. 441–458.

Page 70: WEIGHTEDGEOMETRICMEAN OFLARGE-SCALEMATRICES ...

60 BIBLIOGRAPHY

[43] C. Jagels and L. Reichel. “The Extended Krylov Subspace Method and Or-thogonal Laurent polynomials”. Linear Algebra and Its Applications 431.3–4(July 2009), pp. 441–458.

[44] V. Druskin, C. Lieberman, and M. Zaslavsky. “On Adaptive Choice of Shiftsin Rational Krylov Subspace Reduction of Evolutionary Problems”. SIAMJournal on Scientific Computing 32.5 (2010), pp. 2485–2496.

[45] S. Güttel. “Rational Krylov Methods for Operator Functions”. PhD thesis.Technische Universität Bergakademie Freiberg, 2010.

[46] L. Knizhnerman and V. Simoncini. “A New Investigation of the ExtendedKrylov Subspace Method for Matrix Function Evaluations”. Numerical Lin-ear Algebra with Applications 17.4 (2010), pp. 615–638.

[47] V. Druskin and V. Simoncini. “Adaptive Rational Krylov Subspaces forLarge-Scale Dynamical Systems”. Systems & Control Letters 60.8 (2011),pp. 546–560.

[48] S. Güttel and L. Knizhnerman. “Automated Parameter Selection for RationalArnoldi Approximation of Markov Functions”. PAMM 11.1 (2011), pp. 15–18.

[49] S. Güttel. “Rational Krylov Approximation of Matrix Functions: Numeri-cal Methods and Optimal Pole Selection”. GAMM-Mitteilungen 36.1 (2013),pp. 8–31.

[50] N. Hale and A. Townsend. “Fast and Accurate Computation of Gauss–Legendre and Gauss–Jacobi Quadrature Nodes and Weights”. SIAM Journalon Scientific Computing 35.2 (2013), A652–A674.

[51] J. Lawson and Y. Lim. “Weighted means and Karcher equations of positiveoperators.” Proceedings of the National Academy of Sciences USA 110.39(2013), pp. 15626–15632.

[52] T. A Driscoll, N. Hale, and L. N. Trefethen. Chebfun Guide. Pafnuty Publi-cations, 2014.

[53] D. Loghin. Personal communication.