Approximation of Inverses of BTTB Matrices › ws › files › 72339340 ›...

Eindhoven University of Technology

MASTER

Approximation of inverses of BTTB matricesfor preconditioning applications

Schneider, F.S.

Award date:2017

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/approximation-of-inverses-of-bttb-matrices(7552a36c-8066-4e39-84fa-704620f98b09).html

APPROXIMATION OFINVERSES OF BTTB

MATRICES

for Preconditioning Applications

M A S T E R T H E S I S

by

Frank Schneider

December 2016

Dr. Maxim PisarencoDepartment of Research

ASML Netherlands B.V., Veldhoven

Dr. Michiel HochstenbachDepartment of Mathematics and Computer Science

Technische Universiteit Eindhoven (TU/e)

Prof. Dr. Bernard HaasdonkInstitute of Applied Analysis and Numerical Simulation

Universität Stuttgart

APPROXIMATION OF INVERSES

of

BTTB MATRICES

m

for Preconditioning Applications

Frank Schneider

December 2016

Submitted in partial fulfillment of the requirementsfor the degree of Master of Science (M.Sc) in Industrial and Applied Mathematics (IAM)

to the

Department of Mathematics and Computer ScienceTechnische Universiteit Eindhoven (TU/e)

as well as for the degree of Master of Science (M.Sc) in Simulation Technology

to the

Institute of Applied Analysis and Numerical SimulationUniversität Stuttgart

The work described in this thesis has been carried out under the auspices of

- Veldhoven, The Netherlands.

A B S T R A C T

The metrology of integrated circuits (ICs) requires multiple solutionsof a large-scale linear system. The time needed for solving this sys-tem, greatly determines the number of chips that can be processedper time unit.

Since the coefficient matrix is partly composed of block-Toeplitz-Toeplitz-block (BTTB) matrices, approximations of its inverse areinteresting candidates for a preconditioner.

In this work, different approximation techniques such as an ap-proximation by sums of Kronecker products or an approximationby inverting the corresponding generating function are examined andwhere necessary generalized for BTTB and BTTB-block matrices. Thecomputational complexity of each approach is assessed and their uti-lization as a preconditioner evaluated.

The performance of the discussed preconditioners is investigatedfor a number of test cases stemming from real life applications.

v

A C K N O W L E D G E M E N T

First and foremost I wish to thank my supervisor from ASML MaximPisarenco. Maxim has supported me not only by providing valuablefeedback over the course of the thesis, but by being always there toanswer all my questions. He guided the thesis while allowing mefreedom to explore the areas that tempted me the most.

I also want to thank my supervisor from the TU/e Michiel Hochsten-bach who was an excellent resource of knowledge, academically andemotionally. Thank you for all the helpful feedback not only regard-ing the work and the thesis, but also regarding future plans.

I owe thanks to the members of my thesis committee, professorBarry Koren and Martijn van Beurden from the TU/e and professorBernard Haasdonk from the university of Stuttgart. Thank you foryour valuable guidance and insightful comments.

Thank you very much, everyone!

Frank Schneider

Eindhoven, December 28, 2016.

vii

C O N T E N T S

i introduction 1

1 motivation 3

1.1 Photolithography . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Metrology . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Other Applications . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Deblurring Images . . . . . . . . . . . . . . . . . 9

1.2.2 Further Applications . . . . . . . . . . . . . . . . 11

2 linear systems 13

2.1 Iterative Solvers . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 CG Method . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Other Methods . . . . . . . . . . . . . . . . . . . 16

2.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . 17

3 toeplitz systems 19

3.1 Multi-level Toeplitz Matrices . . . . . . . . . . . . . . . 21

3.2 Circulant Matrices . . . . . . . . . . . . . . . . . . . . . 23

3.3 Hankel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 problem description 25

4.1 Full Problem . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 BTTB-Block System . . . . . . . . . . . . . . . . . . . . . 28

4.3 BTTB System . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 thesis overview 31

ii preconditioners 33

6 overview over the preconditioning techniques 35

7 full c preconditioner 37

7.1 Application to Full Problem . . . . . . . . . . . . . . . . 37

7.1.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . 37

7.1.2 MVP . . . . . . . . . . . . . . . . . . . . . . . . . 37

8 circulant approximation 39

8.1 Circulant Approximation for Toeplitz Matrices . . . . 39

8.1.1 Circulant Preconditioners . . . . . . . . . . . . . 40

8.2 Circulant Approximation for BTTB Matrices . . . . . . 42

8.2.1 Toeplitz-block Matrices . . . . . . . . . . . . . . 43

8.2.2 Block-Toeplitz Matrices . . . . . . . . . . . . . . 43

8.3 Application to BTTB-block Matrices . . . . . . . . . . . 45

8.3.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . 45

8.3.2 MVP . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 inverse generating function approach 47

9.1 Inverse Generating Function for Toeplitz and BTTBMatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

ix

9.1.1 Unknown Generating Function . . . . . . . . . . 48

9.1.2 Numerical Integration for Computing the FourierCoefficients . . . . . . . . . . . . . . . . . . . . . 49

9.1.3 Numerical Inversion of the Generating Function 50

9.1.4 Example . . . . . . . . . . . . . . . . . . . . . . . 51

9.1.5 Efficient Inversion and MVP . . . . . . . . . . . 51

9.2 Inverse Generating Function for BTTB-block Matrices . 52

9.2.1 General Approach . . . . . . . . . . . . . . . . . 53

9.2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . 53

9.2.3 Proof of Clustering of the Eigenvalues . . . . . . 57

9.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . 63

9.3 Regularizing Functions . . . . . . . . . . . . . . . . . . . 63

9.4 Numerical Experiments . . . . . . . . . . . . . . . . . . 65

9.4.1 Convergence of the IGF . . . . . . . . . . . . . . 65

9.4.2 IGF for a BTTB-block Matrix . . . . . . . . . . 66

10 kronecker product approximation 69

10.1 Optimal Approximation for BTTB Matrices . . . . . . . 69

10.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . 71

10.1.2 Inverse and MVP . . . . . . . . . . . . . . . . . . 71

10.2 BTTB-block Matrices . . . . . . . . . . . . . . . . . . . . 73

10.2.1 One Term Approximation . . . . . . . . . . . . . 75

10.2.2 Multiple Terms Approximation . . . . . . . . . . 77

10.3 Numerical Experiments . . . . . . . . . . . . . . . . . . 79

10.3.1 Convergence of the Kronecker Product Approx-imation . . . . . . . . . . . . . . . . . . . . . . . . 79

10.3.2 Decay of Singular Values . . . . . . . . . . . . . 80

10.3.3 Relation to the Generating Function . . . . . . . 80

11 more ideas 83

11.1 Transformation Based Preconditioners . . . . . . . . . . 83

11.1.1 Discrete Sine and Cosine Transform . . . . . . . 83

11.1.2 Hartley Transform . . . . . . . . . . . . . . . . 84

11.2 Banded Approximations . . . . . . . . . . . . . . . . . . 85

11.3 Koyuncu Factorization . . . . . . . . . . . . . . . . . . . 86

11.4 Low-Rank Update . . . . . . . . . . . . . . . . . . . . . . 88

iii benchmarks 91

12 benchmarks 93

12.1 Transformation-based Preconditioner . . . . . . . . . . 99

12.2 Kronecker Product Approximation . . . . . . . . . . . 100

12.3 Inverse Generating Function . . . . . . . . . . . . . . . . 101

12.4 Banded Approximation . . . . . . . . . . . . . . . . . . 103

iv conclusion 105

13 future work 107

13.1 Inverse Generating Function . . . . . . . . . . . . . . . . 107

13.1.1 Regularization . . . . . . . . . . . . . . . . . . . . 107

13.1.2 Other Kernels . . . . . . . . . . . . . . . . . . . . 107

x

13.2 Kronecker Product Approximation . . . . . . . . . . . 108

13.2.1 Using a Common Basis . . . . . . . . . . . . . . 108

13.3 Preconditioner Selection . . . . . . . . . . . . . . . . . . 108

14 conclusion 111

v appendix 113

a inversion formulas for kronecker product ap-proximation 115

a.1 One Term Approximation . . . . . . . . . . . . . . . . . 115

a.1.1 Sum Approximation . . . . . . . . . . . . . . . . 115

a.2 Multiple Terms Approximation . . . . . . . . . . . . . . 119

a.2.1 Sum Approximation . . . . . . . . . . . . . . . . 119

bibliography 123

xi

L I S T O F F I G U R E S

Figure 1.1 Moore’s law. . . . . . . . . . . . . . . . . . . . . 3

Figure 1.2 Photolithographic process. . . . . . . . . . . . . 4

Figure 1.3 Close-up of a wafer. . . . . . . . . . . . . . . . . 5

Figure 1.4 Effect of focus on the gratings. . . . . . . . . . 6

Figure 1.5 Indirect grating measurement. . . . . . . . . . . 6

Figure 1.6 Shape parameters for a trapezoidal grating. . . 7

Figure 1.7 Example for a PSF. . . . . . . . . . . . . . . . . 10

Figure 1.8 Blurring problem. . . . . . . . . . . . . . . . . . 10

Figure 2.1 Minimization function. . . . . . . . . . . . . . . 14

Figure 2.2 Convergence of gradient descent and conjugategradient (CG) method for different functions φ. 15

Figure 2.3 Preconditioner trade off. . . . . . . . . . . . . . 18

Figure 4.1 Sparsity patterns of the matrices C,G and Gas well as the resulting matrix A. . . . . . . . . 25

Figure 4.2 Sparsity pattern of C. . . . . . . . . . . . . . . . 26

Figure 4.3 Color plots of all levels of C. . . . . . . . . . . 27

Figure 8.1 Color plots for a Toeplitz-block matrix and itscirculant-block approximation. . . . . . . . . . 43

Figure 8.2 Color plots for a block-Toeplitz matrix and itsblock-circulant approximation. . . . . . . . . . 44

Figure 9.1 Illustration of the inverse generating functionapproach (marked in red). . . . . . . . . . . . . 48

Figure 9.2 Illustration of the inverse generating functionapproach for unknown generating functions,with the changes marked in red. . . . . . . . . 49

Figure 9.3 Illustration of the inverse generating functionapproach with numerical Integration (highlightedin red). . . . . . . . . . . . . . . . . . . . . . . . 50

Figure 9.4 Illustration of the inverse generating functionapproach using a sampled generating function. 51

Figure 9.5 Color plots for the inverse of the original BTTBmatrix, T [f]−1, the result of the inverse gener-ating function method T [1/f] and the differencebetween those two. . . . . . . . . . . . . . . . . 52

Figure 9.6 Illustration of the inverse generating functionfor Toeplitz-block matrices. . . . . . . . . . . . 53

Figure 9.7 Color plots for the inverse of the original 2× 2BTTB-block matrix, T [F (x,y)]−1, the result ofthe inverse generating function method T [1/F (x,y)]

and the difference of those two. . . . . . . . . . 64

xii

Figure 9.8 Degrees of regularization. . . . . . . . . . . . . 65

Figure 9.9 Convergence of the inverse generating func-tion (IGF) method towards the exact inverse. . 66

Figure 9.10 Distribution of eigenvalues for the IGF. . . . . 68

Figure 10.1 Relative difference of the Kronecker productapproximation (using all terms) and the orig-inal BTTB matrix, for 500 randomly createdtest cases. . . . . . . . . . . . . . . . . . . . . . . 80

Figure 10.2 Decay of the singular value of a sample test case. 81

Figure 10.3 Relation of the Kronecker product approx-imation and the generating functions (takenfrom the test case 1b). . . . . . . . . . . . . . . . 82

Figure 10.4 Convergence of the Generating Function. . . . 82

Figure 11.1 Color plots for a BTTB matrix and the ap-proximation resulting from discrete sine trans-form (DST) II. . . . . . . . . . . . . . . . . . . . 84

Figure 11.2 Color plots for a BTTB matrix and the approx-imation resulting from discrete cosine trans-form (DCT) II. . . . . . . . . . . . . . . . . . . . 84

Figure 11.3 Color plots for a BTTB matrix and the approx-imation resulting from a Hartley transforma-tion. . . . . . . . . . . . . . . . . . . . . . . . . . 85

Figure 11.4 Color plots for a BTTB matrix and a tridiago-nal approximation on both levels. . . . . . . . . 85

Figure 11.5 Relative difference of GM and UkSkVHk for

different values of k and four different test cases. 89

Figure 12.1 Box plots for the relative speed up of each pre-conditioner compared to the circulant precon-ditioner. . . . . . . . . . . . . . . . . . . . . . . . 98

xiii

L I S T O F TA B L E S

Table 4.1 Structure ofC on each level, from highest (levelZ) to lowest (level X). . . . . . . . . . . . . . . . 27

Table 4.2 Convergence rates (number of iterations) for(4.1) using induced dimension reduction (IDR)(6). 28

Table 6.1 Applicability of different preconditioning meth-ods. . . . . . . . . . . . . . . . . . . . . . . . . . 35

Table 9.1 The average number of iterations for 3×3 BTTBmatrix. . . . . . . . . . . . . . . . . . . . . . . . 68

Table 12.1 Color-code for the tables in the benchmark chap-ter. . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Table 12.2 Number of iterations if a selected precondi-tioner is used on a certain test case. . . . . . . . 97

Table 12.3 Number of iterations for transformation basedpreconditioners. . . . . . . . . . . . . . . . . . . 99

Table 12.4 Number of iterations for preconditioners basedon the Kronecker product approximation. . . 100

Table 12.5 Number of iterations for preconditioners basedon the Kronecker product approximation withapproximate singular value decomposition (SVD).102

Table 12.6 Number of iterations for preconditioners basedon the IGF. . . . . . . . . . . . . . . . . . . . . . 103

Table 12.7 Number of iterations for preconditioners basedon banded approximations. . . . . . . . . . . . 104

xiv

N O M E N C L AT U R E

vectors

x: The vector x =

x1

x2...

x2

with the elements x1, x2, · · · ∈ C.

Vector Norms

||x||p =

(n∑i=1

|xi|p

)1/pis the p-norm of x for 1 6 p 6∞.

||x||∞ = max16i6n

|xi| the∞-norm.

matrices

A(n): is a matrix of size n× n, with the entries (A)i,j = Ai,j =

ai,j for i, j = 1, . . . ,n.

A(n1;n2)i,j;k,l : is referring to the (k, l)th entry of the (i, j)th block of

matrix A, which as the size (n1 · n2)× (n1 · n2). Ai,j;:,: or Ai,j;is consequently referring to the (i, j)th block matrix of A.

A(name): endows the matrix with a certain name that helps un-derstanding its purpose.

Ti: A matrix with just one index is referring to an entry ofa Toeplitz, circulant or similar matrix that can be describedwith just few entries. This is also true for example for two-levelToeplitz matrices, where one entry can be reffered to as Ti;j.Note, that i and j can be negative, following the specific nomen-clature of the class of matrix.

Furthermore:

I is the identity matrix, where Ii,j = δij.

AT is the transposed matrix of A, where Ai,j

T= Aj,i.

A−1 is the inverse matrix of A, where AA−1 = A−1A = I .

xv

AH is the conjugated transposed matrix ofA, where (AH)i,j =

Aj,i. The overbar denotes the complex conjugate (a+ bi =

a− bi).

A square matrix is called symmetric if A = AT.

A matrix A is called positive definite if xHAx > 0 and real,for all non-zero vectors x.

A matrix A is called Hermitian or self-adjoint if A = AH.

λk denotes the eigenvalues of A, where Av = λkv.

κ(A) is the condition number of A, which is defined asκ(A) = ||A−1|| · ||A|| (usually using the 2-norm).

Matrix Norms

||A||p = supx 6=0

||Ax||p||x||p

, the by the vector norm ||x||p induced matrix

norm. In particular:

‖A‖1 = max16j6n

∑mi=1 |ai,j|, which is simply the maximum

absolute column sum of the matrix.

‖A‖∞ = max16i6m

∑nj=1 |ai,j|, which is simply the maximum

absolute row sum of the matrix.

‖A‖2 =√λmax(AHA).

‖A‖F =(∑m

i=1

∑nj=1 |ai,j|

2)1/2

, called Frobenius norm.

miscellaneous

δij the Kronecker delta, with δij =

0 if i 6= j ,

1 if i = j.

f ∈ O (()g) is equivalent to: for x → a < ∞, ∃ C > 0∃ε > 0∀x ∈x : d(x,a) < ε : |f(x)| 6 C · |g(x)|, known as big O notation.

i the imaginary number, where i2 = −1.

acronyms

BCCB block-circulant-circulant-block

BiCG biconjugate gradient

BiCGSTAB biconjugate gradient stabilized

BTTB block-Toeplitz-Toeplitz-block

xvi

CCD charge-coupled device

CG conjugate gradient

DCT discrete cosine transform

DFT discrete Fourier transform

DST discrete sine transform

DTT discrete trigonometric transform

FFT Fast Fourier transform

GMRES generalized minimal residual

HPD Hermitian positive definite

IC integrated circuit

IDR induced dimension reduction

IGF inverse generating function

MVP matrix-vector product

ODE ordinary differential equation

PDE partial differential equation

PSF point spread function

SVD singular value decomposition

xvii

Part I

I N T R O D U C T I O N

This part introduces the main application that motivatedthis master thesis. It explains the process of fabrication ofICs via photolithography and also includes a descriptionof a method of inspecting and monitoring the productionquality of the fabricated ICs. This metrology process re-quires the solution of large linear systems of equations.Before the structure of this particular linear system is fur-ther described along with two related (reduced) linear sys-tems, the basic terms concerning linear systems and theiriterative solvers are introduced. Furthermore, the idea ofpreconditioning is described along with the definition ofToeplitz systems.In the last chapter of this part, the main objectives of thismaster thesis are discussed along with the main results.This chapter concludes with an outline of the followingparts and chapters.

1M O T I VAT I O N

Following Moore’s Law (see Figure 1.1) , the performance of inte- In 1965, Gordon E.Moore proposed inan article that thenumber oftransistors that canbe packed into agiven unit of spacewill double roughlyevery two years [40].

grated circuits (ICs) has steadily increased and fueled what is knownas the "Digital Revolution". Due to the ever increasing complexity andavailability of digital electronics, influential developments such as thepersonal computer, the internet or the cellular phone have been madepossible and affected almost every area of our lives.

1970 1980 1990 2000 2010 2020103

104

105

106

107

108

109

1010

Intel 4004

Intel 8080

Intel 8086

Motorola 68020

Intel 80386

Intel 80486

PentiumPentium II

Pentium 4

Core 2 DuoAMD Phenom

Apple A7

SPARC M7

Year of Introduction

Num

ber

ofTr

ansi

stor

s

Figure 1.1: Moore’s law.This figure shows the number of transistors of landmark microprocessors

against their year of introduction. The line shows the proposed doubling intransistor count every two years.

An integrated circuit (IC) can be thought of as a very advanced,miniaturized electric circuit. Using transistors, resistors and capaci-tors as building blocks, one can implement the basic logical opera-tions: not, and, or, etc. On a higher level, this allows the construc-tion of complex circuits such as microprocessors or flash memories[42, 46].

Today, the world around us is full of integrated circuits and mi-croprocessors. One can find them in computers, smartphones, tele-visions, cars and almost every modern electrical device [42]. But theneed for more complex and powerful electronic devices is everlasting,with new computationally expensive areas such as computer simula-tions rising in importance. This motivates advances in photolithogra-phy, the main process of fabricating these integrated circuits (ICs).

3

4 motivation

1.1 photolithography

Si

Mask

UV Light

SiSiO2

SiSiO2

Photoresist

SiSiO2

Photoresist

SiSiO2

Pho toresist

SiSiO2

Photoresist

SiSiO2

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Figure 1.2:Photolithographicprocess.

The fabrication of ICs is a multi-billion dollar industry that requiresa tightly controlled production environment. Hundreds of these ICs

are produced at the same time on a thin slice of silicon, called a (sili-con) wafer and they are later cut apart into single IC chips. The oftencomplex and interconnected designs of the ICs are copied on a sili-con wafer in a process known as photolithography.

The steps of printing one layer of an IC onto the wafer, are visual-ized in Figure 1.2 (compare [27, 42, 46]):

(1) Prepare wafer: Prior to the use, the silicon wafer has to becleaned chemically.

(2) Deposit barrier layer: In the next step, the wafer is covered witha thin barrier layer, which is usually silicon dioxide (SiO2).

(3) Application of photoresist: After this, the wafer is coated witha light-sensitive material called photoresist.

(4) Mask alignment and exposure to UV light: The mask carry-ing the complex pattern of the IC is carefully aligned and thewhole wafer is exposed to high-intensity ultraviolet light. Thephotoresist is only exposed to the UV light in areas where themask is transparent, and the pattern of the mask gets “copied”onto the photoresist.

(5) Development: After developing the photoresist (similar to thedevelopment of photographic films) it washes away in areaswhere it has been exposed to the UV light (or vice versa fornegative photoresists), making the desired pattern visible on thewafer.

(6) Etching: Chemical etching is used to remove any barrier mate-rial (SiO2) not protected by the coating photoresist.

(7) Photoresist removal: In the last step, the photoresist is removedfrom the wafer, leaving just the barrier layer, with the desiredpattern.

This process is repeated for each layer of the IC. The number oflayers varies greatly, but lies usually between 20 and 40 [33]. Eachlayer is processed one after the other.

In order to produce a working IC, the mask as well as each layerneeds to be aligned with high precision in comparison to the waferand the underlying layers. Since the sizes of the structures on thewafer are in the magnitude of nanometer, this requires a complex

1.1 photolithography 5

process called metrology.

Metrology can also be used to extract information on the qualityof the photolithographic process, by measuring metrology targets orgratings that were printed between the actual chips.

1.1.1 Metrology

Step (4) in the photolithographic process requires not only a careful ASML is the largestsupplier in the worldofphotolithographicsystems for thesemiconductorindustry.

and precise alignment of the mask, but also the correct focus for theexposure, both of which are highly non-trivial tasks performed byhigh-tech lithography systems.

Because of that, small gratings between the chips on the waferare included, as test structures for quality control and high-precisionalignment, as seen in Figure 1.3.

Figure 1.3: Close-up of a wafer.The wafer (large) contains several chips, each with a complicated structure(top right). Between the chips gratings have been printed (bottom right) for

the purpose of quality control and alignment (source: [46]).

Since these gratings pass through the exact same production cycleas the actual chips, they show the same production biases or short-comings. However, it is easier to use the gratings as metrology tar-gets because of their easy periodic structure compared to the chip’scomplex architecture. The exact shape of the gratings contains infor-mation such as an incorrect focus (see Figure 1.4), over- or underex-posure, etc.

Because of the small size of the gratings (in the magnitude of 100

nm), classical optical microscopy is not usable. In 1873 Abbe [1] found

6 motivation

Figure 1.4: Effect of focus on the gratings.While the central gratings is a product of a wafer in focus, the other twogratings were the result of a lithographic process with an incorrect focus.They would both be considered of not reaching the quality standard and

therefore be sorted out (source: [46]).

that for light with wavelength λ, the resolution of the resulting pictureis at best

d =λ

2n sinΘ,

where n is the refractive index of the medium being imaged in and Θis the half-angle subtended by the optical objective lens. This meansthat the maximal resolution is bounded by c · λ with a c close to 1.To increase the resolution, shorter wavelengths such as UV-light andX-rays can be used.

(a) (b) (Simulated) output of theCCD

Figure 1.5: Indirect grating measurement.Subfigure (a) depicts the process of indirect grating measurement using thescattering of light (Source [46]). A (simulated) output of the CCD is shown

in subfigure (b) (source [33]).

On the other side, electron microscopy has its own drawbacks suchas being slow and potentially destructive [46]. Therefore, indirectmeasurements are preferred (see Figure 1.5a).

1.1 photolithography 7

For the indirect grating measurement, light is directed (throughfilters and and optical system) at the gratings. Depending on the grat-ings geometry, the light is scattered in a certain way. Part of the scat-tered light is captured by a CCD. Figure 1.5 illustrates the methodof indirect grating measurement and the light intensity measured bythe CCD.

This however does not give a direct access to the geometrical shapeof the gratings. The actual interest of the metrology step is to find outthe geometrical parameters p of the gratings.

width

height

angle

Figure 1.6:Shapeparameters.

For a trapezoidal grating for example, three shape parameters pkcan be used to describe the grating’s geometry: The height p1 of thegratings, the average width p2 of a grating and the angle of the sidewall p3 (see Figure 1.6).

To extract the geometrical parameters p of the gratings, an inversemodel of the scattering process is required:

• Forward problem - Scattering simulation: Given a certain shapep of the gratings, simulate the light intensities measured by thesensor I(p).

• Inverse problem - Profile reconstruction: Given a measuredlight intensity at the sensor ICCD (see Figure 1.5b), reconstructthe geometrical parameters p. This is done by computing This minimization is

realized using theGauss-Newton

algorithm thatrequires thecomputation of thefirst orderderivatives. They areapproximated usingfinite differencesrequiring O (n)

computations ofI(p).

minp

||ICCD − I(p)|| ,

where I(p) is the result of the forward problem given the pa-rameter p.

Since visible light is an electromagnetic wave with a wavelength be-tween 400 to 700 nm, its diffraction is described by Maxwell’s equa-tions. Using the time-harmonic assumption E(x,y, z, t) = E(x,y, z)e−iωt

the integral form of the equation is as follows (see [6, 18, 31]):

8 motivation

‹

∂Ω

E ds =1

ε0

˚

Ω

ρdV Gauss’s law

‹

∂Ω

B ds = 0 Gauss’s law for magnetism

˛

∂Σ

E dl = − iω¨

Σ

B ds Faraday’s law

˛

∂Σ

E dl = µ0

¨

Σ

J ds− µ0ε0iω¨

Σ

E ds Ampère-Maxwell law

where:

E: the electric field,B: the magnetic field,ρ: the electric charge density,ε0: the vacuum permittivity or electric constant,ω: the frequency,J : the electric current densityΩ: a fixed volume with boundary surface ∂Ω,Σ: a fixed open surface with boundary curve ∂Σ,¸

: denotes a closed line integral,‚: denotes a closed surface integral.

Solving the discretized Maxwell’s equation in the case of lightscattering at gratings requires the solution of a linear systemThe exact structure

and characteristicsof this linear systemis further described

in Chapter 4

Ax = b

which is the most expensive step of the forward problem.

To solve the inverse problem, multiple instances of the forwardproblem have to be solved in the optimization process. This furthermotivates the search for an efficient solution of the forward problemin general and the resulting linear system in particular.

1.2 other applications

Besides the mentioned application, Toeplitz, Toeplitz-like and mul-tilevel Toeplitz systems arise in a variety of mathematics, scientific

1.2 other applications 9

computing and engineering applications (see [10, 41]).

Some of these applications arise from the fact that a discrete convo-lution can be written as an matrix-vector product (MVP) between aToeplitz matrix and a vector:

Lemma 1.2.1: Discrete Convolution

Let h andx, be two vectors of size m and n respectively. Theconvolution h ∗x, can be computed by the MVP:

h1 0 . . . 0 0

h2 h1 . . ....

...

h3 h2 . . . 0 0... h3 . . . h1 0

hm−1... . . . h2 h1

hm hm−1...

... h2

0 0 . . . hm−1 hm−2...

...... hm hm−1

0 0 0 . . . hm

·

x1

x2

x3...

x2

The two-dimensional case will result in a two-level Toeplitz ma-trix, also called a BTTB matrix.

The problem of deblurring images is an example for such an appli-cation that stems from a discrete convolution.

1.2.1 Deblurring Images

A model for a blurred image b is usually written:

Ax = b , (1.1)

where x is the original image, A is the blurring matrix and b is theblurred image (see [23]).

The blurring can be described by a point spread function (PSF),such as the one in Figure 1.7. A PSF describes the response of theoptical system, e. g. the camera or lens, to a point source. Due to im-perfections in the camera or lens system, the intensity of the pointsource will be spread over multiple pixels, and image gets blurred.

10 motivation

Figure 1.7: Example for a PSF.A typical (Gaussian) point spread function (PSF) for a blurring problem.

The full blurring of an image with a given PSF is consequently de-scribed by a convolution of the image with the PSF. In the discretecase, this leads to the fact that the blurring matrix A has a two-levelToeplitz structure (in the two-dimensional case). Depending on theboundary condition, also Toeplitz-like matrices are possible (such asblock-circulant-circulant-block (BCCB) or matrices with a Hankel

structure [23, VIP 9.]). The resulting of such a blurring model can beseen in Figure 1.8.

(a) Original Image (b) Blurred Image

Figure 1.8: Blurring problem.Subfigure (b) is the result of a blurring with the PSF of Figure 1.7 (source

of original image: http://sipi.usc.edu/database/database.php?volume=misc&image=12#top).

http://sipi.usc.edu/database/database.php?volume=misc&image=12#top

http://sipi.usc.edu/database/database.php?volume=misc&image=12#top

1.2 other applications 11

To extract the original (sharp) image, the BTTB-system (1.1) hasto be solved. For applications in the field of image processing andrestoration and their relation to preconditioning, see also Kamm and

Nagy [29], Koyuncu [32], Lin and Zhang [35].

1.2.2 Further Applications

Other applications that include a Toeplitz, Toeplitz-like or multi-level Toeplitz system occur in areas such as (see for example [10, 22,23, 41, 45]:

• Numerical ordinary differential equation (ODE)s and partialdifferential equation (PDE)s

• Statistics• Signal processing and filtering• Control theory• Stochastic automata and neutral networks

2L I N E A R S Y S T E M S

This chapter aims at introducing the basic definitions and methodsregarding systems of linear equations. These concepts will be used insubsequent chapters.

A system of linear equations (or linear system) is given by

Ax = b , (2.1)

with the coefficient matrix A ∈ Cn×n, the right-hand side vector b ∈ Cn

and the unknown solution x ∈ Cn. This system has a unique solu-tion, if the coefficient matrix A is invertible (also called nonsingular).Consequently, the solution x∗ is then

x∗ = A−1b .

The solution can be calculated using direct methods such as the Gaus- An implementationof the Gaussian

elimination has acomplexity ofO(n3).

sian elimination. However, for larger systems, this is computationallyexpensive and iterative solvers are preferred [2].

2.1 iterative solvers

Given an initial guess x0, an iterative solver computes a sequence ofapproximations xk of the true solution x∗ until the residual rk =

Axk − b satisfies ||rk||||b|| < εtol .

2.1.1 CG Method

The CG method was proposed in the 1950s for symmetric and pos-itive definite coefficient matrices A [25]. It is the most prominent it-erative solver for large sparse systems [52] and the basis for a lot ofmore advanced and specialized algorithms.

A proof for theequality can befound in [52].

Solving (2.1) is equivalent to the following minimization problem:

minφ(x) def=1

2xTAx− b

Tx (2.2)

which means that if φ(x) becomes smaller with each iteration, wealso get closer to the solution x∗.

13

14 linear systems

φ(x) in (2.2) describes a quadratic function, where the exact formis described by the matrix A and the vector b (see Figure 2.1).

−10−5

05

10 −10−5

05

10

0

200

400

Figure 2.1: Minimization function.Example for a function φ(x) with the corresponding contour lines.

The gradient of the minimization problem is the residual of the lin-ear system:The solution x∗ of

the minimizationproblem fulfills the

necessary condition∇φ(x∗) = 0 ⇐⇒

Ax∗ = b .

∇φ(x) = Ax− b = r(x) .

The basic idea of the CG method is that the minimum of φ(x) canbe found by taking steps in the direction of the negative gradient atthe current point, i. e. xk+1 = xk − γ · ∇φ(xk) = xk − γ ·Ax+ γ · b .The stepsize γ can be chosen to minimize φ(xk+1) along the direc-tion of the gradient.

In contrast to the gradient descent method (also known as steepest de-scent method), the CG method does not directly use the gradient asthe descent direction, but insists that each new descent direction con-jugates all directions used before.Two vectors pi and

pk are conjugate(with respect to A),

if pT

iApk = 0 .Figure 2.2 compares the convergence of the gradient descent method

with the CG method. It is important to note that the form of thefunction φ - and therefore the matrix A - plays a vital role in theconvergence speed of the methods (as illustrated with Figure 2.2b).

2.1 iterative solvers 15

(a) Contour plot for the same functionφ as in Figure 2.1 with the conver-gence of the gradient descent (blue)and the CG method (green).

(b) Contour plot for a function φ re-sulting from a scaled identity ma-trix. Both methods would convergewithin a single step, regardless ofthe starting point.

Figure 2.2: Convergence of gradient descent and CG method for differentfunctions φ.

The closer the contours resemble circles, the faster the methods converge

2.1.1.1 Convergence Rate

Definition 2.1.1: Order of Convergence

Let a sequence xk converge to x∗, and ek = xk − x∗ be theerror at step k. If there exists a constant C > 0 such that for ap > 0:

limk→∞ ||ek+1||

||ek||p= C

is satisfied, then we say the order of convergence of xk is (at least)p.If p = 1 and C < 1 then the convergence rate is said to be linear.

The order of convergence allows a prediction to the number of iter-A sequence xkconverges to x∗ iflimk→∞ |xk − x∗| = 0 .

ations a solver needs until it reaches a satisfactory solution. A higherorder of convergence is preferred, since it means less computationtime.

The performance of the CG method for solving a linear systemdepends on the condition number κ(A) of its coefficient matrix A[41, 43]. For example:

16 linear systems

ek denotes theerror vector

ek = xk −x∗ ,where xk is the k-th

iterate of the CGmethod and x∗ is

the exact solution ofthe linear system.

Theorem 2.1.2: Convergence Rate of CG Method

If the coefficient matrix A of the linear system (2.1) is Hermitianpositive definite with the condition number κ(A), then the CGmethod converges in the following way:

||ek||

||e0||6 2

(√κ(A) − 1√κ(A) + 1

)k

This theorem implies linear convergence for the CG method. At thesame time, the CG method converges faster for linear systems witha smaller condition number. More precise convergence rates can bestated if the distribution of the eigenvalues ofA is known (see amongothers [41]). In the special case of clustered eigenvalues, we get:

Lemma 2.1.3: Convergence Rate for Clustered Spectrum

If the eigenvalues λk of A are such that

0 < δ 6 λ1 6 . . . 6 λi 6 1− ε 6 λi+1 6 . . .

6 λn−j 6 1+ ε 6 λn−j+1 6 · · · 6 λn

for a δ > 0 and 1 > ε > 0, then the CG method converges in thefollowing way:

||ek||

||e0||6 2

(1+ ε

δ

)iεk−i−j , k > i+ j

This implies that a matrix A with eigenvalues that are tightly clus-tered around 1 and away from 0, with only as few exceptions as pos-sible, is desirable. This relates to a function φ, with almost circularcontours (as shown in Figure 2.2b).

2.1.2 Other Methods

Besides the basic CG method, three other iterative solvers are impor-In general there areno convergence rates

known for theseadvanced methods.In Chapter 12 we

will comparedifferent

preconditioners bythe number of

MVPs they needuntil convergence.

tant to this work. All three do not impose any restriction on the coef-ficient matrix A, such as symmetry, and are therefore viable methodsfor solving the linear system described in Chapter 4.

• The biconjugate gradient stabilized (BiCGSTAB) method isan improved version of the biconjugate gradient (BiCG) method,a generalization of the CG method for nonsymmetric coefficientmatrices [53]. The method was described by van der Vorst [58].

2.2 preconditioning 17

• The induced dimension reduction (IDR) was invented by Wes-seling and Sonneveld [59].

• The generalized minimal residual (GMRES) was proposed bySaad and Schultz [49]. The main drawback of this method isits relatively high storage requirements [3].

2.2 preconditioning

The goal of preconditioning is to transform the original linear system(i. e. (2.1)) into a different linear system that has the same solutionx∗, but a better convergence rate.

This can be done by multiplying the whole linear system (2.1) withthe inverse of a preconditioner P , thus getting a left-preconditioned sys-tem:

P−1Ax = P−1b (2.3)

Since the linear system was multiplied by P−1 on both sides, the so-lution x∗ is still the same.

The right-preconditioned system:

AP−1y = b with P−1y = x (2.4)

has the same solution as well, which can be seen easily.

In order to have a better convergence rate, the preconditioner Pshould be chosen, so that the new linear system can be solved easily.This in general means that the left sides of the preconditioned linearsystems (P−1A for (2.3) and AP−1 for (2.4)) need to have a smallercondition number κ than in the original linear system [24].

For CG-like methods the distribution of the eigenvalues of P−1A

or AP−1 is also highly important, because it influences the conver-gence rate (see Section 2.1.1.1).

Note that in practice it is not necessary to compute the matrix-matrix product P−1A or AP−1, instead, the iterative solvers use thepreconditioner in each iteration in a MVP. This results in an extraMVP of a vector with P−1, compared to the original system. Theadditional costs of the extra MVP in each iteration, needs to be com-pensated by a faster convergence, i. e. less iterations.

Loosely speaking, if P is a good approximation of A the solverswill converge fast. If P = A, then the linear system can be solved

18 linear systems

in one step. However, the inversion of A in order to construct thepreconditioner is equal to solving the problem and computationallyexpensive. Therefore easily invertible approximations of A are inter-esting choices for P .

In conclusion, a perfect preconditioner should satisfy the followingconditions:

C1 P should be a good approximation of A

C2 P−1 should be easy to compute

C3 P−1 should be easily applied to a vector, i. e. the MVP P−1x

should be cheap to compute

P = I P = A

a) Hard Problem

P = I P = A

b) Medium Problem

P = I P = A

c) Easy Problem

Figure 2.3:Preconditionertrade off.

While C1 influences the total number of iterations needed, C2 de-termines the initial computation cost of the solver (the inverse is cal-culated exactly one time at the beginning). Additionally C3 decidesthe computational cost of each iteration.

In general, the closer P approximates A, the more complex it is tocompute its inverse and an MVP with it. Figure 2.3 illustrates thistrade off, between a heavily preconditioned system (close to P =

A) and a system without preconditioning (P = I). The blue linesymbolizes the number of iterations needed to solve the system, thegreen line the time needed in each iteration and the black line is theproduct of both and represents the total time needed. The optimalpreconditioner (marked with the dashed red line) that minimizes thetotal complexity of solving the system, depends on the complexity ofthe problem.

3T O E P L I T Z S Y S T E M S

Definition 3.0.1: Toeplitz Matrix

A matrix T (n) ∈ Cn×n, with constant entries along each diago-nal, i. e. of the form:

T (n) =

t0 t−1 . . . t−n+2 t−n+1

t1 t0 t−1. . . t−n+2

... t1 t0. . .

...

t−2+n . . .. . . . . . t−1

t−1+n t−2+n . . . t1 t0

,

is called a Toeplitz matrix.

Each entry tij of a Toeplitz matrix only depends on the difference In contrast togeneral n×nmatrices, aToeplitz matrix iswell defined by only2 ·n− 1 (ratherthan n2) entries tk.

of its indices i and j:

ti,j = ti+1,j+1 := ti−j = tk k = −n+ 1, . . . , 0, . . . n− 1

Example 3.0.2: Toeplitz matrix

T (4) =

9 1 3 6

2 9 1 3

4 2 9 1

1 4 2 9

is a Toeplitz matrix.

Definition 3.0.3: Toeplitz System

A Toeplitz system, is a linear system

Tx = b ,

where T is a Toeplitz matrix as defined in definition 3.0.1.

We can interpret T (n) as a principal submatrix of a ∞×∞ matrix A principalsubmatrix isobtained byremoving rows andcolumns with thesame indices from alarger matrix.

T (∞):

19

20 toeplitz systems

T (∞) =

. . . . . . . . . . . . . . . . . . . . .

. . . a0 a−1 a−2 a−3 a−4 . . .

. . . a1 a0 a−1 a−2 a−3 . . .

. . . a2 a1 a0 a−1 a−2 . . .

. . . a3 a2 a1 a0 a−1 . . .

. . . a4 a3 a2 a1 a0 . . .

. . . . . . . . . . . . . . . . . . . . .

We can further assume that the diagonal coefficients tk

∞k=−∞ of

this T (∞) matrix are the Fourier coefficients of a function f(x):

tk =1

2π

π

−π

f(x)e−ikx dx (3.1)

hence:

f(x) =

∞∑k=−∞ tke

ikx (3.2)

We call f(x) the generating function of T (∞), as well as any princi-pal submatrix T (n) (of size n× n). In the same way, T (n)[f], is theToeplitz matrix T (n) ∈ Cn×n induced by the generating functionf(x).

In many practical problems, the generating function is usually givenfirst, not the corresponding Toeplitz matrices [10, 41]. This is for ex-ample true for:

• Numerical differential equations, where the equation gives f• Filter design, where the transfer function gives f• Image restoration, where the blurring function gives f

In the main application of this thesis (see Section 1.1 this is alsotrue. The coefficient matrix is a result of the grating’s geometry andthe refractive index of the grating’s materials (see also Chapter 4).

One of the special properties of Toeplitz matrices is that we cancompute an MVP with T (n) in only O (2n log 2n) operations, us-ing the Fast Fourier transform (FFT). This works by embedding theToeplitz matrix in a circulant matrix of twice the size, and then com-puting the MVP (see Section 8.1 for how this is done).

Another important aspect of a Toeplitz matrix is that while itsinverse is not Toeplitz, it can be factorized by Toeplitz matrices,according to the Gohberg–Semencul formula [20].

3.1 multi-level toeplitz matrices 21

Theorem 3.0.4: Gohberg–Semencul Formula

If the Toeplitz matrix T ∈ Rn×n is such that each of the systemsof equations

Tx = e1

Ty = en

is solvable and the condition x1 6= 0 is fulfilled, then the ma-trix A is invertible, and its inverse is formed according to theformula

T−1 = x−11

(Lower(x)Lower(Jy)T − Lower(Z0y)Lower(Z0Jx)T

)where Lower(x) denotes a lower triangular Toeplitz matrix withx as the first column, J is the anti-diagonal matrix with ones onthe anti-diagonal and zeros everywhere else andZ0 = Lower(e2).

3.1 multi-level toeplitz matrices

We can also define matrices, that possess a Toeplitz structure on oneore more levels of a matrix, therefore defining multi-level Toeplitz

matrices. We can first define

Definition 3.1.1: Toeplitz-Block Matrix

A block matrix T (m;n)(TB) ∈ Cm·n×m·n, where each of the m×m

blocks is a n× n Toeplitz matrix, is called a Toeplitz-block ma-trix. It has the form:

T(m;n)(TB) =

T1,1; T1,2; . . . T1,m;

T2,1; T2,2; . . . T2,m;...

.... . .

...

Tm,1; Tm,2; . . . Tm,m;

, where Ti,j; is Toeplitz .

not to be confused with:

22 toeplitz systems

Definition 3.1.2: Block-Toeplitz Matrix

A block matrix T (m;n)(BT) ∈ Cm·n×m·n, of the form:

T(m;n)(BT) =

A0; A−1; . . . A1−m;

A1; A0; . . . A2−m;...

.... . .

...

Am−1; Am−2; . . . A0;

,

where Ak is arbitrary.

A combination of the definitions 3.1.1 and 3.1.2 is:

A BTTB matrix issometimes also

called a two-levelToeplitz matrix.

Definition 3.1.3: BTTB Matrix

A block-Toeplitz matrix, where the blocks Tk; are themselvesToeplitz matrices, is called a block-Toeplitz-Toeplitz-block (BTTB)matrix:

T(m;n)(BTTB) =

T0; T−1; . . . T1−m;

T1; T0; . . . T2−m;...

.... . .

...

Tm−1; Tm−2; . . . T0;

,

where Tk is Toeplitz.

Example 3.1.4: BTTB matrix

T(4;3)(BTTB) =

4 2 7 8 6 5 1 0 6 9 2 1

6 4 2 3 8 6 3 1 0 3 9 2

8 6 4 8 3 8 6 3 1 4 3 9

7 3 1 4 2 7 8 6 5 1 0 6

1 7 3 6 4 2 3 8 6 3 1 0

9 1 7 8 6 4 8 3 8 6 3 1

6 0 9 7 3 1 4 2 7 8 6 5

3 6 0 1 7 3 6 4 2 3 8 6

0 3 6 9 1 7 8 6 4 8 3 8

4 3 2 6 0 9 7 3 1 4 2 7

2 4 3 3 6 0 1 7 3 6 4 2

3 2 4 0 3 6 9 1 7 8 6 4

A BTTB matrix corresponds to a generating function with two vari-

ables , i. e.The elements of aBTTB matrix are

the Fourier

coefficients of abivariate function.

3.2 circulant matrices 23

tk;l =

(1

2π

)2 π

−π

π

−π

f(x,y)e−i(kx+ly) dxdy

hence:

f(x,y) =∞∑

k=−∞∞∑

l=−∞ tk;lei(kx+ly)

instead of (3.1) and (3.2) respectively.

3.2 circulant matrices

A special kind of Toeplitz matrix is the circulant matrix, where each Block-circulant,circulant-block andBCCB matrices canbe defined similarlyto definitions3.1.1,3.1.2 and 3.1.3

column is a circular shift of its preceding column :

C =

c0 c1 . . . cn−2 cn−1

cn−1 c0 c1. . . cn−2

... cn−1 c0. . .

...

c2 . . .. . . . . . c1

c1 c2 . . . cn−1 c0

A circulant matrix is fully defined by just n coefficients. In Chapter 8

further useful properties are described and used.

3.3 hankel

A related matrix type is the Hankel matrix. It is basically an upside- We can again defineblock-Hankel,Hankel-block andblock-Hankel-Hankel-block(BHHB) matricesand evencombinations fromall the matrix typesabove, such as ablock-Hankel-Toeplitz block(BHTB) matrix.

down Toeplitz matrix, which means that it has constant values alongits anti-diagonals:

H =

h−n+1 h−n+2 . . . h−1 h0

h−n+2... h−1 h0 h1

...... h0 h1

...

h−1 h0 h1... hn−2

h0 h1 . . . hn−2 hn−1

Analogously, to the Toeplitz matrices, a Hankel matrix is de-

scribed by 2n− 1 coefficients.

4P R O B L E M D E S C R I P T I O N

This chapter describes the full problem arising from the motivationillustrated in Chapter 1. Then two closely related and reduced prob-lems are derived that will be the main topic of this work.

4.1 full problem

As described in Section 1.1.1, a numerical solution of Maxwell’sequations describing light scattering on a 2D-periodic structure (thegratings) requires the solution of the linear system

Ax = b , (4.1)

with A ∈ Cn×n the coefficient matrix, b ∈ Cn the right-hand side andx ∈ Cn the unknown solution.

It is known, that A can be decomposed via:

A = C −GM ,

where C,G,M ∈ Cn×n are sparse matrices. Since the sparsity pat- In a sparse matrixmost elements arezero. If mostelements arenon-zero, the matrixis called dense.

terns of G and M are complementary, A is a dense matrix (see Fig-ure 4.1). Note that the matrix M has an identical structure as C.

= − ·

A C G M

= − ·

A C G M

Figure 4.1: Sparsity patterns of the matrices C,G and G as well as the re-sulting matrix A.

25

26 problem description

The sparsity pattern of C is shown in more detail in Figure 4.2.

0 2000 4000 6000. . .

...

6000

4000

2000

0

C1;

C2;

C3;

CNz;

Figure 4.2: Sparsity pattern of C.

Each block Cl; of C is of the form :A matrix of the formof Cl; can be called aBTTB-block matrix.

Cl; =

Cl;1,1; Cl;1,2; Cl;1,3;

Cl;2,1; Cl;2,2; Cl;2,3;

Cl;3,1; Cl;3,2; Cl;3,3;

, (4.2)

where each block Cl;i,j is a BTTB matrix with the following generat-ing function, FCl;i,j :

FCl;i,j(x,y) = ni(x,y)nj(x,y)(

εb

ε(x,y)− 1

)+ δij for i, j ∈ x,y, z

where ε(x,y) are the piecewise constant material properties and ni(x,y),nj(x,y) are the normal-vector fields, which are piecewise constant forpolygonal shapes.

Additionally, each Cl; is symmetric on a block level, i. e. Cl;i,j =

Cl;j,i .

4.1 full problem 27

The structure of matrix C on each level is shown in color plots inFigure 4.3 and is summarized in Table 4.1.

Figure 4.3: Color plots of all levels of C.The image shows color plots of each level of C. On the top level, the

block-diagonal structure can be seen easily. The next image shows the 3× 3symmetry that the matrix possesses on the next level. The bottom twolevels each possess a Toeplitz structure, which is clearly visible in the

color plots.

Table 4.1: Structure of C on each level, from highest (level Z) to lowest (levelX).

Level Structure

Z DiagonalA 3× 3 SymmetricY Toeplitz

X Toeplitz

BecauseC is sparse and an approximation ofA,C−1 is a good can-didate for a preconditioner. In fact, preliminary investigations haveshown that by choosing C−1 as a preconditioner, the number of it-erations of the solver can be drastically reduced in comparison tochoosing no preconditioner at all (which is equivalent to choosingthe identity I as a preconditioner), see Table 4.2.

However, computing the inverse C−1 as well as an MVP with it, isquite expensive. Since the inverse of a BTTB matrix is not BTTB, theMVP of C−1 has a cost of O

(n2). This is computationally expensive

in comparison to the cost of an MVP of the unpreconditioned system(here it is an MVP with C and therefore a cost of O (n logn)).

Therefore approximations of C with a cheaper inverse and MVPare wanted. Nevertheless, usingC without any changes is still a possi-

28 problem description

Table 4.2: Convergence rates (number of iterations) for (4.1) using IDR(6).

Case P = I P = C

Case 1a 732 225

Case 1b 297 102

Case 2a > 99998 381

Case 2b > 99998 3288

Case 3a 275 130

Case 3b 1817 212

Case 4 85135 310

ble preconditioner and an option for harder problems, see Section 2.2.

So the full problem P1 can be defined as following:Note that our fullproblem only uses C

as the coefficientmatrix. In

Section 11.4, we willdiscuss possible

ways to include Gand M into the

preconditioner, for abetter adaptation tothe complete matrix

A.

P1 Find a good preconditioner (fulfilling C1 to C3) for a systemCx = b, with C as defined in (4.2).

4.2 bttb-block system

Using C as a preconditioner requires computing the inverse C−1.Since C is a block diagonal matrix, the inverse is given by

C−1 =

C−11; 0 . . . . . . 0

0 C−12;

. . ....

.... . . C−1

3;. . .

......

. . . . . . 0

0 . . . . . . 0 C−1Nz;

This means, if we solve the following reduced problem P2, we will

also solve the full problem.

P2 Find a good preconditioner (fulfilling C1 to C3) for a systemDx = b , with

Dx = b ,

where

D =

D1,1; D1,2; D1,3;

D2,1; D2,2; D2,3;

D3,1; D3,2; D3,3;

4.3 bttb system 29

and each Di,j; ∈ Cn×n is a BTTB matrix.

Furthermore, the matrix D is symmetric on the block level, i. e.

Di,j; =Dj,i; .

A solution to this reduced problem P2 can be used for other applica-tions, as the one described in Chapter 1.

4.3 bttb system

A simplification of the previous problem P2 is the following:

P3 Find a good preconditioner (fulfilling C1 to C3) for a systemDx = b, with

Ex = b ,

where E ∈ Cn×n is a BTTB matrix with a piecewise constantgenerating function.

Solving this problem can be an important step on the path to solv-ing the full problem P1 and can be used for different applications,some of which are mentioned in Section 1.2.

5T H E S I S O V E RV I E W

Chapter 1 introduces the main application which motivates the re-search subject of this thesis. The metrology of integrated circuits (ICs)requires the solution of a high-dimensional dense linear system. Thetime needed to solve such a system with an iterative solver can bereduced by applying a preconditioner to the linear system. Besidesthis main application, further applications of this thesis are described,such as the deblurring of images.

Chapter 2 gives an introduction into the mathematics of linear sys-tems, iterative solvers and the field of preconditioning. Additionally,three conditions for a good preconditioner are derived.

In Chapter 3, the special structure of Toeplitz systems is described.The concept of an associated generating function is explained andspecial properties of Toeplitz systems are mentioned.

Chapter 4 focuses on the mathematical description of the main ap-plication mentioned in the first chapter. The special structure of thelinear system is illustrated. Two linear systems with a simpler struc-ture are introduced that will be considered as intermediate problems.

After this short outline of the thesis and its content, Chapter 6 willstart off the part about different preconditioning techniques. In thischapter, a quick overview of the preconditioners described in the fol-lowing is given.

The first preconditioner is described in Chapter 7. The completematrix C is considered as a preconditioner and evaluated.

In Chapter 8, different circulant approximations of the Toeplitz

structures are considered. This approximation is described in the con-text of Toeplitz, multi-level Toeplitz and Toeplitz-block matrices.

Chapter 9 considers T [1/f] as a preconditioner for T [f]. This pre-conditioner is then generalized to the Toeplitz-block and the BTTB-block case. A proof is provided, that if F is HPD, the spectrum ofT [F−1]T [F ] is clustered. The chapter closes by proposing a regular-ization for cases where F is not HPD.

31

32 thesis overview

Chapter 10 describes the Kronecker product approximation thatapproximates a matrix by a sum of Kronecker products. Several op-tions to adapt this method to the BTTB-block case are suggested.Additionally, the relationship between this method and the generat-ing function is illustrated.

Chapter 11 collects several additional preconditioning ideas anddescribes them briefly.

Chapter 12 compares the performance of the suggested precondi-tioners for several test cases by comparing the required number ofiterations until convergence.

Suggestions for future investigations are presented in Chapter 13.This includes aspects of different topics that could not be analyzed inthe limited time available for this thesis.

Finally, Chapter 14 summaries the main results of this work andpresents conclusions.

Part II

P R E C O N D I T I O N E R S

The following part describes various techniques for ap-proximating (multi-level) Toeplitz matrices. Each approx-imation method will be explained and discussed in termsof their applicability for the presented case.If necessary, changes and generalizations are made to adaptit to the main application of this work. They are explainedalong with an examination of the complexities of each pre-conditioner, in terms of inversion and MVP with its in-verse.

6O V E RV I E W O V E R T H EP R E C O N D I T I O N I N GT E C H N I Q U E S

This whole part will discuss and propose several preconditioners thatcan be applied to the full problem described in Section 4.1. The pre-conditioners in this part are proposed for general Toeplitz, BTTBand BTTB-block systems, to make sure the results obtained in thispart can be used in general.

Table 6.1 illustrates the different preconditioning techniques and ifthey can be applied to a simple Toeplitz matrix, a BTTB matrix and aBTTB-block matrix. The table shows the applicability of different pre-conditioning methods for Toeplitz, BTTB and BTTB-block systems.A black check mark denotes this method has been used in literaturebefore, while the symbol of a light bulb denotes the application ofthis method was done in this work. A cross marks means that dur-ing this work no way of applying this method for this case could befound or was not considered.

Table 6.1: Applicability of different preconditioning methods.

PreconditioningTechnique

Toeplitz matrix BTTB matrix BTTB-block matrix

Circulant 3 3 3

DST I - IV 3 3 3

DCT I - IV 3 3 3

Hartley 3 3 3

Diagonal 3 3 3

Banded with bandwidth > 2 7 7 7

Inverse Generating Function 3 3

Kronecker

s = 1 - 3

s > 2 - 7 7

with approximate SVD 3 3

Koyuncu - 3 7

35

36 overview over the preconditioning techniques

To simplify the notation of the subsequent chapters, T(Toep) will de-note a simple Toeplitz matrix of size Nx ×Nx. T(BTTB) will denotea BTTB matrix, with Ny ×Ny blocks, each of size Nx ×Nx. Addi-tionally, T(Block) denotes a 3× 3 block matrix, whose blocks are BTTBmatrices, with Ny ×Ny blocks, each of size Nx ×Nx.

7F U L L C P R E C O N D I T I O N E R

The first and most obvious choice is to use the complete matrix C asa preconditioner (P = C). Compared to the subsequent precondition-ers, that are based on C but are approximations of it, this choice isthe best in terms of reducing iterations.

However, each iteration is computationally expensive, as is the in-version of C. Nevertheless, using the complete matrix C can be agood choice for hard problems and is additionally an interesting ref-erence point for the subsequent preconditioners.

7.1 application to full problem

Since C is used as a full matrix, without any changes, this approachis directly usable to the full problem.

7.1.1 Inversion

An exact inversion of a BTTB-block matrix can done using the Gaus-sian elimination algorithm. However, the complexity is O

((NxNy)

3Nz).

So far, no exact inversion formulas with smaller complexity are knownfor BTTB-block matrices.

7.1.2 MVP

Since the inverse of a BTTB matrix is not BTTB, C−1 does not pos-sess any structure that can be used for an optimized MVP. Thismeans that the complexity of the MVP with C−1 is O

((NxNy)

2Nz)

37

8C I R C U L A N TA P P R O X I M AT I O N

8.1 circulant approximation for toeplitz

matrices

It is known, that a Toeplitz matrix T can be approximated well by a This relates tocondition C1 of agood preconditioner.

circulant matrix C [10, 12, 14, 54, 57].

Additionally, it is well known [10, 17] that any circulant matrixC ∈ Cn×n can be diagonalized, such that

C =(F (n)

)HΛ(n)F (n) , (8.1)

where F (n) is the Fourier matrix of order n, i. e.(F (n)

)j,k

=1√ne2πijkn

and Λ(n) = diag(F (n)c) , where c is the first column of C, is a diag- diag(A) denotes adiagonal matrix,were the diagonal isequal to the diagonalof A.

onal matrix holding the eigenvalues of C.

Via this decomposition it is easy to compute the inverse of a circu-lant matrix as follows:

C−1 =(FHΛF

)−1= F−1Λ−1

(FH

)−1= FHΛ−1F .

Therefore, the inversion of an n × n circulant matrix can be done This relates tocondition C2 of thepreconditioner.

efficiently in O (n logn).In the same way, the MVP can be computed in O (n logn):

This satisfiescondition C3.C−1x = FHΛ−1Fx = FHdiag(Fc)−1Fx .Since most FFTalgorithms rely uponthe factorization ofn, the complexityincreases, if n isprime. However,specializedalgorithms havebeen developed thatguarantee acomplexity ofO (n logn) evenwhen n is prime,e.g., [47].

The mentioned properties of circulant matrices make them a suit-able choice as preconditioners. There are different methods of approx-imating a Toeplitz matrix with a circulant one, that will be describedin the following section.

39

40 circulant approximation

8.1.1 Circulant Preconditioners

In this work, we will look at three different circulant preconditioners,each minimizing a certain norm:

• Strang’s preconditioner CS(T ) [54]: Minimizing ‖C − T ‖1 forC Hermitian and circulant, if T is Hermitian.

• T. Chan’s (optimal) preconditioner CC(T ) [14]: Minimizing ||C−

T ||F.

• Tyrtyshnikov’s (superoptimal) preconditioner CT (T )[57]: Min-imizing ||I −C−1T ||F.

Each preconditioner will be discussed in the following sections.

8.1.1.1 Strang’s Preconditioner

Strang’s preconditioner [54] uses basically the first half of the firstrow of the original Hermitian Toeplitz matrix T and completes therest of the first row with a flipped version of it, to make it circular. Inother words:

CS(T ) = circ([

(T1j)bn/2c+1j=1 transpose(flip((Ti1)n−k+1i=2 ))

]),

whereCS(T ) denotes Strang’s preconditioner for the Hermitian Toeplitz

matrix T , and circ(x) the circulant matrix, defined by x as the firstrow.

Example 8.1.1: Strang’s preconditioner

Take T from Example 3.0.2, T =

9 1 3 6

2 9 1 3

4 2 9 1

1 4 2 9

, then

CS(T ) = circ((9 1 3 2

)) =

9 1 3 2

2 9 1 3

3 2 9 1

1 3 2 9

.

Incidentally, CS(T ) minimizes ||C−T ||1 and ||C−T ||∞ over all Her-mitian circulant matrices C, for Hermitian matrices T [12].

It can be shown that CS(T )−1T has a clustered spectrum (see forexample Chan and Jin [10]), thus resulting in a fast convergence ofthe preconditioned CG method.

8.1 circulant approximation for toeplitz matrices 41

8.1.1.2 T. Chan’s Optimal Preconditioner

Another very popular choice is Chan’s optimal preconditioner CC[14], which minimizes ||A−C ||F over all circulant matrices C for an Chan’s

preconditioner issometimes denotedcF(T ), highlightingits relation to thediscrete Fourier

transform.

arbitrary matrix A.

Using the decomposition from (8.1), we get:

||A−C ||F = ||A−FHΛF ||F = ||FAFH −Λ||F ,

which is minimized for Λ = diag(FAFH) and therefore

Cc(A)(8.1)= FHdiag(FAFH)F .

If A is a Toeplitz matrix, the entries of the first row of CC(A) canbe given explicitly by

ci =ia−n+i + (n− i)ai

ni = 0, . . . ,n− 1 ,

which is equal to averaging the corresponding diagonal of A [10,Eq. (2.6)].

Important properties of CC(A) are that it inherits the positive-definitenessof A and that again the spectrum of CC(T )−1T is clustered [10, 57].

Example 8.1.2: Chan’s (optimal) preconditioner

Take again T from Example 3.0.2, T =

9 1 3 6

2 9 1 3

4 2 9 1

1 4 2 9

, then

CC(T ) =

9 1 3.5 3

3 9 1 3.5

3.5 3 9 1

1 3.5 3 9

.

8.1.1.3 Tyrtyshnikov’s Superoptimal Preconditioner

Similar to Chan’s preconditioner, we can try to minimize ||I−C−1A||Fover all circulant matrices C for an arbitrary matrix A.


It can be shown [10], that such a preconditioner is related to Chan’spreconditioner by

CT (A) = CC(AAH)CC(A

H)−1 .

Example 8.1.3: Tyrtyshnikov’s (superoptimal) preconditioner

Take again T from Example 3.0.2, T =

9 1 3 6

2 9 1 3

4 2 9 1

1 4 2 9

, then

CT (T ) =

9.42 0.8142 3.3981 3.0040

3.0040 9.42 0.8142 3.3981

3.3981 3.0040 9.42 0.8142

0.8142 3.3981 3.0040 9.42

.

Similar to the previous preconditioners, CT (T )−1T proves to havea clustered spectrum [10].

8.2 circulant approximation for bttb ma-trices

The goal of this section is to generalize the method of circulant ap-proximations to BTTB matrices. Each level will be handled separately,starting with the upper level, which is equivalent to a block-Toeplitz

matrix.

A BTTB matrix can then be approximated with a BCCB matrix, byapplying the Toeplitz-block and the block-Toeplitz approximationsconsecutively. Chan and Jin [9] showed, that approximating bothlevels separately with Chan’s optimal preconditioner is equivalentto solving

minC(BCCB)

‖T(BTTB) −C(BCCB)‖F ,

if C(BCCB) is BCCB.

8.2 circulant approximation for bttb matrices 43

8.2.1 Toeplitz-block Matrices

A Toeplitz-block matrix is of the form

T(m;n)(TB) =

T1,1; T1,2; . . . T1,m;

T2,1; T2,2; . . . T2,m;...

.... . .

...

Tm,1; Tm,2; . . . Tm,m;

,where Ti,j; is Toeplitz

i, j = 1, 2, . . . ,m.

Therefore, a natural choice is to approximate each Toeplitz matrixTi,j; with its circulant approximation C(Ti,j;) [15].

The resulting matrix is a matrix of the form

C(T(m;n)(TB) ) =

C(T1,1;) C(T1,2;) . . . C(T1,m;)

C(T2,1;) C(T2,2;) . . . C(T2,m;)...

.... . .

...

C(Tm,1;) C(Tm,2;) . . . C(Tm,m;)

.

Figure 8.1 shows an example of a Toepltz-block matrix and itsapproximation with a circulant-block matrix.

(a) Color plot for a sample Toeplitz-block matrix, with 6× 6 blocks eachof size 4× 4. The present Toeplitz

structure is clearly visible.

(b) Color plot for the circulant ap-proximation of Figure 2.2b. TheToeplitz structure of each blockhas been approximated by a circu-lant one.

Figure 8.1: Color plots for a Toeplitz-block matrix and its circulant-blockapproximation.

8.2.2 Block-Toeplitz Matrices

The case of a block-Toeplitz matrix can be handled identically, aftertransforming it into a Toeplitz-block matrix first.


A block-Toeplitz matrix has the form

T(m;n)(BT) =

A0; A−1; . . . A1−m;

A1; A0; . . . A2−m;...

.... . .

...

Am−1; Am−2; . . . A0;

, where Ak; is arbitrary .

We can now define a permutation matrix P such that(T

(n;m)(TB)

)k,l;i,j

:=(PT

(m;n)(BT) PH

)k,l;i,j

=(T

(m;n)(BT)

)i,j;k,l

.

That means that after applying the permutation

T(TB) = PT(BT)PH , (8.2)

a block-Toeplitz matrix T(BT) will become a Toeplitz-block matrixT(TB). Note that the number of blocks and the size of the blocks willbe swapped between those two matrices.

This means that a suitable approximation method for block-Toeplitz

matrices T(TB) can be done via the following steps:

1. Transformation: Transform the block-Toeplitz matrix to a Toeplitz-block matrix by applying the permutation defined in (8.2): T(TB) =

PT(BT)PH .

2. Toeplitz-block approximation: Apply the approximation C(T(TB))

as defined in Section 8.2.1.

3. Back transformation: Transform the solution from the previousstep back, by applying: C(T(BT)) = P

HC(T(TB))P .

The result of such an approximation is illustrated in Figure 8.2.

(a) Color plot for a sample block-Toeplitz matrix, with 4× 4 blockseach of size 6× 6.

(b) Color plot for the circulant approx-imation of Figure 8.2a.

Figure 8.2: Color plots for a block-Toeplitz matrix and its block-circulantapproximation.

8.3 application to bttb-block matrices 45

8.3 application to bttb-block matrices

As stated in the previous sections, we can approximate each BTTBmatrix, by a BCCB one. The resulting matrix is therefore BCCB-block. As the following two sections will describe, no further approxi-mation on the 3× 3 symmetric level is required, to achieve an efficientinversion and MVP in that case.

8.3.1 Inversion

To compute this, we can use the block-inversion formulas. Let A bea 3× 3-block matrix, where each of the blocks are BCCB matrices ofsize NxNy ×NxNy, i. e. ,

A =

A1,1; A1,2; A1,3;

A2,1; A2,2; A2,3;

A3,1; A3,2; A3,3;

where Ai,j; is a BCCB matrix .

This 3× 3 matrix can be condensed into a 2× 2 matrix, on which the For the full problemdiscussed inSection 4.1 thematrix A describedhere is equal to oneof the blocks of C.

known block inversion formula (see for example (2.8.25) in [5]) canbe used.

A =

(W Q

R S

),

where

W =

(A1,1; A1,2;

A2,1; A2,2;

),

Q =

(A1,3;

A2,3;

),

R =(A3,1; A3,2;

),

S =(A3,3;

).

Then by applying the block inversion formula, one gets

A−1 =

(W Q

R S

)−1

=

(W−1 +W−1QZRW−1 W−1QZ

−ZRW−1 Z

),

where

Z = (S −RW−1Q)−1


is the Schur complement.

The computation requires W−1, which can be computed by apply-ing the block inversion formula again, this time on W , thus

W−1 =

(A1,1; A1,2;

A2,1; A2,2;

)−1

=

(A−11,1; +A

−11,1;A1,2;Z

′A2,1;A−11,1; A−1

1,1;A1,2;Z′

−Z ′A2,1;A−11,1; Z ′

),

where

Z ′ = (A2,2; −A2,1;A−11,1;A1,2;)

−1

is the Schur complement.

Thus computing A−1 requires the computation of Z and Z ′. SinceThe fact that BCCBmatrices form an

algebra can bechecked easily by

using thedecomposition via

the Fourier

transformation, see(8.1)

BCCB matrices form an algebra, Z and Z ′ are BCCB matrices as well.Thus their computation can be done efficiently in the preprocessingphase.

The total computation of A−1 requires multiple multiplications,inversions and additions of BCCB matrices, so the total costs areO ((NxNy) log(NxNy)).

8.3.2 MVP

As seen in the previous chapter, if A is a BCCB-block matrix, so isA−1. The MVP of a BCCB matrix with can be computed inO ((NxNy) log(NxNy)).

9I N V E R S E G E N E R AT I N GF U N C T I O N A P P R O A C H

As described in Chapter 3 a Toeplitz matrix can be associated witha generating function (see (3.2)). It has been shown ([11], [34]) thatunder certain assumptions for f, T [1/f] is a good preconditioner forT [f].

The first section will describe the approach using the inverse gen-erating function (IGF) for Toeplitz matrices. However, the methodworks almost identically for multi-level Toeplitz matrices, such asBTTB matrices. For the BTTB case, the main difference is that thegenerating function will be bivariate.

The subsequent section will focus on the generalization for theToeplitz-block and BTTB-block case. It will also include a proof thatthe eigenvalues of the preconditioned system in that case are clus-tered as well.

9.1 inverse generating function for toeplitz

and bttb matrices

In many applications, a generating function is given and from this, In the case of aBTTB matrix, thegenerating functionis bivariate, asshown in Figure 9.1.However theprocedure is almostidentical, whether aToeplitz or aBTTB matrix isused.

the corresponding Toeplitz or BTTB matrix is computed (see up-per arrow a) in Figure 9.1). Computing the inverse of the two-levelToeplitz matrix generated by f(x,y) is expensive and therefore a stepthat should be avoided (see the downwards arrow on the right sidein Figure 9.1).

Instead the inverse of the generating function can be computed(step b)) and the Toeplitz matrix generated by 1/f can be computed.As shown by Chan and Ng [11] and Lin and Wang [34], this matrixis a good approximation of T [f]−1 if certain conditions are met. Under the

assumption thatf ∈ C2π is positive,Chan and Ng [11]have shown that theeigenvalues ofT [1/f] T [f] areclustered aroundone, which(indirectly) satisfiescondition C1 for agood preconditioner.Lin and Wang

[34] showed thesame for BTTBmatrices.

The IGF approach works in three steps, if the computation of T [f]from f is included. The steps are illustrated in Figure 9.1 and are

A Building T [f] by computing the Fourier coefficients of f (see(3.1) or 3.1).

B Computing the inverse of f.

47

48 inverse generating function approach

C Computing the matrix generated by 1/f i. e. , T [1/f].

f(x,y) T [f(x,y)]

T [f(x,y)]−1

1/f(x,y) T [1/f(x,y)]

Fourier TransformationA

Inversion B

Fourier TransformationC

≈

Figure 9.1: Illustration of the inverse generating function approach (markedin red).

In the next sections, the required steps A to C for the IGF ap-proach, will be replaced, with numerical alternatives (Ã to C). Thisis necessary in cases when the analytical way can not be used or istoo computationally expensive. In the cases relevant to this work, allthree alternatives have to be applied, in order to implement this ap-proach (see Figure 9.4). Note that all the suggested alternatives canbe computed efficiently, with the use of the FFT

9.1.1 Unknown Generating Function

In some cases, the generating function is not (explicitly) known, butonly the (multilevel) Toeplitz matrix. In these cases, the starting pointof the IGF method is the matrix itself. Therefore, the direction of stepA has to be reversed (see Figure 9.2) and consequently the first stephas to be

Ã Approximate the (actual) generating function, with an f, usingthe matrix elements.

This can be done in a variety of ways, the most straightforward waybeing

f =

Nx−1∑k=−(Nx−1)

tkeikx , (9.1)

for an Nx ×Nx (one level) Toeplitz matrix. This approach can beeasily generalized to BTTB and other multilevel Toeplitz matrices.Computing the approximation f as in (9.1) is equivalent to

f = DNx−1 ∗ f ,

9.1 inverse generating function for toeplitz and bttb matrices 49

where DNx−1 denotes the Dirichlet kernel of order Nx− 1 and f theexact generating function [7, pp. 1011–1016].

f(x,y) T [f(x,y)]

T [f(x,y)]−1

1/f(x,y) T [1/f(x,y)]

Approximation

Ã

Inversion B

Fourier TransformationC

≈

Figure 9.2: Illustration of the inverse generating function approach for un-known generating functions, with the changes marked in red.

Using alternative kernels that have the specific form f = K ∗ f =Nx−1∑

−(Nx−1)

bkeikx, for some coefficients bk, such as the Fejér kernel [7,

pp. 1016–1020], are alternative choices and worth investigating in thefuture.

If the used kernel is of the form above, then f can be computedusing an FFT in O (Nx logNx) operations.

9.1.2 Numerical Integration for Computing the Fourier Coefficients

In general, the Fourier coefficients of 1/f can not be computed an-alytically, which is why a numerical alternative is described in thissection (see Figure 9.3).


f(x,y) T [f(x,y)]

T [f(x,y)]−1

1/f(x,y) T [1/f(x,y)]

Fourier TransformationA

Inversion B

Numerical IntegrationC

≈

Figure 9.3: Illustration of the inverse generating function approach with nu-merical Integration (highlighted in red).

In order to get the elements of T [1/f] the following integral (in thesimple one level Toeplitz case) has to be computed

1

2π

π

−π

1

f(x)e−ikx dx .

Instead of an analytical integration, this can be transformed using therectangular (also called mid-point) rule, into a numerical integration,that requires only point evaluations of f(x), i. e.

1

sNx

sNx−1∑j=0

1

f(2π j/sNx − π)e−ik(2π j/sNx−π) , (9.2)

where s is the sampling rate of the rectangular rule. In general, moreaccurate (and more complicated) methods of numerical integrationcould be applied, such as higher order Newton-Cotes formulae [8,Sec. 4.3].

For the rectangular rule, the numerical integration step can againbe computed using the FFT of order sNx

9.1.3 Numerical Inversion of the Generating Function

Sometimes the generating function f is not given explicitly, but onlyallows for computationally efficient function evaluations. However, ifwe use both numerical alternatives from the previous sections, func-

9.1 inverse generating function for toeplitz and bttb matrices 51

tion evaluations are sufficient to compute T [1/f] (see Figure 9.4). There-fore, we can change (9.1) to

fgrid(xj) =

n−1∑k=−(n−1)

tkeikxj ,

where xj = (2π j/sn− π) one of the sampling points. Thus, 1/fgrid canbe computed by inverting pointwise, and be used in (9.2).

f(x,y) T [f(x,y)]

T [f(x,y)]−1

1/f(x,y) T [1/f(x,y)]

Approximation

Ã

Pointwise Inversion B


≈

Figure 9.4: Illustration of the inverse generating function approach using asampled generating function.

9.1.4 Example

Figure 9.5 shows an example for the result of the IGF method. Thetop image shows the inverse of the original BTTB matrix, while thebottom figure shows the BTTB matrix generated by the inverse gen-erating function. It should be easily visible, that in this case bothmatrices are quite similar.

9.1.5 Efficient Inversion and MVP

Besides being a good approximation, which was shown in [11] and[34], the preconditioner also needs to fulfill the conditions regardingthe inversion and its MVP. Condition C2 is satisfied, since no in-verse needs to be computed. As shown in the previous sections allthe steps of the inverse generating function method can be computedefficiently using FFTs.

The last condition C3 requires an efficient MVP with T [1/f], which issatisfied trivially since it has the same structure as T [f] and thereforethe same costs in an MVP. For a BTTB matrix with Ny ×Ny blocks,


T [f]−1

T [1/f]

T [f]−1 − T [1/f]

Figure 9.5: Color plots for the inverse of the original BTTB matrix, T [f]−1,the result of the inverse generating function method T [1/f] andthe difference between those two.

each of size Nx ×Nx, the costs for an MVP are O (NxNy logNxNy).

9.2 inverse generating function for bttb-block matrices

In this section, the IGF approach will be generalized to (multi-level)Toeplitz-block matrices. First, the general approach of the IGF methodfor block matrices will be described. Subsequently, a proof is pro-vided, showing that the eigenvalues of the preconditioned systemare clustered around one. This proof is currently in preparation [50]to be published and is similar to the ones provided by Chan and Ng

[11] and Lin and Wang [34] and a generalization of them.

9.2 inverse generating function for bttb-block matrices 53

9.2.1 General Approach

Let us introduce a matrix-valued generating function

F (s) =

f11(s) f12(s) . . . f1M(s)

f21(s) f22(s). . .

......

.... . .

...

fM1(s) fM2(s) . . . fMM(s)

, (9.3)

and associate it with the corresponding Toeplitz-block matrix gener-ated by F (s) Note that in

definition (9.4) T [F ]

denotes aToeplitz-blockmatrix, while T [F ]

could also bereferring to ablock-Toeplitzmatrix, see forexample [51, Eq. (1)and (2)]. However,both definitions aresimilar, since theyare a result of apermutation (see(8.2)). Therefore,

the results describedhere also apply tostandardblock-Toeplitzmatrices, andvice-versa, theanalysis ofblock-Toeplitzmatrices can be usedhere.

T [F (s)] =

T [f11(s)] T [f12(s)] . . . T [f1M(s)]

T [f21(s)] T [f22(s)]. . .

......

.... . .

...

T [fM1(s)] T [fM2(s)] . . . T [fMM(s)]

. (9.4)

We can define the matrix-valued inverse generating function F−1(s)

(if the inverse exists) and use T [F−1] as a preconditioner, analogouslyto before. This method is illustrated in Figure 9.6.

F(x,y) T [F(x,y)]

T [F(x,y)]−1

F(x,y)−1 T [F(x,y)−1]

Approximation

Ã

Matrix Inversion B


≈

Figure 9.6: Illustration of the inverse generating function for Toeplitz-blockmatrices.

9.2.2 Preliminaries

In this section, smaller lemmas and their proofs are reproduced, thatare needed in Section 9.2.3, starting with some definitions that willsimplify the nomenclature later on. If not stated otherwise, the 2-norm is meant, if || · || is written.


Definition 9.2.1: minsλmin(F (s))

For any matrix-valued function F ∈ L1, where F (s) = F (s)H,we define

minsλmin(F (s)) ≡ sup

y

y ∈ R : λ1(F (s)) > y , for a. e. s ∈ [−π,π] ,

where λj (F (s)) j = 1, . . . ,n are the eigenvalues of F sorted innon-decreasing order.

Roughly speaking, this denotes the smallest eigenvalue of F over alls ∈ [−π,π].

Definition 9.2.2: maxsλmax(F (s))

For any matrix-valued function F ∈ L1, where F (s) = F (s)H,we define

maxsλmax(F (s)) ≡ inf

yy ∈ R : λn(F (s)) 6 y , for a. e. s ∈ [−π,π] ,

where λj (F (s)) j = 1, . . . ,n are the eigenvalues of F sorted innon-decreasing order.

Roughly speaking, this denotes the largest eigenvalue of F over alls ∈ [−π,π].

The next two lemmas refer to the linearity of T [F ] and the fact thatthe Hermitian structure is preserved under T [F ]. Both lemmas can bechecked easily, by utilizing (3.1).

Lemma 9.2.3: Linearity of T [F ]

T [F ] is a linear mapping, such that T [a ·A+ b ·B] = a · T [A] +

b · T [B].

Lemma 9.2.4: Hermitian structure of T [F ]

Let F be a matrix-valued Hermitian function, i.e., fuv = (fvu)∗.Then T [F ] will be Hermitian, i.e.,

T [fuv]w = (T [fvu]−w)∗ .

In the case of a scalar-valued function f, the well-known Grenanderand Szegö Theorem (see, for example, [10, p. 13]) provides much in-formation on the distribution of the eigenvalues of T [f]. An extension


for block-Toeplitz matrices (see [55]), provides a similar result for ourcase.

Lemma 9.2.5: Distribution of Eigenvalues

Let F be Hermitian and λ be an eigenvalue of T [F ]. Then it holdsthat

minsλmin (F (s)) 6 λ 6 max

sλmax (F (s)) .

Proof. We can directly use [39, Thm. 3.1] or [51, Sec. 2] in combina-tion with the fact that the block-Toeplitz matrix in those papers canbe transformed by a similarity transformation into a Toeplitz-blockmatrix (following definition (9.4)).

We will use the following result in Section 9.2.3 to show the cluster-ing of the eigenvalues of the preconditioned system, in particular, toquantify the impact of the small norm perturbation on the spectrum.

Lemma 9.2.6: Bauer–Fike Theorem for HPD matrices

Let A be Hermitian positive definite (HPD). Let µ be an eigen-value of A+E, then there exists a λ, which is an eigenvalue ofA, such that:

|λ− µ| 6 ‖E‖ .

Proof. For a proof see [48, pp. 59–60] in combination with [48, Thm. 1.8]

Some lemmas will close this section, that will be used numeroustimes during the proof of the clustering of the eigenvalues.

Lemma 9.2.7: Sum of Hermitian matrices

If A and B is Hermitian, so is A+B.

Proof. (A+B)H = AH +BH = A+B

The next lemma is a general inequality between the spectral normand the Frobenius norm.

Lemma 9.2.8: 2-Norm and Frobenius norm

||A||2 6 ||A||F


Proof. We can write the Frobenius norm as ||A||2F =n∑j=1

||Aej||22.

At the same time, for an arbitrary x with ||x||2 = 1, i.e.n∑j=1

|xj|2 = 1,

we have

||Ax||22 = ||

n∑j=1

xjAej||22 6

n∑j=1

|xj|||Aej||2

2

6

n∑j=1

|xj|2

n∑j=1

||Aej||22 =

n∑j=1

||Aej||22 = ||A||2F ,

where the first inequality is the triangular inequality and for the sec-ond one we used the Cauchy-Schwarz inequality.Since this is true for arbitrary x it is also true for max

||x||2=1||Ax||22 =

||A||22.

Lemma 9.2.9: Sub-multiplicativity

All induced matrix norms are sub-multiplicative, ı.e. ||AB|| 6||A||||B||.

Proof.

‖AB‖ = max‖x‖=1

‖(AB)x‖ = max‖x‖=1

‖A(Bx)‖

6 max‖x‖=1

‖A‖‖Bx‖ = ‖A‖ max‖x‖=1

‖Bx‖ = ‖A‖‖B‖

Lemma 9.2.10: Eigenvalues of the inverse

If the matrix A has the eigenvalue λ, then A−1 has the eigen-value λ−1.

Proof. If λ is an eigenvalue of A, then Av = λv. If we multiply bothsides withA−1, we getA−1Av = λA−1v from which directly followsA−1v = 1

λv.

Lemma 9.2.11: Rank

rank(AB) 6 min(rank(A), rank(B))


Proof. We can identify the vector product Ax with the linear trans-form A(x). Then rank(AB) = rank(A(B(x))) 6 rank(A) = rank(A).We can do the same to get rank(DC) 6 rank(D). If we set C = AT

and D = BT , we get rank(DC) = rank((AB)T

)6 rank(BT ).

We later need the fact that a similarity transformation A 7→ P−1AP

does not change the eigenvalues.

Lemma 9.2.12: Similarity transform

If λ is an eigenvalue of A, then it is also an eigenvalue of A =

P−1AP for any invertible matrix P .

Proof. Let Av = λv, then for v = P−1v, Av = (P−1AP )P−1v =

P−1Av = λP−1v = λv.

9.2.3 Proof of Clustering of the Eigenvalues

We first show that the rank of T [P−1] T [P ] is bounded by 2KM, if allentries of P (s) are trigonometric polynomials of degree K or smaller.We follow the proof of Lemma 2 from [11] and generalize it to theToeplitz-block case.

Lemma 9.2.1

Let puv, 1 6 u, v 6M be trigonometric polynomials of degree Kin C2π, i.e.,

puv(s) =

K∑k=−K

puvk eiks.

Define

P =

p11 p12 ... p1M

p21 p22 ... p1M

......

. . ....

pM1 pM2 ... pMM

,

and assume its invertibility.Then for n > 2K, rank(T [P−1] T [P ] − I) 6 2KM, where T [P−1],and T [P ] ∈ CnM×nM and I denotes the identity matrix of ap-propriate size.


Proof. Let

R(s) = P (s)−1 , (9.5)

with its entries

ruv(s) =

∞∑k=−∞ r

uvk e

iks.

(9.5) implies

R(s)P (s) = I .

Therefore

M∑m=1

rum(s)pmv(s) = δu−v =

∞∑l=−∞ δu−v δl e

ils , (9.6)

where δi is the Kronecker delta that is 1 if i = 0 and 0 otherwise.On the other hand,

M∑m=1

rum(s)pmv(s) =

M∑m=1

( ∞∑k ′=−∞ r

umk ′ e

ik ′s

)(K∑

k=−K

pmvk eiks

)

=

M∑m=1

( ∞∑k ′=−∞

K∑k=−K

rumk ′ pmvk ei(k

′+k)s

)

[k ′ = l− k] =

M∑m=1

( ∞∑l=−∞

K∑k=−K

ruml−k pmvk eils

)

=

∞∑l=−∞

(M∑m=1

K∑k=−K

ruml−k pmvk

)eils . (9.7)

Comparing the coefficients of eils in the right-hand sides of (9.6) and(9.7) we see that

M∑m=1

K∑k=−K

ruml−k pmvk = δu−v δl =

1, if u = v and l = 0,

0, otherwise..

Hence for n > 2K, the entries of T [P−1] T [P ] − I are all zeros exceptentries in the first and last K columns of each Toeplitz-block. Thusrank(T [P−1] T [P ] − I) 6 2KM.

We now slightly deviate from [11, Lem. 3] and instead follow [34,Lem. 2] to show that T [F−1] − T−1[F ] can be written as a low-rankmatrix Gn and a matrix Hn of small norm.


Lemma 9.2.2

Define F (s) as in (9.3) where each fuv ∈ C2π, 1 6 u, v 6M, andFmin > 0. Then for all ε > 0, there exist positive integers N andK such that for all n > N,

T [F−1] − T−1[F ] = G+H ,

where rank(G) 6 2KM and ‖H‖ < ε.

Proof. 1. Since fuv ∈ C2π, 1 6 u, v 6 M, following the Weierstrassapproximation theorem (see [16, pp. 4–6]) given ε > 0, thereexists a trigonometric polynomial

puv(s) =

K∑k=−K

puvk eiks,

such that

‖fuv(s)−puv(s)‖ ≡ maxs

|fuv(s)−puv(s)| 6 ε, 1 6 u, v 6M.

2. Since F and P are Hermitian, F −P is also Hermitian. We canuse the linearity, Hermitian structure and the distribution of theeigenvalues (see 9.2.5) to show, that the matrices T [F ] and T [P ]

can be made arbitrarily close

‖T [F ] − T [P ]]‖ = ‖T [F −P ]‖= max

k|λk (T [F −P ]) |

6 max(|minsλmin (F (s) −P (s)) |,

|maxsλmax (F (s) −P (s)) |

).

We first derive an upper bound for the second element of themaximum above

|maxsλmax (F (s) −P (s)) |2 6 max

s,k|λk (F (s) −P (s)) |2

6 maxs‖F (s) −P (s)‖2

6 maxs‖F (s) −P (s)‖2F

= maxs

∑uv

|fuv(s) − puv(s)|2

6∑uv

maxs

|fuv(s) − puv(s)|2

6Mε2 .


For the first element we get

|minsλmin (F (s) −P (s)) |2 6 max

s,k|λk (F (s) −P (s)) |2 6Mε2 ,

by using the same steps as for |maxsλmax (F (s) −P (s)) |2.

Thus ‖T [F ] − T [P ]]‖2 6Mε2.

3. Since T [F ] is invertible, T [P ] is also invertible for a sufficientlysmall ε. We can also derive that P is HPD, if F is HPD and‖F −P ‖ 6 cε for some c.

We can now write

T [F−1] − T−1[F ] =(T [F−1] − T [P−1])

+ [(T [P−1] − T−1[P ]) + (T−1[P ] − T−1[F ])]

:=G+H ,

where

G =T [P−1] − T−1[P ] =(T [P−1] T [P ] − I

)T−1[P ],

H =(T [F−1] − T [P−1]

)+(T−1[P ] − T−1[F ]

).

4. Since P consists of trigonometric polynomials of degree K, wecan use Lemma 9.2.1 to show that G is low-rank

rank(G) 6 rank(T [P−1] T [P ] − I

)6 2KM.

5. Now we address the small error term H

‖H‖ 6 ‖T [F−1] − T [P−1]‖+ ‖T−1[P ] − T−1[F ]‖= ‖T [P−1(F −P )F−1]‖

+ ‖T−1[P ] (T [F ] − T [P ]) T−1[F ]‖ .

Using the Hermitian structure, we now show bounds for eachof those two terms separately, starting with the first term.

‖T [P−1(F −P )F−1]‖ 6 maxk

∣∣λk (T [P−1(F −P )F−1])∣∣

6 maxs,k

∣∣λk (P (s)−1(F (s) −P (s))F (s)−1)∣∣

6 maxs‖P (s)−1(F (s) −P (s))F (s)−1‖

6 maxs‖P (s)−1‖ ‖(F (s) −P (s))‖ ‖F (s)−1‖

61

minsλmin (P (s))

maxs,k

|λk (F (s) −P (s))|1

minsλmin (F (s))

For the first inequality, we use the fact that T [P−1(F −P )F−1] =

T [F−1] − T [P−1] and therefore Hermitian, as the sum of two


Hermitian matrices. In the last step, we use the fact that thefollowing equality holds true

maxsλmax

(F (s)−1

)= max

s

1

λmin (F (s))=

1

minsλmin (F (s))

,

if F (s) is HPD for all s ∈ [−π,π].

Now for the second term, we can bound it in the following way.

‖T−1[P ] (T [F ] − T [P ]) T−1[F ]‖6 ‖T−1[P ]‖ ‖T [F ] − T [P ]‖ ‖T−1[F ]‖= λmax

(T−1[P ]

)maxk

|λk (T [F ] − T [P ])| λmax(T−1[F ]

)6 max

s

1

λmin (P (s))maxs,k

|λk (F (s) −P (s))| maxs

1

λmin (F (s))

61

minsλmin (P (s))

maxs,k

|λk (F (s) −P (s))|1

minsλmin (F (s))

Putting it all together (with Step 2 of this proof), we get

‖H‖2 6 2 ·1

minsλmin (P (s))

maxs,k

|λk (F (s) −P (s))|1

minsλmin (F (s))

6 2 · 1

minsλmin (P (s))

1

minsλmin (F (s))

√Mc︸︷︷︸

:=C

ε = Cε .

We now use these lemmas to show that all except 2KM eigenvaluesof the preconditioned matrix T [F−1] T [F ] are clustered around one,following the proof and generalizing the result of [34, Thm. 3].

Theorem 9.2.3

Let fuv ∈ C2π, 1 6 u, v 6M and F (s) is HPD for all s ∈ [−π,π].

1. All eigenvalues of T [F−1] T [F ] lie in the interval[minsλmin(F (s))

maxsλmax(F (s))

,maxsλmax(F (s))

minsλmin(F (s))

].

2. For all ε > 0, there exist positive integers K andN such thatfor all n > N, at most 2KM eigenvalues of T [F−1] T [F ] − I

have absolute values greater than ε.

Proof. 1. Since F (s) is HPD for all s ∈ [−π,π], both matrices T [F ]

and T [F−1] are HPD (using Lemmas 9.2.4 and 9.2.5). The eigen-


values of T [F−1] T [F ] and T 1/2[F ] T [F−1] T 1/2[F ] coincide (sincethe latter is a result of a similarity transform of the former). Letλ be an eigenvalue of T 1/2[F ] T [F−1] T 1/2[F ]. We have

λ 6maxx6=0

x∗T 1/2[F ] T [F−1] T 1/2[F ]x

x∗x

= maxy 6=0

y∗T [F−1]y

y∗T−1[F ]y

6maxy 6=0

y∗T [F−1]y

y∗ymaxy 6=0

y∗y

y∗T−1[F ]y

6maxs,k

λk(F−1)max

z 6=0

z∗T [F ]z

z∗z6

1

minsλmin(F )

maxsλmax(F ).

Similarly we obtain λ >minsλmin(F (s))

maxsλmax(F (s)) . Therefore,

minsλmin(F (s))

maxsλmax(F (s)) 6

λ 6maxsλmax(F (s))

minsλmin(F (s)) .

2. From Lemma 9.2.2 it follows that for a given ε > 0, there existpositive integers K and N such that for all n > N,

T [F−1] − T−1[F ] = G+H .

where rank(G) 6 2KM and ‖H‖ < Cε. Since F (s) is HPD, thematrices T [F ], T [F−1], T [P ], T [P−1], T−1[F ] and T−1[P ] areHermitian. Therefore, G and H are Hermitian. From the lastequation we get

T1/2[F ] T [F−1] T

1/2[F ] = I + G+ H ,

where

G = T1/2[F ]G T

1/2[F ],

H = T1/2[F ]H T

1/2[F ].

It follows that

rank(G) 6 rank(G) 6 2KM,

and

‖H‖ = ‖T 1/2[F ]‖2 ‖H‖.

Therefore, at most 2KM eigenvalues of

T1/2[F ] T [F−1] T

1/2[F ] − H = I + G

are different from 1.

9.3 regularizing functions 63

We also know that

T [F−1] T [F ] =T−1/2[F ]

(T1/2[F ] T [F−1] T

1/2[F ])T1/2[F ]

=T−1/2[F ]

((I + G) + H

)T1/2[F ] ,

which means that the eigenvalues of T [F−1] T [F ] are the sameas the eigenvalues of (I + G) + H (since we applied a similaritytransform).

To get the eigenvalues of (I + G) + H , we use the Bauer–FikeTheorem (see Lemma 9.2.6). We know that at most 2KM eigen-values of I + G are not equal to one. If we addH , we know thatthe difference in the eigenvalues (compared to I + G) is limitedby ‖H‖.

In conclusion, the differences of the eigenvalues of (I + G) and(I + G) + H will be smaller than ‖H‖ 6 ‖T 1/2[F ]‖2 ‖H‖ 6‖T 1/2[F ]‖2 Cε︸︷︷︸

:=ε

6 ε and therefore, at most 2KM eigenvalues of

T [F−1] T [F ] lie outside the interval [1− ε, 1+ ε].

9.2.4 Example

Figure 9.7 shows an example for the result of the IGF method inthe case of a 2 × 2 BTTB-block matrix. The top image shows theinverse of the original matrix, while the bottom image is the resultof the inverse generating function method described in this section.The plots illustrate that under certain circumstances, both matricesare very close to each other.

9.3 regularizing functions

Applying the inversion formulas for the matrix-valued generatingfunction F, requires the inversion of some of its elements. As dis-cussed in the last section, the inversion of the elements fuv can hap-pen pointwise after approximating it with fuv, using the Fourier

coefficients extracted from the original matrix.

However, if fuv is close to zero at some point x, the inverse (fuv)−1

will be very large at this point. The fact that the generating functioncan be only approximated can reinforce this fact, producing unrealis-tically high values for the inverse generating function.


T [F ]−1

T [F−1]

∣∣T [F ]−1 − T [F−1]∣∣

Figure 9.7: Color plots for the inverse of the original 2× 2 BTTB-block ma-trix, T [F (x,y)]−1, the result of the inverse generating functionmethod T [1/F (x,y)] and the difference of those two.

To counteract this effect, we can apply the inverse generating func-tion method on a regularized function, i. e. , applying it on T [F +

αIM]. This is equivalent to applying it onto T [F ] +αIm×n.

The optimal value for α corresponds to a trade-off, that can bedifferent for each linear system. Figure 9.8 shows the number of it-erations until convergence (relative to the number of iterations forα = 0), for several test cases, with the inverse generating functionmethod with different strengths of regularization, i. e. , different val-ues for α. In other words, the y-axis describes the ratio of the numberof iterations compared to the unregularized system, i. e.

y =number of iterations of IGF with a regularization of strength α

number of iterations of IGF without regularization.

Figure 9.8 indicates that choosing the right value for α can reducethe number of iterations. It also indicates that the optimal value varies

9.4 numerical experiments 65

Figure 9.8: Degrees of regularization.Relative convergence rates for several test cases without and with several

degrees of regularization for the inverse generating function method.

between the several test cases. Finding the optimal value - or at leasta value that decreases the iterations - appears to be a non-trivial prob-lem, requiring more analysis.

9.4 numerical experiments

9.4.1 Convergence of the IGF

Figure 9.9 shows for a sample test case, the convergence of T [f] (theresult of the IGF with approximations) towards T [f−1], if the sam-pling rate of the numerical integration (step c)) and the number ofFourier coefficients in the approximation of the generating function(step ã)) is increased.

This sample test case is the result of a piecewise constant generat-ing function, for which the Fourier coefficients can be computed ana-lytically. The inverse generating function is again a piecewise constantfunction. Therefore, T [f] and T [f−1] can be computed. Figure 9.9 plotsthe relative difference of the result of the IGF method and T [f−1], i. e.

, ‖T [f−1]−T [f]‖F‖T [f−1]‖F

. It can be seen that increasing the number of Fourier

terms does not necessarily decrease the relative difference, if the sam-pling rate is hold constant.


Figure 9.9: Convergence of the IGF method towards the exact inverse.

9.4.2 IGF for a BTTB-block Matrix

This section will analyze the IGF method for a matrix of the form T1 0.15 · T2 0.01 · T30.15 · T2 T1 0.05 · 0.15 · T20.01 · T3 0.05 · 0.15 · T2 T1

,

where Tl, with l ∈ 1, 2, 3 is a BTTB matrix with n1 × n1 blocks, eachof size n2 × n2, generated by the corresponding sequence of Fouriercoefficients.

T1 is generated by

tj,k =((|j|+ 1)(|k|+ 1)1+0.1(|j|+1)

)−1.

T2 is generated by

tj,k =((|j|+ 1)3.1 + (|k|+ 1)3.1)−1 .


T3 is generated by

tj,k =

2.5 j = 0, k = 0(1+ (|k|+ 1)1.1

)−1j = 0, k = 0,±1,±2, . . .(

(|j|+ 1)2.5 + |jk+ 1|2.5)−1

j = ±1,±2, . . . , k = 0,±1,±2, . . .

.

It can be verified that this generates a HPD matrix, thus fulfilling theassumptions of Theorem 9.2.3.

Table 9.1 shows the average number of required iterations of thepreconditioned conjugate gradient method, until the tolerance (10−7)is reached, for ten randomly created right-hand side vectors and dif-ferent sizes of n1 and n2. Here, the midpoint rule is used to computethe Fourier integrals numerically, using n1 and n2 intervals in thecorresponding directions.

Besides comparing the results of the IGF preconditioner to the orig-inal system (P = I), the results of a 3× 3 BCCB-block preconditioner(P = C(T [F ])) are also provided as a reference (see Chapter 8).

Compared to the original system the number of iterations is re-duced by a factor of up to ten, when using T [F−1] as a precondi-tioner. Compared to the BCCB-block preconditioner, T [F−1] reducesthe number of iterations as well, up to a factor of two.

Figure 9.10 illustrates the distribution of the eigenvalues in the caseof n1 = n2 = 30, firstly for the original system (Figure 9.10a), thenwhen using the BCCB-block preconditioner (Figure 9.10b) and lastlywhen using the IGF preconditioner (Figure 9.10c). Both precondi-tioners cluster the eigenvalues around one. It can be seen that theclustering is tighter for our preconditioner, compared to the BCCB-block preconditioner. All eigenvalues of T [F−1]T [F ] are real, sinceT [F−1]T [F ] is similar to T 1/2[F ] T [F−1] T 1/2[F ] (see the proof of 9.2.3).T1/2[F ] T [F−1] T 1/2[F ] is Hermitian and therefore all its eigenvalues

are real. Consequently the eigenvalues of T [F−1]T [F ] are also real.


Table 9.1: The average number of iterations for 3× 3 BTTB matrix.

n1 n2 Matrix size P = I P = C(T [F ]) P−1 = T [F−1]

10 10 300 56.7 16.6 16.020 20 1200 79.3 17.9 11.030 30 2700 92.2 19.0 11.040 40 4800 100.5 18.9 10.8

(a) Original System

(b) Circulant Preconditioned System

(c) IGF Preconditioned System

Figure 9.10: Distribution of eigenvalues for the IGF.Distribution of the eigenvalues λj of the original system (a), using 3× 3

BCCB-block preconditioner (b) and using T [F−1] as a preconditioner (c).

10K R O N E C K E R P R O D U C TA P P R O X I M AT I O N

The Kronecker product approximation refers to finding the optimalmatrices Ak and Bk such that the sum of their Kronecker productsis as close as possible to the desired matrix. This is equivalent tosolving

minAk∈CNx×Nx ,Bk∈CNy×Ny

||M −

s∑k=1

Ak ⊗Bk||F , (10.1)

where ⊗ denotes the Kronecker product of two matrices and M isany matrix of size NxNy ×NxNy.

Using this (approximate) decomposition into Kronecker products,the MVP of its inverse and a vector can be computed more effi-ciently, if additional approximations are applied. The preconditionerobtained by computing its Kronecker product approximation will bedenoted by Krons(T ), where s indicates the number of Kronecker

product terms used in the approximation and T refers to the originalmatrix.

The next section describes an efficient way to obtain the Kronecker

product approximation for BTTB matrices and how one can use thisdecomposition to compute the inverse efficiently from the computa-tional point of view.

Section 10.2 will discuss possible ways to use this approach in thecase of a 3× 3 BTTB-block matrix, thus generalizing the approach tomake it applicable to the full problem.

10.1 optimal approximation for bttb matri-ces

Olshevsky et al. [44] proved that if the approximated matrix is BTTB Theorem 3.2 in [44]proofs this equalityin the general case.This means that theoptimal Ak and Bkfor example for ablock-Toeplitz-Hankel-blockmatrix, would havea Toeplitz and aHankel structure.

(M = T(BTTB)), then the problem in (10.1) is equivalent to

minAk∈T(Nx),Bk∈T(Ny)

||T(BTTB) −

s∑k=1

Ak ⊗Bk||F ,

69

70 kronecker product approximation

where T(n) denotes the class of Toeplitz matrices of size n× n. Inother words, the optimal Ak and Bk have the same structure as T ,just reduced to one-level.

From [36] we know that if we define a tilde-transformation thatrearranges block matrices in the following way:

T = tilde(T ) = tilde

T1,1 T1,2 . . . T1,n

T2,1 T2,2 . . . T2,n...

.... . .

...

Tn,1 Tn,2 . . . Tn,n

=

vec(T1,1)T

...

vec(T2,1)T

...

vec(T1,n)T

...

vec(Tn,n)T

,

where the operator vec(A) unrolls a matrix into a vector by "stacking"its columns from left to right. We can reformulate (10.1) into

minak,bk

||T(BTTB) −

s∑k=1

(akbT

k)||F (10.2)

This problem can be solved by computing the SVD of T =r∑k=1

σkukvHk

and taking the first s terms [21]. However, the cost of an SVD of amatrix of size N2x ×N2y is with O

((N2x)

2N2y + (N2y)3)

too expensive.In [29], a banded

BTTB matrix isdefined as a matrix

that can be fullydefined by a single

column.

In [29] and [30], algorithms were proposed to solve (10.2) with anSVD of a much smaller size. While Kamm and Nagy [29] focused onbanded BTTB matrices, Kilmer and Nagy [30] expanded the ideato dense block-Toeplitz-plus-Hankel matrices with adjustments forsimilar cases, such as BTTB matrices.

[29] and [30]consider real valued

matrices, while inthis work, we

consider complexvalued matrices.

Following [29] and [30], the problem in (10.2) can be condensedinto a smaller problem

minak,bk

||RL1PRTR1 −

s∑k=1

(RL1ak)(bTkR

TR1)||F = min

ak,bk||P −

s∑k=1

(akbTk)||F

where

P is a (2Nx − 1)× (2Ny − 1) matrix containing all the Fourier

coefficients, present in the BTTB matrix T(BTTB),

10.1 optimal approximation for bttb matrices 71

RL1 =1√Nx

diag(√1,√2, . . . ,

√Nx − 1,

√Nx,√Nx − 1, . . . ,

√2,√1)

,

RR1 =1√Ny

diag (1, 2, . . . ,Ny − 1,Ny,Ny − 1, . . . , 2, 1)1/2.

10.1.1 Algorithm

To summarize, solving (10.1) can be done efficiently with the follow-ing steps:

1. Compute RL1 and RR1.

2. Compute P = RL1PRTR1.

3. Calculate SVD of P ≈s∑k=1

σkuvHk .

4. Set ak =√σk uk and bk =

√σk vk.

5. Solve RL1ak = ak and RR1bk = bk.

6. Build matrices Ak and Bk from ak and bk.

Step 6 is done by creating a Toeplitz matrix Ak, whose first row con-sist of the first Nx elements of ak. The first column of Ak are the lastNx elements of ak. Analogously for Bk.

The total costs of this algorithm are dominated by step 3, comput-ing the SVD of a (2Nx − 1)× (2Ny − 1) matrix, of which the compu-tational costs are O

(NxN

2y

), if Nx > Ny [56, Lecture 31].

10.1.2 Inverse and MVP

To use the Kronecker product approximation as a preconditioner,the MVP with its inverse needs to be computable efficiently. In thissection we start by first discussing the case of s = 1, i. e. an approx-imation with just one Kronecker product. Next, the case of s > 2

terms is discussed which requires further approximations for an effi-cient computation.

10.1.2.1 One Term Approximation

In this case, the BTTB matrix T(BTTB) is approximated with just asingle term of the Kronecker product approximation, thus

T(BTTB) ≈ A⊗B .

To compute

(T(BTTB))−1x ≈ (A⊗B)−1x ,


we use the fact that (A⊗B)−1 = A−1 ⊗B−1 [21, Sec. 12.3.1]. Addi-tionally, it is known that (DT ⊗E)vec(S) = vec(F ) ⇐⇒ ESD = F

for matrices D, E, F and S [26, Lem. 4.3.1]. Using this, the MVPwith the inverse can be expressed as

(A⊗B)−1 x =(A−1 ⊗B−1

)x

= vec(B−1vec−1(x)A−T

). (10.3)

Consequently, the computation of an MVP with (A⊗B)−1 re-quires only the inverses ofA andB separately. They can be computedutilizing the Gohberg–Semencul formula (see Theorem 3.0.4). Thisway, the inverses are given as four Toeplitz matrices, i. e. , T−1 =

AB +CD, where A,B,C and D are Toeplitz.

The two matrix-matrix products in (10.3) can then be computed bycomputing multiple MVPs with Toeplitz matrices. Thus, the totalcost is O (NxNy logNy +NyNx logNx) = O (NxNy log(NxNy)).

10.1.2.2 Two-Terms Approximation

In the case of s = 2, we can rewrite

(A1 ⊗B1 +A2 ⊗B2)−1x = c

⇐⇒ solve for c (A1 ⊗B1 +A2 ⊗B2)c = x⇐⇒ solve for C (A1 ⊗B1 +A2 ⊗B2)vec(C) = vec(X)

⇐⇒ solve for C B1CA1T+B2CA2

T=X

(∗)⇐⇒ solve for C B1C +B2CA2TA−T1 =XA−T

1

(∗∗)⇐⇒ solve for C B−12 B1C +CA2

TA−T1 = B−1

2 XA−T1

where at the equality marked with (∗) the previous equation wasmultiplied by A−T

1 from the right, and analogously at step (∗∗) withB−12 from the left (see also [38, p. 1135]). This results in an equation of

the form AC +CB = C, which is called a Sylvester equation andcan be numerically solved, for example with the Bartels–Stewart

algorithm, in O(N3x +N

3y

)[4]. However, as this has to be computed

in each iteration, it is considered as too computationally expensive,and other methods are preferred.

10.1.2.3 Multiple Terms Approximation

In general, no exact inversion formulas exist fors∑k=1

Ak ⊗Bk for

s > 2.

10.2 bttb-block matrices 73

Kamm and Nagy [28] suggested using an approximate SVD if s >2. Given the SVD of A1 = UASAV HA and B1 = UBSBV HB , construct

U = UA ⊗UB ,

V = VA ⊗VB ,

S = diag(UH(

s∑k=1

Ak ⊗Bk)V)

,

thens∑k=1

Ak ⊗Bk ≈ USV H. It can be seen that S satisfies

minΣ∈D

‖Σ−UH(

s∑k=1

Ak ⊗Bk)V ‖ = minΣ∈D

‖UHΣV − (

s∑k=1

Ak ⊗Bk)‖ ,

where D denotes the class of all diagonal matrices. This shows thatthe described method produces an optimal SVD approximation giventhe fixed basis U and V . Additionally, for s = 1, it returns the regularSVD, without any further approximation.

The MVP with its inverse can then be computed by

T−1(BTTB)x ≈ (

s∑k=1

Ak ⊗Bk)−1 ≈ (USV H)−1x

=(V H

)−1S−1U−1x

= (V S−1UH)x

= (VA ⊗VB)S−1(UHA ⊗UHB )x ,

which can be applied using the same strategy and computationalcosts as described in the previous section.

10.2 bttb-block matrices

If T(block) consist of 3× 3 BTTB blocks that have each been approxi-mated by a sum of Kronecker products,

T(block) =

T1,1; T1,2; T1,3;

T2,1; T2,2; T2,3;

T3,1; T3,2; T3,3;

=

s∑k=1

A1,1;k ⊗B1,1;k

s∑k=1

A1,2;k ⊗B1,2;k

s∑k=1

A1,3;k ⊗B1,3;k

s∑k=1

A2,1;k ⊗B2,1;k

s∑k=1

A2,2;k ⊗B2,2;k

s∑k=1

A2,3;k ⊗B2,3;k

s∑k=1

A3,1;k ⊗B3,1;k

s∑k=1

A3,2;k ⊗B3,2;k

s∑k=1

A3,3;k ⊗B3,3;k


then the block inversion formula can be used to compute the inverse.Let

T(block) =

(W Q

R S

),

where

W =

(T1,1; T1,2;

T2,1; T2,2;

),

Q =

(T1,3;

T2,3;

),

R =(T3,1; T3,2;

),

S =(T3,3;

).

Then by applying the block inversion formula, one gets

T−1(block) =

(W Q

R S

)−1=

(W−1 +W−1QZRW−1 W−1QZ

−ZRW−1 Z

),

where

Z = (S −RW−1Q)−1 .

The computation requires W−1 which can be computed by applyingthe block inversion formula again, this time on W , thus

W−1 =

(T1,1; T1,2;

T2,1; T2,2;

)−1

=

(T−11,1; + T

−11,1;T1,2;Z

′T2,1;T−11,1; T−1

1,1;T1,2;Z′

−Z ′T2,1;T−11,1; Z ′

),

where

Z ′ = (T2,2; − T2,1;T−11,1;T1,2;)

−1 .

This means that Z and Z ′ have to be computed in order to computeBesides Z and Z ′,the inverse of T1,1;has to be computed.

How this can bedone efficiently is

discussed in theprevious section.

the inverse of a BTTB-block matrix. However, this can not be doneefficiently without further approximations.

The problem of computing Z and Z ′ is separated into two differentcases. In the first case s = 1, the approximation is just a single Kro-necker product, while in the second case s > 2 the approximationconsists of a sum of Kronecker products.


10.2.1 One Term Approximation

10.2.1.1 Sum Approximation

If s = 1, then

Z ′ = (A2,2; ⊗B2,2; −A2,1; ⊗B2,1; ·A−11,1; ⊗B

−11,1; ·A1,2; ⊗B1,2;)

−1

= (A2,2; ⊗B2,2; − (A2,1;A−11,1;A1,2;)⊗ (B2,1;B

−11,1;B1,2;))

−1 .

However, this is the inverse of a sum of Kronecker products which,as described in Section 10.1.2.3, cannot be computed efficiently. There-fore, a possible approach is to approximate the sum occurring in Z ′

by just the first term, thus

Z ′ ≈ A−12,2; ⊗B

−12,2; . (10.4)

Analogously, for Z the same approximation is needed and therefore

Z ≈ S−1 = A−13,3; ⊗B

−13,3; . (10.5)

Using this approximation, the elements of W−1 are(W−1

)1,1 = A

−11,1 ⊗B

−11,1 + (A−1

1,1A1,2A−12,2A2,1A

−11,1 )⊗ (B−1

1,1 B1,2 B−12,2 B2,1 B

−11,1 )(

W−1)1,2 = (A−1

1,1A1,2A−12,2 )⊗ (B−1

1,1 B1,2 B−12,2 )(

W−1)2,1 = (−A−1

2,2A2,1A−11,1 )⊗ (B−1

2,2 B2,1 B−11,1 )(

W−1)2,2 = A

−12,2 ⊗B

−12,2 .

The same approximation and inversion formula can be appliedagain to get the inverse of T(block). The elements of T−1

(block) can befound in the appendix, see Section A.1.1.

Using the approximations (10.4) and (10.5) for Z ′ and Z, the in-verse of T(block) can be computed using only inverses of BTTB matri-ces.

The MVP can be computed using one of two strategies. The firstoption is to precompute the matrix-matrix products in the equationsabove. For the second to last equation of Section A.1.1 we get forexample:


(T−1(block)

)3,2

= (−A−13,3A3,1A

−11,1A1,2A

−12,2 )︸︷︷︸

A1

⊗ (B−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )︸︷︷︸

B1

+ (−A−13,3A3,2A

−12,2 )︸︷︷︸

A2

⊗ (B−13,3 B3,2 B

−12,2 )︸︷︷︸

B2

,

= A1 ⊗ B1 + A2 ⊗ B2 (10.6)

If this strategy is used, it is necessary to use the detailed formuladescribed in the appendix. This option reduces the number of matrix-matrix products that have to be computed. However, the Toeplitz

structures of the matrices in the original equation will be lost. There-fore the complexity will be O

(N2xNy +NxN

2y

).

For the second option, the equations will be left untouched, pre-serving the Toeplitz structure of the matrices involved. This wouldrequire a fairly large number of matrix-matrix products, but eachcan be sped up using the FFT in combination with the Gohberg–Semencul formula (see Theorem 3.0.4).

The complexity is in this case O (NxNy logNy +NyNx logNx), butwith a fairly large constant, depending on the number of matrices inthe expression.

10.2.1.2 Diagonal Approximation

Section 10.2.1.2 Another approach would be to approximate the high-est level (3× 3-block level) with a block diagonal matrix, i. e.

T(block) =

T1,1; T1,2; T1,3;

T2,1; T2,2; T2,3;

T3,1; T3,2; T3,3;

≈

T1,1;

T2,2;

T3,3;

.

The inverse is then simply

T−1(block) ≈

T−11,1;

T−12,2;

T−13,3;

.


The MVP with T−1(block) can be computed efficiently if the Gohberg–

Semencul formula is utilized (see Theorem 3.0.4). The cost of anMVP is then O (NxNy logNy +NyNx logNx), with a smaller con-stant compared to the previous method.

10.2.2 Multiple Terms Approximation

In the case of s > 2, each BTTB matrix will be approximated via anapproximate SVD, so Ti,j ≈ Ui,jSi,jV Hi,j .

10.2.2.1 Sum Approximation

Using the same approximations for Z ′ and Z, as the ones describedin Section 10.2.1.1, we get(

W−1)1,1 = V11S

−111 U

H11 +V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

U21S21VH21V11S

−111 U

H11(

W−1)1,2 = V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22(

W−1)2,1 = −V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11(

W−1)2,2 = V22S

−122 U

H22 ,

for the elements of W−1. The elements of T−1(block) are given in the

appendix.

The costs of an MVP with T(block) are in this caseO(N2xNy +N

2yNx +NxNy

)= O (NxNy(Nx +Ny + 1)). This can be

achieved, since all the matrices Ui,j are a result of a Kronecker prod-uct, so,U = UA⊗UB . The same is true for the matrices Vi,j. A matrix-matrix product with Ui,j and Vi,j can be computed inO(2Ny +N

2yNx

)= O (NxNy(Nx +Ny)). A matrix-matrix product

with a matrix Si,j can be computed in O (NxNy), since Si,j is diago-nal.

10.2.2.2 Common Basis

As another method, all BTTB matrices Ti,j could be diagonalizedby common basis matrices U and V . Then Ti,j = U Si,j V

H and noapproximations for Z or Z′ are needed, since

Z ′ =(U(S22 −S21S

−111 S12)V

H)−1

:= V S−1Z ′ U

H ,

can be computed easily. Similar, for Z,

Z =(U(S22 −S21S

−111 S12)V

H)−1

:= V S−1Z UH .


Using a common basis, the elements of W−1 are(W−1

)1,1 = V (S−1

11 +S−111 S12S

−1Z ′ S21S

−111 )U

H(W−1

)1,2 = V (S−1

11 S12S−1Z ′ )U

H(W−1

)2,1 = V (−S−1

Z ′ S21S−111 )U

H(W−1

)2,2 = V (S−1

Z ′ )UH ,

and ultimately, the elements for T−1(block) are(

T−1(block)

)11

=V(S−111 +S−1

11 S12S−1Z ′ S21S

−111

+S−111 S13S

−1Z S32S

−111 +S−1

11 S13S−1Z S32S

−111 S12S

−1Z ′ S21S

−111

+S−111 S12S

−1Z ′ S21S

−111 S13S

−1Z S32S

−111

+S−111 S12S

−1Z ′ S21S

−111 S13S

−1Z S32S

−111 S12S

−1Z ′ S21S

−111

−S−111 S13S

−1Z S32S

−1Z ′ S21S

−111

−S−111 S12S

−1Z ′ S21S

−111 S13S

−1Z S32S

−1Z ′ S21S

−111

+S−111 S12S

−1Z ′ S23S

−1Z S31S

−111

+S−111 S12S

−1Z ′ S23S

−1Z S31S

−111 S12S

−1Z ′ S21S

−111

− S−111 S12S

−1Z ′ S23S

−1Z S31S

−1Z ′ S21S

−111

)UH .

(T−1(block)

)12

=V(S−111 S12S

−1Z ′

+S−111 S13S

−1Z S31S

−111 S12S

−1Z ′

+S−111 S12S

−1Z ′ S21S

−111 S13S

−1Z S31S

−111 S12S

−1Z ′

+S−111 S13S

−1Z S32S

−1Z ′

+S−111 S12S

−1Z ′ S21S

−111 S13S

−1Z S32S

−1Z ′

+S−111 S12S

−1Z ′ S23S

−1Z S31S

−111 S12S

−1Z ′

+ S−111 S12S

−1Z ′ S23S

−1Z S32S

−1Z ′)UH .

(T−1(block)

)13

=V(S−111 S13S

−1Z +S−1

11 S12S−1Z ′ S21S

−111 S13S

−1Z +S−1

11 S12S−1Z ′ S23S

−1Z

)UH .

(T−1(block)

)21

=V(−S−1

Z ′ S21S−111 −S−1

Z ′ S21S−111 S13S

−1Z S31S

−111

−S−1Z ′ S21S

−111 S13S

−1Z S31S

−111 S12S

−1Z ′ S21S

−111

+S−1Z ′ S21S

−111 S13S

−1Z S32S

−1Z ′ S21S

−111

+S−1Z ′ S23S

−1Z S31S

−111 +S−1

Z ′ S23S−1Z S31S

−111 S12S

−1Z ′ S21S

−111

−S−1Z ′ S23S

−1Z S32S

−1Z ′ S21S

−111

)UH .


(T−1(block)

)22

=V(S−1Z ′ −S

−1Z ′ S21S

−111 S13S

−1Z S31S

−111 S12S

−1Z ′

−S−1Z ′ S21S

−111 S13S

−1Z S32S

−1Z ′

+S−1Z ′ S23S

−1Z S31S

−111 S12S

−1Z ′ +S

−1Z ′ S23S

−1Z S32S

−1Z ′)UH .

(T−1(block)

)23

=V(−S−1

Z ′ S21S−111 S13S

−1Z +S−1

Z ′ S23S−1Z

)UH .

(T−1(block)

)31

= −V(S−1Z S31S

−111 +S−1

Z S31S−111 S12S

−1Z ′ S21S

−111

−S−1Z S32S

−1Z ′ S21S

−111

)UH .

(T−1(block)

)32

= −V(S−1Z S31S

−111 S12S

−1Z +S−1

Z ′ S32S−1Z ′)UH .

(T−1(block)

)3,3

=V(S−1Z

)UH .

The complexity of this approach depends heavily on the basis ma-trices U and V .

10.2.2.3 Diagonal Approximation

Analogously to the diagonal approximation in the one term case,T(block) can be approximated with a block diagonal matrix on thehighest level. The costs of the MVP with T−1

(block) is in this caseO (NxNy(Nx +Ny + 1)).

10.3 numerical experiments

10.3.1 Convergence of the Kronecker Product Approximation

The Kronecker product approximation is only an approximationif terms are omitted. If the complete number of terms is used (s =

rank(P)), then it recreates the original matrix T(BTTB).

Figure 10.1 illustrates this fact by plotting the relative differencesof 500 randomly created BTTB matrices T(BTTB) and the result of theKronecker product approximation if all terms are used. The result ofthe Kronecker product approximation is computed using the algo-rithm described in Section 10.1.1 and is a decomposition of the form∑k

Ak ⊗Bk.


Figure 10.1: Relative difference of the Kronecker product approximation(using all terms) and the original BTTB matrix, for 500 ran-domly created test cases.

10.3.2 Decay of Singular Values

The core part of the Kronecker product approximation is the SVDthat is calculated in step 3 of the algorithm, described in Section 10.1.1.The decay of the singular value of this SVD can give us an idea of thequality of the approximation and also of how it relates to the numberof terms used.

Figure 10.2 illustrates the decay of the singular values of this SVDfor a sample case. Each line of the plot corresponds to one of theBTTB matrices of C. If a line stops, it means that the next singularvalue will be exactly zero.

The illustration shows a fast decay for the singular values, indicat-ing that each additional term of the Kronecker product approxima-tion will significantly improve the approximation. While Figure 10.2corresponds to only a single test case, a similar behavior has beenobserved for other test cases as well.

10.3.3 Relation to the Generating Function

As described in Chapter 3, a matrix with a Toeplitz structure can beassociated with a generating function. The Kronecker product ap-proximation can also be interpreted in terms of the generating func-tion (see Figure 10.3).

The generating function of T(BTTB) is a bivariate function f(x,y).The first term of the Kronecker product approximation consists oftwo Toeplitz matrices A1 and B1. Their generating function is uni-variate and we can denote them as g(x) and h(y). The Kronecker


Figure 10.2: Decay of the singular value of a sample test case.

product A1 ⊗B1 will result in a BTTB matrix. Its generating func-tion f1(x,y) can be written as

f1(x,y) = g(x) · h(y) .

In other words, the first term of the Kronecker product approx-imation A1 ⊗B1 corresponds to an approximation of the originalgenerating function by a separable function f1(x,y).


≈ ⊗ =

f(x,y)

≈

g(x)

·

h(y)

=

f1(x,y)

Figure 10.3: Relation of the Kronecker product approximation and the gen-erating functions (taken from the test case 1b).

(a) Original Gener-ating Function

(b) One Term Ap-proximation

(c) Two Term Ap-proximation

(d) Three Term Ap-proximation

Figure 10.4

This also means that if the original generating function f(x,y) ofT(BTTB) is a separable function, a single term of the Kronecker prod-uct approximation is a perfect decomposition. Therefore, it can be ex-pected that the Kronecker product approximation works very wellfor separable or almost separable generating functions.

Figure 10.4 illustrates the convergence of the Kronecker prod-uct approximation in terms of the generating function. Figure 10.4ashows the top view of the generating function corresponding to theoriginal BTTB matrix (taken from test case 2b). Figure 10.4b is thegenerating function corresponding to A1⊗B1, the first term approx-imation of the Kronecker product approximation. This function isclearly separable in x and y direction.

Figure 10.4c and Figure 10.4d add one and two more terms to theapproximation, respectively. It is visible that in this example, the threeterm approximation gives a very good approximation of the originalgenerating function.

11M O R E I D E A S

11.1 transformation based preconditioners

In Chapter 8, circulant matrices have been discussed as possible pre-conditioners, because they are diagonalizable by the Fourier trans-formation. This fact allows for an efficient inversion and an efficientcomputation of its MVP.

Similar to Chapter 8, other transformations can be used to derive adifferent set of preconditioners P where

P = TH(Transf.)ΛT(Transf.) ,

for Λ a diagonal matrix and T(Transf.) the associated transformation.

If the transformation T(Transf.) can be computed efficiently, for ex-ample by being a Fourier-related transform, the efficient algorithmsfor the inversion and the MVP still hold true.

In the following sections a series of Fourier-related transforms aredescribed briefly.

11.1.1 Discrete Sine and Cosine Transform

The discrete sine transforms (DSTs) and discrete cosine transforms(DCTs) are a series of Fourier-related transforms and are used inapplications such as signal processing or statistics. Martucci [37]described in total 16 transformations (four even and four odd ver-sions of each) and generalized them as discrete trigonometric trans-form (DTT).

Figure 11.1 and Figure 11.2 show the result of the DST II and DCTII transformation on a BTTB matrix. The transformation matrices ofDST II and DCT II of size n, are defined as(

T(DSTII)

)i,j

= ci,j sin(π(i+ 1)(2j+ 1)

2n

)i, j = 0, 1, . . . ,n− 1(

T(DCTII)

)i,j

= ci,j cos(πi(2j+ 1)

2n

)i, j = 0, 1, . . . ,n− 1 ,

where ci,j is a scaling factor.

83

84 more ideas

(a) Color plot of the original BTTB ma-trix.

(b) Color plot of the approximation viathe DST II.

Figure 11.1: Color plots for a BTTB matrix and the approximation resultingfrom DST II.


(b) Color plot of the approximation viathe DCT II.

Figure 11.2: Color plots for a BTTB matrix and the approximation resultingfrom DCT II.

11.1.2 Hartley Transform

Another Fourier-related transform is the Hartley transform. Thediscrete transformation matrix is defined as(

T(Hartley)

)i,j

=1√n

cos(2π

nij

)sin(2π√nij

)i, j = 0, 1, . . . ,n− 1 .

Figure 11.3 illustrates the result of an approximation with the Hart-ley transformation.

11.2 banded approximations 85


(b) Color plot of the approximation viathe Hartley transformation.

Figure 11.3: Color plots for a BTTB matrix and the approximation resultingfrom a Hartley transformation.

11.2 banded approximations

A banded matrix is a matrix whose non-zero elements are limited tothe main diagonal plus some diagonals on either side. For example, atridiagonal matrix has only non-zero elements on the main diagonalas well as on the diagonal above and below it. Figure 11.4 illustratesthe result of a tridiagonal approximation on both levels.


(b) Color plot of a two-level tridiagonalapproximation.

Figure 11.4: Color plots for a BTTB matrix and a tridiagonal approximationon both levels.

Banded approximations can be used on both levels of a BTTBmatrix, as described in Section 8.2. However, efficient inversion andMVP is only known if a diagonal approximation is applied.

86 more ideas

It is possible that efficient formulas for the inversion and the MVPexist as well, for example with the help of specialized banded solvers.However, due to the restricted time of this thesis, they have not beenconsidered in this work. Nevertheless, the performances of bandedapproximations with different bandwidths are shown in Chapter 12.These results can help in evaluating whether it is worth investingmore time in banded approximations.

11.3 koyuncu factorization

In his PhD thesis, Koyuncu [32] provides analytical inversion formu-las for BTTB matrices. They can be seen as generalizations for the Go-hberg–Semencul formula for one-level Toeplitz matrices. The mainresult [32, Thm. 3.0.1] of the thesis is the following:

Theorem 11.3.1: Inversion Formula for BTTB Matrices

Let

P(z1, z2) =n1∑k=0

n2∑l=0

Pklzk1zl2 , Q(z1, z2) =

n1∑k=0

n2∑l=0

Qklzk1zl2 ,

R(z1, z2) =n1∑k=0

n2∑l=0

Rklzk1zl2 , S(z1, z2) =

n1∑k=0

n2∑l=0

Sklzk1zl2 ,

be stable operator valued polynomials, and suppose that

Q(z1, z2)P(z1, z2)H = S(z1, z2)HR(z1, z2)

on T2, where T = z ∈ C : |z| = 1. Put

f(z1, z2) = P(z1, z2)H−1Q(z1, z2)−1

= R(z1, z2)−1S(z1, z2)H−1

,

for z1, z2 ∈ T. Put Λ = n\n, where n = (n1,n2) and write theFourier coefficients of f(z1, z2) as f(k, l), (k, l) ∈ Z2. Consider

T = (fk1−k2,l1−l2)(k1,l1),(k2,l2)∈Λ

Define A,A1,B,B1,C1, C1,C2, C2,D1,D2 as follows later.If Range(Ci) ⊂ Range(Di) or Range(CHi ) ⊂ Range(DHi ) for

i = 1, 2, then

T−1 = A1AH −BHB1 − C

H1 D

(−1)1 C1 − C

H2 D

(−1)2 C2 ,

where D(−1)1 and D(−1)

2 denote generalized inverses of D1 andD2.

11.3 koyuncu factorization 87

The matrices A,A1,B,B1,C1, C1,C2, C2,D1,D2 are defined asfollows:

A = (Pk−l)k,l∈Λ , A1 = (Qk−l)k,l∈Λ ,

B = (Sk−l)k∈n+Λ,l∈Λ , B1 = (Rk−l)k∈n+Λ,l∈Λ ,

(C1)ij =

j1∑k1=i1−n1

mini2,j2∑k2=0

Qk−iPHj−k −

j1+n1∑l1=i1

mini2+n2,j2+n2∑l2=n2

SHl−iRl−j

(C1)ij =

j1∑k1=i1−n1

mini2,j2∑k2=0

Pk−iQHj−k −

j1+n1∑l1=i1


RHl−iSl−j ,

where i ∈ Θ1 = n1 + 1,n1 + 2, . . . × 0, 1, . . . ,n2 − 1, j ∈ n1 ×n2\n1,n2,

(C2)ij =

mini1,j1∑k1=0

j2∑k2=i2−n2

Qk−iPHj−k −


j2+n2∑l2=i2

SHl−iRl−j

(C1)ij =

mini1,j1∑k1=0

j2∑k2=i2−n2

Pk−iQHj−k −


j2+n2∑l2=i2

RHl−iSl−j ,

where i ∈ Θ2 = 0, 1, . . . ,n1 − 1 × n2 + 1,n2 + 2, . . . , j ∈ n1 ×n2\n1,n2,

(D1)k,k =

mink1,k1∑l1=maxk1,k1−n2

mink2,k2∑l2=0

Qk−lPHk−l

−

mink1,k1+n1∑s1=maxk1,k1

mink2,k2+n2∑s2=n2

SHs−kRs−k ,

where k, k ∈ Θ1 and

(D2)k,k =

mink1,k1∑l1=0

mink2,k2∑l2=maxk2,k2−n2

Qk−lPHk−l

−

mink1,k1+n1∑s1=n1

mink2,k2+n2∑s2=maxk2,k2

SHs−kRs−k ,

where k, k ∈ Θ2 and Pk = Rk = Qk = Sk = 0 whenever k 6∈ n.

Although the inversion formula is something that would poten-tially benefit the construction of a preconditioner heavily, this methodis not pursued further in this work. This is mainly due to the fact that

88 more ideas

the prerequisites of the presented Theorem 11.3.1 are very strict. It isconsidered highly unlikely that such a decomposition of the generat-ing function is possible, or that it can be closely approximated by oneof this form.

11.4 low-rank update

Additional to preconditioning methods based on approximating theinverse of C, involving the other two matrices G and M that makeup A has been analyzed.

The matrix-matrix productGM can be approximated with an SVDof rank k, i. e.

GM ≈ UkSkV Hk , (11.1)

where GM = USV H, the result of an SVD on GM and Uk is thematrix equal to the first k columns of U , Vk is the matrix equal to thefirst k rows of V and Sk is the matrix that consists of the first k rowsand k columns of S.

Instead of using only the preconditioner based on C , we can thenP(C) can be apreconditioner

suggested in one ofthe chapters 7 to 11.

use P(C)+UkSkVHk as a new preconditioner. Here P(C) denotes the

preconditioner based on C and UkSkV Hk is the low rank update.

The inversion of this new preconditioner can be computed usingthe Woodbury matrix identity [5, Cor. 2.8.8],

P(C) +UkSkVHk = P(C)−1

− P(C)−1Uk(S−1k +V Hk P(C)−1Uk

)−1V Hk P(C)−1 ,

which only requires the inversion of P(C) and a matrix of size k× k.The inversion ofP(C), can be done

efficiently, if it is oneof the suggestedpreconditioners.

Thus, the low rank update is only efficient if k size(C).

Figure 11.5 illustrates the relative difference of the left and the rightside of (11.1) for different values of k and several test cases. It is vis-ible that for k size(C), the quality of the approximation is ratherlow. Some sample performance benchmarks also confirm that the im-provement of the preconditioner with a low rank update is not rele-vant for small values of k.

11.4 low-rank update 89

Figure 11.5: Relative difference of GM and UkSkV Hk for different values ofk and four different test cases.

Part III

B E N C H M A R K S

A variety of real-life test scenarios is used to test the per-formance of the proposed preconditioners of the last part.They will be compared in terms of the required iterationsneeded until convergence is reached.

12B E N C H M A R K S

In this chapter, several test cases are used to analyze the performanceof the proposed preconditioners. The test cases stem from the appli-cation described in Chapter 1.

In the benchmarks, IDR(s) is chosen as the iterative solver, usingthe MATLAB implementation described by van Gijzen and Sonn-eveld [19], which can be found online (http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html). The solver is used on a left precondi-tioned system, as described in (2.3), with the following parameters:

• s = 6: Dimension of the shadow space.

• tol = 10−7: Tolerance of the method.

• maxit = 5000: Maximum number of iterations.

The following preconditioners have been tested:

• No Preconditioner: This is the unpreconditioned system Ax =

b and is useful as a reference.

• Exact C: This preconditioner uses the complete matrix C, as de-scribed in Chapter 7. This will (usually) describe the upper limitof the performance of any preconditioner that will be based onC. While it reduces the number of iterations drastically, it isalso important to note that each iteration will take relativelylong, and therefore will not be optimal in terms of time.

• Circulant: This preconditioner replaces both Toeplitz levels withthe Chan’s optimal preconditioner, described in Section 8.1.1.2.It is the current default choice and will therefore be used as themain reference in this benchmark.

• DCT II: Similar to the circulant preconditioner, but instead ofthe Fourier transformation, DCT II is used, as described inSection 11.1.1.

• DST II: Similar to the circulant preconditioner, but instead ofthe Fourier transformation, DST II is used, as described inSection 11.1.1.

• Hartley: Similar to the circulant preconditioner, but instead ofthe Fourier transformation, the Hartley transformation is used,as described in Section 11.1.2.

93

http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html

http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html

94 benchmarks

• Kron 1: Each BTTB block ofC is approximated with a one termKronecker product approximation. No further approximationsare done on the level that is 3× 3 symmetrical.

• Kron 2: Each BTTB block of C is approximated with a twoterm Kronecker product approximation. Although no efficientinversion and MVP could be found, the results of this bench-mark might still provide further insight.

• Kron 3: Each BTTB block of C is approximated with a threeterm Kronecker product approximation. Although no efficientinversion and MVP could be found, the results of this bench-mark might still provide further insight.

• Kron Full: Each BTTB block of C is described using all theterms resulting from the Kronecker product approximation.Since it replicates the matrix C exactly, the results should bevery close to the one from Exact C. Although no efficient inver-sion and MVP could be found, the results of this benchmarkmight still provide further insight.

• Kron SVD 1: Each BTTB block of C is approximated with theapproximate SVD described in Section 10.1.2.3, using one termof the Kronecker product approximation. No further approxi-mations are done on the level that is 3× 3 symmetrical.

• Kron SVD 2: Each BTTB block of C is approximated withthe approximate SVD described in Section 10.1.2.3, using twoterms of the Kronecker product approximation. No further ap-proximations are done on the level that is 3× 3 symmetrical.

• Kron SVD 3: Each BTTB block of C is approximated withthe approximate SVD described in Section 10.1.2.3, using threeterms of the Kronecker product approximation. No further ap-proximations are done on the level that is 3× 3 symmetrical.

• Kron SVD Full: Each BTTB block of C is approximated withthe approximate SVD described in Section 10.1.2.3, using allterms of the Kronecker product approximation. No further ap-proximations are done on the level that is 3× 3 symmetrical.

• Kron 1 Approx: Same preconditioner as Kron 1, but with thesum approximation on the 3× 3 symmetric level, mentioned inSection 10.2.1.

• Kron 1 Approx Matrix: Same preconditioner as Kron 1 Approx,but implemented without the use of the Kronecker decompo-sition. The results should be close to Kron 1 Approx and can beused as a validation.

benchmarks 95

• Kron 1 Diagonal: Same preconditioner as Kron 1, but with thediagonal approximation on the 3×3 symmetric level, mentionedin Section 10.2.1.

• Kron SVD Diagonal: Same preconditioner as Kron SVD Full,but with the diagonal approximation on the 3 × 3 symmetriclevel, mentioned in Section 10.2.2.

• IGF 1: Preconditioner described in Section 9.2, with a samplingrate of 1, as defined in (9.2).




• IGF 5 (α = 0.3): Same preconditioner as IGF 5, but with anadditional regularization of α = 0.3.

• Diagonal: Each Toeplitz level of C has been approximated bya diagonal matrix, as described in Section 11.2. No further ap-proximations are done on the level that is 3× 3 symmetrical.

• Tridiagonal: Each Toeplitz level of C has been approximatedby a tridiagonal matrix, as described in Section 11.2. No furtherapproximations are done on the level that is 3× 3 symmetrical.

• Pentadiagonal: Each Toeplitz level ofC has been approximatedby a pentadiagonal matrix, as described in Section 11.2. No fur-ther approximations are done on the level that is 3× 3 symmet-rical.

• Heptadiagonal: Each Toeplitz level of C has been approxi-mated by a heptadiagonal matrix, as described in Section 11.2.No further approximations are done on the level that is 3× 3symmetrical.

The computation was done in MATLAB R2016b on a system withan Intel Core i5-6200U.

Table 12.2 contains all the results from the benchmark. It shows thenumber of iterations until convergence is reached, if one of the abovementioned preconditioners is used. The results are color-coded, in away that is shown in Table 12.1.

In addition to the results per test case, the table also provides thesum, the average and the median of all the test cases, per precondi-tioner, rounded to the nearest integer. While the median value tends

96 benchmarks

Table 12.1: Color-code for the tables in the benchmark chapter.

far better (> 10% fewer iterations vs. circulant)better (< 10% fewer iterations vs. circulant)worse (< 10% more iterations vs. circulant)far worse (> 10% more iterations vs. circulant)

to weight all test cases equally, the average weights the test cases de-pending on the number of iterations needed. Figure 12.1 illustratesthe results in terms of the speed up each preconditioner produces rel-ative to the circulant preconditioner.

The test cases are categorized in three groups, Group 1, Group 2 andGroup 3. It can be expected that the test cases in each group sharecommon properties.

The following sections will briefly describe the results per precon-ditioning method.

be

nc

hm

ar

ks

97

Table 12.2: Number of iterations if a selected preconditioner is used on a certain test case.Group Test Case No Preconditioning Exact C Circulant DCT II DST II Hartley Kron 1 Kron 2 Kron 3 Kron Full Kron SVD 1 Kron SVD 2 Kron SVD 3 Kron SVD Full Kron 1 Approx Kron 1 Approx Matrix Kron 1 Diagonal Kron SVD Diagonal IGF 1 IGF 3 IGF 5 IGF 7 IGF 5 (α = 0.3) Diagonal Tridiagonal Pentadiagonal Heptadiagonal

Group 1 Test Case 1a 1049 210 682 842 755 878 623 537 339 211 767 801 759 833 632 652 878 804 >5000 970 600 >5000 >5000 759 663 423 430

Group 1 Test Case 1b >5000 254 3905 4370 3629 >5000 3263 2821 966 261 2783 3442 3114 3401 2933 3194 2881 3154 >5000 2044 902 >5000 >5000 3450 3206 1816 1889

Group 1 Test Case 1c 2488 1178 2211 2399 2332 2858 1387 1454 1371 1218 1439 1497 1450 1484 1615 1585 1515 1583 >5000 >5000 >5000 >5000 >5000 1565 1560 1320 1407

Group 1 Test Case 2a 383 220 380 409 396 398 262 250 223 214 254 329 327 333 263 262 263 331 4030 1065 692 >5000 >5000 276 271 226 224

Group 1 Test Case 2b 71 54 62 62 65 66 58 55 54 54 55 59 59 59 57 59 58 57 1876 2887 1112 >5000 >5000 57 56 53 54

Group 1 Test Case 3a 373 176 338 350 346 359 228 224 192 177 223 303 299 300 230 229 224 296 3872 859 579 4784 4730 238 240 210 211

Group 1 Test Case 3b 69 47 60 61 63 65 52 51 47 47 52 52 53 53 54 54 56 58 1521 2971 1060 >5000 4492 56 56 50 49

Group 1 Test Case 4a 322 184 290 297 289 290 199 229 189 185 203 205 201 207 198 206 212 217 3854 1470 775 >5000 >5000 221 220 195 199

Group 1 Test Case 4b 74 39 46 43 43 44 38 40 40 39 38 40 41 42 39 39 41 43 590 2708 1224 >5000 2852 45 42 38 39

Group 1 Test Case 5a 380 224 327 328 352 331 235 233 226 229 239 248 255 256 241 240 235 244 >5000 1335 920 >5000 >5000 252 248 237 230

Group 1 Test Case 5b 77 53 58 63 60 59 52 55 53 53 52 54 54 54 54 54 55 56 1112 3953 1363 >5000 >5000 55 55 52 52

Group 1 Test Case 6a 584 267 450 466 457 460 336 341 528 279 342 438 428 420 357 340 350 406 >5000 1948 1028 >5000 >5000 361 361 516 507

Group 1 Test Case 6b 113 51 60 61 59 62 64 57 94 51 66 58 55 55 60 60 62 60 1986 >5000 2151 4809 >5000 63 62 93 91

Group 1 Test Case 7a 402 195 231 275 247 267 266 762 2372 198 269 326 391 254 259 259 267 249 >5000 >5000 >5000 >5000 767 331 3125 4927 2881

Group 1 Test Case 7b 172 74 93 125 112 125 105 191 311 74 105 118 128 115 109 107 108 117 779 >5000 >5000 >5000 289 133 1444 1252 205

Group 2 Test Case 8a 429 122 197 211 146 191 242 481 161 121 243 356 167 156 237 238 242 156 1437 1035 529 >5000 944 361 2235 3547 494

Group 2 Test Case 8b 166 56 100 100 79 100 121 331 69 56 121 229 80 77 121 118 121 77 1021 3909 1707 >5000 296 158 1134 629 331

Group 2 Test Case 9a 384 119 147 161 125 146 116 119 119 118 116 118 117 120 116 116 116 120 1442 1247 1516 >5000 754 297 651 122 219

Group 2 Test Case 9b 147 48 68 71 65 69 55 50 48 48 55 51 52 51 55 55 55 51 571 2947 930 >5000 263 121 251 68 188

Group 2 Test Case 10a 92 41 63 59 70 65 51 42 42 41 51 43 42 42 55 55 50 47 754 163 115 233 176 83 63 83 124

Group 2 Test Case 10b 50 24 33 30 33 35 26 24 24 24 26 24 24 24 28 28 28 26 185 599 352 476 67 49 37 33 37

Group 3 Test Case 11a 504 76 96 95 114 98 81 78 77 76 81 79 79 77 176 180 81 77 1148 >5000 1806 >5000 >5000 179 103 90 86

Group 3 Test Case 11b 2991 100 107 119 132 111 100 101 105 104 111 105 111 107 351 326 100 107 1321 3631 1710 >5000 >5000 274 348 166 123

Group 3 Test Case 11c >5000 803 856 1481 >5000 1756 798 809 806 788 789 813 851 790 >5000 >5000 798 790 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 2780

Group 3 Test Case 12a 549 79 227 128 168 191 262 134 90 79 262 153 137 131 482 472 262 131 >5000 2526 >5000 >5000 >5000 434 598 160 142

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000 >5000 >5000 >5000 789 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 287 526 1415 192 192 192 192 188 188 188 188 188 188 192 188 687 1712 719 >5000 >5000 >5000 1250 1223 1406

Group 3 Test Case 13b >5000 1700 >5000 >5000 >5000 >5000 1670 1670 1670 1670 1793 1793 1793 1793 1588 1701 1670 1793 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 14a 129 63 70 86 86 68 62 62 62 62 62 62 62 62 62 62 62 62 87 65 63 282 280 121 111 112 105

Group 3 Test Case 14b 790 108 418 727 523 449 109 109 109 109 109 109 109 109 110 108 109 109 2089 553 200 3178 1960 762 637 743 652

Sum 37793 7526 23095 23708 26275 25959 16054 16503 15580 7567 15895 17094 16427 16594 20672 20989 16092 16410 80372 80605 57060 133786 100647 30705 34030 33387 25157

Average 1260 251 770 790 876 865 535 550 519 252 530 570 548 553 689 700 536 547 2679 2687 1902 4460 3355 1024 1134 1113 839

Median 384 114 212 186 157 191 157 192 140 114 155 171 133 126 193 197 157 126 1931 2617 1086 >5000 >5000 263 355 218 215

98

be

nc

hm

ar

ks

Figure 12.1: Box plots for the relative speed up of each preconditioner compared to the circulant preconditioner.

12.1 transformation-based preconditioner 99

12.1 transformation-based preconditioner

Table 12.3 shows the results for transformation-based preconditioners,in comparison to no preconditioning and the exact C preconditioner.The exact C preconditioner reduces the iterations drastically, but it isimportant to note that the time per iteration for this preconditioner isrelatively large.

Table 12.3: Number of iterations for transformation based preconditioners.

Group Test Case No Exact C Circulant DCT II DST II HartleyPrec.

Group 1 Test Case 1a 1049 210 682 842 755 878

Group 1 Test Case 1b >5000 254 3905 4370 3629 >5000

Group 1 Test Case 1c 2488 1178 2211 2399 2332 2858

Group 1 Test Case 2a 383 220 380 409 396 398

Group 1 Test Case 2b 71 54 62 62 65 66

Group 1 Test Case 3a 373 176 338 350 346 359

Group 1 Test Case 3b 69 47 60 61 63 65

Group 1 Test Case 4a 322 184 290 297 289 290

Group 1 Test Case 4b 74 39 46 43 43 44

Group 1 Test Case 5a 380 224 327 328 352 331

Group 1 Test Case 5b 77 53 58 63 60 59

Group 1 Test Case 6a 584 267 450 466 457 460

Group 1 Test Case 6b 113 51 60 61 59 62

Group 1 Test Case 7a 402 195 231 275 247 267

Group 1 Test Case 7b 172 74 93 125 112 125

Group 2 Test Case 8a 429 122 197 211 146 191

Group 2 Test Case 8b 166 56 100 100 79 100

Group 2 Test Case 9a 384 119 147 161 125 146

Group 2 Test Case 9b 147 48 68 71 65 69

Group 2 Test Case 10a 92 41 63 59 70 65

Group 2 Test Case 10b 50 24 33 30 33 35

Group 3 Test Case 11a 504 76 96 95 114 98

Group 3 Test Case 11b 2991 100 107 119 132 111

Group 3 Test Case 11c >5000 803 856 1481 >5000 1756

Group 3 Test Case 12a 549 79 227 128 168 191

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 287 526 1415

Group 3 Test Case 13b >5000 1700 >5000 >5000 >5000 >5000

Group 3 Test Case 14a 129 63 70 86 86 68

Group 3 Test Case 14b 790 108 418 727 523 449

Sum 37793 7526 23095 23708 26275 25959

Average 1260 251 770 790 876 865

Median 384 114 212 186 157 191

100 benchmarks

In comparison to that, the circulant preconditioner reduces the it-erations compared to no preconditioning as well, but with less ad-ditional time per iteration. On average, the circulant preconditionerrequires only 61% of the iterations the original system requires.

The other three transformations, DCT II, DST II and Hartley, donot show a clear pattern. While the median of all three is smallerthan the median of the circulant preconditioner, their average is not.While in many test cases, the results are around the same as for thecirculant one, there are some cases where the number of iterations isa lot higher, see for example test case 11c for all three transformations.

12.2 kronecker product approximation

Table 12.4 and Table 12.5 show the results for preconditioners basedon the Kronecker product approximation.

Table 12.4: Number of iterations for preconditioners based on the Kro-necker product approximation.

Group Test Case No Exact C Circulant Kron 1 Kron 2 Kron 3 Kron Full Kron 1 Kron 1 Kron 1Prec. Approx Approx Diagonal

Matrix

Group 1 Test Case 1a 1049 210 682 623 537 339 211 632 652 878

Group 1 Test Case 1b >5000 254 3905 3263 2821 966 261 2933 3194 2881

Group 1 Test Case 1c 2488 1178 2211 1387 1454 1371 1218 1615 1585 1515

Group 1 Test Case 2a 383 220 380 262 250 223 214 263 262 263

Group 1 Test Case 2b 71 54 62 58 55 54 54 57 59 58

Group 1 Test Case 3a 373 176 338 228 224 192 177 230 229 224

Group 1 Test Case 3b 69 47 60 52 51 47 47 54 54 56

Group 1 Test Case 4a 322 184 290 199 229 189 185 198 206 212

Group 1 Test Case 4b 74 39 46 38 40 40 39 39 39 41

Group 1 Test Case 5a 380 224 327 235 233 226 229 241 240 235

Group 1 Test Case 5b 77 53 58 52 55 53 53 54 54 55

Group 1 Test Case 6a 584 267 450 336 341 528 279 357 340 350

Group 1 Test Case 6b 113 51 60 64 57 94 51 60 60 62

Group 1 Test Case 7a 402 195 231 266 762 2372 198 259 259 267

Group 1 Test Case 7b 172 74 93 105 191 311 74 109 107 108

Group 2 Test Case 8a 429 122 197 242 481 161 121 237 238 242

Group 2 Test Case 8b 166 56 100 121 331 69 56 121 118 121

Group 2 Test Case 9a 384 119 147 116 119 119 118 116 116 116

Group 2 Test Case 9b 147 48 68 55 50 48 48 55 55 55

Group 2 Test Case 10a 92 41 63 51 42 42 41 55 55 50

Group 2 Test Case 10b 50 24 33 26 24 24 24 28 28 28

Group 3 Test Case 11a 504 76 96 81 78 77 76 176 180 81

Group 3 Test Case 11b 2991 100 107 100 101 105 104 351 326 100

Group 3 Test Case 11c >5000 803 856 798 809 806 788 >5000 >5000 798

Group 3 Test Case 12a 549 79 227 262 134 90 79 482 472 262

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000 789 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 192 192 192 192 188 188 192

Group 3 Test Case 13b >5000 1700 >5000 1670 1670 1670 1670 1588 1701 1670

Group 3 Test Case 14a 129 63 70 62 62 62 62 62 62 62

Group 3 Test Case 14b 790 108 418 109 109 109 109 110 108 109

Sum 37793 7526 23095 16054 16503 15580 7567 20672 20989 16092

Average 1260 251 770 535 550 519 252 689 700 536

Median 384 114 212 157 192 140 114 193 197 157

While Kron 1 is a valid preconditioner on the BTTB-level, the re-sults for Kron 2, Kron 3 and Kron Full are only there for the purpose of

12.3 inverse generating function 101

providing further insights, since for these preconditioners no efficientinversion and MVP is known. Kron Full, for example, should repro-duce the results of the exact C preconditioner closely, which can beverified with the results from the table.

Although Kron 2 and Kron 3 should both provide a better approxi-mation of C than Kron 1, only Kron 3 shows a better performance inboth the average and the median. This suggests, that it is not worth tofurther look into possibilities to use a two-term Kronecker approxi-mation as a preconditioner.

There exist some test cases, where the number of iterations is (al-most) equivalent for Kron 1, Kron 2, Kron 3 and Kron Full. These testcases correspond to a separable generating function.

The last three columns correspond to additional approximationson the 3× 3 symmetrical level of C, compared to Kron 1. While allthree show an increase in iterations (on average and in the median)compared to Kron 1, it is very small for Kron 1 diagonal. It is quite sur-prising, that Kron 1 approx and Kron 1 matrix produce worse resultsthan Kron 1 diagonal, which could be evidence for an incorrect imple-mentation.

It is very important to note that all of the preconditioners suggestedin this section perform better or far better than the circulant precondi-tioner. This is true for the preconditioners in Table 12.5 as well. Thesepreconditioners result from an approximate SVD. Again, there is nostrictly monotonic trend visible if more terms of the Kronecker prod-uct approximation are used.

12.3 inverse generating function

Table 12.6 shows the result of the benchmark for preconditionersbased on the IGF method. It is easily visible, that the method is farworse in almost all test case. There are a few exceptions where it isbetter than the circulant approximation.

It is worth noting that in none of the test cases, the matrix C isexpected to fulfill the assumptions mentioned in Theorem 9.2.3. Tobe more specific, if the matrix C is complex valued, it is never Her-mitian. Additionally, the matrix is typically not positive definite, butcan be in special cases.

The last column shows the result with an additional regularizationas described in Section 9.3. While the result can be better than withoutregularization, it still was not enough to provide a better result than

102 benchmarks

Table 12.5: Number of iterations for preconditioners based on the Kro-necker product approximation with approximate SVD.

Group Test Case No Exact C Circulant Kron SVD 1 Kron SVD 2 Kron SVD 3 Kron SVD Full Kron SVDPrec. Diagonal

Group 1 Test Case 1a 1049 210 682 767 801 759 833 804

Group 1 Test Case 1b >5000 254 3905 2783 3442 3114 3401 3154

Group 1 Test Case 1c 2488 1178 2211 1439 1497 1450 1484 1583

Group 1 Test Case 2a 383 220 380 254 329 327 333 331

Group 1 Test Case 2b 71 54 62 55 59 59 59 57

Group 1 Test Case 3a 373 176 338 223 303 299 300 296

Group 1 Test Case 3b 69 47 60 52 52 53 53 58

Group 1 Test Case 4a 322 184 290 203 205 201 207 217

Group 1 Test Case 4b 74 39 46 38 40 41 42 43

Group 1 Test Case 5a 380 224 327 239 248 255 256 244

Group 1 Test Case 5b 77 53 58 52 54 54 54 56

Group 1 Test Case 6a 584 267 450 342 438 428 420 406

Group 1 Test Case 6b 113 51 60 66 58 55 55 60

Group 1 Test Case 7a 402 195 231 269 326 391 254 249

Group 1 Test Case 7b 172 74 93 105 118 128 115 117

Group 2 Test Case 8a 429 122 197 243 356 167 156 156

Group 2 Test Case 8b 166 56 100 121 229 80 77 77

Group 2 Test Case 9a 384 119 147 116 118 117 120 120

Group 2 Test Case 9b 147 48 68 55 51 52 51 51

Group 2 Test Case 10a 92 41 63 51 43 42 42 47

Group 2 Test Case 10b 50 24 33 26 24 24 24 26

Group 3 Test Case 11a 504 76 96 81 79 79 77 77

Group 3 Test Case 11b 2991 100 107 111 105 111 107 107

Group 3 Test Case 11c >5000 803 856 789 813 851 790 790

Group 3 Test Case 12a 549 79 227 262 153 137 131 131

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 188 188 188 188 188

Group 3 Test Case 13b >5000 1700 >5000 1793 1793 1793 1793 1793

Group 3 Test Case 14a 129 63 70 62 62 62 62 62

Group 3 Test Case 14b 790 108 418 109 109 109 109 109

Sum 37793 7526 23095 15895 17094 16427 16594 16410

Average 1260 251 770 530 570 548 553 547

Median 384 114 212 155 171 133 126 126

12.4 banded approximation 103

the circulant preconditioner. However, the value of α in this case wasjust a guess, based on very few previous tests.

Table 12.6: Number of iterations for preconditioners based on the IGF.Group Test Case No Exact C Circulant IGF 1 IGF 3 IGF 5 IGF 7 IGF 5

Prec. (α = 0.3)

Group 1 Test Case 1a 1049 210 682 >5000 970 600 >5000 >5000

Group 1 Test Case 1b >5000 254 3905 >5000 2044 902 >5000 >5000

Group 1 Test Case 1c 2488 1178 2211 >5000 >5000 >5000 >5000 >5000

Group 1 Test Case 2a 383 220 380 4030 1065 692 >5000 >5000

Group 1 Test Case 2b 71 54 62 1876 2887 1112 >5000 >5000

Group 1 Test Case 3a 373 176 338 3872 859 579 4784 4730

Group 1 Test Case 3b 69 47 60 1521 2971 1060 >5000 4492

Group 1 Test Case 4a 322 184 290 3854 1470 775 >5000 >5000

Group 1 Test Case 4b 74 39 46 590 2708 1224 >5000 2852

Group 1 Test Case 5a 380 224 327 >5000 1335 920 >5000 >5000

Group 1 Test Case 5b 77 53 58 1112 3953 1363 >5000 >5000

Group 1 Test Case 6a 584 267 450 >5000 1948 1028 >5000 >5000

Group 1 Test Case 6b 113 51 60 1986 >5000 2151 4809 >5000

Group 1 Test Case 7a 402 195 231 >5000 >5000 >5000 >5000 767

Group 1 Test Case 7b 172 74 93 779 >5000 >5000 >5000 289

Group 2 Test Case 8a 429 122 197 1437 1035 529 >5000 944

Group 2 Test Case 8b 166 56 100 1021 3909 1707 >5000 296

Group 2 Test Case 9a 384 119 147 1442 1247 1516 >5000 754

Group 2 Test Case 9b 147 48 68 571 2947 930 >5000 263

Group 2 Test Case 10a 92 41 63 754 163 115 233 176

Group 2 Test Case 10b 50 24 33 185 599 352 476 67

Group 3 Test Case 11a 504 76 96 1148 >5000 1806 >5000 >5000

Group 3 Test Case 11b 2991 100 107 1321 3631 1710 >5000 >5000

Group 3 Test Case 11c >5000 803 856 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 12a 549 79 227 >5000 2526 >5000 >5000 >5000

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 687 1712 719 >5000 >5000

Group 3 Test Case 13b >5000 1700 >5000 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 14a 129 63 70 87 65 63 282 280

Group 3 Test Case 14b 790 108 418 2089 553 200 3178 1960

Sum 37793 7526 23095 80372 80605 57060 133786 100647

Average 1260 251 770 2679 2687 1902 4460 3355

Median 384 114 212 1931 2617 1086 >5000 >5000

12.4 banded approximation

Table 12.7 shows the number of iterations for preconditioners wherethe Toeplitz structure has been replaced with a banded approxima-tion. None of these preconditioners has any additional approximationon the 3× 3 symmetrical level.

While all banded approximations are fairly good for a majority ofthe Group 1 test cases, they are mostly far worse in the other cases.Overall they are worse or far worse than the circulant one.

104 benchmarks

A trend is visible, that a larger bandwidth results in fewer itera-tions. However, even a heptadiagonal preconditioner is still worsethan the cirulant preconditioner. Therefore, it seems that banded pre-conditioner are only an interesting choice if a much larger bandwidthcan be used.

Table 12.7: Number of iterations for preconditioners based on banded ap-proximations.

Group Test Case No Exact C Circulant Tri- Penta- Hepta-Prec. Diagonal diagonal diagonal diagonal

Group 1 Test Case 1a 1049 210 682 759 663 423 430

Group 1 Test Case 1b >5000 254 3905 3450 3206 1816 1889

Group 1 Test Case 1c 2488 1178 2211 1565 1560 1320 1407

Group 1 Test Case 2a 383 220 380 276 271 226 224

Group 1 Test Case 2b 71 54 62 57 56 53 54

Group 1 Test Case 3a 373 176 338 238 240 210 211

Group 1 Test Case 3b 69 47 60 56 56 50 49

Group 1 Test Case 4a 322 184 290 221 220 195 199

Group 1 Test Case 4b 74 39 46 45 42 38 39

Group 1 Test Case 5a 380 224 327 252 248 237 230

Group 1 Test Case 5b 77 53 58 55 55 52 52

Group 1 Test Case 6a 584 267 450 361 361 516 507

Group 1 Test Case 6b 113 51 60 63 62 93 91

Group 1 Test Case 7a 402 195 231 331 3125 4927 2881

Group 1 Test Case 7b 172 74 93 133 1444 1252 205

Group 2 Test Case 8a 429 122 197 361 2235 3547 494

Group 2 Test Case 8b 166 56 100 158 1134 629 331

Group 2 Test Case 9a 384 119 147 297 651 122 219

Group 2 Test Case 9b 147 48 68 121 251 68 188

Group 2 Test Case 10a 92 41 63 83 63 83 124

Group 2 Test Case 10b 50 24 33 49 37 33 37

Group 3 Test Case 11a 504 76 96 179 103 90 86

Group 3 Test Case 11b 2991 100 107 274 348 166 123

Group 3 Test Case 11c >5000 803 856 >5000 >5000 >5000 2780

Group 3 Test Case 12a 549 79 227 434 598 160 142

Group 3 Test Case 12b >5000 770 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 13a >5000 191 1518 >5000 1250 1223 1406

Group 3 Test Case 13b >5000 1700 >5000 >5000 >5000 >5000 >5000

Group 3 Test Case 14a 129 63 70 121 111 112 105

Group 3 Test Case 14b 790 108 418 762 637 743 652

Sum 37793 7526 23095 30705 34030 33387 25157

Average 1260 251 770 1024 1134 1113 839

Median 384 114 212 263 355 218 215

Part IV

C O N C L U S I O N

In this last part, several suggestions are made for futureinvestigations of the work discussed in this thesis.It concludes, by giving a summary of the main results andsome conclusions.

13F U T U R E W O R K

In this section several topics are listed that are suggested startingpoints for future work continuing the research.

A main aspect is the implementation of optimized versions of thepreconditioners described in this work. Realizing this, will make itpossible to replace the benchmarks in Chapter 12 with time measure-ments instead of only counting iterations.

Besides this general suggestion, there are a few preconditioner-specific suggestions for future work.

13.1 inverse generating function

13.1.1 Regularization

In Section 9.3 a regularization has been suggested in cases, where theregular IGF method will not return a successful preconditioner. Inthose cases, we propose using the IGF not on the original generatingfunction f, but an elevated function f+α.

In Section 9.3, it was also pointed out, that the optimal value of αvaries between different test cases. So far, no method could be found,to estimate this optimal α, besides trial-and-error. It could be an in-teresting future work, to find such methods.

Besides this regularization, where the whole function is shifted, adifferent regularization can be tested, that only changes the functionin the problematic region. For example, instead of using f or f+α, thefunction fβ = max(f,β) could be used. In this case, the function willonly be changed in regions, where it is close to zero. Analogously toa regularization of the whole function, the optimal value of β is theresult of a trade-off.

13.1.2 Other Kernels

As mentioned in Section 9.1.1 different kernels could be used to ap-proximate the original generating function, if it is not available.

107

108 future work

In future work, different kernels besides the Dirichlet kernel thatwas used in this work, could be tested. Chan and Yeung [13] offersome kernels satisfying the requirements of the IGF method, thatcould be explored in future work.

13.2 kronecker product approximation

13.2.1 Using a Common Basis

In this work, several options for applying the Kronecker productapproximation to BTTB-block matrices have been suggested in Sec-tion 10.2.2. One of them is to use a common basis for all BTTB ma-trices. Theoretically, the basis of any BTTB could be used, but it isexpected that this will produce a suboptimal result. Different choicesfor a common basis could be analyzed in future research.

13.3 preconditioner selection

The last part compared the performance of the preconditioners in dif-ferent test cases and showed that the optimal preconditioner dependson the test scenario. So naturally the question arises, if and how wecan find the optimal preconditioner for a given test case.

For example, an automatic process could be tested, that automat-ically chooses a preconditioner for a given test system. Ideally, thisprocess should pick the optimal preconditioner for this system, buta more realistic goal would be to make sure the process chooses apreconditioner that performs only slightly worse than the optimum.

In order to choose a (quasi) optimal preconditioner for a given sys-tem, the needed time for solving the system with each preconditionerneeds to be computed. This could be done using a regression modelfor each preconditioner, that was trained with test data in a precom-putation phase and could potentially be updated throughout the useof the selection algorithm.

To make sure the regression produces realistic time predictions, weneed good input values x for the regression model, that can ade-quately predict the performance of a preconditioner in a given sce-nario. This is a very important step, which is crucial to the overallperformance of the selection algorithm. These predictors could be forexample

• Diagonal dominance: The ratio of the diagonal elements andthe the sum of the off-diagonal elements. This could, for exam-ple, predict if a banded approximation is a good choice. The

13.3 preconditioner selection 109

diagonal dominance could be approximated on just a few rowsand columns.

• Norms: The norms of I − P−1C or I − P−1A could be com-puted or approximated. However, this requires the actual com-putation of the preconditioner and its inverse. If the 2-norm isused here, it is necessary to also compute the 2-norm of theinverse, to get a good prediction.

• Separability: As mentioned before, the Kronecker product ap-proximation works really well, for separable geometries.

The general idea of the automatic selection process is that the se-lection algorithm chooses the preconditioner based on their expectedtime. However, the regression model is just an approximation of theactual time needed and it is highly unlikely, that the preconditionerwith the smallest expected time is always the optimal one.

Instead a smaller assumption is made, namely that the probabil-ity that a preconditioner k is the optimal one, is proportional to theestimated time, computed by the regression model:

P(P = k|x) ∝ Tkexpected(x) , (13.1)

where

x is the input vector of the regression, i. e. the predictors,

Tkexpected(x) is the expected time of preconditioner k given thepredictors, estimated by the regression model,

P(P = k|x) the probability that the selection algorithm willchoose k, given the predictors x. It approximates the probabilitythat k is the optimal preconditioner, given the predictors x,

Furthermore, it can be a good choice to try out a preconditionerwhere we don’t have a lot of test data yet that is similar to the datawe are currently trying to solve. In other words, the probability thatwe choose preconditioner k could also depend on the ’uncertainty’we have, concerning test cases around x:

P(P = k|x) ∝ Ukuncertainty(x) , (13.2)

where

P(P = k|x) the probability that the selection algorithm willchoose k, given the predictors x,

Ukuncertainty(x) is a measure for the uncertainty of the expectedtime of preconditioner k around x. This could be done, for ex-ample, by measuring the distance to the nearest set of predic-tors, for which the computational time is known.

110 future work

(13.2) describes a scaling of the probabilities depending on theamount of knowledge around x. This means that preconditionerswith less test data around a given x get chosen more often relativeto preconditioners with more test data around the same x. In otherwords, (13.2) describes the fact that the selection algorithm tends totry out things.

While (13.1) describes a preference of the algorithm for faster pre-conditioners, (13.2) at the same time describes a preference for ’tryingout’ uncertain preconditioners. In the end however, we want the bal-ance between those two aspects to change overtime. The more knowl-edge we already gathered (in total), the more we want the algorithmto choose the one with the smallest expected time. At the same time, itmakes more sense to try out preconditioners at the beginning, whenthe gathering of more training data could improve the performancein the long run.

If we combine this fact with (13.1) and (13.2), a viable choice forthe probability to choose a certain preconditioner is

P(P = k|x) = N(Tkexpected(x)

)t·(Ukuncertainty(x)

)1/t,

where N is a suitable normalization (so that∑k

P = 1) and t is a mea-

sure of the already accumulated training set, or the ’time’ alreadyinvested. This way, we make sure that as we process more data, wevalue expected time more.

However, this is just one of many possible algorithms for choosinga preconditioner and a lot more literature research is needed. Themethod described here is just a possible suggestion to spark futurework.

14C O N C L U S I O N

In this work, several preconditioning methods for BTTB and BTTB-block systems were presented with the goal of reducing the timeneeded for solving a linear system using iterative solvers.

The performance of many of the suggested preconditioners hasbeen analyzed in various real-world test cases. From the obtainedresults of the transformation-based methods it can be concluded thatusing different transformations besides the discrete Fourier trans-form (DFT), does not seem promising. The performance of othertransformations was on average slightly worse than the performanceof the circulant preconditioner, which is a result of the DFT.

In contrast to that, the Kronecker product approximation seemsto be very promising. The performance of this preconditioner familyis significantly better than the performance of the circulant precondi-tioner and should be considered the new default preconditioner inthose test cases.

While the inverse generating function (IGF) has been shown toperform far worse in the benchmarks, we were still able in the con-text of this work, to extend the method to Toeplitz-block and BTTB-block matrices. Additionally, theoretical results could be obtained,that prove that the IGF works in these cases, if certain assumptionsare met. However, these assumptions are typically not met in the testcases, which could explain its bad performance. Regularization couldpotentially help in those cases, but has to be further studied.

Banded approximations, especially the diagonal approximation hasbeen shown to work well in about half of the test cases. However, thediagonal approximation, as well as the other banded preconditionersperform worse in the other half of the test set.

Finally, we want to mention, that while the focus was on the ap-plication of metrology for integrated circuits (ICs), many of the pro-posed methods and generalizations are also viable choices in otherapplications.

111

Part V

A P P E N D I X

AI N V E R S I O N F O R M U L A SF O R K R O N E C K E RP R O D U C TA P P R O X I M AT I O N

a.1 one term approximation

a.1.1 Sum Approximation

This section describes the elements of T−1(block), if for a one term ap-

proximation the sum approximation described in Section 10.2.1.1 isused.

The last step for each element of T−1(block) is necessary if the MVP

will be computed following the strategy of (10.6).

115

116 inversion formulas for kronecker product approximation

(T−1(block)

)1,1

=(W−1

)1,1 +

((W−1

)1,1 T1,3 +

(W−1

)1,2 T2,3)T

−13,3 (T3,1

(W−1

)1,1

+ T3,2(W−1

)2,1

)=(W−1

)1,1 +

(W−1

)1,1 T1,3 T

−13,3 T3,1

(W−1

)1,1

+(W−1

)1,1 T1,3 T

−13,3 T3,2

(W−1

)2,1 +

(W−1

)1,2 T2,3 T

−13,3 T3,1

(W−1

)1,1

+(W−1

)1,2 T2,3 T

−13,3 T3,2

(W−1

)2,1

=(A−11,1 )⊗ (B−1

1,1 ) + (A−11,1A1,2A

−12,2A2,1A

−11,1 )⊗ (B−1

1,1 B1,2 B−12,2 B2,1 B

−11,1 )

+ (A−11,1A1,3A

−13,3A3,1A

−11,1 )⊗ (B−1

1,1 B1,3 B−13,3 B3,1 B

−11,1 )

+ (A−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3A3,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,1 B

−11,1 )

+ (A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (−A−11,1A1,3A

−13,3A3,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,3 B

−13,3 B3,2 B

−12,2 B2,1 B

−11,1 )

+ (−A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3A3,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,2 B

−12,2 B2,1 B

−11,1 )

+ (A−11,1A1,2A

−12,2A2,3A

−13,3A3,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,3 B

−13,3 B3,1 B

−11,1 )

+ (A−11,1A1,2A

−12,2A2,3A

−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (−A−11,1A1,2A

−12,2A2,3A

−13,3A3,2A

−12,2A2,1A

−11,1 )

⊗ (B−11,1 B1,2 B

−12,2 B2,3 B

−13,3 B3,2 B

−12,2 B2,1 B

−11,1 ) ,

A.1 one term approximation 117

(T−1(block)

)1,2

=(W−1

)1,2 +

((W−1

)1,1 T1,3 +

(W−1

)1,2 T2,3)T

−13,3 (T3,1

(W−1

)1,2 + T3,2

(W−1

)2,2

)=(W−1

)1,2 +

(W−1

)1,1 T1,3 T

−13,3 T3,1

(W−1

)1,2 +

(W−1

)1,1 T1,3 T

−13,3 T3,2

(W−1

)2,2

+(W−1

)1,2 T2,3 T

−13,3 T3,1

(W−1

)1,2 +

(W−1

)1,2 T2,3 T

−13,3 T3,2

(W−1

)2,2

=(A−11,1A1,2A

−12,2 )⊗ (B−1

1,1 B1,2 B−12,2 )

+ (A−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2 )⊗ (B−1

1,1 B1,3 B−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )

+ (A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2 )

⊗ (B−11,1 B1,2 B

−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )

+ (A−11,1A1,3A

−13,3A3,2A

−12,2 )⊗ (B−1

1,1 B1,3 B−13,3 B3,2 B

−12,2 )

+ (A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3A3,2A

−12,2 )

⊗ (B−11,1 B1,2 B

−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,2 B

−12,2 )

+ (A−11,1A1,2A

−12,2A2,3A

−13,3A3,1A

−11,1A1,2A

−12,2 )

⊗ (B−11,1 B1,2 B

−12,2 B2,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )

+ (A−11,1A1,2A

−12,2A2,3A

−13,3A3,2A

−12,2 )

⊗ (B−11,1 B1,2 B

−12,2 B2,3 B

−13,3 B3,2 B

−12,2 ) ,

(T−1(block)

)1,3

=((W−1

)1,1 T1,3 +

(W−1

)1,2 T2,3

)T−13,3

=(A−11,1A1,3A

−13,3 )⊗ (B−1

1,1 B1,3 B−13,3 )

+ (A−11,1A1,2A

−12,2A2,1A

−11,1A1,3A

−13,3 )⊗ (B−1

1,1 B1,2 B−12,2 B2,1 B

−11,1 B1,3 B

−13,3 )

+ (A−11,1A1,2A

−12,2A2,3A

−13,3 )⊗ (B−1

1,1 B1,2 B−12,2 B2,3 B

−13,3 ) ,

(T−1(block)

)2,1

=(W−1

)2,1 +

((W−1

)2,1 T1,3 +

(W−1

)2,2 T2,3)T

−13,3 (T3,1

(W−1

)1,1 + T3,2

(W−1

)2,1

)=(W−1

)2,1 +

(W−1

)2,1 T1,3 T

−13,3 T3,1

(W−1

)1,1 +

(W−1

)2,1 T1,3 T

−13,3 T3,2

(W−1

)2,1

+(W−1

)2,2 T2,3 T

−13,3 T3,1

(W−1

)1,1 +

(W−1

)2,2 T2,3 T

−13,3 T3,2

(W−1

)2,1

=(−A−12,2A2,1A

−11,1 )⊗ (B−1

2,2 B2,1 B−11,1 )

+ (A−12,2A2,1A

−11,1A1,3A

−13,3A3,1A

−11,1 )⊗ (B−1

2,2 B2,1 B−11,1 B1,3 B

−13,3 B3,1 B

−11,1 )

+ (−A−12,2A2,1A

−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (A−12,2A2,1A

−11,1A1,3A

−13,3A3,2A

−12,2A2,1A

−11,1 )

⊗ (B−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,2 B

−12,2 B2,1 B

−11,1 )

+ (A−12,2A2,3A

−13,3A3,1A

−11,1 )⊗ (B−1

2,2 B2,3 B−13,3 B3,1 B

−11,1 )

+ (A−12,2A2,3A

−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−12,2 B2,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (−A−12,2A2,3A

−13,3A3,2A

−12,2A2,1A

−11,1 )

⊗ (B−12,2 B2,3 B

−13,3 B3,2 B

−12,2 B2,1 B

−11,1 ) ,


(T−1(block)

)2,2

=(W−1

)2,2 +

((W−1

)2,1 T1,3 +

(W−1

)2,2 T2,3)T

−13,3 (T3,1

(W−1

)1,2

+T3,2(W−1

)2,2

)=(W−1

)2,2 +

(W−1

)2,1 T1,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)2,1 T1,3 T

−13,3 T3,2

(W−1

)2,2

+(W−1

)2,2 T2,3 T

−13,3 T3,1

(W−1

)1,2 +

(W−1

)2,2 T2,3 T

−13,3 T3,2

(W−1

)2,2

=A−12,2 ⊗B

−12,2 + (−A−1

2,2A2,1A−11,1A1,3A

−13,3A3,1A

−11,1A1,2A

−12,2 )

⊗ (B−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )

+ (−A−12,2A2,1A

−11,1A1,3A

−13,3A3,2A

−12,2 )

⊗ (B−12,2 B2,1 B

−11,1 B1,3 B

−13,3 B3,2 B

−12,2 )

+ (A−12,2A2,3A

−13,3A3,1A

−11,1A1,2A

−12,2 )

⊗ (B−12,2 B2,3 B

−13,3 B3,1 B

−11,1 B1,2 B

−12,2 )

+ (A−12,2A2,3A

−13,3A3,2A

−12,2 )⊗ (B−1

2,2 B2,3 B−13,3 B3,2 B

−12,2 ) ,

(T−1(block)

)2,3

=((W−1

)2,1 T1,3 +

(W−1

)1,2 T2,3

)T3,3

=(−A−12,2A2,1A

−11,1A1,3A

−13,3 )⊗ (B−1

2,2 B2,1 B−11,1 B1,3 B

−13,3 )

+ (A−12,2A2,3A3,3 )⊗ (B−1

2,2 B2,3 B3,3 ) ,

(T−1(block)

)3,1

= − T−13,3

(T3,1

(W−1

)1,1 + T3,2

(W−1

)2,1

)=(−A−1

3,3A3,1A−11,1 )⊗ (B−1

3,3 B3,1 B−11,1 )

+ (−A−13,3A3,1A

−11,1A1,2A

−12,2A2,1A

−11,1 )

⊗ (B−13,3 B3,1 B

−11,1 B1,2 B

−12,2 B2,1 B

−11,1 )

+ (A−13,3A3,2A

−12,2A2,1A

−11,1 )⊗ (B−1

3,3 B3,2 B−12,2 B2,1 B

−11,1 ) ,

(T−1(block)

)3,2

= − T−13,3

(T3,1

(W−1

)1,2 + T3,2

(W−1

)2,2

)=(−A−1

3,3A3,1A−11,1A1,2A

−12,2 )⊗ (B−1

3,3 B3,1 B−11,1 B1,2 B

−12,2 )

+ (−A−13,3A3,2A

−12,2 )⊗ (B−1

3,3 B3,2 B−12,2 ) ,

(T−1(block)

)3,3

=A−13,3 ⊗B

−13,3 .

A.2 multiple terms approximation 119

a.2 multiple terms approximation

a.2.1 Sum Approximation

This section describes the elements of T−1(block), if the sum approxima-

tion described in Section 10.2.2.1 is used, where each BTTB matrixhas been approximated via an approximate SVD.

(T−1(block)

)1,1

=(W−1

)1,2 +

(W−1

)1,1 T1,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)1,1 T1,3 T

−13,3 T3,2

(W−1

)2,2 +

(W−1

)1,2 T2,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)1,2 T2,3 T

−13,3 T3,2

(W−1

)2,2

=V11S−111 U

H11 +V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

+V11S−111 U

H11U13S13V

H13V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

+V11S−111 U

H11U13S13V

H13V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

U12S12VH12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

U13S13VH13V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13

V33S−133 U

H33U31S31V

H31V11S

−111 U

H11

U12S12VH12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

−V11S−111 U

H11U13S13V

H13V33S

−133 U

H33U32S32V

H32

V22S−122 U

H22U21S21V

H21V11S

−111 U

H11

−V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

U13S13VH13V33S

−133 U

H33U32S32V

H32V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U23S23V

H23

V33S−133 U

H33U31S31V

H31V11S

−111 U

H11

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U23S23V

H23V33S

−133 U

H33U31S31V

H31

V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

−V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U23S23V

H23

V33S−133 U

H33U32S32V

H32V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11 ,


(T−1(block)

)1,2

=(W−1

)1,2 +

(W−1

)1,1 T1,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)1,1 T1,3 T

−13,3 T3,2

(W−1

)2,2 +

(W−1

)1,2 T2,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)1,2 T2,3 T

−13,3 T3,2

(W−1

)2,2

=V11S−111 U

H11U12S12V

H12V22S

−122 U

H22

+V11S−111 U

H11U13S13V

H13V33S

−133 U

H33U31S31V

H31

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13

V33S−133 U

H33U31S31V

H31V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

+V11S−111 U

H11U13S13V

H13V33S

−133 U

H33U32S32V

H32V22S

−122 U

H22

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13

V33S−133 U

H33U32S32V

H32V22S

−122 U

H22

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U23S23V

H23V33S

−133 U

H33

U31S31VH31V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U23S23V

H23

V33S−133 U

H33U32S32V

H32V22S

−122 U

H22 ,

(T−1(block)

)1,3

=((W−1

)1,1 T1,3 +

(W−1

)1,2 T2,3

)T−13,3

=V11S−111 U

H11U13S13V

H13V33S

−133 U

H33

+V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21

V11S−111 U

H11U13S13V

H13V33S

−133 U

H33

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33 ,

A.2 multiple terms approximation 121

(T−1(block)

)2,1

=(W−1

)2,1 +

(W−1

)2,1 T1,3 T

−13,3 T3,1

(W−1

)1,1

+(W−1

)2,1 T1,3 T

−13,3 T3,2

(W−1

)2,1

+(W−1

)2,2 T2,3 T

−13,3 T3,1

(W−1

)1,1

+(W−1

)2,2 T2,3 T

−13,3 T3,2

(W−1

)2,1

= −V22S−122 U

H22U21S21V

H21V11S

−111 U

H11

−V22S−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13

V33S−133 U

H33U31S31V

H31V11S

−111 U

H11

−V22S−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13V33S

−133 U

H33U31S31V

H31

V11S−111 U

H11U12S12V

H12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

+V22S−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13V33S

−133 U

H33

U31S31VH31V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

U12S12VH12V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11

−V22S−122 U

H22U23S23V

H23V33S

−133 U

H33U32S32V

H32

V22S−122 U

H22U21S21V

H21V11S

−111 U

H11 ,

(T−1(block)

)2,2

=(W−1

)2,2 +

(W−1

)2,1 T1,3 T

−13,3 T3,1

(W−1

)1,2

+(W−1

)2,1 T1,3 T

−13,3 T3,2

(W−1

)2,2

+(W−1

)2,2 T2,3 T

−13,3 T3,1

(W−1

)1,2 +

(W−1

)2,2 T2,3 T

−13,3 T3,2

(W−1

)2,2

=V22S−122 U

H22 −V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13V33S

−133 U

H33

U31S31VH31V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

−V22S−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13

V33S−133 U

H33U32S32V

H32V22S

−122 U

H22

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33U31S31V

H31

V11S−111 U

H11U12S12V

H12V22S

−122 U

H22

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33U32S32V

H32V22S

−122 U

H22 ,

(T−1(block)

)2,3

=((W−1

)2,1 T1,3 +

(W−1

)1,2 T2,3

)T3,3

= −V22S−122 U

H22U21S21V

H21V11S

−111 U

H11U13S13V

H13V33S

−133 U

H33

+V22S−122 U

H22U23S23V

H23V33S

−133 U

H33 ,


(T−1(block)

)3,1

= − T−13,3

(T3,1

(W−1

)1,1 + T3,2

(W−1

)2,1

)= −V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11

−V33S−133 U

H33U31S31V

H31V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

U21S21VH21V11S

−111 U

H11

+V33S−133 U

H33U32S32V

H32V22S

−122 U

H22U21S21V

H21V11S

−111 U

H11 ,

(T−1(block)

)3,2

= − T−13,3

(T3,1

(W−1

)1,2 + T3,2

(W−1

)2,2

)= −V33S

−133 U

H33U31S31V

H31V11S

−111 U

H11U12S12V

H12V22S

−122 U

H22

−V33S−133 U

H33U32S32V

H32V22S

−122 U

H22 ,

(T−1(block)

)3,3

=U3,3S3,3VH3,3 .

B I B L I O G R A P H Y

[1] E. Abbe, Beiträge zur Theorie des Mikroskops und der mikroskopis-chen Wahrnehmung, Arch. Mikrosk. Anat., (9), (1873), pp. 413–468,German, Universitätsbibliothek Johann Christian Senckenberg.

[2] A. Amritkar, E. de Sturler, K. Swirydowicz, D. Tafti, and

K. Ahuja, Recycling Krylov subspaces for CFD applications and anew hybrid recycling solver, J. Comput. Phys., 303, (2015), pp. 222–237.

[3] R. Barrett, M. W. Berry, T. F. Chan, J. Demmel, J. Donato,J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der

Vorst, Templates for the Solution of Linear Systems: Building Blocksfor Iterative Methods, SIAM, 1994.

[4] R. H. Bartels and G. Stewart, Solution of the matrix equationAX+ XB= C [F4], Comm. ACM, 15(9), (1972), pp. 820–826.

[5] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas,Princeton University Press, 2009.

[6] M. van Beurden, Fast convergence with spectral volume integralequation for crossed block-shaped gratings with improved material in-terface conditions, J. Opt. Soc. Am. A, Opt. Image Sci. Vis., 28(11),(2011), pp. 2269–2278.

[7] A. M. Bruckner, J. B. Bruckner, and B. S. Thomson, Real Anal-ysis, ClassicalRealAnalysis.com, second edition ed., 2008.

[8] R. L. Burden and J. D. Faires, Numerical Analysis, Brooks/ColeCengage Learning, 9th ed., 2011.

[9] R. H. Chan and X.-Q. Jin, A family of block preconditioners for blocksystems, SIAM J. Sci. Stat. Comput., 13(5), (1992), pp. 1218–1235.

[10] R. H. Chan and X.-Q. Jin, An Introduction to Iterative ToeplitzSolvers, Fundamentals of Algorithms, SIAM, 2007.

[11] R. H. Chan and K.-P. Ng, Toeplitz preconditioners for HermitianToeplitz systems, Linear Algebra Appl., 190, (1993), pp. 181–208.

[12] R. H. Chan and M. K. Ng, Conjugate gradient methods for Toeplitzsystems, SIAM Rev., 38(3), (1996), pp. 427–482.

[13] R. H. Chan and M.-C. Yeung, Circulant preconditioners con-structed from kernels, SIAM J. Numer. Anal., 29(4), (1992), pp.1093–1103.

123

124 Bibliography

[14] T. F. Chan, An optimal circulant preconditioner for Toeplitz systems,SIAM J. Sci. Stat. Comput., 9(4), (1988), pp. 766–771.

[15] T. F. Chan and J. A. Olkin, Circulant preconditioners for Toeplitz-block matrices, Numer. Algorithms, 6(1), (1994), pp. 89–101.

[16] O. Christensen and K. L. Christensen, Approximation Theory:From Taylor Polynomials to Wavelets, Springer Science & BusinessMedia, 2004.

[17] P. J. Davis, Circulant Matrices, American Mathematical Soc., 2012.

[18] R. P. Feynman, R. B. Leighton, and M. Sands, The FeynmanLectures on Physics, Vol. II: The New Millennium Edition: MainlyElectromagnetism and Matter, vol. II of The Feynman Lectures onPhysics, Basic books, the new millennium edition ed., 2011, on-line edition last accessed 23-March-2016.URL feynmanlectures.caltech.edu/

[19] M. B. van Gijzen and P. Sonneveld, Algorithm 913: An ele-gant IDR(s) variant that efficiently exploits biorthogonality properties,ACM Trans. Math. Software, 38(1), (2011), pp. 5:1–5:19.

[20] I. C. Gohberg and A. A. Semencul, The inversion of finite Toeplitzmatrices and their continual analogues, Mat. Issled., 7(2), (1972), pp.201–223.

[21] G. H. Golub and C. F. van Loan, Matrix Computations, vol. 3,JHU Press, 2012.

[22] R. M. Gray, Toeplitz and Circulant matrices: A Review, now pub-lishers inc, 2006.

[23] P. C. Hansen, J. G. Nagy, and D. P. O’leary, Deblurring Im-ages: Matrices, Spectra, and Filtering, vol. 3 of Fundamentals of Al-gorithms, SIAM, 2006.

[24] L. Hemmingsson, Toeplitz preconditioners with block structure forfirst-order PDEs, Numer. Linear Algebr., 3(1), (1996), pp. 21–44.

[25] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients forsolving linear systems, J. Res. Nat. Bur. Stand., 49(6), (1952), pp.409–436.

[26] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cam-bridge University Press, 1991.

[27] R. C. Jaeger, Introduction to Microelectronic Fabrication, vol. V ofModular Series on Solid State Devices, Pearson, 2002.

[28] J. Kamm and J. G. Nagy, Kronecker product and SVD approxima-tions in image restoration, Linear Algebra Appl., 284(1), (1998), pp.177–192.

feynmanlectures.caltech.edu/

Bibliography 125

[29] J. Kamm and J. G. Nagy, Optimal Kronecker product approximationof block Toeplitz matrices, SIAM J. Matrix Anal. Appl., 22(1), (2000),pp. 155–172.

[30] M. E. Kilmer and J. G. Nagy, Kronecker product approximations fordense block Toeplitz-plus-Hankel matrices, Numer. Linear Algebr.,14(8), (2007), pp. 581–602.

[31] A. Kirsch and F. Hettlich, The Mathematical Theory of Time-Harmonic Maxwell’s Equations, vol. 190 of Applied MathematicalSciences, Springer International Publishing, 2015.

[32] S. Koyuncu, The Inverse of Two-level Toeplitz Operator Matrices,Ph.D. thesis, Drexel University, 2012.

[33] M. van Kraaij, Forward Diffraction Modelling: Analysis and Appli-cation to Grating Reconstruction, Ph.D. thesis, Technische Univer-siteit Eindhoven, 2011.

[34] F.-R. Lin and C.-X. Wang, BTTB preconditioners for BTTB systems,Numer. Algorithms, 60(1), (2012), pp. 153–167.

[35] F.-R. Lin and D.-C. Zhang, BTTB preconditioners for BTTB leastsquares problems, Linear Algebra Appl., 434(11), (2011), pp. 2285–2295.

[36] C. F. van Loan and N. Pitsianis, Approximation with Kroneckerproducts, in Linear algebra for large scale and real-time applications,pp. 293–314, Springer, 1993.

[37] S. A. Martucci, Symmetric convolution and the discrete sine andcosine transforms, IEEE T. Signal Proces., 42(5), (1994), pp. 1038–1051.

[38] K. Meerbergen and B. Plestenjak, A Sylvester–Arnoldi typemethod for the generalized eigenvalue problem with two-by-two opera-tor determinants, Numer. Linear Algebr., 22(6), (2015), pp. 1131–1146.

[39] M. Miranda and P. Tilli, Asymptotic spectra of Hermitian blockToeplitz matrices and preconditioning results, SIAM J. Matrix Anal.Appl., 21(3), (2000), pp. 867–881.

[40] G. E. Moore, Cramming more components onto integrated circuits,Electronics, 38(8), (1965), pp. 114–117.URL monolithic3d.com/uploads/6/0/5/5/6055488/gordon_

moore_1965_article.pdf

[41] M. K. Ng, Iterative Methods for Toeplitz Systems, Numerical Math-ematics and Scientific Computation, Oxford University Press,2004.

monolithic3d.com/uploads/6/0/5/5/6055488/gordon_moore_1965_article.pdf

monolithic3d.com/uploads/6/0/5/5/6055488/gordon_moore_1965_article.pdf

126 Bibliography

[42] Nobelprize.org, The History of the Integrated Circuit, Website,2003, last accessed 7-March-2016.URL http://www.nobelprize.org/educational/physics/

integrated_circuit/history/

[43] J. Nocedal and S. J. Wright, Numerical Optimization, chap. Con-jugate Gradient Methods, pp. 101–134, Springer Series in Oper-ations Research and Financial Engineering, Springer New York,New York, NY, 2nd ed., 2006.

[44] V. Olshevsky, I. Oseledets, and E. Tyrtyshnikov, Tensor proper-ties of multilevel Toeplitz and related matrices, Linear Algebra Appl.,412(1), (2006), pp. 1–21.

[45] H.-K. Pang, Y.-Y. Zhang, S.-W. Vong, and X.-Q. Jin, Circulantpreconditioners for pricing options, Linear Algebra Appl., 434(11),(2011), pp. 2325–2342.

[46] M. Pisarenco, Scattering from Finite Structures: An ExtendedFourier Modal Method, Ph.D. thesis, Eindhoven University of Tech-nology, 2011.

[47] C. M. Rader, Discrete Fourier transforms when the number of datasamples is prime, Proc. IEEE, 56, (1968), pp. 1107—-1108.

[48] Y. Saad, Numerical Methods for Large Eigenvalue Problems, SIAM,revised edition ed., 2011.

[49] Y. Saad and M. H. Schultz, GMRES: A generalized minimal resid-ual algorithm for solving nonsymmetric linear systems, SIAM J. Sci.Stat. Comput., 7(3), (1986), pp. 856–869.

[50] F. Schneider and M. Pisarenco, Inverse generating function ap-proach for Toeplitz-block matrices, (2017), in preparation.

[51] S. Serra, Asymptotic results on the spectra of block Toeplitz precon-ditioned matrices, SIAM J. Matrix Anal. Appl., 20(1), (1998), pp.31–44.

[52] J. R. Shewchuk, An introduction to the conjugate gradient methodwithout the agonizing pain, digital, 1994.URL cs.cmu.edu/~quake-papers/

painless-conjugate-gradient.pdf

[53] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis,vol. 12 of Texts in Applied Mathematics, Springer-Verlag New York,3rd ed., 2002.

[54] G. Strang, A proposal for Toeplitz matrix calculations, Stud. Appl.Math., 74(2), (1986), pp. 171–176.

http://www.nobelprize.org/educational/physics/integrated_circuit/history/

http://www.nobelprize.org/educational/physics/integrated_circuit/history/

cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

Bibliography 127

[55] P. Tilli, A note on the spectral distribution of Toeplitz matrices, Lin-ear and Multilinear Algebra, 45, (1998), pp. 147–159.

[56] L. N. Trefethen and David Bau, III, Numerical Linear Algebra,SIAM, Philadelphia, 1997.

[57] E. E. Tyrtyshnikov, Optimal and superoptimal circulant precondi-tioners, SIAM J. Matrix Anal. Appl., 13(2), (1992), pp. 459–473.

[58] H. A. van der Vorst, Bi-CGSTAB: A fast and smoothly converg-ing variant of Bi-CG for the solution of nonsymmetric linear systems,SIAM J. Sci. Stat. Comput., 13(2), (1992), pp. 631–644.

[59] P. Wesseling and P. Sonneveld, Numerical experiments with amultiple grid and a preconditioned Lanczos type method, in Approx-imation methods for Navier-Stokes problems, pp. 543–562, Springer,1980.

D E C L A R AT I O N

I, Frank Schneider, declare that this thesis titled, “Approximation ofInverses of BTTB Matrices ” and the work presented in it are my own.I confirm that:

• This work was done wholly or mainly while in candidature fora research degree at the named Universities.

• Where any part of this thesis has previously been submitted fora degree or any other qualification at these Universities or anyother institution, this has been clearly stated.

• Where I have consulted the published work of others, this isalways clearly attributed.

• Where I have quoted from the work of others, the source is al-ways given. With the exception of such quotations, this thesis isentirely my own work.

Eindhoven, December 2016

Frank Schneider

Approximation of Inverses of BTTB Matrices › ws › files › 72339340 ›...

Documents

Transcript of Approximation of Inverses of BTTB Matrices › ws › files › 72339340 ›...