Perturbation Analysis Matrix...

Perturbation Analysis of Some

Matrix Factorizations

by

Xiao- Wen Chang

School of Computer Science

McGill University

Montréal, Québec

Canada

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH

OF MCGILL UN~VERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

THE DEGREE OF DOCTOR OF PHILOSOPHY

Copyright @ 1997 by Xia&Wen Chang

National Library 1+1 ofCanada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services services bibliographiques

395 Wslligton Street 395, nie Wellington OttawaON K1AON4 Ottawa ON KfA ON4 canada canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Abstract

Matrix factorizations are arnong the most important and basic tools in numerical

liiear algebra. Perturbation analyses of matrix factorizations are not only important

in their own right, but also useful in many applications, e.g. in estimation, control

and statistics. The aim of such analyses is to show what effects changes in the data

will have on the factors. This thesis is concerned with developing new general purpose

perturbation analyses, and applying thern to the Cholesky, QR and LU factorizations,

and the Cholesky downdating problem.

We develop a new approach, the so called 'matrix-vector equation' approach,

to obtain sharp results and true condition numbers for the above problems. Our

perturbation bounds give significant improvements on previous results, and could

not be sharper. Also we use the so called 'matrix equation' approach originated by

G. W. Stewart to derive perturbation bounds that are usually weaker but easier to

interpret. This approach allows efficient computation of satisfactory estimates for

the true condition nurnbers derived by Our approach. The combination of these two

approaches gives a powerful understanding of these problems. Although first-order

perturbation bounds are satisfactory for al1 but the most delicate work, we also give

- some rigorous perturbation bounds for some factorizations.

We show that the condition of many such factorizations is significantly improved

by the standard pivoting strategies (except the L factor in the LU factorization), and

provide fimly based theoretical explanations as to why this is so. This extremely

important information is very useful for designing more reliable matrix algorithms.

Our approach is a powerful general *tool, and appeam to be applicable to the

perturbation analysis of any matrix factorization.

Résumé

Les factorisations de matrices sont parmi 1 s outils les plus importants et les

fondamentaux de l'algèbre linéaire numérique. Les analyses de perturbation

des factorisations de matnces sont non seulement importantes en elles-mêmes, mais

i l s sont aussi utiles dans maintes applications, par exemple dans les domaines de

l'estimation, du contrôle et des statistiques. Ces analyses ont pour but de démontrer

quels effets les changements dans les données produiront sur les facteurs. Cette thèse

s'intéresse a u développement de nouvelles analyses genénérales des perturbations,

et à leur application aux factorisations de Cholesky, QR et LU, et au problème de

inodificat ion de factorisation de Cholesky.

Nous développons une nouvelle approche que nous nommons l'approche équation

iiiatrice-vecteur, afin d'obtenir des résultats précis et des vrais nombres de condition-

nement pour les problèmes mentionnés ci-dessus. Nos bornes sur les perturbations

apportent des améliorations significatives aux résultats zntérieurs et ne pourraient

être plus précises. De plus, nous utilisons l'approche équation matrice développée par

G. W. Stewart pour dériver des bornes sur les perturbations qui sont généralement

plus faibles mais plus faciles à interpréter. Cette approche permet des calculs effi-

caces d'estimés satisfaisants pour les vrais nombres de conditionnement qui dérivent

de notre approche. La combinaison de ces deux approches donne une compréhension

profonde de ces .problèmes. Bien que des bornes sur les perturbations de premier

ordre soient satisfaisantes pour tout travail, sauf pour le plus. délicat, nous donnons

également des bornes rigoureuses sur les perturbations pour certa ines factorisations.

Nous démontrons que la condition de plusieurs de ces factorisations est améliorée

de façon significative par les stratégies usuelles de pivotage (sauf en ce qui con-

cerne le facteur L dans la factorisation LU), et nous fournissons des explications

théoriques solidement fondées pour démontrer pourquoi il en est ainsi. Cette infor-

mation extrêmement importante est des plus utiles pour construire des algorithmes

plus sûrs pour les matrices.

Notre approche est un outil générai puissant, et semble pouvoir s'appliquer ails

analyses de perturbation de n'importe quelle factorisation de matrice.

Acknowledgement s

1 would like to extend my sincere thanks and deep gratitude to my supervisor, Profes-

s i r Cliris Paige, for his invaluable guidance, support, encouragement and care through

the years of Ph.D studies. He displayed extreme generosity with his ideas and tirne,

andsenerated a continuous enthusiasm for this work. He made al1 his research facil-

ities available to me. My association with him is, for meo a rare privilege. 1 hope he

will find this thesis to be a small reward for the maay days of inspired collaboration

as well as for his efforts to ensure that 1 received the best possible training. 1 also wish

to thank Dr. Françoise Paige for translating the abstract of this thesis into French.

1 thank Professor G. W. Stewart for his insightful ideas, which provided one of

tlic approaches used in this thesis. My thanks to Professor Ji-guang Sun for his

suggestions and encouragement, and for providing me draft versions of his work.

Their pioneering work was a major inspiration for this work.

I am thankful to Professor Jirn Varah, externd reviewer of this thesis, for his care-

h l reading. This thesis has been improved by his helpful comments and suggestions.

1 am indebted to Professors Gene Golub, Nick Higham, and George Styan for their

interest and encouragement in this work.

1 am grateful to Professor Jia-Song Wang, my Master's Thesis supervisor, for

introducing me to numerical analysis and continuous encouragement.

The School of Co-mputer Science at McGill University provided me with a com-

fortable environment for my thesis work. Many people including my fellow graduate

students gave me various help. My heartfelt thanks go to Penny Anderson, Franca

Cianci, Naixun Pei, 3osée Turgeon, Josie Vallelonga, Xiaoyan Zhm and Binghai Zhu.

My deepest gratitude goes to my parents, brothers and sister. Their love and care

have always been a constant source of my enthusiasm to complete this work. 1 deeply

regret that my father, who died during my Ph.D. studies, cannot see the completion

of this thesis. Also my thanks to my uncle and aunt for their strong support.

Finally 1 would like to give special thanks to my wife, Su, whose love, patience

and understanding made it possible for me to finally finish my work on this thesis.

To my ivife and my parents

vii

Contents

List of Tables x

1 Introduction and Preliminaries 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Notation and basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 The C hoiesky Fact orization

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Perturbation analysis with norm-bounded changes in A . . . . . . . .

2.2.1 First-order perturbation bounds . . . . . . . . . . . . . . . . . 2.2.2 Rigorous perturbation bounds . . . . . . . . . . . . . . . . . .

2.2.3 Numerical experiments . . -. . . . . . . . . . . . . . . . . . . . 2.3 Perturbation analysis with component-bounded changes in A . . . . .

2.3.1 First-order perturbation bound with IAAl 5 elAl . . . . . . . 2.3.2 First-order perturbation boundswit h backward rounding errors

2.3.3 Rigorous perturbation bound with backward rounding errors .

2.3.4 Numerical experirnents . . . . . . . . . . . . . . . . . . . . . . 2.4 Summazy and future work . . . . . . . . . . . . . . . . . . . . . . . .

3 The QR factorization 60

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -. 60

a..

V l l l

3.2 Rate of change of Q and R. and previous results . . . . . . . . . . . . 62

3.3 Refined andysis for Q . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.4 Perturbation analyses for R . . . . . . . . . . . . . . . . . . . . . . . 68

3.4.1 Matrix-vector equation analysis for R . . . . . . . . . . . . . . 68

3.4.2 hlatrix equation analysis for R . . . . . . . . . . . . . . . . . . 72

3.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.6 Summary and future work . . . . . . . . . . . . . . . . . . . . . . . . 80

4 The LU factorization 83

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2 Rate of change of L and U . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3 New perturbation results . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.1 Matrix-vector equation analysis . . . . . . . . . . . . . . . . . 86

4.3.2 Matrix equation analysis . . . . . . . . . . . . . . . . . . . . . 90

. . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Numerical experiments 97

4.5 Summary and future work . . . . . . . . . . . . . . . . . . . . . . . . 100

5 The Cholesky downdating problem 101

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2 Basics. previous results. and an improvement . . . . . . . . . . . . . . 103

5.3 New perturbation results . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.3.1 Matrix-vector equation analysis . . . . . . . . . . . . . . . . . 108

5.3.2 Matrix equation analysis . . . . . . . . . . . . . . . . . . . . . 114

5.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.5 Surnmary and future work . . . . . . . . . . . . . . . . . . . . . . . . 124

6 Conclusions and future research 126

List of Tables

2.2.1 Results for matrix A = Q A Q ~ of order 25. 2 = P A P ~ . D = Dr . . . 34

. . . . . . . . . . . 2.2.2 Results for Pascal matrices. A = P A P ~ . D = D, 34

. . . . . . 2.2.3 Results for -4 = I(~(B)K,(B). 8 = n/4. A = ~ A I ' I ~ . D = Dr 35

2.3.1 Results for Pascal matrices without pivoting . . . . . . . . . . . . . . 56

. . . . . . . . . 2.3.2 Results for Pascal matrices with pivoting. Â = P A P ~ 57

. . . . . . . . . 3.5.1 Results for Pascal matrices without pivoting. A = QR 78

. . . . . . . . . 3.5.2 Results for Pascal matrices with pivoting. A P = Q R 79

.... 3 . 5 . 3 R e s u l t ~ f 0 r 1 0 ~ 8 m a t r i x A ~ . j = l . 8.withoutpivoting . . . . . . 79

3.5.4 Results for Kahan matrices. 8 = 7r/8. AII = R . . . . . . . . . . . . 80

4.4.1 Results vithout pivoting. A = LU . . . . . . . . . . . . . . . . . . . . 98

4.4.2 Results with partial pivoting. G P A = ZÜ . . . . . . . . . . . . . . 98

4 . 4 . 3 R e s u l t s w i t h c o r n p l e t e p i v o t i n g . Â ~ ~ ~ ~ = ~ % . . . . . . . . . . . 99

5.4.1 Results for the example in Sun's paper . . . . . . . . . . . . . . . . . 123

Introduction and Preliminaries

1.1 Introduction

This thesis is concerned with the perturbation analysis for the Cholesky, QR, and LU

fac torizations, and for the C holesky downdat ing problem. These matrix factorizations

are arnong the most fundamental and important tools in numerical linear dgebra

(see for example Golub and Van Loan [26, 19961). The goal of such an analysis is

to detemine bounds for the changes in the factors of a matrix when the matrix is

perturbed.

We first give some motivation for our concems. Suppose A is a given matnx, and

has a factorization

A = BC, (1.1.1)

where B and C are the factors of A. As in any topic of matrix perturbation theory,

t here are t hree main considerations in perturbation theory for matrix factorkat ions.

First, the elements of A may be determined from physical measurement, and therefore

be subject to errors of observation. The true matrix is A + AA, where AA is the

observation error. Suppose the same factori~ation for A + AA is

A + AA = ( B + AB)(C + AC). . - (1.12)

CHAPTER 1. INTRODUCTION AND PRELMLlVAEWES 2

We are thus led immediately to the consideration of the perturbations A B and AC.

Second, even if the elements of A can be defined exactly by mathematical formulae,

usually these can not efficiently be represented exactly by a digital computer due to

its finite precision. The matrix stored in a computer is A + AA, with [AAl 6 ulAl,

where u is the unit roundoff. So we are faced with much the sarne problem as before.

Finally, backward rounding error analysis throws back errors made in executing ân

algonthm on the original data, see Wilkinson [52, 19631. Suppose for a stored matrix

A, a backward rounding error analysis-shows the computed factors l? and C of A are

the exact factors of A + AA, i.e.,

wbere a bound on IlAAll (or IdAl) is known, then perturbation theory is used to

assess the effects of these backward errors on the accuracy of the computed factors,

i.e., give boundç on IIB - Bll and 116 - CI1 (or fi - BI and (C - CI).

Although in solving lineax equations the sensitivity of factors may not be of cen-

tral interest, it is important when the factors have significance. For example in the

estimation problem with m x n A of full column rank and m dimensional y given

(where &(-) indicates the expected value),

y = A x + u , E ( v ) = O , &(vvT)=cr21,

if we obtain the QR factorization A = QR then solving Rî = Q~~ gives the best

linear unbiased estimate (BLUE) 2 of x, and

so R is the factor of what has been called in the engineering literature the 'information

matrk ' . This is important in its own nght (see for example Paige [34, 1985]), and we

are interested in how changes in A e t R.

CHAPTER 1. INTRODUCTION AND PRELlllrlLNARiES 3

In general we regularly use the fart that the columns of Q in the QR factorization

of -4 form an orthonormal basis for R ( A ) , and we are concerned with how changes in

A affect Q.

In some statistical applications, if certain matrices A and B have QR factorizations

A = QARA, B = QBRB, -

then t h e singular values of QAQB give what are called the 'canonical correlations'

(more generally these give the angles between the subspaces R ( A ) and R(B)), see for

example Bjorck and Golub [4, 19731. Thus the sensitivity of Q in the QR factorization

can be used directly to answer the following important problems: "How do changes

in A and B affect R ( A ) and R(B) and the angles between these (or the canonical

correlations)" .

We thus see the area is an interesting and useful one to study in general. This

area has been an active area of research in recent years. Most of the existing results

have been incorporated in Higham [30, 19961.

Realizing most of the published results on the sensitivity of factorizations, such as

LU, Cholesky, and QR, were extremely weak for certain classes of matrices, Chang,

under the supervision of Chris Paige, see the commentary in Chang, Paige and Stew-

art 114, 19961, originated as approach to obtaining provably s h q resuits and corre-

sponding condition numbers for the Cholesky factorization. He d s o redized that the

condition of the problem was significantly improved by pivoting, and provided the

first firmly based theoretical explanations as to why this was so. Even though the

original work was only about the Cholesky factorization, the approach is a general

approach, and thus can be applied to almost al1 well-known matrix factorizations.

From (1.1.1) and (1.1 -2) we have by dropping the second-order term that

AA = BAC + ABC, (1 S -3)

The basic idea of this approach is to write the approximate matrix equation (1.1.3)

as a matrix-vector equation by using the special structure and properties of B and

CHAPTER 1 . AND PRELMINARIES 4

C, then get the vector-type expressions for AB and AC. So we will call this the

mat ris-vector equation' approach.

Stewart (44, 19951 was stimulated by Chang's work on the Cholesky factorization

to understand this more deeply, and present simple explanations for what w w going

on. Before Chang's work, the most used approxh to perturbation analyses of factor-

izations was what we will call the 'matrix equation' approach, which keeps equations -

like (1.1.3) in their matrix-matrix form. Stewart 1441 (also see Chang, Paige and

Stewart [13, 19961) used an elegant cons twt , partly illustrated by the 'up' and 'low'

notation in Section 1.2, which makes the mat& equation approach a far more usable

and intuitive tool. He cornbined this with deep insights on scaling to produce the new

matriz equation analysis which is appealingly clear, and provides excellent insights

into the sensitivities of the LU and Cholesky factorizations. This new matrix equa-

tion analysis does not in general provide tight results like the matrix-vector equation

analyses do, but they are usually more simple, and provide practical estimates for the

true condition numbers obtained from the latter. This approach is also fairly general,

but for each factorization a particular treatment is needed. This is different from the

matrix-vector equat ion approach, which can be applied to any factorization directly

without any difficulty. .

We combined t hese two approaches to give a deep undestanding of the sensitivity

of the Cholesky factorization, see Chang, Paige and Stewart [13, 19961. We also

applied the two approaches to the QR factorization and the Cholesky downdating

problem, see Chang, Paige and Stewart [14, 19961, and, Chang and Paige [IO, 19961:.

The interplay of the two approaches goes through the whole thesis.

The main purpose of this thesis is to establish first-order perturbation bounds

that are îs tight as possible for the factorizations mentioned above, present the cor-

responding condition numbers, give some condition estimators, and shed light on the

efFect of the standard pivoting on the conditioning. Although first-order perturbation

bounds are satisfactory for al1 but the rnost delicate work, we also give some rigorous

CHAPTER 1. INTRODUCTION AND PRELIlMINAmS 5

perturbation bounds for some factorizations. Some results in this thesis have been

presented in the papers mentioned above. Sorne other new results here have not yet

been published.

1.2

First

cepts

O

O

Notation and basics -

we describe some mostly standard notation, and define some elementary con-

used throughout this thesis.

RmXn denotes the vector space of al1 rn x n real matrices, and Rn = Rnxl.

.A matrix is always denoted by a capital Ietter, e.g. A. The corresponding

lowercase letter with the subscript j and zj refers to the the j t h column and

( 2 , j ) th entry respectively, e.g. a j , aij. Also the notation ( A ) i j designates the

(i, j ) th entry. A(i , :) denotes the ith row of A and A(:, j ) the j t h column.

A vector is represented by a lowercase letter, e.g. 6. The individual components

are denoted witli single subscripts, e-g. b;.

R ( A ) denotes the space spanned by the columns of A.

X(A) denotes an eigenvalue of a matrix A; p(A) denotes the spectral radius of

A, i.e. p ( A ) = max lX(A)I.

a ( A ) denotes a nonzero singular value of a matrix A; o,,(A) and omin(A)

denote the largest and smallest nonzero singular values of A, respectively.

Let A = (a,) be an m x n matrix, then IAI is defined by IA( = (lûijl).

Let t be a scalar and let A(t) = (a i j ( t ) ) be an m x n matrix. If ai,@) is a

differentiable function of t for al1 i and j , then we Say A(t) is differentiable with

CHAPTER 1. INTRODUCTION AND PRELIMLNAkUES

respect to t and define

a II -'II denotes a vector norm or matrix norm.

The 1-nom, 2-norm (or Euclidean norm), and cm-nom of an n-dimension vector

x are defined respectively by

a The S-norm (S for summation), F-norm (or Frobenius norm), and M - n o m (M

for maximum) of an m x n rnatrix A are defined respectively by

The 1-norm, 2-nom (or spectral n o m ) , and oo-nom of an rn x n matrix A

defined by

are given respectively by

&,(A) = I ~ A + I l Y llAllY denotes the siandard condition number of matrix A and

cond,(A) = (1 l ~ t l I A ~ I l v the Bauer-Skeel condition number of matrix A when

norm II 1 1 , is used, where ~t is the Moore-Penrose generalized inverse of A. If

A is nonsingular, then K,(A) = I I ~ - ~ l l , l l ~ l l ~ and condJA) = I I I A - ' ~ ( A ( 1 1 , .

CHAPTER 1. INTRODUCTION AND P R E L M I N A B

4 Let C = [cl, c*, - -., G] be an m x n matrix, then vec(C) is defined by

Now ive describe some special notation used throughout this thesis.

8 D, always denotes the set of al1 n x n red positive definite diagonal matrices.

Let S = ( z i j ) be an n x n matrix. The upper triangular part, strictly lower

triangular part and strictly upper tnangular part of X are denoted respectively

b y

and the diagonal of X is denoted by

4 For any n x n matnx X = (xi,), we define the upper and lower triangular

mat rices

CHAPTER 1. INTROD UCTlON AND PRELILMNAARLES

# For any n x n matrix C = [cl,. . . , % I l denote by cj.') the vector of the firçt i

elements of cj, and by ,I the vector of the last i elements of cj. With these,

we define ('u' denotes 'upper', 'sl' denotes 'strictly lower')

They are the vectors formed by stacking the columns of the upper triangular

part of C into one long vector and by stacking the columns of the strictly lower

triangular part of C into one long vector, respectively.

The 'low' and 'up' have the following basic properties. For general X E RnX"

For symrnetric X E RnXn 1

The following well-known theorem obtained by van der Sluis [SI, 19691 will often

- be referred to when we discuss the effect of scaling on the condition estimators in this

Theorem 1.2.1 Let S, T E RnXn and let S be n ~ ~ n g u l a r , and define

Particularly, if S is synmetn'c positive definite, define D. = diag(S)lI2, then -

K~(D, 'sD; ' ) 5 n inf K*(D-'SD-'). O D€Dn

CVe will often use the following results when discussing the effect of standard

pivoting on the condition numbers.

Theorem 1.2.2 Let T E Rn'" be a nonsingular upper triangular matriz satisfying

[tiil 2 ltijl for al1 j > Z. (1.2.17)

Then with D G diag(T),

CHAPTER 2. 2IVTRODUCTION AND PRELIMINARJES

Proof. Let D-'T. Then we have 1 = ICil 2 IGl. It is easy to show that

From these (1.2.18) and (1.2.19) follow.

By (1.2.23) we have for j > i ,

of the inequalities in (1.2.18), (1 . Z N ) , (1.2.20) and (1.2.21) become equalities with

which give (1.2.20) and (1.2.21).

1

- - - - - - p p p p p p - - - - - - - - - - - - - - - -

If T has the form of (1.2.22), then we easily verify T-l

- ( T i ) = 2 for j > i. It is easy to see fiom thèforegoing p6of that A1

P - 1 1 2 i . . 2 r 2

1 1 ... 2"-3

. . . .

1 1

1

Chapter 2

The Cholesky Factorization

2.1 Introduction

The Cholesky factorization is a fundamental tool in matrix computations: given an

n x n real symmetric positive definite matrix A, there exists a unique upper tnaagular

matrix R with positive diagonal entries such that

R is called the Cholesky factor.

There are different algonthmic foms of Cholesky factorization. The foilowing

algonthrn is the 'bordered' form.

Algorithm CHOL: Given a symmetric positive definite A E RnXn this algorithm cornputes the Cholesky factorization A = RTR.

f o r j = l : n f o r i = I : j - 1

ri, = (ai j - rkirkj)/rii

end r,j = (aj j - ~ j - 1 k=i ' k j ) 2 1/2

end-

CHAPTER 2. THE CHOLESKY FACTOREATION 12

Let A A E RnXn be a symmetric rnatrix such that A + AA is still syrnmetric

positive definite, then A + AA has the unique Cholesky factorization

The goal of the perturbation analysis for the Cholesky factorization is to determine

a bound on IIARII (or IARI) in terms of (a bound on) 11 AAll (or IAAl).

The rest of this chapter is organized as follows. In Section 2.2 we consider the case

where only a bound on 11 AA)I is known. We refer to these as 'nom-bounded changes in -

A'. First-order and rigorous tight bounds are presented by the so calfed matrix-vector

equation approach, and somewhat weaker but more insightful and computationally

applicable bounds are also given by the so called matrix equation approach. In

Section 2.3 we rnalre a similar analysis to that in Section 2.2 for the case where a

bound on lAAl is known. CVe refer to these as 'component-bounded changes in A'. In

both of the sections, we derive useful upper bounds on the condition of the problem

when we use pivoting, and give numerical results to confirm Our theoretical analyses.

Finally we summat-ize our findings and point out future work in Section 2.4.

This Cholesky analyses (and particularly Section 2.2.1) may also be taken as

an introduction to the general approach to perturbation analysis of factorkations

proposed by t his t hesis.

2.2 Perturbation analysis with norm-bounded

changes in A

There have been several papers dealing with the perturbation analysis for the Cholesky

factorization with nom-bounded changes in A. The first result was that of Stew-

art [39, 19771. it was further modified and improved by Sun [46, 19911, who in-

cluded a first-onier perturbation result. Using a different approach, Stewart (41,

19931 obtained the same f i~ t -order perturbation result . Recently ~ r m a ~ ; Ornladif

CHAPTER 2. THE CHOLESKY FACTORIZASION 13

and Veselié [20, 19941 presented perturbation results of a different flavor. They

made a perturbation analysis for the Cholesky factorization of H = DF'AD;' with

D, = diag(cl:f), instead of A. The advantage of their approach is that a good bound

can be obtained when AA corresponds to backward rounding errors. So their result

will be referred to in the next section.

The main goal of this section is to establish new first-order bounds on the norm

of the perturbation in the Cholesky factor, srnaller than those of Sun [46, 19911 and

Stewart [41, 19931, and present a condition number which more closely reflects the

true sensitivity of the problem. Also, we give rigorous perturbation bounds. Many

of the results have been presented in Chang, Paige and Stewart 113, 19963.

2.2.1 First-order perturbation bounds

We first obtain an equation and an expression for R ( O ) in the Cholesky factorization

-4 + tG = R T ( t ) ~ ( t ) , then we use these to obtain Our new first-order perturbation

bounds by the rnatrix-vector equation approach and the matrix equation approach.

The first approach will provide a sharp bound, resulting in the condition number

for the Cholesky factorization with norm-bounded changes

approach provides results that are usudly weaker but easier

efficient computat ion of satisfactory estimates for the actual

in A, while the second

to interpret, and allows

condition number.

Rate of change of R

Here we derive the basic results on how R changes as A changes. We then derived

Sun's [46, 1991) results. The following theorem summarizes the results we use later.

Theorem 2.2.1 Let A E RnXn be symmetric positive definite, with the Chuleskg

factorization A = R ~ R , let G E RnXn be symrnetric, and let AA = C G , for some

€ 2 0 . If

,~(A-%A) c 1, (2.2.1)

CHAPTER 2. THE CHOLESKY FACTORlZATION

then A + AA has the Cholesky factorkation

where R(0) is defined by the unique Cholesky fac~oriration

-4 + tG = ~ * ( t ) ~ ( t ) , It( 5 E , - and so satisfies the equations

R~R(O) + RT(O)R = G,

R ( O ) = U~(R-*GR-') R,

where the 'up ' notation 2s defined by (1.2.3).

Proof. If (2.2.1) holds, then for al1 Itl 5 6 the spectral radius of ~ R - ~ G R - ' satisfies

Therefore for all It 1 < c, A + tG = R T ( ~ + ~ R - ~ G R - ' ) R is symrnetric positive definite

and so has the unique Cholesky factorization (2.2.4). Notice that R(0) = R and

R(6) = R + AR, so (2.2.2) holds.

It is easy to verify that R(t) is twice continuously differentiable for Itl 5 E from

the algorithm for the Choleshy factorization. If we differentiate (2.2.4) and set t = O

in the result , we obtain (2.2.5) which we will see is a linear equation uniquely defining

the elernents of upper triangular R(O) in terms of the elernents-of G. Frorn upper

triangular R ( O ) R-1 in

we see with the 'up' notation in (1 -2.3) that (2.2.6) holds: Finally the Taylor expansion

for R(t) about t = O gives (2.2.3) at t = E . 13

CHAPTER 2. THE CHOLESKY FACTORlZATION 15

Using Theorem 2.2.1 we can now easily obtain the first-order perturbation bound

due to Sun [46, 19911, and d s o proved by Stewart [41, 19931 by a different approach.

Theorem 2.2.2 Let A E RnXn be symmetnc positive de f i i t e , with the Cholesky

factoriration A = R ~ R , and let AA be a real syrnmetn'c n x n matriz satisfying

11 AAll F 5 c llAll2-. If

K*(A)c < 1. (2.2.7)

then -4 + A4 has the Cholesky factorizution

where

Proof. Let G = LAIE (if E = O, the theorem is trivial). Then

Since

p ( ~ - ' ~ 4 i I I A - ' A A I I ~ I ~ 2 ( A ) 6 ,

the assumption (2.2.7) implies that (2.2.1) holds. So the conclusion of Theorem 2.2.1

holds here. By using the fact that Il up(X)IIF 5 $211~II~ for any symmetnc X fsee

(l.2.7)), we have from (2.2.6) that

Then (2.2.8) follows immediately from the Taylor expansion (2.2.3). 0

CHAPTER 2. THE CHOLESKY FACTORlZATION 16

Clearly from (2.2.8) we see &-(A) can be regarded as a measure of tlie sensitivity

of the Cholesky factorization. Since a condition number as a function of a matrix of a

certain class has to be from a bound which is attainable to first-order for any matrix

in the given class (see (2.2.19) for a more formal definition of the condition number of

the Cholesky factorization with nom-bounded changes in A) we will use this ngorous

terminology, and use a qualified term condition estimatof when this criterion is not

met. For general -4 the first-order bound in (2.2.8) is not attainable, in other words,

we are not dways able to choose a symmetnc AA satisfying llAAllF e jlAJ12 to make

(2.2.11) an equality. We could use a simple example to illustrate this, but we choose

not to do so here. as it will be obvious after we obtain the actual condition number.

Therefore we Say -&*(A) is a condition estimator for the Cholesky factorization.

We have seen the basis for deriving first-order perturbation bounds for R is the

equation (2.2.5) (or the expression (2.2.6) of its solution), which will be used later.

Our following analyses will be based on the same assumptions as in Theorem 2.2.2.

Matrix-vector equation analysis

Now we would like to derive an attainable first-order perturbation bound.

The upper and lower triangular parts of the matrix equation (2.2.5) contain iden-

tical information. The upper -triangular part c m be rewritten in the following form

by using the 'uvec' notation in (1.2.4) (for the derivation, see the Appendix of Chang,

Paige and Stewart 113, 19961):

CHAPTER 2. THE CHOLESKY FACTORIZATION

n(m+ ' ' 9 is the lower triangulb rnatrix where WR E R T

Note that for any upper triangular X, Il~vec(X)l(~ = IIXIIF. TO help Our norm

analysis, for any matrix C f RMn we define

where

2 j n

Thus for any symmetric matrix G we have I (d~vec(G)( (~ = IIGIIF. For Our norm-based

analysis, we rewrite (2.2.12) as

Since R is nonsingular, & is also, and from (2.2.15)

uvec(~(0) ) = 6' duvectG).

-


G such that duvec(G) lies in the space spanned by the nght singular vectors corre- l sponding to the la.rgeçt singular value of Fa1 and IlGll F = IIA1]2 Using the Taylor

expansion (2.2.3) and IIA1I2 = IIRII;, we see

and this bound is attainable to first-order in é. Thus for the Cholesky factoriza-

tion with nom-bounded changes in A the condition number (with respect to the

combination of the F- and 2-noms)

is given by

- - - - -

Obviously with the definition of n,(4) we have from (2.2.8) that

1

This upper bound on n,(A) is achieved if R is an n x n identity matrix with n 2 2,

and so is tight.

We now denve a lower bound on nC(A). Observe that the n x n bottom right

hand corner of WR is just di&, 1, . . . , 1 ,2 )~= , so that GR has the form

CHAPTER 2. THE CHOLESKY FACTOMZATION

where

Therefore we have

It follows that

thus

This bound is tight for any n, since equality will hold by taking R = diag(rii), with

O < d r n n 5 r;i7 i # n.

We summarize these resul ts as the following theorem.

Theorem 2.2.3 With the same assumptions as in Theorem 2.2.2, A + AA has the

unique Cholesky factoriration

such that

where K,(A) = ~~e'll~ l l ~ l l : / ~ , and the first-order bound in (2.2.25) is attainable.

O

From (2.2.26) we know the new first-order bound in (2.2.25) is at least as good

as that in (2.2.8), but it suggests the former may be considerably smaller than the

latter. Consider the following example.

CHAPTER 2. THE CHOLESKY FACTOMZATION 20

Example 1: Let A = diag (1 , b2). Then R = diag (l,b), ER = diag (2, fi, 26).

When O < 6 5 114, we obtain

We see that the first-order perturbation bound (2.2.8) can severely overestimate the

effect of a perturbation in -4.

But it is possible that i;,(A) has the same order as ~ ~ ( 4 ,

Example 2: If A = -1 : / with srnail 6 > O, then R =

L A

computations give

as we now show.

Suppose the Cholesky factorization of A is approached by using the standard

symrnetric pivoting strategy: P A P ~ = R ~ R , where P is an n x n permutation matrix

designed so that rows and columns of A are interchanged, during the computation of

the reduction, to make the leading diagonal elements of R as large as possible. Let

the Cholesky factorization of P ( A + AA) pT be P(A+ AA) PT = (R + A R ) ~ ( R + AR).

Then by Theorem 2.2.3 we have

and

Note that the first-order bound in (2.2.8) does not change when the Cholesky factor--

ization of A is approached by using any pivoting strategy. Clearly the perturbation

bound (2.2.25) more closely refiects the structure of the problem. Many numeri-

cal experiments with the standard pivoting strategy suggest that n, ( P A P ~ ) usually

has the same order as 4l2(~), and in fact K,(PAP=) can be bounded above by

CHAPTER 2. THE CHOLESKY FACTORIZATION 21

& * ( ~ ) J n ( n + 1)(P + 672 - 1)/6, see (2.2.39). Using the standard symrnetric pivot-

ing strategy in Example 2 gives

and K,(PAP') = O(l/S), showing how pivoting can improve the condition of the

problem (as measured by our condition number) by an arbitrary amount.

There have been several techniques for estimating the Znorm of the inverse of a

triangular matrix, e.g., Cline et al [16, 19791, Cline et al [15, 19821 and Dixon [18,

19831. A comprehensive, comparative survey on theSe techniques has been given by

Higham [27, 19871. Using these techniques, l l ~ ~ ' 1 1 ~ could be estimated a t the cost

of solving a few linear systems with matrices T& and e. To solve the former is

equivalent to solving RTX + XTR = G (G is symrnetric) for upper triangular S,

and the cost is 0 ( n 3 ) . Even though @y = c is not the transpose of the above

rnatrix equation, it can dso be solved in 0 ( n 3 ) by using the special structure of G. However since the Cholesky factorization costs 0 ( n 3 ) , such a computation would

rarely be considered feasible. Of course if it is know-n that K , ( P A P ~ ) = &*(A), a s

usually happens when we use the standard symrnetric pivoting, then we need only

1 / 2 estimate % ( A ) = Q ( R ) for this case, and this can be done in 0 ( n 2 ) . For a practical

approach to the general case, see the following rnatrix equation analysis.

Matrix equation analysis

As we saw, n&4) is unreasonably espensive to compute or estimate directly with

the usual approach, except when we use pivoting, in which case K,(PAP=) usually

approaches its lower bound 4 ' * ( ~ ) / 2 , se, (2.2.39). Fortunately, the approach of

Stewart [44, 19951 can be extended to obtain an excellent upper bound on &,(A), and

also give considerable insight into what is going on, and lead to efficient a sd practical

condition estimators even when we do not use pivoting.

CHAPTER 2. THE CHOLESI(Y FACTORIZATION 22

In Theorem 2.2.2 we used the expression of R(O) in (2.2.6) to derive Sun's first-

order perturbation bound. Now we again look at (2.2.6)' repeated here for clarit .

R(o) = up( R-~GR-~)R.

A useful observation is that for any matrix B E RnXn and diagonal m a t h D E Rn"",

Let D, be the set of al1 n x n real positive definite diagonal matrices.

D = diag(bi, . . . , bn) E D, we c m take R = DR, @ving

which leads to cancellation of the D - ~ with D:

n',(A) =

which gives the encouraging result

inf dc (A , D), DEDm

&(A) 5 &;(A, 1) = &R) = nz(A).

Then from the Taylor expansion (2.2.3) we have

For any

(2.2.31) shows 11 fi times the first-order bound in (2.2.32) is at least as good as that

in (2.2.8), the first-order perturbation bound in Sun [46, 19911 and Stewart (41, 19931.

With (2.2.19), the definition of rc,(A), we see from (2.2.32) that K=(A) 5 &(A).

But it is useful to prove this more delicately, and so obtain some indication of how

iveak dc(A) is as an approximation to +(A) . We know G can be chosen to make

(2.2.17) an equality, so that for such a G we have

or

which implies

for any D E D,. Note the two inequalities (2.2.33) and (2.2.34) in going from K ~ ( A )

to KL(A, D).

Now we summarize the above results as the following theorem.

Theorem 2.2.4 With the same the assumptions as in Theorem 2.2.2, A + 6.4 has

the Cholesky factoriTation

.4 + AA = ( R + AR)=(R + AR),

such that

where -ti$(A) is as in (2.2.29) and (2.2.30). 0

CH-4PTER 2. THE CHOLESKY FACTOREATION

This matrix equation approach is very simple yet gives considerable insight into

why the Cholesky factorization can be far more well-conditioned than we previously

thought. For example, if the ill-conditioning of R is mostly due to bad scaling of the

rows, then correct choice of D in R = DR can give ns(R) very near one, so $ (A , D)

will approach twice the lower bound 4 ' * ( ~ ) / 2 on K=(A). As an illustration suppose .- -

R = / 1 , then for s m d i 6 ; 0. Q ( R ) = 0 ( i / 6 ) But if we set D = diag(l.6).

L J

then r i@) = 1/6 and n2(R) = 0(1) , so that IEIc(Al D) is close to the lower bound on

tcC(A). Note how almost al1 the ill-conditioning was revealed by the diagonal of R.

This also provides another explanation as to why the standard symmetric piv-

oting of A is so successful, making K , ( P A P ~ ) approach its lower bound in nearly

al1 cases. If A is ill-conditioned (so there is a large distance between the lower and

upper bounds on K ~ ( A ) ) and the Cholesky factorization is computed with standard

symmet ric pivot ing, the ill-conditioning of A will usually reveal itself in the diagonal

elements of R. Stewart [43, 19951 has shown that such upper triangular matrices

are artificially ill-conditioned in the sense that they can be made well-conditioned by

scaling the rows via D. This implies that ~ , ( P A P ~ , D), and therefore (as we shall

show) t c C ( P ~ p T ) , will approach its lower bound. We support this mathematically in

-- - - - . the following.

Theorem 2.2.5 Let A E RnXn be symmetric positive definite un'th the Cholesky fac-

torrtation P A P ~ = RTR when the standad symrnetric pivoting stmtegy is used.

Then

(2.2.39)

and fi.orn this


Proof. We need only prove the 1s t inequality of (2.2.39). In fact standard pivoting

ensures 2 C%!i r: for al1 j 2 i , so ltiil 2 Irijl. Since for any D E Dn,

the inequality follows irnrnediately from K ~ ( R ) = 4I2(A) and (1.2.18) in T'heo-

rem 1.2.2 with D = diag(R). cl

One rnay not be impressed by the 4" factor in the upper bound in (2.2.39), and

may wonder if the upper bound can significantly be irnproved. Ln fact the upper bound

ca l nearly be -approximated by a parametrized farnily of matrices A = c ( 0 ) K,(B),

where

& ( O ) = diag(1, s, - ; sn-')

with c = cos(B) and s = sin(d), were introduced by Kahan [32, 19661. Notice here

the permutation P corresponding to the standard symmetric pivoting strategy is the

identity. Taking D = diag(K,(O)) = diag(1, s, - , sn-'), then by the- last part of

Theorern 1.2.2 we have

Let Dr = diag(l1 Kn(8)(i, :)Il2), then

Heace

But by (1.2.14) in van der Sluis's Theorem 1.2.1 we have

Then it follows from (2.2.42) that as 8 + 0,

This indicates the upper bound in (2.2.39) is nearly tight.

Many computational experiments show with standard symmetnc pivoting that

K = ( P A P ~ ) is uçually quite close to the Iower boiuid of nY2(A), see Section 2.2.3

Tables 2.2.1 and 2.2.2, but can significantly larger for the matrices whose Cholesky

factors are Kahan matrices, see Section 2.2.3 Table 2.2.3 as well as the comments.

In the latter case, something like a rank-revealing pivoting strategy such as that in

Hong and Pan [31, 19921) will most likely be required to make the condition number - - - - .

close to its lower bound.

The practical outcome of this simple analysis is that we now have an 0 ( n 2 ) condi-

tion estirnator for the Cholesky factor. By (1.2.14) in van der Sluis's Theorem 1.2.1,

K ~ ( R ) will be nearly optimal when the rows of R are equilibrated in the 2-nom. Thus

the estimation procedure for the condition of the Cholesky factorization is to choose

D = Dr = diag(llR(i, : ) I l 2 ) in R = DR, and use a standard condition estimator (for

rnatrix inversion) to estimate tc2(R) and K@) in (2.2.29).

Finally we give a new perturbation bound which does not involve any scaling

matrix D, by using a monotone and consistent matrix n o m 11 -11 (see Section 1.2).

CHAPTER 2. THE CHOLESKY FACTOMZATlON 27

Theorem 2.2.6 Let A E RnXn be symrnetric positive definite, un'th the Cholesky

factoriration A = R ~ R , and let A A E RnXn be a symmetnc matriz satisfying 11 AAll 5

E 11 Al1 for some monotone and consistent matriz n o m . If K ( A ) ~ < 1, then

Proof. Let G = AAIE (if c = O the theorem is trivial). Since

p ( ~ - ' ~ ~ ) 5 l l ~ - ' ~ ~ l l < &(A)€ < 1,

Theorem 2.2.1 is applicable here. If we take I I - I I on bot h sides of (2.2.6), we have

which with the Taylor expansion (2.2.3) gives (2.2.43). 0

Note cond(R) is invariant under the row scaling of R, in other words, the pertur-

bat ion bound (2.2.43) provides the scaling automatically. This makes (2.2.43) look

simpler than (2.2.37). Also from (1.2.12) in van der Sluis's Theorem 1.2.1, we know

for the oo-nom,

cond,(R) = min K,(D-'R) = n , ( D d ~ ) , DED,

where DZ'R has rows of unit 1-nom (Drl = diag(llR(i, :)Ill). This gives the condition

estimator rci (R)K,(D,' R) with respect to the oo-nom.

2.2.2 Rigorous perturbation bounds

Usually a first-order bound is satisfactory, but sometimes more careful work is needed.

In this section, we will present rigorous perturbation bounds (with no higher order .


terms) for the Cholesky factor by the matrix-vector equation approach and the matrix

equation approach.

- - Let -4 = R T ~ . If A + AA = ( R + AR)*(R + AR), then we have

which gives

ARR-' = U ~ [ R - ~ ( A A - A R ~ A R ) R - ' 1 . (2.2.45)

We will use (2.2.14) and (2.2.45) in deriving rigorous perturbation bounds as well

as the following lemma.

Lemma 2.2.1 ( A trivial uan'ant of Theonmi 3.1 in Stewart (38, 19731. and Theorem

2.11 in Stewart and Sun [45, 19901) Let T be u bounded linear operator on a Banach

space B. Assume thet T has a bounded inverse, and set 6 = IIT-lII-L. Let 9 : B -' B

be a function that satisfies

l I d4 Il 5 11I14l2

Tz = g + p(x)

has o unique solution x that sotisfies

First we give a rigorous perturbation b o n d . by the matrix-vector equation a p

proach. -

CHA PTER 2. THE CHOLESKY E4CTORlZATION 29

Let X = A R , and define T R X = R=X + XTR and # ( X ) = xTX. Then (2.2.44)

can be rewritten as

TRX = AA - b ( X ) . (2.2.46)

By using Lemma 2.2.1 we can prove the following theorem.

Theorem 2.2.7 Let -4 E RnXn be symrnetnc positive definite, with the Cholesky

jactorization -4 = R ~ R . Let AA E Rn '" be syrnrnetric. If

II%;' 1l%i1 duve~(AA)11~ < 114, *

(2.2.4'7)

then A + A-4 has the Cholesky factoriration

-4 + AA = (R+AR)=(R + AR), (2.2.48)

Obuzously, the weakest bound aboue can be rewn'tten in the following elegant f o m :

Proof. See Chang, Paige and Stewart [13, 19961. 0

Thus the assuiPtion (2.2.47) is generally çtronger than (2.2.7), and may be greatly

so. Following the same argument as Stewart [41, 19931, howeve;, (2.2.47) is needed


to guarantee that the bound on (IARIIF will not explode. Furthemore, if the ill-

conditioning of R is mostly due to bad scaling of the rom, then correct choice of D

can give rs2(D-' R) very near one. In particular, if the standard symmetric pivoting

is used in cornputing the Cholesky factorization, then IIE~' ( l î and 1l.V' (1;'' will

usually be of similar magnitude, see (2.2.40); that is, 11 Gi1112 (1 @; d ~ v e c ( A A ) ( ( ~

and (I A-' I l 2 IIAAII will usually have similar-magnitude. So the condition (2.2.47) is

not too constraining. Numerical esperiments suggest that (2.2.49) is bet ter t han the

equivalent result in Sun [46, 19911, see Chang, Paige and Stewart [13, Section 3.21.

Now we use the matrix equation approach to derive a weaker but practical rigorous

perturbation bound. Let R = DR. then frorn (2.2.45) we have

ARR-l = U ~ ( R - ~ A A R - ~ ) - U ~ ( R - ~ A R ~ A R R - ~ ) . (2.2.31)

Let X = ARR-', and define &Y) = up(D-'XTx). Then (2.2.51) can be rewritten

as

,Y = U ~ ( R - ~ A A R - ' ) - 4 ( ~ ) . (2.2.52)

Applying Lemma 2.2.1 to (2.2.52) we obtain the following result .

Theorem 2.2.8 Let A E RnXn be symmetric positive definite, math the Cholesky

factorization A = RT R. Let AA E Rn X n be syrnmetric, and assume D E D, . If

then A + AA has the Cholesky factoritution

A + AA = ( R + A R ) ~ ( R + AR),

where

If the assumption (2.2.53) zs strengthened to


so we take q = IID-ll12. By the assurnption (2.2.53), we have p = ~7716 < 11.1, ~ i t h

is an identity operator. Thus, by Lemma 2.2.1, (2.2.52) has a unique upper tnangular

solution, say ARP, where R RD-', that satisfies

tlius (2.2.54) and (2.2.55) follow, the latter using (IAR(1 < I ~ A R R - ~ I I Fllfil12.

But R + AR in (2.2.54) must have positive diagonal to satisfy Our definition of

the Cholesky factorization, yhere R was given and ARR-' solves (2.2.52). We now

prow the positivity. From (2.2.59) and (2.2.53) it follows that

Thus R + AR is nonsingular for m y t E [O, 11, and by continuity of elements, R + AR

CHA PTER 2. THE CHOLESKY FACTORIZATDN 32

If we take D = I in Theorem 2.2.8, the assumption (2.2.56) can be weakened to

and we have the following perturbation bound, which is due to Sun [46, 19911,

This is a slightly stronger than (2.2.57) where D is replaced by 1. The proof is similar

to that of Theorem 2.2.8. The only difference is using the fact that 11 up(X)(IF 5

-$$VllF for any syrnrnetric XE RnXn.

-4s we know frorn (1.2.14) in van der Sluis's Theorem 1.2.1 that if we take D =

Dr = diag(l1 R(i , :)Il2) in $(A, D ) l hJC(Al Dr) will be nearly minimal. Thus possibly

the bound (2.2.57) is much smaller than (2.2.61). But the assumption (2.2.56) with

D = Dr is possibly much more constraining than (2.2.60).

2.2 -3 Numerical experiments

In Section 2.2.1, we made first-order perturbation analyses for the Cholesky fac-

torization with norm-bounded changes in A using two different approaches, pre-

sented rc,(A) = I~@;'J~~[[AII; '~ as the corresponding condition nurnber, and sug- - -

gested K = ( A ) C O U ~ ~ be approximated in practice by rlc(A, D r ) = Q ( R ) K * ( D ; ~ R)

with Dr = diag(l1 R(i, :) II2), which could be estimated by a standard condition esti-

mator (see for exarnple Higham [30, 1996, Ch. 14)) in 0 ( n 2 ) . Also we showed the

condition of the problem can usually be (significantly) improved by standard symmet-

ric pivoting. In Section 2.2.2 ~ ~ O ~ O U S perturbation bounds were obtained. In order to

confirm Our theoretical analyses, we have carriecl out several numerical experiments

to cornpute the following measures of the sensitivity of the Cholesky factorkation,

which satisfy (see (1.2.14), (2.2.26), (2.2.30) and (2.2.31))

CHAPTER 2. T H E CNOLESKY FACTOREATION

The computations were performed in Matlab. Here we give three sets of numerical

exam ples.

(1) The matrices in the first set have the form

where Q E RnXn is an orthogonal matrix obtained from the QR factorization of

a random n x n matrix, A=diag(rl,. . . , rn-*, 6,6) with ri, . . ., rn-2 random positive

numbers and O < 6 5 1. We generated different matrices by taking d l combinations of

n E {5,10,15,20,25) and 6 E { l , IO-', . . . , 10-Io) . The results for n = 25, 6 = IO-',

i=O, 1, . . ., 10 are shown in Table 2.2.1, where P is a permutatiod corresponding to

the standard symmetric pivoting. Results obtained by putting the two 6's at the top

of l i were mainly similar.

(2) The second set of matrices are n x n Pascal matrices (with elements aij =

ail = 1, aij = aij-1 + a;-lj ), n = 1, . . ., 15. The results are shown in Table 2.2.2,

where P is a permutation corresponding to the standard syrnmetric pivoting.

(3) The third set of matrices are n x n A = K;f(9)Kn(0), where K,@) are Kahan

matrices, see (2.2.41). The results for n = 5,10,15,20,25 with 9 = 7r/4 are shown

in Table 2.2.3, where II is a permutation such that the first column and row are

moved to the last colurnn and row positions, and the remaining columns and rows

are moved left and up one position - this permutation Ii corresponds to the rank-

revealing pivoting strategy, for details, see Hong and Pan 131, 19921. The permutation

P corresponding to the standard symrnetnc pivoting is the identity, so the standard

symmetric pivoting does not bring any changes t o Our condition number and condition

estimators.

We give some comrnents on the results.

rn The experiments confirm that n * ( ~ ) / f i /S be much larger than %,(A) for ill-

conditioned problems, so the fht-order bound in (2.2.25) may be much smaller


Table 2.2.1: Results for matnx -4 = QAQ* of order 25, A = P A P ~ , D = Dr

Table 2.2.2: Results for Pascal matrices, A = P A P ~ , D = Dr

CH4 PTER 2. T H E CHOLESKY FACTORIZATION

Table 2.2.3: Results for A = G(B)K,(B), t? = n/4, -4 = I I A ~ ~ , D = Dr

-

than that in (2.2.8).

The standard symmetric pivoting almost always gives an improvement on krC(A)

and dC(A, Dr). Table 2.2.2 indicates the improvement can be sig&ficant. Our

esperiments suggest that if the Cholesky factorization of A is approached using

the standard symmetric pivoting strategy, then the condition number of the

Cholesky factorization n c ( P ~ P T ) d l usually have the same order as its lower

lirnit 4'*(~)/2 (the ratio in the last columns of Table 2.2.1 and Table 2.2.2 was

never larger than 4). But Table 2.2.3 shows the ratio can be large. However such

examples are rare in practice, and furthemore if we adopt the rank-revealing .

pivoting strategy, we see from Table 2.2.3 the ratio ~ K ~ ( I I A I I ~ ) / & ~ ( A ) is again

small.

rn Note in Table 2.2.1 and Table 2.2.3 rlc(A, D,) ( dC(PAPT, Dr), 4(n.411T, Dr))

is a very good approximation of n,(A) ( tcc(pApT), ~ ( I I A I I ? ) ) . In Table 2.2.2,

the results with pivoting also show this. For n = 15 without pivoting in Ta-

ble 2.2.2 4 ( A , Dr) overestimates K ~ ( A ) by a factor of about 250, but is much

better for low n. A study of the n = 2 case shows dc(A) can never be much

larger than K=(A), m d we have not found an example which shows dC(A, Dr)

c m be much larger than n,(A) for n > 2, so we suspect n',(A) and d',(A, Dr)

are a t worst K=(A) times some function of n done (probably involving some- .


thing like 2") in general. Rom Theorern 2.2.5 we know this is true when the

standard symmetric pivoting strategy is used.

2.3 Perturbation analysis with component-bounded

changes in A

Iii tliis section we consider the case where a bound on IAAl is given. There have been

a few papers dealing with such problems. Sun [47, 19921 first presented a rigoroui

perturbation bound on IARJ in terms of- IdAl, which was improved by Sun [48,

19921. For AA corresponding to the backward rounding error in A resulting from

iiumerical stable cornputations, e.g. Algorithm CHOL, on finite precision floating

point cornputers, DrmaE, Omladir and VeseliC 120, 19941 presented a nice norm-based

perturbation result using their H = DT'AD;' approach. Sun in [47] and [48] also

included component perturbation bounds for a different and a sornewhat complicated

form of the backward rounding error in A.

The main purpose of this section is to establish new perturbation bounds when LIA

corresponds to the backward rounding error, which are sharper than the equivdent

results in [20]. Most of the results have been presented in Chang [8, 19961. Also we

present first-order perturbation bounds for some other kinds of bounds on JAAJ.

In Section 2.3.1 we establish first-order perturbation bounds with a general bound

lAAl 5 CE, and apply such results to a specid case: IAAl 5 e IAl. The motivation

for considering this special form is that possibly the relative error in each element

of A has the same reasonable known bound, for example, often the elements of A

can not be stored exactly on a digital cornputer, and the matrix actually stored

is A + AA with IAAl 5 uJAl, where u is the unit roundoff. In Section 2.3.2 and

Section 2.3.3 we respectively derive first-order and rigorous perturbation bounds with

l4Al 5 eddr, where di = an2, -which cornes h m the backward rounding error -


analysis, see Demmel [17, 1989). In our analyses we use both the matrix-vector

equation approach and the matrix equation approach.

2.3.1

First we

First-order perturbation bound with ~AAI 5 E (Al

give the following result by using the matrix-vector equation approach. -

Theorem 2.3.1 Let A E RaXn be symrnetric positive definite, with the Choleskg

factorization A = R ~ R . Let AA E RnXn be a symmetric matriz satisfying IAAl 5 eE

for some nonnegative mat& E weth nonnegative c. If

wltere 1) . II denotes a monotone and consistent matriz n o m , then A + A.4 has the

Cholesky factorization

A + AA = (R + A R ) ~ ( R + AR),

such that

whem IIXIIM = ma* j lxi j] and llXlls = C i j 1 ~ ~ ~ 1 , and for the M - n o m the first-order

bound in (2.3.3) is attainable. Thus, in partictllor, if E = IAI, then

and for the M - n o m the first-order bound in (2.3.5) is uttainable.

Proof. Let G G AA/e (if e = O the theorem is trivial). Since


the conclusion of Theorem 2.2.1 holds. Then from (2.2.12). the matrix-vec tor equation

forrn of (2.2.5), we have

This with the Taylor expansion (2.2.3) gives (2.3.2), from which (2.3.3) follows. For

the M-norm the first-order bound is attained for AA satisfying

Theorem 2.3.1 implies that for the Cholesky factorization under relative changes

in the elements of A the condition number (with respect to the M-norm)

pc(.4) = lirn sup { € 4 0

llARlillf : ( A + A.) = ( R + AR)T(R + AR). / A A ~ r il} ~IIRIIM

is given by

We see p C ( A ) is not very intuitive, and is expensive to estimate directly by any

presently known approach. Fortunately by the matrix-equation approach we have the

following practical results.

Theorem 2.3.2 Wzth the same assumptions as in Theorem 2.3.1, A + LIA has the


A + AA = (R + A R ) ~ ( R + AR),


and for a monotone und consistent matriz nonn II - 1 1 ,

and for the M-norm,

where

Proof. ((2.3.7) can easily be proved by using (2.2.6) and (2.2.3). Notice ( A I 5 J I R I , then (2.3.8) follows immediately from (2.3.7), and (2.3.9) is obtained by taking the

norm II - II and using

Similarly (2.3.10) c m be obtained by taking the M-norm and using

where we used the fact that (IABllM 5 IIAllMllBIIi, llAllmllBllM for any A E RmX*

and B E R P X n , which c m easily be verified. From the definition of p, (A) and (2.3.10),-

we see the inequality (2.3.12) holds. 0

The quintuple matrix product in (2.3.8) is nice, as it counteracts the effect of

poor column scding in R (the first product and the third product), and. poor row

c - scaling (the fourth product). This is reflected in both of the first-order bounds in

CHA PTER 2. THE CHOLESKY FACTORlZATION 40

(2.3.9) and (2.3.10). The significance of this theorem is now we can estimate a bound

on the relative change in R in 0(n2) by using the standard condition estimators.

Numerical experiments suggest usually pL(A) is a reasonable upper bound on the

condition number p,(A) . But the former can be arbi t rady larger than the latter. -

For esample, if A = [L 62: 6 4 ] with s m d 6 > 0, then R = [i 1. Simple

compiitations give

It is easy to check the overestimation was caused by the inequality 1 up(B)I 5 IBI

used in deriving (2.3.10). Cleady the strictly lower triangular of 1 BI can be arbitrarily

larger ttian that of 1 up(B)I. Hence (2.3.10) can sometirnes overestimate the true

sensitivity of the problem, so can (2.3.9) for the same reason. We also can give an

esamplc to show &(-4) can overestimate p,(A) due to the inequali ty (2.3.14)-

A careful reader may have noticed the following fact: in the proof of Theo-

rem 2.2.37 we also used the inequality 1 up(B)I 5 IBI (see (2.2.28)) but we men-

tioned in Section 2.2.3 that we have not found an example to show rcC(A) c m be

arbit rarily larger t han K, ( A ) . and furthemore initial investigations suggest probably

~&(A)/rc,(il) can be bounded above by a function of n. Why does the inequality

appear to have different effects? The reason is that here B is the function of only R

and so has a special structure, whereas in (2.2.28) B has a parameter matrix G, and

for any R possibly G c m be chosen such that llBll is close to II up(B)II.

Comparing the first-order bound in (2.3.9) with that in (2.2.43), we see the former

is at least as small as the latter since cond(RT) < n(RT). If the ill-conditioning of

R is mostly due to bad scaling of the columns, then cond(RT) < K ( R ~ ) , that is to

Say the former can be much smaller than the latter. That is not surprising since

the assumption lAAl 5 e 1-41 in Theorem 2.3.2 provides more information about the

perturbations in data than the assumpti& IlAAlI 5 c IlAl1 in Theorem 2.2.6.

CHA PTER 2. THE CHOLESKY FACTORIZATION 41

The standard pivoting strategy can usually improve p C ( A ) , just as it can usually

improve K,-(A). In fact we have the following theorem.

Theorem 2.3.3 Let A E RnXn be syrnmetnc positive d e f i i t e with the Cholesky fac-

toriration P A P ~ = RTR when the standard pivoting stmtegy is used. Then

Proof. Standard pivoting ensures 1 rii 1 2 1 ri j 1 for dl j 2 i. Then from the definit ion of

&(A) in (2.3.11), (2.3.15) is immediatelyobtained by using (1.2.21) in Theorem 1.2.2.

From (2 .3 .15) it is natural to raise the following question: is it possible that

the standard pivoting rnakes c o n d , ( ~ ~ ) much worse than that without pivoting,

so that p C ( ~ ~ P T ) is actually much worse than &(A)? The answer is no. In fact

we can show any pivoting can not bring an essential change to c o n d , ( ~ ~ ) . Let

P A P ~ - RT R, where P is any permutation matrix. Let Dl = diag(lJ RT(i, :)[Il) and

let D2 = diag(l1 RT(i, :)Il2). Then 1 1 ~ 2 ~ 0 ~ 1 1 ~ 5 fi and < 1. By using

van der Sluis's Theorem 1.2.1, we have

Notice that D2 = diag(ll RT(i, :) I l 2 ) = d iag (p~pT) ' i2 = P diag(A)lI2 pT, then

Notice I I H - ~ Il2 is independent of P, thus permutation has no significant effect on

- cond,(RT).

CH-4PTER 2. THE CHOLESKY FACTORIZATION 42

Using the same approaches as in Section 2.2.2 we could easily provide rigorous

perturbation bounds with ld-41 5 E IAl. But we choose not to do so here in order to

keep the material and the basic ideas as brief as possible.

2.3.2 First-order perturbation bounds with backward round-

ing errors

In this section ive first use the matrix-vector equation approach to derive tight per-

turbation bounds, Ieading to the condition number xC(A) for perturbations having

bounds of the form of the equivalent backward rounding error for the Cholesky fac-

torization, then use the n m n x equation approach to derive a practical perturbation

bound, leading to a practical estimator x',(A) of x c ( A ) We also compare x , (A) with

K,-(A), and ,<=(A) with d c ( A ) . Finally we show how standard pivoting improves the

condition number x,(-4).

Before proceeding we introduce the following result due to Demmel [17, 19891,

also see Higham [30, 1996, Theorems 10.5 and 10.71.

Lemma 2.3.1 Let A DcHDc E RnXn be a symmetric positive defiite Poating

point matriz, where LI, = diag(~) ' /*. If

where 6 = ( n + l ) u / ( l - 2(n + 1)u) wzth u being the unit round-08, then the Cholesky

factoriration applied to A succeeds (barring underflow and ouerjlow) and produces a

nonsingular R, which satisfies

A + A A = R ~ R , lAAl d d T ,

1/2 where di = aii . O

This lemma is applicable not only to ~ l g o r i t h m CHOL but also to its standard

mat hematicdly equivalent vanants.


Based on Lemma 2.3.1, we can establish the following bound for the computed

Cholesky factor R.

T'heorem 2.3.4 With A = R~ R and the same assumptions as in Lemma 2.3.1, for

the perturbation AA and result R in (2.3.17) we have

where AR = R - R,

h

R G RD;', and Wk is just GR in (2.2.15) with each entry r, replaced by fi,. The

first-order bovnd in (2.3.19) 2s attainable for the M - n o m , and the first-order bound

in (2.3.20) is approximately attaznable.

Prooj. Let G E AA/E. By (2.3.17) and (2.3.16), we have

p ( ~ ; l ~ - l ~ ; l ~ ~ ) = p ( ~ ' l ~ ; l ~ ~ ~ ; ' )

I(H-' 112 IID;'AAD;l112 I E IIH-'112 Ip ; ldf l~ ; l II. ne I I H - ' ~ ~ ~ < 1.

Then as we did in the proof of Theorem 2.3.1 we can apply Theorern 2.2.1 to show

(2.3.18) and (2.3.19) hold and the first-order bound in (2.3.19) is attainable for the

M-norm.

It remains to show (2.3.20) and its attainability. Rom (2.2.5) in Theorem 2.2.1

and R = RD,, we have

CHA PTER 2. THE CHOLESKY FACTORlZATION 44

As we know (2.2.5) can be rewritten as the matrix-vector equation form (2.2.15), so

similarly (2.3.22) can be rewritten as the following matrix-vector equation form

- C i ; i u v e c ( ~ ( 0 ) ~ ; ' ) = duvec(~;'G~;'). (2.3.23)

It is easy to verify that uvec( R(o)D;') = DL' uvec(~(0) ) with Vc as in (2.3.21), then

from (2.3.23) we have

uvec(fi(0)) = D~@;' d u v e c ( D ; ' ~ ~ ; ~ ) ,

whicli with IG/ = JArllcJ 5 d B gives

I I R ( ~ ) ~ ~ F 5 I IVC&' I I~ I I D ; ' ~ ~ ~ D ~ ' ~ ~ F = nll'~,@;' 112-

Tlius (2.3.20) is obtained immediately from the Taylor expansion (2.2.3).

Obviously there exists a symmetric matrix F E RnXn such that

which shows the first-order bound in (2.3.20) is approxirnately attained for such G.

It iç easy to verify that D, = diag(a!/2) = diag(llR(:, j)l12). Thus fi, the Cholesky

factor of H (H = D;'AD;' = D;'R~RD;' = R ~ R ) , has columns of unit Euclidean

length. That is the reason that we use the notation D,, where 'c' denote 'column'.

Since the first-order bound in (2.3.20) is approzimately attainable, the quantity

can be thought of as

the 'form of backwhd

the the condition number for the Cholesky factorization with

rounding error satisfjhg (2.3.17) when the combination of F-

CHAPTER 2. T H E CHOLESKY FACTORJZATIOIV 45

and 2-noms is used. Notice here we have relaxed a little the requirement a condition

nurnber should satisS. Strictly the condition number should be defined by

But from the proof of Theorem 2.3.1 we see

so such relaxing is harmless. The main reason for introducing the condition number

with respect to the combination of the F- and 2 -noms rather than the M-norm is

tliat we are iiiterested in the cornparison of the results given in this section n-i tli tliose

given in Section

We now give

2.2.

a lower bound on x,-(A). Since has the form (cf. (2 .227 ) )

where

Hence we get the following lower bound on x,-(A)

This bound is tight for any n, since equality will hold by taking R = diag(rii), with

O < rii < T,,,, i # n. But it is a little complicated. In fact we can get a slightly

weaker but simpler lower bound. Since

CH-4PTER 2. THE CHOLESKY FACTORIZATION

we have from (2.3.28) that

Numerical experiments suggest that usually xC(A) is smaller or mucli snialler tlian

il), the condition number for a general perturbation AA, defined by (2.2.20). We

now relate these two condition numbers mathematically for the general case. For the

case wliere standard pivoting is used in computing the Cholesky factorization. see the

comment following Theorem 2.3.9.

Theorem 2.3.5

Proof. Since R = RD,, it is easy to verify by the the structure of & tha t

where V, is defined by (2.3.21) and

Thus using (2.3.31) and ma- aii 5 IIAl12, we have

- The proof is complete. 0


The first inequality in (2.3.30) is attainable, since equality will hold by taking

-4 = c l with c > O. The second inequality is at least nearly attainable. In fact by

taking A = RTR, with R = diag(bn-', F2, . . . , b,1) + ele: with small 6 > O , we . -

easily obtain

This example also suggests that possibly K,(A) is much larger than xc(A) if the

maximum element is much laxger then the minimum element on the diagonal of A.

A reader might want to know why the first inequality in (2.3.30) is not xC(A) 5

K,-(A). This can easily be explained. For a general AA, we have by Theorem 2.2.3

that I14RIIF/II R1I2 5 K=(A)c, where c satisfies llAAllF 5 c IIAl12. For the backward

rounding error AA, we have by Theorem 2.3.4 that (IARIIFIIIRII~ < xc(A)e, where e

satisfies IAA( < e d 8 (see (2.3.17)), from which it follmvs that IIAAII 5 c l l d 8 II =

E I I RIIF, where 1 1 RllF satisfies attainable inequalitieç llAllz 5 liRll$ 5 nllAlln

Like n,(A), xJA) is difficult to estimate directly. Now we derive practical per-

turbation bounds by using the matrix equation approach.

Theorem 2.3.6 With A = R ~ R and the same assumptzons as in Lemma 2.3.1, for

the perturbation AA and result R in (2.3.17) we have

where AR = R - R, R e RD;', and

CH-4PTER 2. THE CHOLESKY FACTORIZATION 48

Proof. Let G r A A / c Since p ( ~ - ' A A ) < 1 (see the proof of Theorem 2.3.4), we

can apply Theorem 2.2.1 hem. Frorn (2.2.6) with R = RD,, it follows that

Then by the Taylor expansion (2.2.3), (2.3.33) follows immediately.

Aiso from (2.3.38) we have for any D E D, that

Thus taking the Frobenius n o m we have

whicli with (GI < d 8 gives

Since this holds for any D E D,, (2.3.34) follows by using Taylor expansion (2.2.3).

It remains to prove (2.3.35). From (2.3.24) and (2.3.39) we have

~ ( ~ . ~ ' d u v e c ( ~ ; ' GD;')^!^ 5 I I R - ' ~ ~ ~ ~ID;'GD;' I I F I I R - ' D ~ ~ ~ IID-' RIIi.

- (2.3.40)

.4ctually this holds for any symmetric G E RnXn since it was essentially obtained

from 'the matrix equation RTX + X T ~ = G with X triangular by the two different

approaches. Notice Ild~vec(D;'GD;')11~ = II D;'GD;'II F, thus frorn (2.3.40) we

must have

ll'oc@il 112 5 IIR-'112 1 1 f i - l ~ l 1 2 I I ~ - ' ~ 1 1 2 7 -

u which implies (2.3.35).


Notice that 11 up(X)IIF 5 for any symmetric X (see (1.2.7)), and 11 RI[$ =

llH1I2, then from (2.3.38) it is easy to obtain

which is essentially the first-order bound of DrmaZ, OmladiC and Veselié [20, Theorem

3.11, except that their bound is rigorous and only the 2-nom is used. But this bound

can severely overestimate the tme relative errors of the computed Cholesky factor.

In fact -

&(A) 5 x X A , 1) = 4W-l Un7 (2.3.12)

and nll H-' Il2 can be arbitrarily larger than $,(A). For example, if A =

with small b > O, then R = [ O 1 ] . Tal. O = diag(&, r ) , tbeo simple cornputa-

tions give

Thus the new approximation &(A) to the condition number x,(A) is a significant

improvement on that of DmaE et al. Furthemore, it is easy to see IIH-' I l 2 is invariant

if pivoting is used in computing the Cholesky factorization of A, whereas xc(A)

and yc (A) depend on any pivoting. Thus the new bounds (2.3.20) and (2.3.34)

more closely reflect the t rue sensitivity of the Cholesky factorization t han (2.3.41).

However, if the ill-conditioning of R is mostly due to bad scaling of its columns, then

nll H-l Il2 is âmdl, and is as good as &(A).

We see (2.3.42) implies nll H-' I l 2 is also an upper bound on xC(A) . In fact we

have the following stronger result

which can easily be provecl by using (2.3.24), (2.3.38) and the fact that for any

syrnmehic X, (1 up(X)nF _< -&IxII~ . That is to Say the b t - o r d e r bound in (22.20)


is a t l e s t a s small as that in (2.3.41). The example above suggests the former can

be much smdler than the latter.

The practical outcome of Theorem 2.3.6 is that x',(A) is quite easy to estiniate.

According to (1.2.10) in van der Sluis's Theorem 1.2.1, al1 we need to do is to choose

D = D, = diag(llR(i, :) I l2) in $&4, D) in (2.3.37), then use norm estimators (see for

example Higham [30, 1996, Ch. 141) to estimate $,(A, Dr) in 0 ( n 2 ) .

Numerical experiments suggest usually xC(A) is a reasonable approsimation to

,yc(A). But the following example shows &(A) can still be much larger than xC(A) , r 1

In Theorem 2.3.6 the F- and 2-norms are used. If we use a monotone and consistent

matrix norm II.II, then from (2.3.38) it is straightforward to show we have the followiiig

perturbation bound instead of (2.3.34),

The advantage here is that the boiind does not involve the scaling matriu D.

In Theorem 2.3.5 we established a relationship (2.3.30) between x,(A) and K=(A)

defined in Theorem 2.2.3. 1s there a similar relationship between yc(A, D) (or dC(.4))

and dc(A, O) (or G ( A ) ) defined in Theorem 2.2.4? The answer is yes.

Theorem- 2.3.7

1 niaxi a;; -x>(A, n D) 5 n',(A, D) I min; ai;

xXA9 DL

and barn this, 1 ma* ai; - x m n 5 K',(A) 5 min; ai; x ~ A ) -

CHAPTER 2. THE CNOLESKY FACTORIZATION

On the other hand,

The proof is complete. 13

The standard pivoting strategy can usually improve xC(A) too. In fact we have

the following theorem.

Theorem 2.3.8 Let A E Rn"" be symrnetnc positive definite with the Cholesky fac-

torization P A P ~ = RTl? when the standard pivoting strategy Hs used. Then

CHAPTER 2. THE CHOLESKY FACTORLZATION

and so from (2.3.37) we obtain

. - Standard pivoting ensures Ir i i [ 2 Ir,[ for dl j > i. Then (2.3.46) follows immediately

by 11 R - I = 11 H-' Ili/* and (1.2.18) in Theorern 1.2.2. 0 -

111 Theorem 2.2.5 we have

By van der Sluis's result (1.2.16),

where it is possible tha t

K : I 2 ( ~ ) « 4i2(~) if A is badly scaled-the columns of R are badly scaled. But 1 5 IIH1I2 < n since H

is positive definite with hii = 1, hence

and it is possible that - - 1 1 /2 llH 112 « 4'2(4

if R has badly scaled columns. Thus it is expected that x , ( P A P ~ ) Gan be arbitrarily

smaller than n C ( P ~ P T ) .

Suppose the Cholesky factorization of A be approached by using the standard

symmetric pivoting strategy: P A P T = RTR. If the permutation matrix P is known

beforehand and the Cholesky fartorization is applied to P A P ~ directly, it is easy to

observe that Lemma 2.3.1 still holds except that now D, and H should be redefined

as

T 1/2 D , = d i a g ( P A P ) , H E D ; ' P A P ~ D ; ' ,

CH.4 PTER 2. THE CHOLESKY FACTORIZASION

and -4 + d A = R ~ R in (2.3.17) should be replaced by

P ( A + A A ) P ~ = RTR.

Theii by Theorem 2.3.4 we have

- Rll~/llR112 < X C ( P A P ~ ) ~ -

This with (2.3.46) suggests that the computed Cholesky factor R has high accuracy.

However. usuaity the permutation matrix P for A is not known Eeforehand. The

permutation matrix for A + AA, which is produced in the computing process, is

possibly different from that for A. Fortunately, Higharn [30, 1996, Lemma 10.111

'showed that if there are no ties in the pivoting strategy for P A P ~ = R T ~ , then for

sufficiently small AA, the two permutation matrices are the same. Thus the Cholesky

factorization with standard syrnmetric pivoting will most likely gives an R which is

about as accurate as possible.

2.3.3 Rigorous perturbation bound with backward rounding

errors

DrmaE, OmladiC and VeseliE 120, 19941 obtained rigorous perturbation bounds. For

cornparison here we also present our rigorous perturbation bounds, which can be

obtained by applying the results in Section 2.2.2.

Theorem 2.3.9 Let A E RnXn be a syrnmetric positive definite floatzng point mutriz,

with the exact Choksky factorization A = R ~ R . D e f i e D, = diag(.-4)1/2 and R = RD;'. If

-1 2 nc l l ~ ~ @ ~ Il2/ min aii < 114, (2.3.48)

where e = (n + l)u/(l- 2(n + 1)u) with u king the unit round-08, and Vc is dejined

b y (2.3 -2 1 ) , then the Cholesky factorization applied to A succeeds ( barn-ng unde f low

CH-WTER 2. THE CHOLESKY E4CTORlZATION

and overflow) and produces a nonsingular fi, which satisfies

where AR r R - R. Obviously the weaker bound above can be rewn'tten in the

following f o m :

Proof. Let H = D;l.4D;' as before. Then H = fiTR. Frorn (2.3.29) and (2.3.27),

which with the assurnption (2.3.48) implies that (2.3.16) holds. Thus by Lemma 2.3.1

t he Cholcsky factorizat ion applied to A succeeds and the computed Cholesky factor

R satisfies

A + A A = R ~ R , ( A A ~ 5 c ddT ,

which with the assurnption (2.3.48) implies that the condition (2.2.47) of Theo- -

rem 2.2.7 is satisfied. Then (2.3.49) is immediately obtained by using Theorem 2.2.7.

O

Theorem 2.3.10 Let A E RnXn b a symmeCnc positive definite fIoating point matriz

with the ezact Cholesky factorization A = R ~ R . Define D, d i a g ( ~ ) ' / ~ and R ZE

CHAPTER 2. THE CHOLESKY FACTORlZATION

RD;'. Assume D E Dn . If

ne I I R - ' ) ~ ~ 11 R - ' ~ l l 2 IID-'112 < 114, (2.3.51) . -

tuhere 6 (n + l )u / ( l - 2(n + 1)u) with u being the unit round-ofl, and R = RD;1.

then Cholesky factorization applzed to A succeeds (buning underflow and overflow)

and produces a nonsingvlar R, which satisfies

where AR r R - R.

Proof. Let H r D;'AD;'. Since

which with (2.3.51) implies that (2.3.16) holds. Thus by Lemma 2.3.1 Cholesky

factorization applied to A succeeds and the cornputed Cholesky factor R satisfis

where di = a:/2. Then

Hence the bound (2.3.52) c m be easily obtained from (2.2.55) in Theorem 2.2.8.

O

for

If we take i) = 1, it is easy to show by using the fact that II up(X)IIF 5 -&IxIJF any syrnmetric X E RnXn that the assumption (2.3.51) can be weaken to -

~ ~ ~ H - ' ~ ~ ~ 5 -1/2, (2.3.54)

CHAPTER 2. THE CHOLESKY FACTOEUZATION

Table 2.3.1 : Results for Pascal matrices without pivoting

and we have the following bound:

which is a slightly stronger than (2.3.54) where D = I. An equivdent rigorous bound

where only the 2-nom is used was obtained by Dm&, Omladi? and VeseliE [20,

19941.

As we pointed out in the comment following Theorem 2.3.6, with a correct choice

of D, e-g., D = Dr = diag((l(R(i, :)Il2), possibly &(A, D) can

11 H-' Il2. So the bound (2.3.55) is potentially weak, although

is not as constraining as (2.3.51).

2.3.4 Numerical experiments

In Sections 2.3.2 and 2.3.3 we presented new first-order and

bounds for the changes-caused by backward rounding errors

Be much srnaller than

the condition (2.3.54)

rigorous perturbation

for the Cholesky fac-

CHAPTER 2- THE CHOLESKY FACTORlZATION

Table 2.3.2: Resultç for Pascal matrices with pivoting, r pAPT

torization using two different approaches, defined xc(A) = n l l ~ , @ i ~ 1 1 1 / 1 1 ~ 1 1 : ' 2 as

the condition number of the problem, and suggested that usually x,(A) could be

estirnated in practice by $=(A, Dr) = n l ( ~ - ' 1 1 ~ 11 R-'D& llD;iR112/11R112, with Dr =

diag(llR(i, :)Ilz), which can be estimated by standard n o m estimators in 0(n2) . Our

bounds are potentially much smaller than the equivdent bound in Drrnaii, OmladiE

and Veselié [20, 19941. Also we compare x,(A) with nc(A), and compare corre-

sponding estimators &(A, D) with G(A, D) as well. These condition numbers and

condition estimators satisfy the following inegualities (see (1.2.10), (2.3.30), (2.3.35),

(2 .3.36), (2.3.42), (2.3.43); and (2.3.44)):

CHAPTER 2. T H E CHOLESKY FACTORIZATION 58

Now we give a set of examples to show Our findings. The matrices are n x n Pascal

matrices, n = 1,2, . . . , n. The results are shown in Table 2.3.1 without pivoting and

in Table 2.3.2 with pivoting. . -

Note in Tables 2.3.1 and 2.3.2 how &nll~-'11~ can be worse than x,(A). In

Table 2.3.2 pivoting is seen to give a significant improvement on xC(A) . Also we

observe from both Tables 2.3.1 and 2.3.2 that &(A) is a feasonable approximation

of xc(rl). We see xC(A) is smaller than rc,(A) for n > 2.

2.4 Summary and future work

Although with norm-bounded changes in A the Sun [46, 19911 and Stewaxt (41, 19931

first-order perturbation bound (2.2.8) is relevant in the sense that some problems do

attain close to the indicated condition, we have shown that it gives a large over-bound

for most problems. The more refined bound (2.2.25) obtained b) the matnu-vector

equation approach is usually significantly stronger, and is never weaker, and the re-

sulting condition number K,(A) more accurately reflects the true sensitivity of the

problem. Further, the sizes of Our condition numbers depend on any symmetric pivot-

ing used, and numericd results and analyses show that the standard syrnmetric pivot-

ing strategy usually leads to a near optimally conditioned factorization for a given A

in P A P ~ = R*R. Because of the diEculty in understanding and computing iç,(A),

there was need for, and fortunately we have been able to give by the matrix-equation

approach, a simpler bound. Although the new bound (2.2.29) is somewhat weaker,

it provides a computationally practical and useful estimate dc(A, Dr) of K,-(A), and

at the sarne tirne gives us insight into why the Cholesky factorization is bften less

sensitive than we thought, and adds to our understanding as to why the standard

pivoting usually gives a condition number approaching its lower bound $dI2(~) . For the perturbation AA which cornes from the backward rounding error analysis,

we first presented first-order (nearly) attainable bounds (see Theorems 2.3.4) by the

CHAPTER 2. THE CHOLESKY FACTOItIZATION

mat rix-vector equation approach, t hen gave computationally practical bounds (see

Theorem 2.3.6) by the matrix equation approach. Even though the latter are weaker

than the former, both of them are (potentially) stronger than the corresponding - -

equivalent bound (2.3.41) of D m 2 et al. Our condition number xC(A) more closely

reflects the true sensitivity of the problem. ALso numericai experiments and analysis

- show that usually the standard symmetric pivoting strategy c m significantly improve

the condition nurnber x C ( A ) So the computed Cholesky factor rnost likely has hi&

accuracy when the standard symmetnc pivoting is used.

With the relative changes in the elements of A (Le. lAAl 5 €1 Al) we presented

first-order perturbation analyses, which resulted in the condition number p,(.4) and

a practical and useful upper bound &(A) on pc(A) .

In the future we would like to

0 Investigate the ratio K ~ ( A ) / ~ ~ ( A ) , which we suspect is bounded by the function

of n alone, probably involving something like 2".

a Explore the effect of rank-revealing pivoting on K~ in both theory and compu-

tations, and study the optimization problem minp K , ( P A P ~ ) .

m Give a better approximation to xC(A) than our current &(A), which can some-

times overestimate x,(A), or alternatively look for other methods to estimate

xc(A) efficiently.

r Give a better approximation to p,(A) than Our current pC(A), which can some-

times overestimate pc(A), or alternatively look for other methods to estimate

pC ( A ) efficient ly.

Chapter 3

The QR- factorization

3.1

The Q R

Introduction

factorization is an important tool in matrix computations: given an m x n

real matrix A with full column rank, there exists a unique rn x n real matrix Q with

orthonormal colurnns, and a unique nonsingular upper triangular n x n real matrix

R with positive diagonal entries such that

A = QR.

The rnatrix Q is referred to as the orthogonal factor, and R the triangular factor.

Let AA be a real rn x n matrix such that A + d A is still of full colurnn rank, then

-4 + AA has a unique QR factorkation

A + A A = (Q + AQ)(R + AR).

The goal of the perturbation analysis for the QR factorization is to determine a bound

on Il AQll (or Id&[) and 11 AR11 (or IARI) in terms of (a bound on) 11 AAll (or IdAl).

The perturbation analysis for the QR factorization has been considered by several

authors. Given )IdAl], the first result was presented by Stewart [39, 1977). That was

further modified and improved by-Sun [46, 1991]1 Using ditferent approaches Sun [46,

CHAPTER 3. THE QR FACTORIZATION 61

19911 and Stewart [41, 19931 gave first-order perturbation analyses. Recently a new

rigorous perturbation bound for Q alone was given by Sun 149, 19951. Given ( A A ( ,

Sun [48, 19921 presented a rigorous analysis for the components of Q and R. For A-4 . - which has the form of the equivalent backward rounding error (componentwise form)

from a numerically stable computation of the QR factorization, Zha (55, 19931 gave

afirst-order analysis.

The main goal of this chapter is to establish new first-order perturbation bounds

given a bound on IlAAll, which are sharper than the equivalent results for the R factor

iti Suu 146, 19911 and Stewart [41, 19931, and more straightforward than the sharp

result in Sun [49, 19951 for the Q factor, and present the corresponding condition

numbers which- more closely reflect the true sensitivity of the problem.

The rest of this chapter is organized as follows. In Section 3.2 we obtain expres-

sions for ~ ( 0 ) and k(0) in the QR factorization A + tG = Q(t)R(t). These basic

sensitivity expressions will be used to obtain our new perturbation bounds in Sec-

tions 3.3 and 3.4, but in Section 3.2 they are also used to denve Sun's results on the

sensitivity of R and Q. In Section 3.3 we give a refined perturbation analysis for Q,

showing in a simple way why the standard column pivoting strategy for A can be

beneficial for certain aspects of the sensitivity of Q. In Section 3.4 we analyze the

perturbation in R, first by the detailed and tight matrix-vector equation approach,

then by the straightforward matrix equation approach. We give numerical results

and suggest practical condition estimators in Section 3.5. Finally we summarize Our

findings and point out future work in Section 3.6.

CHAPTER 3. THE QR FACTORlZATION 62

3.2 Rate of change of Q and R, and previous

results

Our perturbation bounds for Q will be tighter if we bound separately the perturba-

tions along the column space of A and along its orthogonal complement. Thus we

introduce the following notation. For real rn x n A, let Pl be the orthogonal projector

ont0 R ( A ) , and Pz be the orthogonal projector onto R ( A ) I . For real m x n AA define

so €* = E: + € 2 - Here we denve the basic results on how Q and R change as A changes. We then

derive the first-order results obtained by Sun [46, 1991][49, 19953. The following

theorem summarizes the results we use later.

Theorem 3.2.1 Let A E RmXn be of full column mnk n with the QR factorkation

A = QR, let G

where ' i2(A) E

has the unique

be a real m x n matriz, and let AA = CG, for some E 2 O. If

llAt11211~112 and Pi is the O-rthogonal projector onto R(A) , then A+ AA

QR factorizatzon

with AR and AQ satisfying

AR = ER(O) + O(€*), AQ = E Q ( O ) + 0(t2),

where k(0) and Q ( O ) are def i ed by the unique QR jactorization

~ ( t ) = A + t~ = ~ ( t ) ~ ( t ) , ~ ~ ( t ) ~ ( t ) = 1, 1t1 I E ,

CHAPTER 3. THE QR FACTORIZATION

and so satisfy the equations

wheie the 'up' notation is defined &y (1.2.3).

Proof. Take any Q such that [Q,Q] is square and orthogonal, then for al1 It( 5 e

From the inequality (3.2.2) we see lltQTGllz 5 umin(A) = omin(R) for al1 Itl 5 C. TINS

A + tG has full column rank and the unique QR factorization (3.2.6). Notice that

R(0) = R, R ( E ) = R + AR, & ( O ) = Q and Q ( e ) = Q + AQ, so (3.2.3) holds.

It is easy to ~ e n f y that Q ( t ) and R ( t ) are twice continuously differentiable for

It( 5 e from the algonthm for the QR factorization. If we differentiate R ( t ) T ~ ( t ) =

~ ( t ) ~ ~ ( t ) with respect to t and set t = 0, and use A = QR, we obtain (3.2.7) which

we will see is a linear equation unzquely defining the elements of upper triangular

R ( O ) in terms of the elements of QTG. From upper triangular R(O)R-' in

we see with the 'up' notation (see (1.2.3)) that (3.2.8) holds. Next differentiating

(3.2.6) at t = O gives

G = y ~ ( o ) + Q(O)R,

and combining this with (3.2.8) gives'(3.2.9). Finally the Taylor expansions for R(t)

and Q(t ) about t = O give (3.2.4) and (3.2.5) at t = e. [1

By Theorem 3.2.1 we can easily obtain the kt-order perturbation bound for R

given by Sun [46, 19911 and dso by Stewart [41, 19931, and the first-order bound for

Q given by Sun [49, 19951.


Theorem 3.2.2 Let A E RmXn be of /ull column mnk n with the QR factorilution

A = QR, and let AA be a real m x n matriz. De jke E = IlAAIIF/IIA112 and ci = I I Q Q * A A I I ~ / I I A ~ ~ ~ , see (3.2.1). If (3.2.2) holds, then A + AA has the unique QR . - facton'zation

A + AA = (Q + AQ)(R + AR),

where

Proof. Let G AA/c (if E = 0, the theorem is trivial), then

Cleariy al1 the conclusions of Theorem 3.2.1 hold here. From (3.2.8) and the fact that

for symmetric X, I I up(X)(IF 5 & I I x I I F (see (1.2.7)) we have

and ic2(R) = n2(A),

Thus (3.2.10) follows

If [Q, Q] is square

from the Taylor expression (3.2.4).

and orthogonal, then from (3.2.9) we have


yhicli, witli the Taylor expression (3.2.5), gives the bound (3.2.1 1) for the Q factor.

cl

Mre see ~ K ~ ( A ) is a mesure for the sensitivity of both R and Q , but i t is not

tlie act ual condition number since for general A the first-order bounds in (3.2.10) and

(3.2.1 1) are not attainable. Thus & K ~ ( A ) is a condition estimator for both R and

Q in the QR factorization.

3.3 Refined -analysis for Q

The results of Sun [49, 19951 give about as good as possible overall bounds on the

change 4 Q in Q. But by looking a t how AQ is distributed between R(Q) and its

ortliogonal complement, and following the ideas in the proof of Theorem 3.2.2, we

are able to obtain a result which is tight but, unlike the related tight result in [49],

easy to follow. It makes clear exactly where any ill-conditioning lies. From (3.2.5)

with Q = [Q, QI square and orthogonal,

and the key is to bound the first term on the right separately from the second. Note

from (3.2.9) with G = AA/e and (3.2.1) that

where G can be chosen to give equality here for any given A. Hence

CH,ilPTER 3. THE QR FACTORIZATION 66

and for that part of AQ in R(Q) the condition number (with respect to the combi-

nation of the F- and 2-noms)

t i p s ( ~ ) z lirnsup { €4

'1P2nQ"F : A + AA = (Q + AQ)(R + AR), eallQII2

is given by -

K*.L ( A ) = Q ( A ) .

Sow for the part of AQ in 'R(Q). We see from (3.2.9) that

which is skew symmetnc with clearly zero diagonal. If we partition Q, G and R as

then from (3.3.2) we have

It is easy to verify for any h - 1 that equalities are obtained by taking G = (qnyT, O),

with y nonzero such that 1 ) ~ : y ~ J 2 = IJ&-A1)12 11y112. It follows that the first-order

bound is attainable in

CHAPTER 3. THE QR FACTORUATION

so for that part of AQ in R(Q) the condition number

In some problems we are mainly (in fact only, if A is square and rionsingular)

interested in the change in Q lying in R(Q), and this result shows its bound can be

smaller than we previously thoughtr In particular if A has only one small singular

value, and we use the standard column pivoting strategy in computing the QR fac-

torization, then will.usual1y be quite well-conditioned cornpaxed with R, and

we will have II ~ ; ! ~ 1 1 ~ 1 1 ~ 1 1 ~ < n2(A). However for some special cases this may not be

true, for example the Kahan matrix in Section 3.5, and then a rank re\realing pivot-

ing strategy çuch as in Hong and Pan [31, 19921 would be required to obtain sudi aii

improvement .

We now summarize the above results as the following theorem.

Theorem 3.3.1 Let -4 = [Q,Q] l o l be the QR factorization of A E RmXn with

L 4

fvll column mnk, and let AA be a real m x n matriz. Let E = IlAAIIF/IIA112, €1 = IIQQT~~I(F/II~112 and €2 = I I Q Q ~ A A ( I ~ / I I A ~ ~ ~ If (3.2.2) holds, then there is a

unique QR factorizatzon satisfying

A + AA = (Q + A Q ) ( R + AR),

where

with

CHAPTER 3. THE QR FACTOREATION 68

3.4 Perturbation analyses for R

In Section 3.2 we saw the key to deriving first-order perturbation bounds for R in

- the QR factorization of full column ranlc A is the equation (3.2.7). Like in Chapter

2 for the Cholesky factor, we will now analyze the equation in two ways. The first,

the matrix-vector equation approach, gives a sharp bound leading to the condition

number r i ,@) for R in the QR factorkation of A; while the second. the matris

equation approach, gives a clear improvement on (3.2.10), and provides an upper

bound on r;,(A). Both approach& provide efficient condition estimators (see Chang

and Paige [9, 19951 for the matrix-vector equation approach), and nice results for

the special case of A P = QR, where P is a permutation matrix giving the standard

column pivoting, but we will only derive the rnatrix equation versions of these. The

tigliter but more complicated matrix-vector equation analysis for the case of pivoting

is given in (91, and only the results will be quoted here. Al1 Our analyses in this section

are based on the same assumptions as in Theorem 3.2.2. Most of the results to be

given here have been presented in Chang, Paige and Stewart [14, 19961.

3.4.1 Matrix-vector equation analysis for R

The matrix-vector equation approach views the -mat& equation (3.2.7) as a large

matrix-vector equation. The upper and lower triangular parts of (3.2.7) contain iden-

t icd information. By using the 'uvec' notation defined by (1.2.4) and 'vec' notation,

we can easily show the upper triangular part can be rewritten in the following form

(for the derivation, see [14]):

f l l

T l 2 r22

Tl3 r23 r33

. .

Since R is nonsingular, WR is &O, and frorn (3i4.1)

uvec(fi(0)) = wil ZR v ~ c ( Q ~ G ) .

Remernbering R ( O ) is upper triangular, we see

I I R ( ~ ) ~ ~ F = 1luvec(&o))ll2 = I l w ; ' z R ~ e c ( & ~ ~ ) l l 2

CHAPTER 3. THE QR E4CTORIZATION

- where for any nonsingular upper triangular R, equality can be obtained by choosing

G sucli that v e c ( ~ ~ G ) lies in the space spanned by the right singular vectors corre-

sponding to the largest singular value of ~ 6 ' 2 ~ . Therefore we see from the Taylor

espansion (3.2.4),

and this bound is attainable to first order in c. This implies for R in the QR fac-

torization of A the condition number (with respect to the combination of the F- and

2-norrns) defined by

K ~ ( A ) E lim sup { c-O

' l A R ' l F : ( A + AA) = (Q + dQ)(R + AR) €1 Il Rlll

is given by

K R ( A ) = I Ipv i1~R112.

From the definition of K , (A) and the Sun's first-order perturbation bound (3.2.10)

we e a d y observe

K R ( A ) I fiQ(4.

This upper bound is achieved if R is an identity matrix, and so is tight.

The structure of WR and ZR reveals that each column of WR is one of the columns

of ZR, and so w&ZR has an n(n + 1)/2 square identity submatrix, giving

We now summarize these results as the following theorem.

Theorem 3.4.1 Wzth the same assumptzons as in Theorem 3.2.2, A + AA has the

unique QR f a c t o ~ ~ a t i o n

- A + A 4 = (Q + AQ)(R+AR) ,

CHAPTER 3. THE QR FACTONZATION

where with W R and ZR as in (3.4.1),

and the first-order bound in (3.4.5) 2s uttainable. O

Unity is the best constant lower bound on K,(A) we c m obtain, as can be seen

froni t lie following example.

Example 1. Let R = diag(l,6,. . . , PL), O < 6 5 1. We can easily show

Fr0111 (3.4.6) we know the first-order perturbation bound in (3.4.5) is at least as

good as as that in (3.2.10). In fact it can be better by an arbitrary factor. Consider

Example 1,

K,(A) = J-, K?(A) = 116,

and

We see the first-order perturbation bound (3.2.10) can severely overestimate the effect

of a perturbation in A.

Suppose we use the standard column pivoting strategy in A P = QR, where P

is a permutation matrix designed so that columns of A are interchanged, during the

computation of the reduction, to make the leading diagonal elements of R as large

as possible, see Golub and Van Loan 126, 1996, $5.41 for details. We see n2(A) in

(3.2.10) does not change, but K,(A) does change. The following result was show by

Chang and Paige [9, 19951.


Theorem 3.4.2 Let A E RmXn be of full colurnn rank, un'th the QR factorization

AP = Q R when the standard column pivoting strategy is used. Then

If R = fin(9), where & ( O ) are the Kahan matrices (see (2.2.41)), then

Theorem 3.4.2 shows that when the standard column pivoting strategy is used,

K,(.AP) is bounded for fked n no matter how large 4 A ) is. Many numerical ex-

periments with this strategy suggest that r(,(AP) is usually close to its lower bound

of one. But it is not for the Kahan matrices. Fortunately such exarnples are rare

in practice, and furthemore if we adopt the rank-mvealing pivoting strategy, the

condition nurnber will most likely be close to its lower bound, see Section 3.5.

3.4.2 Matrix equation analysis for R

-4s far as we can see, K, (A) is unreasonablely expensive to compute or estimate

directly with the usud approach, except when we use pivoting, in which case K,(AP)

usually approaches its lower bound of 1. Fortunatety, by the matrix equation approach

we can obtain an excellent upper bound on K,(A).

In the proof of Theorem 3.2.2 we used the expression of R(O) in (3.2.8) to derive

Sun's first-order perturbation bound. Now we again look a t (3.2.8), repeated here for

da r i ty :

R(o) = U ~ [ Q ~ G R - ' + ( Q ~ G R - ~ ) ~ ] R .

Let D, be the set of d l n x n real positive definite diagonal matrices. For any

D = diag(&,. . . ,6,) E D,, let R = DR. Note that for any matrix B we have

up(B)D = up(BD). Hence if we define B = Q ~ G R - ~ , then -

R ( 0 ) = u ~ [ Q ~ G R - ~ + D-'(Q~GR-')~D]R = [up(B) + D - ~ u ~ ( B ~ ) D ] R . (3.4.9)'


With obviouç notation, the upper triangular matrix up(B) + D-' U P ( B ~ ) D has (i, j )

element bij + bjibj /6* for i < j , and ( 2 , i ) element b,. To bound this, we use:

where

Proof. Clearly

But by the Cauchy-Schwarz theorem, (bij + $bji)* 5 (b;j + b;i)(l + ( 2 ) ~ ) ~ so

From this (3.4.10) follows. 0

We can now bound R ( O ) of (3.4.9)

CHAPTER 3. THE QR FACSORIZATION

But this is true for d l D E D,, so that

K;(A) inf K',(A,D), DED*

where CD is defined in (3.4.1 1). This gives the encouraging result

Hence from the Taylor expansion (3.2.4) we have

where from (3.4.17) this is never worse than the bound (3.2.10).

Clearly d , ( A ) is a measure of the sensitivity of R in the QR factorization. Since

K,(A) is the condition number for R (see the definition (3.4.3)), certainly we have

üsually (3.4.19) is a strict inequality, but if R is diagonal equality will hold. In fact,

take D = R, and let CD = r j j / r i i , j > i, then d R ( A , D ) = J=. On the other

hand, it is straightforwaxd to show rc,(.4) = di+6. Sa n,(A) = K',(A, D), which

implies sR(A) = dR(A). For another proof for this, see Chang, Paige and Stewart [14,

1996, Remark 5.11.

Now we summaxize the above results as the following theorem.

Theorem 3.4.3 With the same assumptions as in Theorem 3.2.2, A + AA has the

unzque QR factorization satisfying

A + AA = (Q + AQ)(R + AR),

where with dJA) as in (3.4.15) and (3.4.16),


and zf R zs diagonal the first inequality in (3.4.21) will become an equality. 0

* - From (3.4.21) we know the first-order perturbation bound (3.4.20) is at least as

good as (3.2.10). In fact it can be better by an arbitrary factor, as can dso be seen

from Example 1. Taking D = R, we have

If we take R = diag(bl-" ,... ,6, l ) , O < 6 $ 1, we see 4 R ) = ii2(A) = hl-",

while

which is close to the upper bound &,(A) for small 6. This shows that relatively

srnall early diagonal elements of R cause poor condition, and suggests if we do not

use pivoting, then there is a significant chance that the condition of the problem will

approach its upper bound, at least for randomly chosen matrices.

With the standard column pivoting strategy in A P = QR, P a permutation

matrix, this analysis also leads simply to a very nice result, even though it is a bit

weaker than the tight result (3.4.8).

Theorem 3.4.4 Let A E RmXn be of full column rank, wzth the QR factorization

AP = Q R when the standard colunn pzuotzng stmtegy is used. Then

Proof. Standard column pivoting ensures (riil 2 Ir;,( and Ir,l 2 Irjj[ for al1 j 2 i.

Since for any D E D,,

(3.4.22) is immediately obtained fiom (1.2.18) in Theorem 1.2.2 by taking D =

diag(R). - 0 .


This analysis gives some insight as to why R in the QR factorization is less sensitive

tliaii the earlier condition estimator &Q(A) indicated. If the ill-conditioning of R

is mostly due to bad scaling of its rows, then correct choice of D can give K*(D-' R)

very near one. If a t the same time b is not large, then d,(A, D) in (3.4.16) can

Lw much smaller than I /%~(R)? see (3.4.17). Standard pivoting aiways ensures that

such a D exists. and in fact gives (3.4.22).

The significance of this analysis is that it provides an excellent upper bound on

( 4 ) hJ,(..I) is quite easy to estimate. Al1 we wed to do is choose a suitable D in

~',(rl, D) in (3.4.16). We consider how to do this in the next section.

3.5 Numerical experiments

In Section 3.4 we presented new first-order perturbation bounds for the R factor of

the QR factorization using two different approaches, obtained the condition number

( 4 = I( I ' I '<~z~~~~ for the R factor, and suggested K,(A) could be estimated in

practice by 4 . 4 , D). Our new first-order results are sharper than previous results

for R, and at least as sharp for Q, and we give some numerical experiments to illustrate

bot h t his, and the possible estimators for n,(A).

We would like to choose D such that d R ( A , D ) is a good approximation to the

minimum &(A) in (3.4.15), and show that this is a good estimate of the condition

iiumber nR(A). Shen a procedure for obtaining an 0(n2) condition estimator for

R in the- QR factorization (i.e. an estimator for sR(A)) , is to choose such a D,

use a standard condition estimator (see for example Higharn [27, 19871) to estimate

rr2(D-lR), and take dR(A, D) in (3.4.16) as the appropriate estimate.

By van der Sluis's Theorem 1.2.1, q ( D - I R ) will be nearly minimal when the

rows of D-'R are equilibrated. But this could lead to a large in (3.4.16). There

- are three obvious possibilities for D. The first one is choosing D to equilibrate R

-prccisely. SpecificaMy, take bi = 4- for i = 1,. . . , n. The second one is

CK4PTER 3. THE QR FACTORIZATION 77

choosing D to equilibrate R as far as possible while keeping 5 1. Specifically,

take hi = ,/=, bi = if 5 6i-1 otherwise bi = bi- 1 , for

i = 2 , . . . ,n. The third one is choosing bi = rii. Computations show that tlie - -

third choice can sornetimes cause unnecessarily large estimates, so we will not give

any results for that choice. We specik the diagonal matrix D obtained by the first

method and the second method by Dl and D2 respectively in the following.

We give three sets of esarnples.

(1 ) The first set of matrices are n x n Pascal matrices, n = 1,2, . . . ,15. The

results are shown in Table 3.5.1 without pivoting, and in Table 3.5.2 with standard

column pivoting. Table 3.5.1 illustrate how the upper bound ~ K , ( A ) can be far

worse than the condition number n,(A), which itself c m be much greater than its

lower bound of 1. In Table 3.5.2 standard column pivoting is seen to give a significant

improvement on I E ~ ( A ) , bringing nR(AP) very close to its lower bound, but of course

&*(AP) = **(A) still. Also we observe £iom Table 3.5.1 that both dR(A, D l )

and dRtR(A, LI2) are very good estimates for nR(A). The latter is a little better than

tlie former. In Table 3.5.2 K,(AP, Di) = n,(AP, Dz) (in fact Dl = D2), and they are

also good estimates for K,(AP) .

(2) The second set of matrices are 10 x 8 Aj, j = 1 ,2 , . . . ,8, which are al1 obtained

from the same random 10 x 8 matrix (produced by the MATLAB function randn),

but with its j t h column multiplied by IO-* to give A,. The results are shown in

Table 3.5.3 without pivoting. Al1 the results with pivoting are similar to that for

j = 8 in Table 3.5.3, and so are not given here. For j = 1,2. . . ,7, n,(A) and

rc,(A) are both close to their upper bound &K~(A), but for j = 8, both KR(A)

and K,(A) are significantly smaller than ~ K ~ ( A ) . Al1 these results are what we

expected, since the matrix R is ill-conditioned due to the fact that q j is very small,

but for j = 1,2, . . . ,7 the rows of R are already essentially equilibrated, and we do

not expect n,(A) to be much better than &c~(A) . Also for the first seven cases

the smallestsingular value of the leading part is close to that of R, so that

CHAPTER 3. THE QR FACTORlZATlON

Table 3.5.1: Results for Pascal matrices without pivoting, -4 = Q R

K ~ ( - 4 ) codd not be much better than &K~(A) . For j = 8, even though R is still

ill-conditioned due to the fact that rg,g is very small, it is not at al1 equilibrated,

and becomes well-conditioned by row scaling. Notice at the same time CD is close

to 1, so hJ,(.4, Dl), dR(A, 04, and therefore n,(A) are much better than f i ~ ~ ( . 4 ) .

In this case, the smallest singular value of R is significantly smaller than that of

Rn-l. Thus K , ( A ) , the condition number for the change in Q lying in the range of

9, is spectacularly better than &&(A). This is a contrived example, but serves to -

emphasize the benefits of pivoting for the condition of both Q and R.

- (3) The third set of matrices are n x n Kahan matrices A = K,(B); see (2.2.41).

Of course without pivoting Q = 1 here, but the condition numbers depend on R

only, and these are al1 we are interested in. The results for n = 5,10,15,20,25 with

û = n/8 are shown in Table 3.5.4, where II is a permutation such that the first column

is moved to the last column position, and the remaining colurnns are rnoved to left

CHAPTER 3. THE QR FACTOMZATION

Table 3.5.2: Results for Pascal matrices with pivoting, AP = Q R

Table 3.5.3: Results for 10 x 8 matrix Aj, j = 1,. . . ,8, without pivoting -


Table 3.5.4: Results for Kahan matrices, 0 = 7r/8, Al3 = QR

one position'- this permutation II is adopted in the rank-revealing QR factorization

for Kahan matrices, see Hong and Pan [31, 19921. Again we found Dl = D?, and

only list the results corresponding to Dl. As we know the Kahan matrices correspond

to corrcctly pivoted A by standard column pivoting. From Table 3.3.4 we see t h t ,

in tliese extreme cases, with large enough n, s , (A) c m be large even with standard

pivoting. This is about as bad a result as we can get with standard coiumn pivoting

(it gets a bit worse as û -, O in R), since the Kahan matrices make the upper bound

on [ICVR' & I I F approximately reachable, see Theorem 3.4.2. However if we use the

rank-revealing pivoting strategy, we see frorn Table 3.5.4 n,(AII) is again close to its

lower bound of 1. Also we see n,(An) is significantly smaller than n,(.4). This is

due to the fact that the smallest singular value of &-1 is much srnall than that of

R,+ the leading n - 1 square part of R in A n = QR. For both of the cases with

and without rank-revealing pivoting, K'JA, Dl) still estimates s R ( A ) excellently.

In al1 these examples we see dR(A, Dl) and dR(A7 D2) gave excellent estimates for

6, ( A ) , wi t h hJR (A, D2) being marginally preferable.


The first-order perturbation analyses presented here show just what the sensitivity

(condition) of each of Q and R i s in the QR factorization of full column rank A, and -

CH..IPTER 3. T H E QR FACTORIZATION 81

in so doing provide their condition numbers (with respect to the measures used, and

for sufficiently small AA) , as well as efficient ways of approximating these. The key

norm-based condition numbers we derived for A + AA = (Q + AQ)(R + A R ) are:

4 R*L = %*(A) for that part of dQ in R ( A ) ~ , see (3.3.1),

n,(A) = fi 11 ~ i 1 ~ 11211.4112 for that part of AQ in R(A) , see (3.3.4),

the estirnate for K,(-1), that is rl,(A) = infDCDI dR(A, D),

where 4 ( A , D) 41 + K ~ ( D - ' R) , sec (3.4.3).

The condition numbers obey

for Q, while for R

see (3.4.6) and (3.4.21). The numencal examples, and an analysis of the n = 2 case

(not given here), suggest that 4 ( A , D), with D chosen to equilibrate = D-'R

subject to CD 5 1, gives an excellent approximation to 4 4 ) in the general case.

In the special case of -4 with orthogonal columns, so R is diagonal, then by taking

O = R,

see Theorem 3.4.3. For general A when we use the standard column pivoting strategy

in the QR factorization, AP = QR, we saw from (3.4.8) and (3.4.2: ) that


As a result of these analyses we see both R and in a certain sense Q can be less

sensitive than was thought from previous analyses. The condition numbers depend on

any column pivoting of A, and show that the standard pivoting strategy often results - -

in much less sensitive R, and sometimes leads to a much smaller possible change of

Q in the range of Q, for a given size of perturbation in A.

- By following the approach of Stewart [38, 1973,Th. 3-11, see also [45, 1990,Th. 2.111,

it would be straightforward, but detailed and lengthy, to extend our first-order re-

sults to provide strict perturbation bounds, as was done in Chapter 2. Our condition

nurnbers and resillting bounds are asymptotically sharp, so there is less need for strict

bounds.


Investigate the ratio K , ( A ) / K ' ~ ( A ) .

m Explore the effect of rank-revealing pivoting on rc, in both theory and compu-

tations, and study the optimization problem minp K,(PAP*).

Extend our analysis to the case where AA has the equivalent componentwise

form of backward rounding errors. In fact a new perturbation bound has been

given by Chang and Paige [9, 19953. Also some other results have been obtained

by Chang and Paige [II].

Chapter 4

The L U factorization

4.1 Introduction

The LU factorization is a basic and effective tool in numericd linear algebra: given

a real n x n matrix A whose leading principal submatrices are al1 nonsingular, there

esist a unique unit lower triangular matrix L and an upper triangular matrix II such

tliat

A = LU.

Notice here we require the diagonal elements of L to be 1. L and U are referred to

as the LU factors. The LU factorization is a "high-level" algebraic description of the

Gaussian elimination. Simple examples shows the standard algonthms for the LU

factorization are not numerically stable. IR order to repair this shortcoming of the

algorit hms, partial pivoting or cornplete pivoting is introduced in the computation.

For al1 of these details, see for example Wilkinson [53, 1965,Chap.4], Higham [30,

1996,Cliap.9] and Golub and Van Loan [26, 1996,Chap.3]).

Let AA be a sufficiently small n x n matrix such that the leading principal subma-

trices of A+AA are still al1 nonsingular, then A+AA has the unique LU factonzation

A + AA = (L + A t ) ( U + AU).

CHAPTER 4. THE LU FACTORJZATION 84

The goal of the sensitivity andysis for the LU factorization is to determine a bound

on I)dLI1 (or IALI) and a bound on I(dU1l (or IAUI) in terms of (a bound on) 11 AAll

(or lAAi). . -

The perturbation analysis of the LU factorization has been considered by a few

autliors. Given IldAl(, the first rigorous normwise perturbation bound was presented

y Barrland [2, 19911. Using a different approach, Stewart [4l, 19931 gave first-order

perturbation bounds, whicb recently were improved by Stewart [42, 19951. In [QI, L

was not assumed to be unit lower triangular, and a parameter p was used to control

how much of the perturbation is attached to the diagonals of L and U. Given IAAI,

the first rigorous componentwise perturbation bounds were given by Sun [18, 19921.

The main purpose of this chapter is to establish new Brst-order perturbation

hounds given a bound on (IdAl(, present the condition numbers, give the condition

estimators, and shed liglit on the effect of the partial pivoting and complete pkoting

on the sensitivity of the problem.

The res t of this chapter is organized as follows. In Section 4.2 we obtain expres-

sions for i ( 0 ) and ~ ( 0 ) in the LU factorization A + tG = L(t)Cr(t). These basic

sensitivity expressions will be used to obtain our new perturbation bounds in Section

4.3. In Section 4.3 we present perturbation results, first by the so called matrix-

vector equation approach, which leads to sharp bounds, then by the so called matris

cquation approach, which leads to weaker but practical bounds. We give numerical

examples in Section 4.4. Finally we bnefly summarize our findings and point out

future work in Section 4.5. .

4.2 Rate of change

Here we derive the basic results on

of L and U

how L and U change as A changes, which will be

used later.

CHAPTER 4. THE LU FACTORIZATION 85

Theorem 4.2.1 Let -4 E RnXn have nonsingular leading k x k principal subrnatrices

jor k = 1 , . . . , n with the LU jactorization A = LU, let G is a real n x n m a t e ,

and let AA = CG, for some E 2 O. If c is suficiently small such that al1 leading

prkcipal subrnatrices O/ A + tG are nonsingular for al1 Itl 5 c, then A + AA has the

L U jactorisation

A + AA = ( L + b L ) ( U + AU) ,

with A L and Au' satisfying

where ~ ( 0 ) and o(0) are defined by the unique LU factorization

and so satisfy the equations

Proof. Since al1 leading principal submatrices of A + tG are nonsingular for al1 ( t 5 e ,

A + tG has the unique LU factorization (4.2.4). Note that L(0) = L, L(e) = L + AL, U(0) = U and LI(€) = U + AU. When t = c, (4.2.4) becomes (4.2.1). It is easy to

observe that L ( t ) and U ( t ) are twice continuously difFerentiable for (tl 5 E from a

standard a l g o r i t h for the LU factorization. If we differentiate (4.2.4) and set t = O

in the result , we obtain (4.2.5), which we will see is a linear equation uniquely defining

the elernents of strictly lower triangular ~ ( 0 ) and and upper triangular ~ ( 0 ) in terms

of the elements of G. From (4.2.5) we have

CK4PTER 4. T H E LU FACTORIZATION 86

Xote tliat L-' L(O) is strictly lower triangular and Ù(o)u-' is upper triangular, thus

we have

. - L-l L(O) = s l t ( ~ - l ~ u - l ) ,

which give (4.2.6) and (4.2.7). Finally the

t = 0 give (4.2.2) and (42.3) . Cl

i r ( 0 ) ~ - l = u t ( ~ - l e u - ~ ) ,

Taylor expansions for L( t ) and U ( t ) about

4.3 New perturbation results

The ùasis for deriving first-order perturbation bounds is the equation (4.2.5) (or the

expressions (4.2.6) and (4.2.7) of its solutions). As in the proceeding chapters, ive will

now analyze the equation in two ways. The first, the matrix-vector equation approach,

provides sharp bounds, resulting in the condition numbers of the problem; while the

second, the matrix equation approach, gives practical bounds, resulting in condition

estimators. Throughout t his section we suppose dl assumptions in Theorem 4.2.1

liold, so we can use its conclusions. Also we assume llAAllF 5 E IIAIIF, hence IIGIIF I

I14AllF/r 5 ll.AllF (if E = O, al1 results we will present are obviously true). One

exception is in Theorem 4.3.3 we assume IIAAlliVm 5 e llAlli,m

4.3.1 Matrix-vector equation analysis

It is not difficult to show that with the 'uvec' notation and 'slvec' notation in (1.2.4)

the matrix equation (4.2.5) can be rewritten in the following form:

CHAPTER 4. THE LU FACTORIZATION

a n+l where W = [WL, Wu] with WL E 7Zn2'- being

CH.4PTER 4. THE LU FACTORIZATION

n(n-1) and W b E 7Zn2 7 being

It is easy to observe that after appropnate colurnn permutations, [WL, Wu] wil! be-

corne lower triangular with diagonal elements

Since U is nonsingular, W is also, and from (4.3.1)

Partitioning I V 1 into two blocks, W-' G yu , we have

slvec( L (O)) = Y& v&(G), u v e c ( ~ (O)) = Y[- vec (G) .


so taking the Znorm and using llGll 5 I lAl l~ @es

wliere equalities can be obtained by choosing G such that vec(G) lies in the space

spanned by the right singular vectors corresponding to the largest singular value of YL - and lb, respectively, and

(42.3) and (42 .3 ) that

llGllF = I I AI1 F . Therefore we see from the Taylor expansions

and for the L factor and the U factor the condition nurnbers (with respect to the

F-nom)

( A ) limsup { c-O

: ( A + d A ) = (L + d L ) ( u + AL'), Ili\A1lF < r l l ~ l l F } , WII F

We summarize these results as the following theorem.

Theorem 4.3.1 Suppose al1 the assumptions of Theoren 4.2.1 hold, and let Il AAII 5

E II Al1 then A + AA has the unique LU factorizBtion

A + AA = ( L + AL)(U + AU),

and these bounds are attainable tb first-order an-c. 0

CHAPTER 4. T H E Lu' FACTORlZATZON

4.3.2 Matrix equation analysis

In Section 4.3.1 we derived sharp perturbation bounds for the L factor and U factor,

.and presented the corresponding condition numbers. But it is difficult to estimate

the condition numbers by using the usud approach. Now we use the matrix equation

approach to derive practical perturbation bounds, leading to the condition estirnators.

First we derive a perturbation bound for the L factor. Let Un-1 denote the ieading

un-1 u (11 - 1) x (n - 1) block of Ci. If we write U = , then from (42.6) -

Denote by D, the set of al1 n x n real positive definite diagonal matrices. Let D =

diag(b,, . . . ,6,) E D,. Note that for any n x n rnatrix B we have D slt(B) = &(RB),

then from (4.3.10) we obtain

L

Yote (Idt(B)IIF 5 llBllF for any B E RnXn, so we have

Since this is true for dl D E Dn, by the Taylor expansion (4.2.2) we have

where

- K ~ ( A ) = ihf dL(A, D), - DED,

CHAPTER 4. THE Lu FACTORIZATION 91

From the definition of i;,(A) and the perturbation bound (4.3.12) we easily see - -

4 4 ) 5 K ~ ( A ) . (4.3.15)

- Also we can obtain a Iower bound on K,(A) . Let u E Rn-' be such tliat I ~ u ~ - ~ ~ v I I ~ =

I I u ~ ? , Il2 I I vllz In ( U . l O ) , take G = [envT, O], where en = (0, . . . ,O, 1)' E Rn. Tlien

it is easy to verik that

L ( O ) = e,[vTu,-Il, O],

Combining this witli the first equality of (4.3.3), we have for this special G that

which gives

or

We would like to point out that (4.3.16) can also be denved directly from the structure

of W in (4.3.1).

Now we denve a practical perturbation bound for the U factor. Notice for any

B E Rn"" and D E D, we have ut(BD) = ut(B)D, then from (4.2.7) we obtain

. - Since this is true for d l

nL(A) = inf KL(A, D), D€D,

D E D,, by the Taylor expansion (4.2.2) we have

From the definition of K"(A) and the perturbation bound (4.3.19) we see

Also we can get a lower bound on K ~ ( A ) . Let v E Rn be such that 11 ~-'v11~ =

11L-'1I2 Ilul12, and take G = ve: in (4.2.7), then combining (4.2.7) and the second

equality of (4.3.3) we can easily show

Like (1.3.16), (4.3.23) can also be showed directfy from the structure of W in (4.3.1).

These results lead to the following theorem.

Theorem 4.3.2 Wzth the same ussumptzons as in Theorern 4.3.1, A + AA has the

unique L U factorization

A + AA = (L + AL)(U + AU),

where for the L factor,-


I I~n=' i1121l~I I~ IlLll F I KL(A) 5 KL(A) inf K ~ ( A , D),

D€Dn

IIL-' Il2IIAIIF - ( A ) ( A ) inf nL(A, LI), llutlF DED,

II-~IIF I IIUF I I ~ I I ~ ~ 1 1 ~ 1 1 ~ IIUIIF~

then we have

As we know in pract ice tc2(L) is usually smdler or much smaller than Q(U) . So these

bounds suggest the L factor may be more sensitive than the U factor in practice.

However both of the right hand sides of (4.3.30) and (4.3.31) c m be arbitrarily larger

than corresponding left hand sides due to the inequality (4.3.29).

If we take D = I in both dL(Al D) and du@, D) and use IlL112 5 l l L l l ~ , llUll2 5

l l u l l ~ and IIun'ill? 5 IlU-'ll2, we have

Thus from (4.3.23) and (1.3.27) we have

CHAPTER 4. THE LU FACTORTZATION 94

which are due to Stewart [41, 19931. These perturbation bounds are simple, but

can overestimate the true sensitivity of the problem. By using the scaling technology,

Stewart [42, 19951 obtained significant improvements on the above results. In [42], the . - diagonal elements of L were not assumed to be l's, and the diagonal elements of AL

may not be O's, and a parameter p was used to control how much of the perturbation

is attached to the diagonals of L and U. The perturbation bounds given in [42] are

equivalent to

These bounds were derived by using the inequality (4.3.29), so they are unnecessarily

weak as ive pointed out in the preceding comment. (4.3.30) suggests that under the

usual assumption that the diagonal elernents of L are always l's, a better bound than

(4.3.36) could be obtained.

As we know it is expensive to estimate n,(A) and K,(A) directly by the usual

approach. Fortunately we now have other methods to do this. By van der Sluis's

Theorern 1.2.1, rc2(LD-') will be nearly minimum when each column of LD-l has

unit 2-norm, so in practice we choose D = DL = diag(llL(:, j)1I2)' then use a standard

condition estimator and a n o m estirnator to estimate $(A, DL), which costs 0(n2).

Similady, K ~ ( D - ' U ) will be nearly minimum when each row of D-'U has unit 2-

norm, then we choose D = Du = diag(llU(i,:)1I2), and use a standard condition

estimator and a norm estimator to estimate 4 ( A , Du), which costs 0 ( n 2 ) . Numerical

experiments showed n',(A, D L ) and du(A, DU) are good approximations of K,(A) and

KU(A)-

When we use the 1- and oo-norms, we can get perturbation bounds without in-

volving the scaling matrix D.

Theorem 4.3.3 Suppose all the-assumptions of Theorern 4.2.1 hold and let 11 AAllp 5

CHAPTER 4. THE LU FACTORlZATION

eIIAllp, p = l1 M, then A + AA has the unique LU factoriration

-4 + AA = ( L + AL)(U + AU),

Proof. Let G AAIE (if E = 0, the theorem is trivial), then llGllp 5 IIAllpl p = 1, m.

From (4.3.10) we have

so taking the pnorm (p = 1, m) gives

Then (4.3.38) follows immediately from the Taylor expansion (4.2.2). By using

IlAllP 5 IlLllp llqlp, (4.3.39) foU0ws-

The results (4.3.40) and (4.3.41) can similarly be proved. 0

Note cocdp(L-') is invariant under the column scaling of L and cond,(U) is in-

k a n t under the row scaling of U. These make (4.3.38) and (4.3.40) look simpler

than (4.3.8) and (4.3.9), respectively, where the Frobenius n o m is used.

As we know the standard algorithrns for LU factonzation with no pivoting are not

numerically stable. In order to repair this shortcoming of the algonthms, partial or

complete pivoting should be incorporated in the computation. Do these two pivoting

strategies have effects on the sensitivity of the factonzation? Let us see the following

theorem.

CHAPTER 4. THE LU FA4CTORIZATrON 96

Theorem 4.3-4 If partial pivoting is used in the LU facto7-ization: P A = LU, where

P is a permutation matriz. Then

1 1 5 tcy(PA) < &;(PA) < J2n(n + 1)(4" + 6 n - I) inf K ~ ( D - ' U ) . (4.3.43)

D€Dn

If contplete pivoting is used in the LU factofization: PAQ = LU, where P and Q are

permutation matrices. Then

Proof. If partial pivoting or cornplete pivoting is used in computing the LU factoriza-

tion, then l i j 5 1 for i > j. Since lii = 1, I(LT)iil 2 I(LT)ij 1 for d l j > i. By the proof

of Theorern 1.2.2 we see I l L-Il12 5 J4n + 672 - 113. Also note IILIIF 5 d-. Then (4.3.42) and (4.3.44) follow from (4.3.26) by taking D = 1.

Note $ ( A , D) 5 n2(L)n2(D-'LI) (see (4.3.31)) and IlUll~ = IIL-'AIIF < IlL-'112 IIAIIF,

then (4.3.13) foilows from (4.3.28). If cornplete pivoting is used, then luiil > luij 1 for

al1 j > i. Thus by Theorern 1.2.2 we have y ( ~ - L U ) 5 +n(n + 1)(4" + 672 - 1)/6 with D = diag(U). Then from (4.3.43) we obtain the much better result (4.3.45).

O

When partial pivoting or complete pivoting is used, 4 L ) is bounded by a function

of n, but possibly 11~;2~11 may become larger, thus from (4.3.42) and (4.3.44) we

cannot see that xL(PA) and K ~ ( P A Q ) are larger or smaller than K ~ ( A ) . (4.3.42) and

(4.3.44) also suggest there is no big difference between the effects of partial pivoting

and complete pivoting on the sensitivity of the L factor. Similarly from (4.3.43) we

are not quite sure if sU(A) will becornelarger or smaller when partial pivoting is used.

c But note there is an essential difference beheen the upper bound on r(,(PA) and

CHAPTER 4. THE LU FACTORIZATION 97

that on x L ( P A ) - the former has a choice D, which may malie i n f D , ~ . K*(D-'u) not

incrcase much, so the possibility that n,(A) will becorne smaller seems higli. From

(4.3.45) we see complete pivoting can give a significant improvement on sU(A). . -

4.4 Numerical experiments -

In Section 4.3 we presented first-order sharp perturbation bounds for the LU factors,

ob tained the corresponding condition numbers KL(A) and K,(A) , and sugges ted n, (A)

and )iu(A) could be respectively estimated in practice by dL(A, DL) and n',(A, Du)

with DL = diag(1) L(:, j)l12) and DU = diag())U(i, :)Il2). The condition numbers and

condition estimators satisfy the following inequalities (see (1.2.14), (1 .2.15), (4.3.26),

(4.3.25), (4.3.32), and (4.3.33)):

Also we discussed the effects of partial pivoting and complete pivoting on those con-

dition numbers.

Now we give some numerical experiments to illustrate our theoretical analyses.

The matrices have the f o m A = DIBDz , where Di = diag(1, d l , . . . , q-'), D2 =

diag(l,d2,. . . , G-') and B is an n x n random matrix (produced by the MATLAB

function randn). The results for n = 10, d l , d2 E {0.2,1,2} and the same matrix

B are shown in Table 4.4.1 without pivoting, in Table 4.4.2 with partial pivoting,

We give some comments on the results.

CHAPTER 4. THE L Il' FACTORJZATION

Table 4.4.1: Results without pivoting, A = LU

Table 4.4.2: Results with partial pivoting, A P A = ZÜ

Br. ~ ( 4 4 ( A t D r ) Pu %(A) rc',(A, Du) P 3.5e+09 3.5e+09 8.5e+09 1.6e+OO 1 .?e+OO 3.le+00 6.6e+ll


Table 4-43: ilesults with cornplete pivoting, Â = P A Q =

The results confirm that P = I I L-' I l 1 11(1-'112 llAllF can be much larger than

sL(-4) and r;,(A), especially for the latter, so the first-order bounds ( 43 .34 )

and (4.3.35) can significantly overestimate the true sensitivity of the L U factor-

izat ion.

dL(A, DL) and n',(A, Du) are good approximations of n,(A) and K ~ ( A ) , respec-

tively, no matter whether pivoting is used or not. This is also confirmed by our

other numerical experiments.

Both K , ( P A ) and K, (PAQ) can be much larger or smaller than &,(il). So

partial pivoting and cornplete pivoting can make the L factor more sensitive or

less sensitive. But from Tables 4.4.1-4.4.2 we see partial pivoting can give a

significant improvement on the condition of the U factor. In fact here K,(PA) 5

&,(A) for al1 cases. From Table 4.4.3 we see th$ complete pivoting can give a

more significant irnprovement .

It can be seen for most cases the L factor is more sensitive than the U factor

no matter whether pivoting is used or not.

When partial pivoting or complete pivoting is used, we see both K , and i, are


close to their lower bounds PL and Pu, respectively.

4 5 Summary and future work

The first-order perturbation analyses presented here show what the sensitivity of each

of L and U is in the LU factorization of A, and in so doing provide their condition

numbers riL(A) and >c,(A) (with respect to the measures used, and for sufficiently

srnall AA), as well as efficient ways of approximating these.

As we lcnow K*(L) is usudly (much) smaller than I E ~ ( U ) , especially in practice

when we use partial pivoting in computing the LU factotization. So we can expect

that the computed solution of the linear system Lx = b will usually be more accurate

than that of the linear system Uy = b. However our analysis and numerical exper-

iments suggest that usually the L factor is more sensitive than the U factor in the

LU factorization, so we expect U is more accurate than L. This is an interesting

phenomenon. Also we see the effect of partial pivoting and complete pivoting on the

sensitivity of L is uncertain - both K,(PA) and K,(PAQ) can be much Iarger or

smaller t han n,(A). But partial pivoting can usually improve the condition of Lr, and

corn plete pivoting can give significant improvement.

In the future we wouid like to

rn Investigate the ratios rc,(A)/rl,(A) and I E ~ ( A ) / d L ( A ) .

a Extend our analysis to the case where lAAl < elAl. .In fact some results have

been obtained by Chang and Paige (121.

Chapter 5 -

The Cholesky downdating

problem

5.1 Introduction

In this chapter we give perturbation analyses of the following problem: given an

upper triangular matriv R E RnXn and a matrix X E R~~~ such that R ~ R - X ~ X

is positive definite, find an upper triangular matrix U E RnXn with positive diagonal

elenlents such t hat

U ~ U = R ~ R - X*X. (5.1.1)

This problem is called the block Cholesky doumdating problern, and the matrix LI

is referred to as the downdated Cholesky factor. The block Cholesky downdating

problem has many important applications, and the case for k = l has been extensively

studied in the literature (see [l, 5, 6, 19,-25, 24, 36, 37, 40)).

Let AR and AX be real n x n and k x n matrices, respectively, such that ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + A X ) is still positive definite, then this has the

unique Cholesky factorization

(LI + A U ) ~ ( U + A U ) = ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + A X ) .

CK4PTER 5. THE CHOLESKY DOWNDATING PROBLEM 102

The goal of the perturbation analysis for the Cholesky downdating problem is to

determine a bound on llAUll (or IAUI) in tems of 11 AR11 (or IARI) and llAXll (or

IW). - .-

The perturbation analysis for the Cholesky downdating problem with nom-bounded

changes in R and X has been considered by several authors. Stewart 140, 19791 pre-

sented perturbation results for single downdating (Le. k = 1). Eldén and Park [22,

19941 made an analysis for block downdating. But these two papers just considered

the case that only R or X is perturbed. More complete analyses, with both R and X

being perturbed, were given by Pan [35, 19931 and Sun [50, 19951. Pan 135, 19931 gave

first-order perturbation bounds for single downdating. Sun [50, 19951 gave rigorous,

also first-order perturbation bounds for single downdating and first-order perturba-

tion bounds for block downdating. Recently Eldén and Park (23, 19961 gave new

firs t-order perturbation bounds for block downdating. Unfortunately t here ivas an

error in their paper when the result of Sun [46, 19911 was applied in denving the

perturbation bound. Because of this, the results presented in [23] will not be cited in

this chapter.

The main purpose of this chapter is to establish new first-order perturbation re-

sults and present new condition numbers which more closely refiect the true sensitivity

of the problem. In Section 5.2 we will give the key result of Sun [50, 19931, and a

new result using the approach of these earlier papers. In Section 5.3 we present new

perturbation results, first by the matrix-vector equation approach, then by the ma-

trix equation approach. We give numerical results and suggest prac tical condition

estimators in Section 5.4. Finally we briefly summarize our findings and point out

future work in Section 5.5. Most of the results have been presented in Chang and

Paige [IO, 19961.

Previous work by others implied the change AR in R was upper tnangular, and

Sun [50, 19951 said this, but neither he nor the othersmade use of this fact. In fact a

backward stable algorithm for computing U given R and X -would produce -the exact

CHAPTER 5. THE CHOLESKY DOWNDATLNG PROBLEM 1 03

resuit Uc = U + AU for nearby data R + AR and X + AX, where it is not clear that

AR would be upper triangular - the form of the equivalent backward rounding error

AR would depend on the algonthm, and if it were upper triangular, i t would require - -

a rounding error analysis to show this. Thus for compieteness it seems necessary to

consider two separate cases - general AR and upper triangular AR. We do this

throughout Sections 5-3-54? and get stronger results for upper triangular AR than

in the general case.

In any perturbation analysis it is important to examine how gooci the results are.

In Section 5.3.1 we produce provably tight bounds, leading to the true condition

numbers (for the norms chosen). The numerical exarnple in Section 5.4 indicates how

much better the results of this new analpis can be compared with some earlier ones,

bu t a theoretical understanding is also desirable. By considering the asymptotic case

as ,Y -, 0, the results simpiify, and are easily understandable. We show the new

results have the correct properties as X -+ O, in contrast to earlier results.

5.2 Basics, previous results, and an improvement

Let I' satisfy rTr = 1. - R - T ~ T ~ ~ ~ - l (so r would be the Cholesky factor of In -

- R - ~ x ~ x R - ' ) , and let o,(r) be the smallqt s ingulcv&e o f f. Notice that for - p p p p p p p - - - - - - - - - - - - - -

fixed R, rTr -+ In as S - O, so o,(r) -. 1. First we derive some relationships

arnong U, R, ,Y and r. 1) Frorn (5.1.1) obviously we have

2) From (5.1.1) it follows that

CHAPTER 5. THE CHOLESICY DO WNDATING PROBLEM

so that taking the h o r m gives

- - 3) From (5.1.1 ) we have

- whicli, combined with (5.2.2), gives

4) From (5.1.1) we have

which, combined with (5.2.2), gives

5 ) By (5.2.2) we have

6) Finally from (5.2.4) we see

Now we derive the basic result on how U changes as R and X change.

Theorem 5.2.1 Suppose we have an upper tn'angular matriz R E RnXn and a matriz

X E RkXn wzth the Cholesky factonzation (ITU = RTR - X ~ X , where U E RnXn is

upper tn'angular with positive diagonal elements. Let G k a real n x n matriz, and

let F be a real k x n matriz. Assume AR = EG and AX = E F , for some c 2 0 . If

CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM

then there is a unique Cholesky factorization

(5 + A U ) ~ ( V + A U ) = ( R + AR)=(R + AR) - ( X + AX)=(X + AX), (5.2.8)

with 4 U satisfyzng

AU = e ~ ( 0 ) + O ( * ) , ( 5 -2.9)

where o(0) is defined by the unique Cholesky factorization

and so satisfies the equations

where the 'up ' notation is defined by (1.2.3).

Proof. If IldRR-1112 5 1, then it is easy to show R + tG is nonsingular for ail 111 5 c.

Notice for al1 1 t 1 5 E ,

and

then if (5.2.7) holds, ( ~ + t ~ ) ~ ( ~ + t G ) - ( x + t F ) ~ ( x + t F ) is positive definite and has

the unique Cholesky factorization (5.2.10). Notice that U ( 0 ) = U and U ( c ) = U+AU,

so (5.2.8) holds.

It is easy to verify that U ( t ) is wice continuously differentiable for Itl 5 É from

t h e algorithm for the Cholesky factorization. If we differentiate (5.2.10) and set t = O

CHAPTER 5. THE CHOLESKY DO WNDATl2VG PROBLEM 106

in the result, we obtain (5.2.11), which, Iike (2.2.5), is a linear equation uniquely

defining the elements of upper triangular o(0) in terms of the elernents of G and F .

Witli the 'up' notation in (1.2.3) we see (5.2.12) holds. Finally the Taylor espansion - -

for U ( t ) about t = O gives (5.2.9) a t t = E. 0

By Theorem 5.2.1 we derive a new first-order perturbation bounds for the block

Cholesky d6wndating problem, from which the first-order perturbation bound given

Theorem 5.2.2 Suppose we have an upper triangular matriz R E Rn '" and a matriz

X E R " ~ wwith the Cholesky factorizotion UTU = R ~ R - X ~ X , where U E RnXn is

upper tn'angular vith positive diagonal elements. Let AR be a real n x n matriz, and

let d X be a red k x n matriz. Define E , m llARllF/ll R1I2 und ex IldxllF/llxll2- Set c = mau{e,, e x } . If

then there is a unique Cholesky factoriration

(U + A U ) ~ ( U + A U ) = ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + AX),

where

ProoJ Let G = Ai?/€ and F = dX/r (if 6 = 0, the theorem is trivin;), then

It is easy to verify that (5.2.13) implies that (5.2.7) holds, so Theorem 5.2.1 is applica-

ble here. From (5.2.12) and the fact that for any syrnmetric B, I I up(B) II 5 & I I B ~ ~ (see (1.2.7)) we have with (5.2.15) that

't

CH.4l'TER 5. THE CHOLESKY DOWNDATING PROBLEM

which, combined with (5.2.2) and (5.2.3), gives

-Then (5.2.14) follows from the Taylor expansion (5.2.9). If

Froni (5.2.14) we see

can be regarded as the condition estirnators for U with respect to relative changes in

R and ,Y, respectively. Notice from (5.2.1) we see 4, > &, so we can define a new

overali c ~ n d i tion estimator

and combine it with (5.2.5) and (5.2.6), then we obtain Sun's bound

whicli leads to the overall condition estimator proposed by Sun:

We have seen the right hand side of (5.2.14) is never worse than that of (5.2.18), and

Although Q is a minor improvement on P , it is still not what we want. We can

see tliis from the asymptotic behavior of these condition estimators. The Cholesky

factorization is unique, so as X -+ O, LI -t R, and X T d X -> O in (5.2.8). Now

for any upper triangular perturbation AR in R, AU -t AR, so the tme condition

nurnber should approach unity. Here ,B, -4 + &Q(R). The next section shows how

we can -0vercorne this inadequacy.


5.3 New perturbation results

In Section 5.2 we saw the Iiey to deriving first-order perturbation bounds for U in

- the block Cholesky downdating problem is the equation (5.2.11). We will now ana-

lyze it by two approacbes. The first approach, the matriu-vector equation approach,

gives sharp perturbation - bounds, which lead t o the condition numbers for the block

Cholesky downdating problem, while the second, the the matrix equation approach,

gives a clear improvement on other earlier results, and provides practical condition

estimators for the true condition numbers. Al1 Our discussion is based on the same

assumptions as in Theorem 5.2.2.

5.3.1 Matrix-vector equat ion analysis

The niatrix-vector equation approach views the matrix equation (5.2.11) as a large

matrix-vector equation.

First assume AR is a general real n x n m a t f i . It is easy to show (5.2.11) can be

rewritten in the following matrix-vector form (cf. Chapter3):

CK4PTER 5. THE CHOLESKY DOWNDATING PROBLEM

CHAPTER 5. THE CHOLESKY DOIVNDATING PROBLEM

Since U is nonsingular, Wu is also, and from (5.3

where for any R and ,Y equality can be made by choosing G and F such that

Shen from the Taylor expansion (5.2.9), we see

and the condition numbers for U with respect to relative changes in R and X are

(here subscript G refers to geneml AR, and Iater the subscript T will refer to upper

triangular AR)


and

- (X + AX)*(X + A X ) , c = ~l~xll~/llXll~) (5.3.8)

respectively. Then a whole condition number for the Cholesky downdating problem

with general AR can be defined as

KCDC(R, X) m a ~ ( n R ~ ( R 9 nx (R, X ) } . (5.3.9)

By the definitions of xRG(R, X) and rc,(X, R), it is easy to verify from (5.2.14)

and (5.2.16) tha t

~ n o ( R x ) 5 d n , rcx(R,X) I dx, (5.3.10)

It is easy to observe that if X -, O, tc,,,(R,X) + IIW;~Z&, where WR is

just Wu with each entry uij replaced by r ~ . If R was found using the standard

pivoting strategy in the Cholesky factorization, then II w&& has a bound which

is a function of n alone (see Theorem 3.4.2). So in this case our condition number

nc,,(R, X) d s o has a bound which is a function of n alone as X -r O.

Now we consider the case where AR is upper trianguZur. (5.2.11) can now be

rewritten in the following matrix-vector fom: .

Wu u v e c ( ~ ( 0 ) ) = WRuvec(G) - Yxvec(F), -

CHAPTER 5. THE CHOLESKY DOWNDATrlVG PROBLEM 112

a n+l where IV,-, E ~9%- and 1% E R*'" are defined as before. and Wn E

n ( n + l l n ( n + l )

R - X I is just WU with each entry u, replaced by T * j . Since O' is iionsingular,

Wu is also, and from (5.3.12) - -

so taking the 2-nom givés

where, like (5.3.3), (5.3.14) will become an equality if we choose G and F as in (5.3.4)

and (5.3.5) with ZR there replaced byAWR. Then from the Taylor espansion (5.2.9),

we see

and the condition numbers for U with respect to relative changes in R and S are

(subscript T indicates upper triangular AR)

and

K,(R,X) ZE lim çup { c - 4

l l A u l l ~ : (u + A U ) ~ ( U + au) = R ~ R W l l 2

respectively. Note n , ( R , X ) is the same as that defined in (5.3.8). T h e n a whole

condition number for the Cholesky downdating problem with upper triangular AR

CHA PTER 5. THE CHOLESKY DO WNDATlNG PROBLEM

can be defiiied as

nCD,(R, X ) - rnau{~,,(R, X), ~ ~ ( 2 , X ) ) .

- Since K,,(R, X) is for a general AR. certainly we have

which can also be proved directly by the fact that the columns of WR form a proper

subset of the columns of ZR. Thus from (5.3.9) and (5.3.18) we see

&cm-(R1 JY) 5 G ~ D G ( R X)- (5.3.20)

If as well X -+ O, then since U + R, wtlwR + Imi.+i1, and r;,,,(R,X) -, 1.

So in tliis case the Cholesky downdating problem becomes very well conditioned no

matter iiow ill-conditioned R or U is.

Finally we summarize the results above as the following theorem.

Theorem 5.3.1 With the same assumptions as in Theorem 5.2.2, there i s a unique


(U + A U ) ~ ( U + AU) = ( R + A R ) ~ ( R + AR) - (X + A X ) ~ ( X + AX),

There are the following relatzonships among the vanous measums of sensitivity of the

problem (see (5.3.10), (5.3.11), (5.3.19) and (5.3.20)) :

CHAPTER 5. THE CHOLESKY DO WNDATING PROBLEM

5.3.2 Mat rix equat ion analysis

.\s far as we see, the condition numbers obtained in the last section are espensive

to- compute or estimate directly with the usual approach. We now use the matris

cquation approach t.o obtain practical condition estimators.

In Theorem 5.2.2 we used the expression of Ü(0) in (5.2.12) to derive a new first-

orcler perturbation bound (5.2.14), frorn which Sun's bound was derived. Now ive

again look at (5.2.12), repeated here for clarity:

Let D, be the set of al1 a x n real positive definite diagonal matrices. For any

D = diag(bi,. . . ,6,) E D,,

up(BD-') = u p ( B ) P 1 and

First witli general AR we

let LI = DU.- Note that for any matris B we have

U ~ ( D - ~ B ) = D-' u ~ ( B ) .

have from (5.3.21) that

so taking the F-norm gives

wliere CD = - ma~~<~<~<,{6~/6~). - Thus from (5.3.22) we have

CHAPTER 5. THE CHOLESKY DOWnDATING PROBLEM

which leads to the following perturbation bound in terms of relative changes

Naturally we define the following two quantities as condition estimators for U with

respect to relative changes in R and X, respectively:

K',,(R,S)= inf KR,(R,X,D), K L ( R , X ) = inf r iL (R ,X ,D) , (5.3.24) DED, DED,

where

Then an overdl condition estimator can be defined as

K',,,(R, X ) = inf riC,,(R, X, D), D E D ~

where

K',,,(R, X, D) = max{n',,(R, X , D ) , &(R, X, D ) } .

Since IISI12 IIRlls, we see

which gives

KC,G(R, X ) = &(R, X ) l n',(R, X ) .

Therefore with these, we have from (5.3.23) that

CHAPTER 5. T H E CHOLESKY DO WNDATLNG PROBLEM

Clearly if we take D = In, (5.3.23) will become (5.2.14), and

It is not difficult to give an example to show # c m be arbitrarily larger than hJcDC(R, X),

as can be seen from the following asymptotic behaviour.

If S -+ O we saw Li -+ R and o,(I') -r 1, so

It is shown in Tlieorem 3.4.4 that with an appropriate choice of D, 41 + c: K@-' R)

has a bound wliich is a function of n only, if R was found using the standard pivoting

strategy in the Cholesky factorization, and in this case, we see dcD,(R, S) is bounded

independeiitly of n2(R) as X + 0, for geneml AR. At the end of this section we give

an even stronger result when X -+ O for the case of upper triangular AR. Note in

the case here that 4 in (5.2.17) can be made as large as we like, and thus arbitrarily

larger than rl,,,(R, X).

By the definitions of rtRG(R, X) and tcX(R7 X) respectively in (5.3.7) and (5.3.8),

we can easily verify from (5.3.29) that

KRC(R, I K ~ o ( ~ , X ) , ~ x ( R 1 X , I K ~ ( ~ Y (5 -3.32)

In the case where AR is upper triangular (so G is upper triangular), we can refine

the analysis further. From (5.3.21) we have

CHAPTER 5. THE CHOLESKY DOWNDATIiVG PROBLEM 117

Notice with the 'slt', 'sut' and 'diag' notation defined in (1.2.1) and (1.2.2),

But for any upper triangular matrix T we have

so that if we define T = d i a g ( ~ - T ~ T ) - GU-', then

th dia^(^-^^^) - GU-' + u-~G* - d i a g ( ~ ~ - ' ) ] = d iag (~ -T R ~ ) - GU-'. (3.3.36)

Thus from (5.3.34)' (5.3.35) and (5.3.36) we obtain

As before, let U = DG, where D = diag(bl,. . . ,6,) E D,. From (5.3.37) it follows

that

Tlien, applying (5.3.2) to this, y e have

CHAPTER 5. THE CHOLESKY DOWNDATIiVG PROBLEM

which leads to the following perturbation bound

Comparing this wit h (5.3.23) and noticing (5.2.3), we see the coefficient multiplying ex

does not change, so d x ( R , X) defined in (5.3.24) can still be regarded as a condition

estimator for U with respect to changes in X. But we now need to define a new -

condition estimator for U with respect to upper triangular changes in R, that is

( R X ) inf dRT(R, X, D), DED,

where

Thus an overall condition estimator can be defined as

KC,T(R, X ) = inf &,,(R, X , D), DED,

Obviously we ha-e

With these, we have from (5.3.38) that

What is the klationship between rc',,,(R, X) and IEICDG(R, X ) = &(R, X ) ? For

any n x n upper tnangular matrix T = (t,), observe the following two facts: -

CHAPTER 5. THE CHOLESKY DOW2VDATDVG PROBLEM 119

1) t i i , i = 1,2,. . . ,n are the eigenvalues of T, so that ltiil 5 IITllz. From this it

follows t ha t

. - Ildiag(T)112 5 11T112-

so that

K',,(R, X) I (1 + Jn)&,(R, X).

Thus ive have from (5.3.28) and (5.3.41) that

&,,(R, X) I (1 + Jn)n',,,(R, X). (5 -3 -44)

On the other hand, iC,,,(R, X) can be arbitrarily smaller than dc,,(R, X). This

can be seen from the asymptotic behaviour, which is important in its own right. AS

X + O, since U + R, o,(I') 4 1 and RU-L -+ In, we have

so for upper triangular changes in R, whether pivoting was used in finding R or not,

CHAPTER 5. THE CHOLESKY DOWnDATRVG PROBLEM 120

Thus when X -. 0, the bound in (5.3.42) reflects the tme sensitivity of the problem.

For the case of general AR, if we do not use pivoting it is straightfonvard to make

KL~,(R, X) in (5.3.27) arbitrarily large even with X = O, see (5.3.25). . -

By the definition of n,(R, X) in (5.3.16), we can easily verify from (5.3.42) that

Thus from tliis and the second inequality in (5.3.32), it follows with (5.3.18) and

(5.3.41) that that

~ z o = ( R , X) < K > ~ ~ ( R , X). (5.3.46)

Now we summarize these resiilts as the following theorem.

Theorem 5.3.2 With the same assvmptzons as in Theorem 5.2.2, there is a unique


(Li + A U ) ~ ( L I + AU) = (R + A R ) ~ ( R + AR) - ( X + AX)*(X + AX) ,

where for geneml AR,

and /or upper triangular AR,

There are the following relationships among the various meusures of sensitivzty o/

the problem (see (5.3.28), (5.3.30), (5.3.32), (5.3.33), (5.3.43), (5.3.44), (5.3.45) and

(5.3.46)) :

CHAPTER 5. THE CHOLESKY DOWNDATDVG PROBLEM 121

Our numencal experiments suggest dc,,(R, X) is usually a good approximation

to ri,,,( R, X). But the following example shows dcDG(R, X) can sometimes be arbi-

t rarily larger t han K,,, ( R, X ) . . -

where 6 is a small positive number. It is not difficult to show

But dCDG(R, X) has an advantage over n,,,(R, X) - it can be quite easy to estimate

- al1 we need to do is to choose a suitable D in dC,,(R, X , D). We consider how to do

this in the next section. In contrast n,,,(R, X) is, as far as we can see, anreasonably

expensive to compute or estimate.

Numerical experiments also suggest d,,,(R, X ) is usually a good approximation

to xCDT(R, X). But sometimes d,,,(R, X) can be arbitrarily larger than K,,,(R, X).

This can also be seen from the example above. In fart, it is not difficult to obtain

Like rcCDG(R, X), n,,,(R, X) is difficult to compute or estimate. But dc,,(R, X) is

easy to estirnate, which is discussed in the next section.

5.4 Numerical experiments

In Section 5.3 we presented new first-order perturbation bounds for the the downdated

Cholesky factor U using first the matrix-vector equation approach, and then the

matrix equation âpproach. We defined n,,,(R, X) for generd AR, and K = ~ = ( R , X)

- for upper triangular AR, as the overall condition nurnbers of the probkrn. Also we '

CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM 122

gave two corresponding practicd but weaker condition estimat ors dcD, ( R, ,Y) and

G D T ( R i X) for the two AR cases.

CVe would like to choose D such - -

approximations to d D G ( R , X) and

(5.3.26) and (5.3.39) that we want

that dcD,(R1 X, D) and dcDT(R, X, D) are good

rc<,,,(R, X ) , respectively. We see from (5.3.25),

to find D such that J1+(2, Q(D-'u) approx-

irnates its infimum. That is the same problem we faced in Section 3.5. We adopt

the best rnethod of choosing D proposed there. Specificdly take G = ,/wl Ci = 4- if Jm 5 ci-' othernrise Ci = for i = 2, . . . , n. Then NT use

a standard condition estimator to estimate K ~ ( D - ' U ) in 0(n2) operations.

Notice from (5.2.4) we have o,(r) = 41 - IIXR-'IJ;. Usually k, the nurnber of

rows of X, is much smaller than n, so on(î) can be computed in 0(n2). If k is not

much smaller than n, then we use a standard norm estimator to estimate IISR-1112 in

0(n2). Similady llU I l z and I I R1I2 c m be estimated in 0(n2). So finally G D G ( R 9 X, D)

can be estimated in 0 ( n 2 ) . Estimating dc,,(R, X, D) is not as easy as estimating

hi$,,(R,X,D). The part Ildiag(RU-1)112 in dRT(R, X, D) can easily be cornputed

in O(n), since diag(RU-') = diag(rll/ull, . . . , rn,/un,). The part (Isut(RU-')(12 in

rc',,(R, X, D) can roughly be estimated in 0(n2) , based on

and the fact that I I RU-' (1 F can be estimated by a standard n o m estimator in 0(n2).

The value of I(XU-'I12 in n',(R,X, D) can be calculated (if k < n) or estimated

by a standard estimator in 0(n2) . Al1 the remaining values lIR1l2, llXllz and 11U112

can also be estimated by a standard n o m estimator in 0(n2). Hence dR,(R, X, D),

&(R, X, D), and thus K)CDT(R, Xl D) can be estimated in 0 ( n 2 ) . For standard con-

dition estimators and norm estimators, see Chapter 14 of Higham [30, 19961.

The relationships arnong the various overall measures of sensitivity of the Cholesky

CHAPTER 5. THE CHOLESKI' DOWNDATRVG PROBLEM 123

downdating problem presented in Section 5.2 and Section 5.3 are as follows.

2 4 > GDG(Ry X) 2 K C D ~ ( R , X ) 2 K C D T ( R : X),

- - (1 + J~ )&, , (R ,~ ) 2 .',,,(~,x) r k c m ( R ? X ) -

Now ive give one numerical example to illustrate these. The example, quoted from

Sun [50, 19951, is as follows.

2 3 4 R = diag(l ,s ,s ,s , s )

where c = 0.95, s = diic'. T h e resultç obtained using MATL.48 are shown in

Table 5-4.1 for various values of 7:

Table 5.4.1: Results for the exarnple in Sun's paper

Note in Table 5.4.1 how P and 4 c m be far worse than the condition num-

bers K ~ ~ ~ ( R , X) and n,,,(R,X), although q5 is not as bad & P . Also we observe

CHAPTER 5. THE CHOLESKY DO WNDATING PROBLEM 124

that hJ,,,(R, X, D ) and dcDT(R7 X, D ) are very good approximations to rl,,,(R, X)

and K,,,(R, X) , respectively. When X become srnall, al1 of the condition num-

bers and condition estimators decrease. The asymptotic behavior of n',.,(R, X, D), - *

hGDT (R, ,y, D), nCDc(R, X ) and K,,,(R, X) coincides wit h our theoretical results:

when X + O, dc,,(R, X) and n,,,(R, X) will be bounded in terms of n since here

R is actually Ks(arccos(0.95)), a Kahan matrix, which corresponds to the Cholesky

factor of a correctly pivoted A, and hiDT(R7 X ) , K , ~ = ( R , X) -) 1.


The first-order perturbation analyses presented here show just what the sensitivity of

the Cholesky downdating problem is, and in so doing provide the condition numbers,

as well as efficient ways of approxirnating these. The key rneasures of the sensitivity

of the problem we derived are:

For general AR:

- overall condition number: >cc,,( R, X) , see (5.3.9),

- overall condition estimator: K',,,(R, X) r idDED, dcDG(Rl X, D), see

(5.3.27),

For triangular AR:

- overall condition number: K,,,(R, X), see (5.3.18),

- overall condition estimator: dcDT(R, X) infDEDn dcDT(R, X, D), see

(5.3.40).

These quantities and the condition estimators 4 (see (5.2.17)) and /3 (see (5.2.19))

obey

CHAPTER 5. T H E CHOLESKY DOWNDATLlVG PROBLEM 125

For the asymptotic case as X -r O, krCDO(R, X) and 4 , , ( R , X) will be bounded in

terms of n, and K,,,(R, X), dc,,(R, D) + 1, while P and q5 have no such properties.

Recently Stewart [42, 19951 presented a bachward rounding error analysis for the - - block downdating algorithm presented by Eldén and Park. It would be straightfor-

ward to combine Our results here with Stewart's result to give a fomard error estimate

for the computed U. But ive choose m t to do this here in order to keep the material

as simple as possibIe.


Give better approximations to riCD,(R, X) and nCD,(R, X) than h&,(R, S)

and rt',,,(R, X).

0 Extend our analyses here to other cases, such as that when AR and A S corne

from a componentwise backward rounding error analysis.

Chapter 6

Conclusions and future research

.A new approach, the so called 'matrix-vector equation approach' has been developed

here for the perturbation analysis of matrix factorizations. The basic idea of this

approach is to write the perturbation matrix equation as a matrix-vector equation by

using the special structure and properties of the factors. Using this approach we ob-

tained tight first-order perturbation results and condition numbers for the Cholesky,

QR and LU factorizations, and for the Cholesky downdating problem. Our pertur-

bation bounds give significant improvements on the previous results, and could not

be sharper.

Also we used the so called 'matrix equation approach' originated by G. W. Stew-

art to obtain perturbation bounds that are usually weaker but easier to interpret,

leading to condition estimators which are easily estimated by the standard condition

estimators (for matrix inversion) or norm estimators. Our experiments suggested

that for the Cholesky, QR and LU factorizations with nom-bounded changes in the

original matrices the condition estimators are very good approximations of the cor-

responding condition numbers. Also Our numerical experiments sugges ted for the

Cholesky factorization with component-bounded changes in the original matrïx and

the Cholesky downdating problem with nom-bounded changes in the original matri-

ces, the condition estimators are usually good approximations of the corresponding

CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH

condition numbers, even though some counter-esamples were found.

The matrix-vector equation approach is a powerful general tool, and appears to

be applicable to the perturbation aoalysis of any matrix factorization. The rnatrix - - equation approach is also fairly general, but for each factorization a particular treat-

ment is needed. The combination of these two approaches gives a deep understanding

of these problems. Although first-order perturbation bounds are satisfactory for al1

but tlie most delicate work, we also gave some rigorous perturbation bounds for the

- Cliolesky factorization.

111 computing these factorizations, standard pivoting is often used to improve the

stability of the algorithms. We showed that the condition of these factorizations is

significantly improved by the standard pivoting strategies (except the L factor in the

LU factorization), and provided firmly based theoretical explanations as to why t his

is so. This extremely important information is very useful for designing more reliable

matris aigorithms.

In the future we hope to continue this research in several directions:

To analyze the Cholesky, LU and QR factorizations, and Cholesky downdating

of general matrices, where perturbations have special structure, for example, by

assuming the perturbation has the form of the equivalent backward rounding

error from a numerically stable computation of the factorization (some results

for the Cholesky factorization have been given in this thesis). Such structure

leads to improved sensitivity results.

To analyze other factorizations of general matrices for both generd perturba-

t ions and structured perturbations.

To extend Our approach to

plicat ions mat rices and the

in a numerically stable way

the factorizations of special matrices. In many a p

resulting factorizations used to solve the problems

have some special structure. Applying the existing

CHAPTER 6. CONCLUSIONS AND F U T U R E RESEARCH 128

general perturbation results to these special problems will result in overestima-

tion of the true sensitivity. Our new approach to such perturbation analyses

can make full use of the structure, so should lead to results which closely reflect - -

the true sensitivity of the problems.

References

[1] S. T. Alexander, C.-T. Pan and R. J. Plemmonç, Analysis of a recursive least

squares hyperbolzc rotation algorithm for signal processzng. Linear Algebra Appl.,

98 (1988), pp. 3-40.

[2] A. Barrland. Perturbation bounds for the LD L~ and the L U factorizations. BIT,

31 (1991), pp. 358-363.

[3] R. Bhatia. Matriz factorizations and their perturbations. Linear .4lgebra and

Appi., 197,198 (1994)) pp.245-276.

[4] A. Bjorck and G. H. Golub. Numerical methods for computing angles between

linear subspaces. Math. Comp., 27 (1973), pp. 579-594.

[5] A. Bjorck, H. Park and L. Eldén. Accurate downdating of leust squares solutions.

SIAM J. Matrix Anal. Appl., 15 (1994), pp. 549-568.

[6] A. W. Bojanuyk, R. P. Brent, P. Van Dooren, and F. R. de Hoog. A note on

downdating the ChoZesky factoriration. SIAM J . Sci. Stat. Cornput., 8 (1987),

pp. 210-221.

[7] P. A. Businger and G . H. Golub, Linear least squares solutions b y Householder

Dansfonnations. Numer. Math., 7 (1965), pp. 269-276.

[8] L W . Chang. Perturbation analyses for the Cholesky factorization with back-

ward rounding en-ors. Technical Report SOCS-96.3, School of Cornputer Science,

blcGill University, 1996. - *

[9] X.-Ur. Chang and C. C. Paige. A perturbation analysis for R in the QR fac-

torization. Technical Report SOCS-95.7, School of Computer Science, McGill

University, 1995.

[1 O] X.- W. Chang and C. C. Paige. Perturbation analyses for the Cholesky downdating

problem. Subrnitted to S I A M J. Matrix Anal. Appl., April 1996. 15 pp.

[ I l ] >;.-W. Chang and C . C . Paige. Perturbation analyses for the QR factorization

with bounds on changes zn the elements of A. In preparation.

[12] X.-W. Chang and C. C. Paige. Perturbation analyses for the L U factorization.

In preparat ion.

(131 X.-W. Chang, C. C. Paige and G. W. Stewart. New perturbation analyses for the

Cholesky factorization. I M A J . Numer. Anal., 16 (1996), pp. 457-484.

[14] X.-W. Chang, C. C. Paige and G. W. Stewart. Perturbation analyses for the QR

factorization. SIAM J. Matnx Anal. Appl., to appear. 18 pp.

1151 A. K. Cline, -4. R. Conn and C. F. Van Loan. Genemlizing the LINPACK condi-

tion estimator. Numerical Analysis, ed. J. P. Hennart. Lecture Notes in Math-.

ernatics, no. 909, Springer-Verlag, NY. 1982.

[16] A. K. Cline, C. B. Moler. G. W. Stewart and J. H. Wilkinson. An estimate for the

condition number of a matriz. S I A M J . Numer. And., 16 (1979), pp. 368-375.

[l?] J. W. Demmel. On floating point eror s in Cholesky. Technical Report, CS-89-87,

Depart ment of Computer Science, University of Tennessee, October 1989. 6pp.

LAPACK Working Note 14. -

[18] J. D. Dixon. Estimating extrema1 eigenvalues and condition number of matrices.

SIAM J. Numer. Anal., 20(4):812-814, 1983.

1191 J. J. Dongarra, J. R. Bunch, C. B. Moler and G. W. Stewart. LINPACK User's

Guide, SIAM, Philadelpliia, 1979.

[20] 2. Drmai., M. OmladiE and K. Veselié. On the perturbation of the Cho1esl;yfac-

torization. SIAM J. Matrix Anal. Appl., 15 (1994), pp. 1319-1332.

(211 L. Eldén and H. Park. Block-doumdating of least squares solutions. SIAM J.

Matrix Anal. Appl., 15 (1994), pp. 1018-1034.

1221 L. Eldén and H. Park. Perturbation analysis for block downdating of a Choleslly

decomposition. Nurnerische Mathematik, 68 (1994), pp. 457-467.

1231 L. Eldén and H. Park. Perturbation and error analyses for block doumdating of

a Cholesky decomposition. BIT, 36 (1996), pp. 247-263.

1241 P. E. Gill, G. H. Golub, W. Murray and M. A. Saunders. Methods for modzfying

matriz factorizations. Math. Comp., 28 (1974), pp. 505-535.

1251 G. H. Golub and G . P. S t y a , Numerical computations for univariate lznear

models. J. Stat. Comp. Simul., 2 (1973), pp. 253-272.

[ X ] G. H. Golub and C. F. Van Loan. Mat* computations. Third Edition. The

Johns Hopkins University Press, Baltimore, Maryland, 1996.

1271 N. J. Higharn. A sumey of condition number estimation for triangular matrices.

SIAM Rev., 29 (1987), pp. 575-596.

[28] N. J. Higham. Analyszs of the Cholesky decomposition of a semi-definite matriz.

Reliable Numerical Computation, ed. M. G. Cox & S. J. Harnmarling. Oxford

University Press, 1990, pp. 161-186.

REFERENCES 132

(291 N. J. Higham, A suruey of componentwise perturbation theory in numerical linear

algebra. In Matliematica of Computation 1943-1993: ..\ Half Century of Com-

putational Mathematics, Walter Gautschi, editor, volume 48 of Proceedings of . m

Symposia in Applied Mathernatics, American Mathematical Society, 1994, pp.

[30) N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for In-

dustrial and Applied Mathematics, Philadelphia, PA, USA, 1996.

[31] Y. P. Hong and C.-T. Pan. Rank-revealing QR facto~zations and the szngular

value decomposition. Mathematics of Computation, 58 (1992), pp. 213-232.

[32r W. Kahan. Numerical linear algebm. Can. Math. Bull., 9 (1966), pp. 737-801.

[33] J . Meinguet. Refined Emor Analyses of Cholesky Factoriration. SIAM J . Numer.

Anal., 20 (1983), pp. 1243-1250.

[34] C. C. Paige. Covariance matriz representation in linear filtering. In Linear Al-

gebra and Its Role in Systerns Theory, B.N. Datta (ed.), AMS, Providence, RI,

1985, pp. 309-321.

[35] C.-T. Pan. A perturbation analyszs of the problem of downdating a Cholesky

factorization. Linear Algebra Appl., 183 (EM), pp. 103-1 16.

[36] CI-S. Pan and R. Plemrnons. Least squares modification with inverse factoriza-

tions: paralLe1 implications. J . Comp. Appl. Math., 27 (1989), pp. 109-127.

[3 71 M. A. Saunders. Large-scale linear progmmming uszng the Cholesky lac t oriza tion .

Technical Report CS252, Cornputer Science Department, Stanford University,

1972.

[38] G. W . Stewart. Emor and perturbation bounds for subspaces associated with cer-

tain eigenvalue ~problems. SIAM Rev., 15 (1 973), pp. 727-764.

[39] G. W. Stewart. Perturbation bounds for the QR jactorization of a matrix. SL4M

J. Nunier. Anal., 14 (1977), pp. 509-518.

[4Q] G. W. Stewart. The eflects of rounding e r o r on an algorithm for downdating a

Cholesky factorization. J. Inst. Math. Appl., 23 (1979), pp. 203-213.

[41] G. W. Stewart. On the perturbation of LU, C h o l e s k ~ and QR factorizations.

SIAM J. Matriz Anal. Appl., 14 (1993), pp. 1141-1145.

[42] G. W. Stewart. On sequeritial updates and downdates. IEEE Transactions on

Signal Processiiig, 43 ( 1995), pp. 2642-2648.

[43] G. W. Stewart. The tnangular matrices of Gaussian elimination and related

decompositions. Tcchnical Report CS-TR-3533 UMIACS-TR-95-91, Department

of Computer Science, University of Maryland, 1995.

[44] G. W. Stewart. On the Perturbation of LU and Cholesky Factors. Teclmical Re-

port CS-TR-3535 UMIACS-TR-95-93, Department of Computer Science, Uni-

versity of Maryland, 1993.

[45] G. W. Stewart and J . G. Sun. Matriz perturbation theory, Academic Press, Lon-

don, 1990.

[46] J . G. Sun. Perturbation bounds for the Cholesky and QR factoBration. BIT, 31

(1991)) 341-352.

[47] J . G. Sun. Rounding-emor and perturbation bovnds for the Cholesky and LD L~

factorizations. Linear Algebra and App1.,173 (1992), pp. 77-97.

[48] J. G. Sun. Cornponentwise perturbation bounds for some rnatrix decompositions.

BIT, 32302-714,1992.

1491 J. G. Sun. On perturbation bounds for the QR factorization. Linear Algebra and

.4ppl., 215 (1993), pp. 9-5-112.

[50] J.-G. Sun. Perturbation analysis of the Cholesky downduting and QR updating

problems. S I A M J. Matris Anal. Appi., 16 (1995), pp. 760-775.

[SI] -4. van der Sluis. Condition nurnbers and equilzbration of matrices. Numerisclie

Mathematik, 14 (1969), 14-23.

[52] J. H. Wilkinson. Roundzng e m r s in algebraic processes. Prentice- Hall, Engle- -

wood Cliffs, NJ, 1963.

1531 J. H. Wilkinson. The aZgebraic ezgenvalue problem. Oxford University Press,

1965.

[54] J. H. Wilkinson. A pnor e m r analyszs of algebraic processes. In Proc. Inter-

nationaI Congress of Mathematicians, Moscow 1966, 1. G. Petrovsky, ed., Mir

Publishers, Moscow, 1968, pp. 629-640.

[Sb] H . Zha. A cornponentvrise perturbation analysis of the QR decomposition. S I A M

J. Matriw Anal. Appl., 14 (1993), pp. 1124-1131.

IMAGE EVALUATION TEST TARGET (QA-3)

APPLIED IMAGE. lnc - = 1653 East Main Street - -. . , Rochester, NY 14609 USA -- -- - - Phone: 71 6/482-0300 -- -- - - Fa: 71 61288-5989

Q 1993. A p p l i Image. Inc.. All R i i t s Resenied

Perturbation Analysis Matrix...

Documents

Transcript of Perturbation Analysis Matrix...