Perturbation Analysis Matrix...
Transcript of Perturbation Analysis Matrix...
Perturbation Analysis of Some
Matrix Factorizations
by
Xiao- Wen Chang
School of Computer Science
McGill University
Montréal, Québec
Canada
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH
OF MCGILL UN~VERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF DOCTOR OF PHILOSOPHY
Copyright @ 1997 by Xia&Wen Chang
National Library 1+1 ofCanada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographic Services services bibliographiques
395 Wslligton Street 395, nie Wellington OttawaON K1AON4 Ottawa ON KfA ON4 canada canada
The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract
Matrix factorizations are arnong the most important and basic tools in numerical
liiear algebra. Perturbation analyses of matrix factorizations are not only important
in their own right, but also useful in many applications, e.g. in estimation, control
and statistics. The aim of such analyses is to show what effects changes in the data
will have on the factors. This thesis is concerned with developing new general purpose
perturbation analyses, and applying thern to the Cholesky, QR and LU factorizations,
and the Cholesky downdating problem.
We develop a new approach, the so called 'matrix-vector equation' approach,
to obtain sharp results and true condition numbers for the above problems. Our
perturbation bounds give significant improvements on previous results, and could
not be sharper. Also we use the so called 'matrix equation' approach originated by
G. W. Stewart to derive perturbation bounds that are usually weaker but easier to
interpret. This approach allows efficient computation of satisfactory estimates for
the true condition nurnbers derived by Our approach. The combination of these two
approaches gives a powerful understanding of these problems. Although first-order
perturbation bounds are satisfactory for al1 but the most delicate work, we also give
- some rigorous perturbation bounds for some factorizations.
We show that the condition of many such factorizations is significantly improved
by the standard pivoting strategies (except the L factor in the LU factorization), and
provide fimly based theoretical explanations as to why this is so. This extremely
important information is very useful for designing more reliable matrix algorithms.
Our approach is a powerful general *tool, and appeam to be applicable to the
perturbation analysis of any matrix factorization.
Résumé
Les factorisations de matrices sont parmi 1 s outils les plus importants et les
fondamentaux de l'algèbre linéaire numérique. Les analyses de perturbation
des factorisations de matnces sont non seulement importantes en elles-mêmes, mais
i l s sont aussi utiles dans maintes applications, par exemple dans les domaines de
l'estimation, du contrôle et des statistiques. Ces analyses ont pour but de démontrer
quels effets les changements dans les données produiront sur les facteurs. Cette thèse
s'intéresse a u développement de nouvelles analyses genénérales des perturbations,
et à leur application aux factorisations de Cholesky, QR et LU, et au problème de
inodificat ion de factorisation de Cholesky.
Nous développons une nouvelle approche que nous nommons l'approche équation
iiiatrice-vecteur, afin d'obtenir des résultats précis et des vrais nombres de condition-
nement pour les problèmes mentionnés ci-dessus. Nos bornes sur les perturbations
apportent des améliorations significatives aux résultats zntérieurs et ne pourraient
être plus précises. De plus, nous utilisons l'approche équation matrice développée par
G. W. Stewart pour dériver des bornes sur les perturbations qui sont généralement
plus faibles mais plus faciles à interpréter. Cette approche permet des calculs effi-
caces d'estimés satisfaisants pour les vrais nombres de conditionnement qui dérivent
de notre approche. La combinaison de ces deux approches donne une compréhension
profonde de ces .problèmes. Bien que des bornes sur les perturbations de premier
ordre soient satisfaisantes pour tout travail, sauf pour le plus. délicat, nous donnons
également des bornes rigoureuses sur les perturbations pour certa ines factorisations.
Nous démontrons que la condition de plusieurs de ces factorisations est améliorée
de façon significative par les stratégies usuelles de pivotage (sauf en ce qui con-
cerne le facteur L dans la factorisation LU), et nous fournissons des explications
théoriques solidement fondées pour démontrer pourquoi il en est ainsi. Cette infor-
mation extrêmement importante est des plus utiles pour construire des algorithmes
plus sûrs pour les matrices.
Notre approche est un outil générai puissant, et semble pouvoir s'appliquer ails
analyses de perturbation de n'importe quelle factorisation de matrice.
Acknowledgement s
1 would like to extend my sincere thanks and deep gratitude to my supervisor, Profes-
s i r Cliris Paige, for his invaluable guidance, support, encouragement and care through
the years of Ph.D studies. He displayed extreme generosity with his ideas and tirne,
andsenerated a continuous enthusiasm for this work. He made al1 his research facil-
ities available to me. My association with him is, for meo a rare privilege. 1 hope he
will find this thesis to be a small reward for the maay days of inspired collaboration
as well as for his efforts to ensure that 1 received the best possible training. 1 also wish
to thank Dr. Françoise Paige for translating the abstract of this thesis into French.
1 thank Professor G. W. Stewart for his insightful ideas, which provided one of
tlic approaches used in this thesis. My thanks to Professor Ji-guang Sun for his
suggestions and encouragement, and for providing me draft versions of his work.
Their pioneering work was a major inspiration for this work.
I am thankful to Professor Jirn Varah, externd reviewer of this thesis, for his care-
h l reading. This thesis has been improved by his helpful comments and suggestions.
1 am indebted to Professors Gene Golub, Nick Higham, and George Styan for their
interest and encouragement in this work.
1 am grateful to Professor Jia-Song Wang, my Master's Thesis supervisor, for
introducing me to numerical analysis and continuous encouragement.
The School of Co-mputer Science at McGill University provided me with a com-
fortable environment for my thesis work. Many people including my fellow graduate
students gave me various help. My heartfelt thanks go to Penny Anderson, Franca
Cianci, Naixun Pei, 3osée Turgeon, Josie Vallelonga, Xiaoyan Zhm and Binghai Zhu.
My deepest gratitude goes to my parents, brothers and sister. Their love and care
have always been a constant source of my enthusiasm to complete this work. 1 deeply
regret that my father, who died during my Ph.D. studies, cannot see the completion
of this thesis. Also my thanks to my uncle and aunt for their strong support.
Finally 1 would like to give special thanks to my wife, Su, whose love, patience
and understanding made it possible for me to finally finish my work on this thesis.
To my ivife and my parents
vii
Contents
List of Tables x
1 Introduction and Preliminaries 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Notation and basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The C hoiesky Fact orization
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Perturbation analysis with norm-bounded changes in A . . . . . . . .
2.2.1 First-order perturbation bounds . . . . . . . . . . . . . . . . . 2.2.2 Rigorous perturbation bounds . . . . . . . . . . . . . . . . . .
2.2.3 Numerical experiments . . -. . . . . . . . . . . . . . . . . . . . 2.3 Perturbation analysis with component-bounded changes in A . . . . .
2.3.1 First-order perturbation bound with IAAl 5 elAl . . . . . . . 2.3.2 First-order perturbation boundswit h backward rounding errors
2.3.3 Rigorous perturbation bound with backward rounding errors .
2.3.4 Numerical experirnents . . . . . . . . . . . . . . . . . . . . . . 2.4 Summazy and future work . . . . . . . . . . . . . . . . . . . . . . . .
3 The QR factorization 60
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -. 60
a..
V l l l
3.2 Rate of change of Q and R. and previous results . . . . . . . . . . . . 62
3.3 Refined andysis for Q . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Perturbation analyses for R . . . . . . . . . . . . . . . . . . . . . . . 68
3.4.1 Matrix-vector equation analysis for R . . . . . . . . . . . . . . 68
3.4.2 hlatrix equation analysis for R . . . . . . . . . . . . . . . . . . 72
3.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6 Summary and future work . . . . . . . . . . . . . . . . . . . . . . . . 80
4 The LU factorization 83
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Rate of change of L and U . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 New perturbation results . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Matrix-vector equation analysis . . . . . . . . . . . . . . . . . 86
4.3.2 Matrix equation analysis . . . . . . . . . . . . . . . . . . . . . 90
. . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Numerical experiments 97
4.5 Summary and future work . . . . . . . . . . . . . . . . . . . . . . . . 100
5 The Cholesky downdating problem 101
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Basics. previous results. and an improvement . . . . . . . . . . . . . . 103
5.3 New perturbation results . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.1 Matrix-vector equation analysis . . . . . . . . . . . . . . . . . 108
5.3.2 Matrix equation analysis . . . . . . . . . . . . . . . . . . . . . 114
5.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Surnmary and future work . . . . . . . . . . . . . . . . . . . . . . . . 124
6 Conclusions and future research 126
List of Tables
2.2.1 Results for matrix A = Q A Q ~ of order 25. 2 = P A P ~ . D = Dr . . . 34
. . . . . . . . . . . 2.2.2 Results for Pascal matrices. A = P A P ~ . D = D, 34
. . . . . . 2.2.3 Results for -4 = I(~(B)K,(B). 8 = n/4. A = ~ A I ' I ~ . D = Dr 35
2.3.1 Results for Pascal matrices without pivoting . . . . . . . . . . . . . . 56
. . . . . . . . . 2.3.2 Results for Pascal matrices with pivoting. Â = P A P ~ 57
. . . . . . . . . 3.5.1 Results for Pascal matrices without pivoting. A = QR 78
. . . . . . . . . 3.5.2 Results for Pascal matrices with pivoting. A P = Q R 79
.... 3 . 5 . 3 R e s u l t ~ f 0 r 1 0 ~ 8 m a t r i x A ~ . j = l . 8.withoutpivoting . . . . . . 79
3.5.4 Results for Kahan matrices. 8 = 7r/8. AII = R . . . . . . . . . . . . 80
4.4.1 Results vithout pivoting. A = LU . . . . . . . . . . . . . . . . . . . . 98
4.4.2 Results with partial pivoting. G P A = ZÜ . . . . . . . . . . . . . . 98
4 . 4 . 3 R e s u l t s w i t h c o r n p l e t e p i v o t i n g . Â ~ ~ ~ ~ = ~ % . . . . . . . . . . . 99
5.4.1 Results for the example in Sun's paper . . . . . . . . . . . . . . . . . 123
Introduction and Preliminaries
1.1 Introduction
This thesis is concerned with the perturbation analysis for the Cholesky, QR, and LU
fac torizations, and for the C holesky downdat ing problem. These matrix factorizations
are arnong the most fundamental and important tools in numerical linear dgebra
(see for example Golub and Van Loan [26, 19961). The goal of such an analysis is
to detemine bounds for the changes in the factors of a matrix when the matrix is
perturbed.
We first give some motivation for our concems. Suppose A is a given matnx, and
has a factorization
A = BC, (1.1.1)
where B and C are the factors of A. As in any topic of matrix perturbation theory,
t here are t hree main considerations in perturbation theory for matrix factorkat ions.
First, the elements of A may be determined from physical measurement, and therefore
be subject to errors of observation. The true matrix is A + AA, where AA is the
observation error. Suppose the same factori~ation for A + AA is
A + AA = ( B + AB)(C + AC). . - (1.12)
CHAPTER 1. INTRODUCTION AND PRELMLlVAEWES 2
We are thus led immediately to the consideration of the perturbations A B and AC.
Second, even if the elements of A can be defined exactly by mathematical formulae,
usually these can not efficiently be represented exactly by a digital computer due to
its finite precision. The matrix stored in a computer is A + AA, with [AAl 6 ulAl,
where u is the unit roundoff. So we are faced with much the sarne problem as before.
Finally, backward rounding error analysis throws back errors made in executing ân
algonthm on the original data, see Wilkinson [52, 19631. Suppose for a stored matrix
A, a backward rounding error analysis-shows the computed factors l? and C of A are
the exact factors of A + AA, i.e.,
wbere a bound on IlAAll (or IdAl) is known, then perturbation theory is used to
assess the effects of these backward errors on the accuracy of the computed factors,
i.e., give boundç on IIB - Bll and 116 - CI1 (or fi - BI and (C - CI).
Although in solving lineax equations the sensitivity of factors may not be of cen-
tral interest, it is important when the factors have significance. For example in the
estimation problem with m x n A of full column rank and m dimensional y given
(where &(-) indicates the expected value),
y = A x + u , E ( v ) = O , &(vvT)=cr21,
if we obtain the QR factorization A = QR then solving Rî = Q~~ gives the best
linear unbiased estimate (BLUE) 2 of x, and
so R is the factor of what has been called in the engineering literature the 'information
matrk ' . This is important in its own nght (see for example Paige [34, 1985]), and we
are interested in how changes in A e t R.
CHAPTER 1. INTRODUCTION AND PRELlllrlLNARiES 3
In general we regularly use the fart that the columns of Q in the QR factorization
of -4 form an orthonormal basis for R ( A ) , and we are concerned with how changes in
A affect Q.
In some statistical applications, if certain matrices A and B have QR factorizations
A = QARA, B = QBRB, -
then t h e singular values of QAQB give what are called the 'canonical correlations'
(more generally these give the angles between the subspaces R ( A ) and R(B)), see for
example Bjorck and Golub [4, 19731. Thus the sensitivity of Q in the QR factorization
can be used directly to answer the following important problems: "How do changes
in A and B affect R ( A ) and R(B) and the angles between these (or the canonical
correlations)" .
We thus see the area is an interesting and useful one to study in general. This
area has been an active area of research in recent years. Most of the existing results
have been incorporated in Higham [30, 19961.
Realizing most of the published results on the sensitivity of factorizations, such as
LU, Cholesky, and QR, were extremely weak for certain classes of matrices, Chang,
under the supervision of Chris Paige, see the commentary in Chang, Paige and Stew-
art 114, 19961, originated as approach to obtaining provably s h q resuits and corre-
sponding condition numbers for the Cholesky factorization. He d s o redized that the
condition of the problem was significantly improved by pivoting, and provided the
first firmly based theoretical explanations as to why this was so. Even though the
original work was only about the Cholesky factorization, the approach is a general
approach, and thus can be applied to almost al1 well-known matrix factorizations.
From (1.1.1) and (1.1 -2) we have by dropping the second-order term that
AA = BAC + ABC, (1 S -3)
The basic idea of this approach is to write the approximate matrix equation (1.1.3)
as a matrix-vector equation by using the special structure and properties of B and
CHAPTER 1 . AND PRELMINARIES 4
C, then get the vector-type expressions for AB and AC. So we will call this the
mat ris-vector equation' approach.
Stewart (44, 19951 was stimulated by Chang's work on the Cholesky factorization
to understand this more deeply, and present simple explanations for what w w going
on. Before Chang's work, the most used approxh to perturbation analyses of factor-
izations was what we will call the 'matrix equation' approach, which keeps equations -
like (1.1.3) in their matrix-matrix form. Stewart 1441 (also see Chang, Paige and
Stewart [13, 19961) used an elegant cons twt , partly illustrated by the 'up' and 'low'
notation in Section 1.2, which makes the mat& equation approach a far more usable
and intuitive tool. He cornbined this with deep insights on scaling to produce the new
matriz equation analysis which is appealingly clear, and provides excellent insights
into the sensitivities of the LU and Cholesky factorizations. This new matrix equa-
tion analysis does not in general provide tight results like the matrix-vector equation
analyses do, but they are usually more simple, and provide practical estimates for the
true condition numbers obtained from the latter. This approach is also fairly general,
but for each factorization a particular treatment is needed. This is different from the
matrix-vector equat ion approach, which can be applied to any factorization directly
without any difficulty. .
We combined t hese two approaches to give a deep undestanding of the sensitivity
of the Cholesky factorization, see Chang, Paige and Stewart [13, 19961. We also
applied the two approaches to the QR factorization and the Cholesky downdating
problem, see Chang, Paige and Stewart [14, 19961, and, Chang and Paige [IO, 19961:.
The interplay of the two approaches goes through the whole thesis.
The main purpose of this thesis is to establish first-order perturbation bounds
that are îs tight as possible for the factorizations mentioned above, present the cor-
responding condition numbers, give some condition estimators, and shed light on the
efFect of the standard pivoting on the conditioning. Although first-order perturbation
bounds are satisfactory for al1 but the rnost delicate work, we also give some rigorous
CHAPTER 1. INTRODUCTION AND PRELIlMINAmS 5
perturbation bounds for some factorizations. Some results in this thesis have been
presented in the papers mentioned above. Sorne other new results here have not yet
been published.
1.2
First
cepts
O
O
Notation and basics -
we describe some mostly standard notation, and define some elementary con-
used throughout this thesis.
RmXn denotes the vector space of al1 rn x n real matrices, and Rn = Rnxl.
.A matrix is always denoted by a capital Ietter, e.g. A. The corresponding
lowercase letter with the subscript j and zj refers to the the j t h column and
( 2 , j ) th entry respectively, e.g. a j , aij. Also the notation ( A ) i j designates the
(i, j ) th entry. A(i , :) denotes the ith row of A and A(:, j ) the j t h column.
A vector is represented by a lowercase letter, e.g. 6. The individual components
are denoted witli single subscripts, e-g. b;.
R ( A ) denotes the space spanned by the columns of A.
X(A) denotes an eigenvalue of a matrix A; p(A) denotes the spectral radius of
A, i.e. p ( A ) = max lX(A)I.
a ( A ) denotes a nonzero singular value of a matrix A; o,,(A) and omin(A)
denote the largest and smallest nonzero singular values of A, respectively.
Let A = (a,) be an m x n matrix, then IAI is defined by IA( = (lûijl).
Let t be a scalar and let A(t) = (a i j ( t ) ) be an m x n matrix. If ai,@) is a
differentiable function of t for al1 i and j , then we Say A(t) is differentiable with
CHAPTER 1. INTRODUCTION AND PRELIMLNAkUES
respect to t and define
a II -'II denotes a vector norm or matrix norm.
The 1-nom, 2-norm (or Euclidean norm), and cm-nom of an n-dimension vector
x are defined respectively by
a The S-norm (S for summation), F-norm (or Frobenius norm), and M - n o m (M
for maximum) of an m x n rnatrix A are defined respectively by
The 1-norm, 2-nom (or spectral n o m ) , and oo-nom of an rn x n matrix A
defined by
are given respectively by
&,(A) = I ~ A + I l Y llAllY denotes the siandard condition number of matrix A and
cond,(A) = (1 l ~ t l I A ~ I l v the Bauer-Skeel condition number of matrix A when
norm II 1 1 , is used, where ~t is the Moore-Penrose generalized inverse of A. If
A is nonsingular, then K,(A) = I I ~ - ~ l l , l l ~ l l ~ and condJA) = I I I A - ' ~ ( A ( 1 1 , .
CHAPTER 1. INTRODUCTION AND P R E L M I N A B
4 Let C = [cl, c*, - -., G] be an m x n matrix, then vec(C) is defined by
Now ive describe some special notation used throughout this thesis.
8 D, always denotes the set of al1 n x n red positive definite diagonal matrices.
Let S = ( z i j ) be an n x n matrix. The upper triangular part, strictly lower
triangular part and strictly upper tnangular part of X are denoted respectively
b y
and the diagonal of X is denoted by
4 For any n x n matnx X = (xi,), we define the upper and lower triangular
mat rices
CHAPTER 1. INTROD UCTlON AND PRELILMNAARLES
# For any n x n matrix C = [cl,. . . , % I l denote by cj.') the vector of the firçt i
elements of cj, and by ,I the vector of the last i elements of cj. With these,
we define ('u' denotes 'upper', 'sl' denotes 'strictly lower')
They are the vectors formed by stacking the columns of the upper triangular
part of C into one long vector and by stacking the columns of the strictly lower
triangular part of C into one long vector, respectively.
The 'low' and 'up' have the following basic properties. For general X E RnX"
For symrnetric X E RnXn 1
The following well-known theorem obtained by van der Sluis [SI, 19691 will often
- be referred to when we discuss the effect of scaling on the condition estimators in this
Theorem 1.2.1 Let S, T E RnXn and let S be n ~ ~ n g u l a r , and define
Particularly, if S is synmetn'c positive definite, define D. = diag(S)lI2, then -
K~(D, 'sD; ' ) 5 n inf K*(D-'SD-'). O D€Dn
CVe will often use the following results when discussing the effect of standard
pivoting on the condition numbers.
Theorem 1.2.2 Let T E Rn'" be a nonsingular upper triangular matriz satisfying
[tiil 2 ltijl for al1 j > Z. (1.2.17)
Then with D G diag(T),
CHAPTER 2. 2IVTRODUCTION AND PRELIMINARJES
Proof. Let D-'T. Then we have 1 = ICil 2 IGl. It is easy to show that
From these (1.2.18) and (1.2.19) follow.
By (1.2.23) we have for j > i ,
of the inequalities in (1.2.18), (1 . Z N ) , (1.2.20) and (1.2.21) become equalities with
which give (1.2.20) and (1.2.21).
1
- - - - - - p p p p p p - - - - - - - - - - - - - - - -
If T has the form of (1.2.22), then we easily verify T-l
- ( T i ) = 2 for j > i. It is easy to see fiom thèforegoing p6of that A1
P - 1 1 2 i . . 2 r 2
1 1 ... 2"-3
. . . .
1 1
1
Chapter 2
The Cholesky Factorization
2.1 Introduction
The Cholesky factorization is a fundamental tool in matrix computations: given an
n x n real symmetric positive definite matrix A, there exists a unique upper tnaagular
matrix R with positive diagonal entries such that
R is called the Cholesky factor.
There are different algonthmic foms of Cholesky factorization. The foilowing
algonthrn is the 'bordered' form.
Algorithm CHOL: Given a symmetric positive definite A E RnXn this algorithm cornputes the Cholesky factorization A = RTR.
f o r j = l : n f o r i = I : j - 1
ri, = (ai j - rkirkj)/rii
end r,j = (aj j - ~ j - 1 k=i ' k j ) 2 1/2
end-
CHAPTER 2. THE CHOLESKY FACTOREATION 12
Let A A E RnXn be a symmetric rnatrix such that A + AA is still syrnmetric
positive definite, then A + AA has the unique Cholesky factorization
The goal of the perturbation analysis for the Cholesky factorization is to determine
a bound on IIARII (or IARI) in terms of (a bound on) 11 AAll (or IAAl).
The rest of this chapter is organized as follows. In Section 2.2 we consider the case
where only a bound on 11 AA)I is known. We refer to these as 'nom-bounded changes in -
A'. First-order and rigorous tight bounds are presented by the so calfed matrix-vector
equation approach, and somewhat weaker but more insightful and computationally
applicable bounds are also given by the so called matrix equation approach. In
Section 2.3 we rnalre a similar analysis to that in Section 2.2 for the case where a
bound on lAAl is known. CVe refer to these as 'component-bounded changes in A'. In
both of the sections, we derive useful upper bounds on the condition of the problem
when we use pivoting, and give numerical results to confirm Our theoretical analyses.
Finally we summat-ize our findings and point out future work in Section 2.4.
This Cholesky analyses (and particularly Section 2.2.1) may also be taken as
an introduction to the general approach to perturbation analysis of factorkations
proposed by t his t hesis.
2.2 Perturbation analysis with norm-bounded
changes in A
There have been several papers dealing with the perturbation analysis for the Cholesky
factorization with nom-bounded changes in A. The first result was that of Stew-
art [39, 19771. it was further modified and improved by Sun [46, 19911, who in-
cluded a first-onier perturbation result. Using a different approach, Stewart (41,
19931 obtained the same f i~ t -order perturbation result . Recently ~ r m a ~ ; Ornladif
CHAPTER 2. THE CHOLESKY FACTORIZASION 13
and Veselié [20, 19941 presented perturbation results of a different flavor. They
made a perturbation analysis for the Cholesky factorization of H = DF'AD;' with
D, = diag(cl:f), instead of A. The advantage of their approach is that a good bound
can be obtained when AA corresponds to backward rounding errors. So their result
will be referred to in the next section.
The main goal of this section is to establish new first-order bounds on the norm
of the perturbation in the Cholesky factor, srnaller than those of Sun [46, 19911 and
Stewart [41, 19931, and present a condition number which more closely reflects the
true sensitivity of the problem. Also, we give rigorous perturbation bounds. Many
of the results have been presented in Chang, Paige and Stewart 113, 19963.
2.2.1 First-order perturbation bounds
We first obtain an equation and an expression for R ( O ) in the Cholesky factorization
-4 + tG = R T ( t ) ~ ( t ) , then we use these to obtain Our new first-order perturbation
bounds by the rnatrix-vector equation approach and the matrix equation approach.
The first approach will provide a sharp bound, resulting in the condition number
for the Cholesky factorization with norm-bounded changes
approach provides results that are usudly weaker but easier
efficient computat ion of satisfactory estimates for the actual
in A, while the second
to interpret, and allows
condition number.
Rate of change of R
Here we derive the basic results on how R changes as A changes. We then derived
Sun's [46, 1991) results. The following theorem summarizes the results we use later.
Theorem 2.2.1 Let A E RnXn be symmetric positive definite, with the Chuleskg
factorization A = R ~ R , let G E RnXn be symrnetric, and let AA = C G , for some
€ 2 0 . If
,~(A-%A) c 1, (2.2.1)
CHAPTER 2. THE CHOLESKY FACTORlZATION
then A + AA has the Cholesky factorkation
where R(0) is defined by the unique Cholesky fac~oriration
-4 + tG = ~ * ( t ) ~ ( t ) , It( 5 E , - and so satisfies the equations
R~R(O) + RT(O)R = G,
R ( O ) = U~(R-*GR-') R,
where the 'up ' notation 2s defined by (1.2.3).
Proof. If (2.2.1) holds, then for al1 Itl 5 6 the spectral radius of ~ R - ~ G R - ' satisfies
Therefore for all It 1 < c, A + tG = R T ( ~ + ~ R - ~ G R - ' ) R is symrnetric positive definite
and so has the unique Cholesky factorization (2.2.4). Notice that R(0) = R and
R(6) = R + AR, so (2.2.2) holds.
It is easy to verify that R(t) is twice continuously differentiable for Itl 5 E from
the algorithm for the Choleshy factorization. If we differentiate (2.2.4) and set t = O
in the result , we obtain (2.2.5) which we will see is a linear equation uniquely defining
the elernents of upper triangular R(O) in terms of the elernents-of G. Frorn upper
triangular R ( O ) R-1 in
we see with the 'up' notation in (1 -2.3) that (2.2.6) holds: Finally the Taylor expansion
for R(t) about t = O gives (2.2.3) at t = E . 13
CHAPTER 2. THE CHOLESKY FACTORlZATION 15
Using Theorem 2.2.1 we can now easily obtain the first-order perturbation bound
due to Sun [46, 19911, and d s o proved by Stewart [41, 19931 by a different approach.
Theorem 2.2.2 Let A E RnXn be symmetnc positive de f i i t e , with the Cholesky
factoriration A = R ~ R , and let AA be a real syrnmetn'c n x n matriz satisfying
11 AAll F 5 c llAll2-. If
K*(A)c < 1. (2.2.7)
then -4 + A4 has the Cholesky factorizution
where
Proof. Let G = LAIE (if E = O, the theorem is trivial). Then
Since
p ( ~ - ' ~ 4 i I I A - ' A A I I ~ I ~ 2 ( A ) 6 ,
the assumption (2.2.7) implies that (2.2.1) holds. So the conclusion of Theorem 2.2.1
holds here. By using the fact that Il up(X)IIF 5 $211~II~ for any symmetnc X fsee
(l.2.7)), we have from (2.2.6) that
Then (2.2.8) follows immediately from the Taylor expansion (2.2.3). 0
CHAPTER 2. THE CHOLESKY FACTORlZATION 16
Clearly from (2.2.8) we see &-(A) can be regarded as a measure of tlie sensitivity
of the Cholesky factorization. Since a condition number as a function of a matrix of a
certain class has to be from a bound which is attainable to first-order for any matrix
in the given class (see (2.2.19) for a more formal definition of the condition number of
the Cholesky factorization with nom-bounded changes in A) we will use this ngorous
terminology, and use a qualified term condition estimatof when this criterion is not
met. For general -4 the first-order bound in (2.2.8) is not attainable, in other words,
we are not dways able to choose a symmetnc AA satisfying llAAllF e jlAJ12 to make
(2.2.11) an equality. We could use a simple example to illustrate this, but we choose
not to do so here. as it will be obvious after we obtain the actual condition number.
Therefore we Say -&*(A) is a condition estimator for the Cholesky factorization.
We have seen the basis for deriving first-order perturbation bounds for R is the
equation (2.2.5) (or the expression (2.2.6) of its solution), which will be used later.
Our following analyses will be based on the same assumptions as in Theorem 2.2.2.
Matrix-vector equation analysis
Now we would like to derive an attainable first-order perturbation bound.
The upper and lower triangular parts of the matrix equation (2.2.5) contain iden-
tical information. The upper -triangular part c m be rewritten in the following form
by using the 'uvec' notation in (1.2.4) (for the derivation, see the Appendix of Chang,
Paige and Stewart 113, 19961):
CHAPTER 2. THE CHOLESKY FACTORIZATION
n(m+ ' ' 9 is the lower triangulb rnatrix where WR E R T
Note that for any upper triangular X, Il~vec(X)l(~ = IIXIIF. TO help Our norm
analysis, for any matrix C f RMn we define
where
2 j n
Thus for any symmetric matrix G we have I (d~vec(G)( (~ = IIGIIF. For Our norm-based
analysis, we rewrite (2.2.12) as
Since R is nonsingular, & is also, and from (2.2.15)
uvec(~(0) ) = 6' duvectG).
-
CHAPTER 2. THE CHOLESKY FACTORIZATION
G such that duvec(G) lies in the space spanned by the nght singular vectors corre- l sponding to the la.rgeçt singular value of Fa1 and IlGll F = IIA1]2 Using the Taylor
expansion (2.2.3) and IIA1I2 = IIRII;, we see
and this bound is attainable to first-order in é. Thus for the Cholesky factoriza-
tion with nom-bounded changes in A the condition number (with respect to the
combination of the F- and 2-noms)
is given by
- - - - -
Obviously with the definition of n,(4) we have from (2.2.8) that
1
This upper bound on n,(A) is achieved if R is an n x n identity matrix with n 2 2,
and so is tight.
We now denve a lower bound on nC(A). Observe that the n x n bottom right
hand corner of WR is just di&, 1, . . . , 1 ,2 )~= , so that GR has the form
CHAPTER 2. THE CHOLESKY FACTOMZATION
where
Therefore we have
It follows that
thus
This bound is tight for any n, since equality will hold by taking R = diag(rii), with
O < d r n n 5 r;i7 i # n.
We summarize these resul ts as the following theorem.
Theorem 2.2.3 With the same assumptions as in Theorem 2.2.2, A + AA has the
unique Cholesky factoriration
such that
where K,(A) = ~~e'll~ l l ~ l l : / ~ , and the first-order bound in (2.2.25) is attainable.
O
From (2.2.26) we know the new first-order bound in (2.2.25) is at least as good
as that in (2.2.8), but it suggests the former may be considerably smaller than the
latter. Consider the following example.
CHAPTER 2. THE CHOLESKY FACTOMZATION 20
Example 1: Let A = diag (1 , b2). Then R = diag (l,b), ER = diag (2, fi, 26).
When O < 6 5 114, we obtain
We see that the first-order perturbation bound (2.2.8) can severely overestimate the
effect of a perturbation in -4.
But it is possible that i;,(A) has the same order as ~ ~ ( 4 ,
Example 2: If A = -1 : / with srnail 6 > O, then R =
L A
computations give
as we now show.
Suppose the Cholesky factorization of A is approached by using the standard
symrnetric pivoting strategy: P A P ~ = R ~ R , where P is an n x n permutation matrix
designed so that rows and columns of A are interchanged, during the computation of
the reduction, to make the leading diagonal elements of R as large as possible. Let
the Cholesky factorization of P ( A + AA) pT be P(A+ AA) PT = (R + A R ) ~ ( R + AR).
Then by Theorem 2.2.3 we have
and
Note that the first-order bound in (2.2.8) does not change when the Cholesky factor--
ization of A is approached by using any pivoting strategy. Clearly the perturbation
bound (2.2.25) more closely refiects the structure of the problem. Many numeri-
cal experiments with the standard pivoting strategy suggest that n, ( P A P ~ ) usually
has the same order as 4l2(~), and in fact K,(PAP=) can be bounded above by
CHAPTER 2. THE CHOLESKY FACTORIZATION 21
& * ( ~ ) J n ( n + 1)(P + 672 - 1)/6, see (2.2.39). Using the standard symrnetric pivot-
ing strategy in Example 2 gives
and K,(PAP') = O(l/S), showing how pivoting can improve the condition of the
problem (as measured by our condition number) by an arbitrary amount.
There have been several techniques for estimating the Znorm of the inverse of a
triangular matrix, e.g., Cline et al [16, 19791, Cline et al [15, 19821 and Dixon [18,
19831. A comprehensive, comparative survey on theSe techniques has been given by
Higham [27, 19871. Using these techniques, l l ~ ~ ' 1 1 ~ could be estimated a t the cost
of solving a few linear systems with matrices T& and e. To solve the former is
equivalent to solving RTX + XTR = G (G is symrnetric) for upper triangular S,
and the cost is 0 ( n 3 ) . Even though @y = c is not the transpose of the above
rnatrix equation, it can dso be solved in 0 ( n 3 ) by using the special structure of G. However since the Cholesky factorization costs 0 ( n 3 ) , such a computation would
rarely be considered feasible. Of course if it is know-n that K , ( P A P ~ ) = &*(A), a s
usually happens when we use the standard symrnetric pivoting, then we need only
1 / 2 estimate % ( A ) = Q ( R ) for this case, and this can be done in 0 ( n 2 ) . For a practical
approach to the general case, see the following rnatrix equation analysis.
Matrix equation analysis
As we saw, n&4) is unreasonably espensive to compute or estimate directly with
the usual approach, except when we use pivoting, in which case K,(PAP=) usually
approaches its lower bound 4 ' * ( ~ ) / 2 , se, (2.2.39). Fortunately, the approach of
Stewart [44, 19951 can be extended to obtain an excellent upper bound on &,(A), and
also give considerable insight into what is going on, and lead to efficient a sd practical
condition estimators even when we do not use pivoting.
CHAPTER 2. THE CHOLESI(Y FACTORIZATION 22
In Theorem 2.2.2 we used the expression of R(O) in (2.2.6) to derive Sun's first-
order perturbation bound. Now we again look at (2.2.6)' repeated here for clarit .
R(o) = up( R-~GR-~)R.
A useful observation is that for any matrix B E RnXn and diagonal m a t h D E Rn"",
Let D, be the set of al1 n x n real positive definite diagonal matrices.
D = diag(bi, . . . , bn) E D, we c m take R = DR, @ving
which leads to cancellation of the D - ~ with D:
n',(A) =
which gives the encouraging result
inf dc (A , D), DEDm
&(A) 5 &;(A, 1) = &R) = nz(A).
Then from the Taylor expansion (2.2.3) we have
For any
(2.2.31) shows 11 fi times the first-order bound in (2.2.32) is at least as good as that
in (2.2.8), the first-order perturbation bound in Sun [46, 19911 and Stewart (41, 19931.
With (2.2.19), the definition of rc,(A), we see from (2.2.32) that K=(A) 5 &(A).
But it is useful to prove this more delicately, and so obtain some indication of how
iveak dc(A) is as an approximation to +(A) . We know G can be chosen to make
(2.2.17) an equality, so that for such a G we have
or
which implies
for any D E D,. Note the two inequalities (2.2.33) and (2.2.34) in going from K ~ ( A )
to KL(A, D).
Now we summarize the above results as the following theorem.
Theorem 2.2.4 With the same the assumptions as in Theorem 2.2.2, A + 6.4 has
the Cholesky factoriTation
.4 + AA = ( R + AR)=(R + AR),
such that
where -ti$(A) is as in (2.2.29) and (2.2.30). 0
CH-4PTER 2. THE CHOLESKY FACTOREATION
This matrix equation approach is very simple yet gives considerable insight into
why the Cholesky factorization can be far more well-conditioned than we previously
thought. For example, if the ill-conditioning of R is mostly due to bad scaling of the
rows, then correct choice of D in R = DR can give ns(R) very near one, so $ (A , D)
will approach twice the lower bound 4 ' * ( ~ ) / 2 on K=(A). As an illustration suppose .- -
R = / 1 , then for s m d i 6 ; 0. Q ( R ) = 0 ( i / 6 ) But if we set D = diag(l.6).
L J
then r i@) = 1/6 and n2(R) = 0(1) , so that IEIc(Al D) is close to the lower bound on
tcC(A). Note how almost al1 the ill-conditioning was revealed by the diagonal of R.
This also provides another explanation as to why the standard symmetric piv-
oting of A is so successful, making K , ( P A P ~ ) approach its lower bound in nearly
al1 cases. If A is ill-conditioned (so there is a large distance between the lower and
upper bounds on K ~ ( A ) ) and the Cholesky factorization is computed with standard
symmet ric pivot ing, the ill-conditioning of A will usually reveal itself in the diagonal
elements of R. Stewart [43, 19951 has shown that such upper triangular matrices
are artificially ill-conditioned in the sense that they can be made well-conditioned by
scaling the rows via D. This implies that ~ , ( P A P ~ , D), and therefore (as we shall
show) t c C ( P ~ p T ) , will approach its lower bound. We support this mathematically in
-- - - - . the following.
Theorem 2.2.5 Let A E RnXn be symmetric positive definite un'th the Cholesky fac-
torrtation P A P ~ = RTR when the standad symrnetric pivoting stmtegy is used.
Then
(2.2.39)
and fi.orn this
CHAPTER 2. THE CHOLESKY FACTORIZATION 25
Proof. We need only prove the 1s t inequality of (2.2.39). In fact standard pivoting
ensures 2 C%!i r: for al1 j 2 i , so ltiil 2 Irijl. Since for any D E Dn,
the inequality follows irnrnediately from K ~ ( R ) = 4I2(A) and (1.2.18) in T'heo-
rem 1.2.2 with D = diag(R). cl
One rnay not be impressed by the 4" factor in the upper bound in (2.2.39), and
may wonder if the upper bound can significantly be irnproved. Ln fact the upper bound
ca l nearly be -approximated by a parametrized farnily of matrices A = c ( 0 ) K,(B),
where
& ( O ) = diag(1, s, - ; sn-')
with c = cos(B) and s = sin(d), were introduced by Kahan [32, 19661. Notice here
the permutation P corresponding to the standard symmetric pivoting strategy is the
identity. Taking D = diag(K,(O)) = diag(1, s, - , sn-'), then by the- last part of
Theorern 1.2.2 we have
Let Dr = diag(l1 Kn(8)(i, :)Il2), then
Heace
But by (1.2.14) in van der Sluis's Theorem 1.2.1 we have
Then it follows from (2.2.42) that as 8 + 0,
This indicates the upper bound in (2.2.39) is nearly tight.
Many computational experiments show with standard symmetnc pivoting that
K = ( P A P ~ ) is uçually quite close to the Iower boiuid of nY2(A), see Section 2.2.3
Tables 2.2.1 and 2.2.2, but can significantly larger for the matrices whose Cholesky
factors are Kahan matrices, see Section 2.2.3 Table 2.2.3 as well as the comments.
In the latter case, something like a rank-revealing pivoting strategy such as that in
Hong and Pan [31, 19921) will most likely be required to make the condition number - - - - .
close to its lower bound.
The practical outcome of this simple analysis is that we now have an 0 ( n 2 ) condi-
tion estirnator for the Cholesky factor. By (1.2.14) in van der Sluis's Theorem 1.2.1,
K ~ ( R ) will be nearly optimal when the rows of R are equilibrated in the 2-nom. Thus
the estimation procedure for the condition of the Cholesky factorization is to choose
D = Dr = diag(llR(i, : ) I l 2 ) in R = DR, and use a standard condition estimator (for
rnatrix inversion) to estimate tc2(R) and K@) in (2.2.29).
Finally we give a new perturbation bound which does not involve any scaling
matrix D, by using a monotone and consistent matrix n o m 11 -11 (see Section 1.2).
CHAPTER 2. THE CHOLESKY FACTOMZATlON 27
Theorem 2.2.6 Let A E RnXn be symrnetric positive definite, un'th the Cholesky
factoriration A = R ~ R , and let A A E RnXn be a symmetnc matriz satisfying 11 AAll 5
E 11 Al1 for some monotone and consistent matriz n o m . If K ( A ) ~ < 1, then
Proof. Let G = AAIE (if c = O the theorem is trivial). Since
p ( ~ - ' ~ ~ ) 5 l l ~ - ' ~ ~ l l < &(A)€ < 1,
Theorem 2.2.1 is applicable here. If we take I I - I I on bot h sides of (2.2.6), we have
which with the Taylor expansion (2.2.3) gives (2.2.43). 0
Note cond(R) is invariant under the row scaling of R, in other words, the pertur-
bat ion bound (2.2.43) provides the scaling automatically. This makes (2.2.43) look
simpler than (2.2.37). Also from (1.2.12) in van der Sluis's Theorem 1.2.1, we know
for the oo-nom,
cond,(R) = min K,(D-'R) = n , ( D d ~ ) , DED,
where DZ'R has rows of unit 1-nom (Drl = diag(llR(i, :)Ill). This gives the condition
estimator rci (R)K,(D,' R) with respect to the oo-nom.
2.2.2 Rigorous perturbation bounds
Usually a first-order bound is satisfactory, but sometimes more careful work is needed.
In this section, we will present rigorous perturbation bounds (with no higher order .
CHAPTER 2. THE CHOLESKY FACTORIZATION 28
terms) for the Cholesky factor by the matrix-vector equation approach and the matrix
equation approach.
- - Let -4 = R T ~ . If A + AA = ( R + AR)*(R + AR), then we have
which gives
ARR-' = U ~ [ R - ~ ( A A - A R ~ A R ) R - ' 1 . (2.2.45)
We will use (2.2.14) and (2.2.45) in deriving rigorous perturbation bounds as well
as the following lemma.
Lemma 2.2.1 ( A trivial uan'ant of Theonmi 3.1 in Stewart (38, 19731. and Theorem
2.11 in Stewart and Sun [45, 19901) Let T be u bounded linear operator on a Banach
space B. Assume thet T has a bounded inverse, and set 6 = IIT-lII-L. Let 9 : B -' B
be a function that satisfies
l I d4 Il 5 11I14l2
Tz = g + p(x)
has o unique solution x that sotisfies
First we give a rigorous perturbation b o n d . by the matrix-vector equation a p
proach. -
CHA PTER 2. THE CHOLESKY E4CTORlZATION 29
Let X = A R , and define T R X = R=X + XTR and # ( X ) = xTX. Then (2.2.44)
can be rewritten as
TRX = AA - b ( X ) . (2.2.46)
By using Lemma 2.2.1 we can prove the following theorem.
Theorem 2.2.7 Let -4 E RnXn be symrnetnc positive definite, with the Cholesky
jactorization -4 = R ~ R . Let AA E Rn '" be syrnrnetric. If
II%;' 1l%i1 duve~(AA)11~ < 114, *
(2.2.4'7)
then A + A-4 has the Cholesky factoriration
-4 + AA = (R+AR)=(R + AR), (2.2.48)
Obuzously, the weakest bound aboue can be rewn'tten in the following elegant f o m :
Proof. See Chang, Paige and Stewart [13, 19961. 0
Thus the assuiPtion (2.2.47) is generally çtronger than (2.2.7), and may be greatly
so. Following the same argument as Stewart [41, 19931, howeve;, (2.2.47) is needed
CHAPTER 2. THE CHOLESKY FACTORIZATION 30
to guarantee that the bound on (IARIIF will not explode. Furthemore, if the ill-
conditioning of R is mostly due to bad scaling of the rom, then correct choice of D
can give rs2(D-' R) very near one. In particular, if the standard symmetric pivoting
is used in cornputing the Cholesky factorization, then IIE~' ( l î and 1l.V' (1;'' will
usually be of similar magnitude, see (2.2.40); that is, 11 Gi1112 (1 @; d ~ v e c ( A A ) ( ( ~
and (I A-' I l 2 IIAAII will usually have similar-magnitude. So the condition (2.2.47) is
not too constraining. Numerical esperiments suggest that (2.2.49) is bet ter t han the
equivalent result in Sun [46, 19911, see Chang, Paige and Stewart [13, Section 3.21.
Now we use the matrix equation approach to derive a weaker but practical rigorous
perturbation bound. Let R = DR. then frorn (2.2.45) we have
ARR-l = U ~ ( R - ~ A A R - ~ ) - U ~ ( R - ~ A R ~ A R R - ~ ) . (2.2.31)
Let X = ARR-', and define &Y) = up(D-'XTx). Then (2.2.51) can be rewritten
as
,Y = U ~ ( R - ~ A A R - ' ) - 4 ( ~ ) . (2.2.52)
Applying Lemma 2.2.1 to (2.2.52) we obtain the following result .
Theorem 2.2.8 Let A E RnXn be symmetric positive definite, math the Cholesky
factorization A = RT R. Let AA E Rn X n be syrnmetric, and assume D E D, . If
then A + AA has the Cholesky factoritution
A + AA = ( R + A R ) ~ ( R + AR),
where
If the assumption (2.2.53) zs strengthened to
CHAPTER 2. THE CHOLESKY FACTORIZATION
so we take q = IID-ll12. By the assurnption (2.2.53), we have p = ~7716 < 11.1, ~ i t h
is an identity operator. Thus, by Lemma 2.2.1, (2.2.52) has a unique upper tnangular
solution, say ARP, where R RD-', that satisfies
tlius (2.2.54) and (2.2.55) follow, the latter using (IAR(1 < I ~ A R R - ~ I I Fllfil12.
But R + AR in (2.2.54) must have positive diagonal to satisfy Our definition of
the Cholesky factorization, yhere R was given and ARR-' solves (2.2.52). We now
prow the positivity. From (2.2.59) and (2.2.53) it follows that
Thus R + AR is nonsingular for m y t E [O, 11, and by continuity of elements, R + AR
CHA PTER 2. THE CHOLESKY FACTORIZATDN 32
If we take D = I in Theorem 2.2.8, the assumption (2.2.56) can be weakened to
and we have the following perturbation bound, which is due to Sun [46, 19911,
This is a slightly stronger than (2.2.57) where D is replaced by 1. The proof is similar
to that of Theorem 2.2.8. The only difference is using the fact that 11 up(X)(IF 5
-$$VllF for any syrnrnetric XE RnXn.
-4s we know frorn (1.2.14) in van der Sluis's Theorem 1.2.1 that if we take D =
Dr = diag(l1 R(i , :)Il2) in $(A, D ) l hJC(Al Dr) will be nearly minimal. Thus possibly
the bound (2.2.57) is much smaller than (2.2.61). But the assumption (2.2.56) with
D = Dr is possibly much more constraining than (2.2.60).
2.2 -3 Numerical experiments
In Section 2.2.1, we made first-order perturbation analyses for the Cholesky fac-
torization with norm-bounded changes in A using two different approaches, pre-
sented rc,(A) = I~@;'J~~[[AII; '~ as the corresponding condition nurnber, and sug- - -
gested K = ( A ) C O U ~ ~ be approximated in practice by rlc(A, D r ) = Q ( R ) K * ( D ; ~ R)
with Dr = diag(l1 R(i, :) II2), which could be estimated by a standard condition esti-
mator (see for exarnple Higham [30, 1996, Ch. 14)) in 0 ( n 2 ) . Also we showed the
condition of the problem can usually be (significantly) improved by standard symmet-
ric pivoting. In Section 2.2.2 ~ ~ O ~ O U S perturbation bounds were obtained. In order to
confirm Our theoretical analyses, we have carriecl out several numerical experiments
to cornpute the following measures of the sensitivity of the Cholesky factorkation,
which satisfy (see (1.2.14), (2.2.26), (2.2.30) and (2.2.31))
CHAPTER 2. T H E CNOLESKY FACTOREATION
The computations were performed in Matlab. Here we give three sets of numerical
exam ples.
(1) The matrices in the first set have the form
where Q E RnXn is an orthogonal matrix obtained from the QR factorization of
a random n x n matrix, A=diag(rl,. . . , rn-*, 6,6) with ri, . . ., rn-2 random positive
numbers and O < 6 5 1. We generated different matrices by taking d l combinations of
n E {5,10,15,20,25) and 6 E { l , IO-', . . . , 10-Io) . The results for n = 25, 6 = IO-',
i=O, 1, . . ., 10 are shown in Table 2.2.1, where P is a permutatiod corresponding to
the standard symmetric pivoting. Results obtained by putting the two 6's at the top
of l i were mainly similar.
(2) The second set of matrices are n x n Pascal matrices (with elements aij =
ail = 1, aij = aij-1 + a;-lj ), n = 1, . . ., 15. The results are shown in Table 2.2.2,
where P is a permutation corresponding to the standard syrnmetric pivoting.
(3) The third set of matrices are n x n A = K;f(9)Kn(0), where K,@) are Kahan
matrices, see (2.2.41). The results for n = 5,10,15,20,25 with 9 = 7r/4 are shown
in Table 2.2.3, where II is a permutation such that the first column and row are
moved to the last colurnn and row positions, and the remaining columns and rows
are moved left and up one position - this permutation Ii corresponds to the rank-
revealing pivoting strategy, for details, see Hong and Pan 131, 19921. The permutation
P corresponding to the standard symrnetnc pivoting is the identity, so the standard
symmetric pivoting does not bring any changes t o Our condition number and condition
estimators.
We give some comrnents on the results.
rn The experiments confirm that n * ( ~ ) / f i /S be much larger than %,(A) for ill-
conditioned problems, so the fht-order bound in (2.2.25) may be much smaller
CHAPTER 2. THE CHOLESKY FACTORIZATION
Table 2.2.1: Results for matnx -4 = QAQ* of order 25, A = P A P ~ , D = Dr
Table 2.2.2: Results for Pascal matrices, A = P A P ~ , D = Dr
CH4 PTER 2. T H E CHOLESKY FACTORIZATION
Table 2.2.3: Results for A = G(B)K,(B), t? = n/4, -4 = I I A ~ ~ , D = Dr
-
than that in (2.2.8).
The standard symmetric pivoting almost always gives an improvement on krC(A)
and dC(A, Dr). Table 2.2.2 indicates the improvement can be sig&ficant. Our
esperiments suggest that if the Cholesky factorization of A is approached using
the standard symmetric pivoting strategy, then the condition number of the
Cholesky factorization n c ( P ~ P T ) d l usually have the same order as its lower
lirnit 4'*(~)/2 (the ratio in the last columns of Table 2.2.1 and Table 2.2.2 was
never larger than 4). But Table 2.2.3 shows the ratio can be large. However such
examples are rare in practice, and furthemore if we adopt the rank-revealing .
pivoting strategy, we see from Table 2.2.3 the ratio ~ K ~ ( I I A I I ~ ) / & ~ ( A ) is again
small.
rn Note in Table 2.2.1 and Table 2.2.3 rlc(A, D,) ( dC(PAPT, Dr), 4(n.411T, Dr))
is a very good approximation of n,(A) ( tcc(pApT), ~ ( I I A I I ? ) ) . In Table 2.2.2,
the results with pivoting also show this. For n = 15 without pivoting in Ta-
ble 2.2.2 4 ( A , Dr) overestimates K ~ ( A ) by a factor of about 250, but is much
better for low n. A study of the n = 2 case shows dc(A) can never be much
larger than K=(A), m d we have not found an example which shows dC(A, Dr)
c m be much larger than n,(A) for n > 2, so we suspect n',(A) and d',(A, Dr)
are a t worst K=(A) times some function of n done (probably involving some- .
CHAPTER 2. THE CHOLESKY FACTORIZATION 36
thing like 2") in general. Rom Theorern 2.2.5 we know this is true when the
standard symmetric pivoting strategy is used.
2.3 Perturbation analysis with component-bounded
changes in A
Iii tliis section we consider the case where a bound on IAAl is given. There have been
a few papers dealing with such problems. Sun [47, 19921 first presented a rigoroui
perturbation bound on IARJ in terms of- IdAl, which was improved by Sun [48,
19921. For AA corresponding to the backward rounding error in A resulting from
iiumerical stable cornputations, e.g. Algorithm CHOL, on finite precision floating
point cornputers, DrmaE, Omladir and VeseliC 120, 19941 presented a nice norm-based
perturbation result using their H = DT'AD;' approach. Sun in [47] and [48] also
included component perturbation bounds for a different and a sornewhat complicated
form of the backward rounding error in A.
The main purpose of this section is to establish new perturbation bounds when LIA
corresponds to the backward rounding error, which are sharper than the equivdent
results in [20]. Most of the results have been presented in Chang [8, 19961. Also we
present first-order perturbation bounds for some other kinds of bounds on JAAJ.
In Section 2.3.1 we establish first-order perturbation bounds with a general bound
lAAl 5 CE, and apply such results to a specid case: IAAl 5 e IAl. The motivation
for considering this special form is that possibly the relative error in each element
of A has the same reasonable known bound, for example, often the elements of A
can not be stored exactly on a digital cornputer, and the matrix actually stored
is A + AA with IAAl 5 uJAl, where u is the unit roundoff. In Section 2.3.2 and
Section 2.3.3 we respectively derive first-order and rigorous perturbation bounds with
l4Al 5 eddr, where di = an2, -which cornes h m the backward rounding error -
CHAPTER 2. THE CHOLESKY FACTOMZATION 37
analysis, see Demmel [17, 1989). In our analyses we use both the matrix-vector
equation approach and the matrix equation approach.
2.3.1
First we
First-order perturbation bound with ~AAI 5 E (Al
give the following result by using the matrix-vector equation approach. -
Theorem 2.3.1 Let A E RaXn be symrnetric positive definite, with the Choleskg
factorization A = R ~ R . Let AA E RnXn be a symmetric matriz satisfying IAAl 5 eE
for some nonnegative mat& E weth nonnegative c. If
wltere 1) . II denotes a monotone and consistent matriz n o m , then A + A.4 has the
Cholesky factorization
A + AA = (R + A R ) ~ ( R + AR),
such that
whem IIXIIM = ma* j lxi j] and llXlls = C i j 1 ~ ~ ~ 1 , and for the M - n o m the first-order
bound in (2.3.3) is attainable. Thus, in partictllor, if E = IAI, then
and for the M - n o m the first-order bound in (2.3.5) is uttainable.
Proof. Let G G AA/e (if e = O the theorem is trivial). Since
CHAPTER 2. THE CHOLESKY FACTOMZATION 38
the conclusion of Theorem 2.2.1 holds. Then from (2.2.12). the matrix-vec tor equation
forrn of (2.2.5), we have
This with the Taylor expansion (2.2.3) gives (2.3.2), from which (2.3.3) follows. For
the M-norm the first-order bound is attained for AA satisfying
Theorem 2.3.1 implies that for the Cholesky factorization under relative changes
in the elements of A the condition number (with respect to the M-norm)
pc(.4) = lirn sup { € 4 0
llARlillf : ( A + A.) = ( R + AR)T(R + AR). / A A ~ r il} ~IIRIIM
is given by
We see p C ( A ) is not very intuitive, and is expensive to estimate directly by any
presently known approach. Fortunately by the matrix-equation approach we have the
following practical results.
Theorem 2.3.2 Wzth the same assumptions as in Theorem 2.3.1, A + LIA has the
Cholesky factorization
A + AA = (R + A R ) ~ ( R + AR),
CHAPTER 2. THE CHOLESKY FACTORIZATION
and for a monotone und consistent matriz nonn II - 1 1 ,
and for the M-norm,
where
Proof. ((2.3.7) can easily be proved by using (2.2.6) and (2.2.3). Notice ( A I 5 J I R I , then (2.3.8) follows immediately from (2.3.7), and (2.3.9) is obtained by taking the
norm II - II and using
Similarly (2.3.10) c m be obtained by taking the M-norm and using
where we used the fact that (IABllM 5 IIAllMllBIIi, llAllmllBllM for any A E RmX*
and B E R P X n , which c m easily be verified. From the definition of p, (A) and (2.3.10),-
we see the inequality (2.3.12) holds. 0
The quintuple matrix product in (2.3.8) is nice, as it counteracts the effect of
poor column scding in R (the first product and the third product), and. poor row
c - scaling (the fourth product). This is reflected in both of the first-order bounds in
CHA PTER 2. THE CHOLESKY FACTORlZATION 40
(2.3.9) and (2.3.10). The significance of this theorem is now we can estimate a bound
on the relative change in R in 0(n2) by using the standard condition estimators.
Numerical experiments suggest usually pL(A) is a reasonable upper bound on the
condition number p,(A) . But the former can be arbi t rady larger than the latter. -
For esample, if A = [L 62: 6 4 ] with s m d 6 > 0, then R = [i 1. Simple
compiitations give
It is easy to check the overestimation was caused by the inequality 1 up(B)I 5 IBI
used in deriving (2.3.10). Cleady the strictly lower triangular of 1 BI can be arbitrarily
larger ttian that of 1 up(B)I. Hence (2.3.10) can sometirnes overestimate the true
sensitivity of the problem, so can (2.3.9) for the same reason. We also can give an
esamplc to show &(-4) can overestimate p,(A) due to the inequali ty (2.3.14)-
A careful reader may have noticed the following fact: in the proof of Theo-
rem 2.2.37 we also used the inequality 1 up(B)I 5 IBI (see (2.2.28)) but we men-
tioned in Section 2.2.3 that we have not found an example to show rcC(A) c m be
arbit rarily larger t han K, ( A ) . and furthemore initial investigations suggest probably
~&(A)/rc,(il) can be bounded above by a function of n. Why does the inequality
appear to have different effects? The reason is that here B is the function of only R
and so has a special structure, whereas in (2.2.28) B has a parameter matrix G, and
for any R possibly G c m be chosen such that llBll is close to II up(B)II.
Comparing the first-order bound in (2.3.9) with that in (2.2.43), we see the former
is at least as small as the latter since cond(RT) < n(RT). If the ill-conditioning of
R is mostly due to bad scaling of the columns, then cond(RT) < K ( R ~ ) , that is to
Say the former can be much smaller than the latter. That is not surprising since
the assumption lAAl 5 e 1-41 in Theorem 2.3.2 provides more information about the
perturbations in data than the assumpti& IlAAlI 5 c IlAl1 in Theorem 2.2.6.
CHA PTER 2. THE CHOLESKY FACTORIZATION 41
The standard pivoting strategy can usually improve p C ( A ) , just as it can usually
improve K,-(A). In fact we have the following theorem.
Theorem 2.3.3 Let A E RnXn be syrnmetnc positive d e f i i t e with the Cholesky fac-
toriration P A P ~ = RTR when the standard pivoting stmtegy is used. Then
Proof. Standard pivoting ensures 1 rii 1 2 1 ri j 1 for dl j 2 i. Then from the definit ion of
&(A) in (2.3.11), (2.3.15) is immediatelyobtained by using (1.2.21) in Theorem 1.2.2.
From (2 .3 .15) it is natural to raise the following question: is it possible that
the standard pivoting rnakes c o n d , ( ~ ~ ) much worse than that without pivoting,
so that p C ( ~ ~ P T ) is actually much worse than &(A)? The answer is no. In fact
we can show any pivoting can not bring an essential change to c o n d , ( ~ ~ ) . Let
P A P ~ - RT R, where P is any permutation matrix. Let Dl = diag(lJ RT(i, :)[Il) and
let D2 = diag(l1 RT(i, :)Il2). Then 1 1 ~ 2 ~ 0 ~ 1 1 ~ 5 fi and < 1. By using
van der Sluis's Theorem 1.2.1, we have
Notice that D2 = diag(ll RT(i, :) I l 2 ) = d iag (p~pT) ' i2 = P diag(A)lI2 pT, then
Notice I I H - ~ Il2 is independent of P, thus permutation has no significant effect on
- cond,(RT).
CH-4PTER 2. THE CHOLESKY FACTORIZATION 42
Using the same approaches as in Section 2.2.2 we could easily provide rigorous
perturbation bounds with ld-41 5 E IAl. But we choose not to do so here in order to
keep the material and the basic ideas as brief as possible.
2.3.2 First-order perturbation bounds with backward round-
ing errors
In this section ive first use the matrix-vector equation approach to derive tight per-
turbation bounds, Ieading to the condition number xC(A) for perturbations having
bounds of the form of the equivalent backward rounding error for the Cholesky fac-
torization, then use the n m n x equation approach to derive a practical perturbation
bound, leading to a practical estimator x',(A) of x c ( A ) We also compare x , (A) with
K,-(A), and ,<=(A) with d c ( A ) . Finally we show how standard pivoting improves the
condition number x,(-4).
Before proceeding we introduce the following result due to Demmel [17, 19891,
also see Higham [30, 1996, Theorems 10.5 and 10.71.
Lemma 2.3.1 Let A DcHDc E RnXn be a symmetric positive defiite Poating
point matriz, where LI, = diag(~) ' /*. If
where 6 = ( n + l ) u / ( l - 2(n + 1)u) wzth u being the unit round-08, then the Cholesky
factoriration applied to A succeeds (barring underflow and ouerjlow) and produces a
nonsingular R, which satisfies
A + A A = R ~ R , lAAl d d T ,
1/2 where di = aii . O
This lemma is applicable not only to ~ l g o r i t h m CHOL but also to its standard
mat hematicdly equivalent vanants.
CHAPTER 2. THE CHOLESKY FACTORIZATION 43
Based on Lemma 2.3.1, we can establish the following bound for the computed
Cholesky factor R.
T'heorem 2.3.4 With A = R~ R and the same assumptions as in Lemma 2.3.1, for
the perturbation AA and result R in (2.3.17) we have
where AR = R - R,
h
R G RD;', and Wk is just GR in (2.2.15) with each entry r, replaced by fi,. The
first-order bovnd in (2.3.19) 2s attainable for the M - n o m , and the first-order bound
in (2.3.20) is approximately attaznable.
Prooj. Let G E AA/E. By (2.3.17) and (2.3.16), we have
p ( ~ ; l ~ - l ~ ; l ~ ~ ) = p ( ~ ' l ~ ; l ~ ~ ~ ; ' )
I(H-' 112 IID;'AAD;l112 I E IIH-'112 Ip ; ldf l~ ; l II. ne I I H - ' ~ ~ ~ < 1.
Then as we did in the proof of Theorem 2.3.1 we can apply Theorern 2.2.1 to show
(2.3.18) and (2.3.19) hold and the first-order bound in (2.3.19) is attainable for the
M-norm.
It remains to show (2.3.20) and its attainability. Rom (2.2.5) in Theorem 2.2.1
and R = RD,, we have
CHA PTER 2. THE CHOLESKY FACTORlZATION 44
As we know (2.2.5) can be rewritten as the matrix-vector equation form (2.2.15), so
similarly (2.3.22) can be rewritten as the following matrix-vector equation form
- C i ; i u v e c ( ~ ( 0 ) ~ ; ' ) = duvec(~;'G~;'). (2.3.23)
It is easy to verify that uvec( R(o)D;') = DL' uvec(~(0) ) with Vc as in (2.3.21), then
from (2.3.23) we have
uvec(fi(0)) = D~@;' d u v e c ( D ; ' ~ ~ ; ~ ) ,
whicli with IG/ = JArllcJ 5 d B gives
I I R ( ~ ) ~ ~ F 5 I IVC&' I I~ I I D ; ' ~ ~ ~ D ~ ' ~ ~ F = nll'~,@;' 112-
Tlius (2.3.20) is obtained immediately from the Taylor expansion (2.2.3).
Obviously there exists a symmetric matrix F E RnXn such that
which shows the first-order bound in (2.3.20) is approxirnately attained for such G.
It iç easy to verify that D, = diag(a!/2) = diag(llR(:, j)l12). Thus fi, the Cholesky
factor of H (H = D;'AD;' = D;'R~RD;' = R ~ R ) , has columns of unit Euclidean
length. That is the reason that we use the notation D,, where 'c' denote 'column'.
Since the first-order bound in (2.3.20) is approzimately attainable, the quantity
can be thought of as
the 'form of backwhd
the the condition number for the Cholesky factorization with
rounding error satisfjhg (2.3.17) when the combination of F-
CHAPTER 2. T H E CHOLESKY FACTORJZATIOIV 45
and 2-noms is used. Notice here we have relaxed a little the requirement a condition
nurnber should satisS. Strictly the condition number should be defined by
But from the proof of Theorem 2.3.1 we see
so such relaxing is harmless. The main reason for introducing the condition number
with respect to the combination of the F- and 2 -noms rather than the M-norm is
tliat we are iiiterested in the cornparison of the results given in this section n-i tli tliose
given in Section
We now give
2.2.
a lower bound on x,-(A). Since has the form (cf. (2 .227 ) )
where
Hence we get the following lower bound on x,-(A)
This bound is tight for any n, since equality will hold by taking R = diag(rii), with
O < rii < T,,,, i # n. But it is a little complicated. In fact we can get a slightly
weaker but simpler lower bound. Since
CH-4PTER 2. THE CHOLESKY FACTORIZATION
we have from (2.3.28) that
Numerical experiments suggest that usually xC(A) is smaller or mucli snialler tlian
il), the condition number for a general perturbation AA, defined by (2.2.20). We
now relate these two condition numbers mathematically for the general case. For the
case wliere standard pivoting is used in computing the Cholesky factorization. see the
comment following Theorem 2.3.9.
Theorem 2.3.5
Proof. Since R = RD,, it is easy to verify by the the structure of & tha t
where V, is defined by (2.3.21) and
Thus using (2.3.31) and ma- aii 5 IIAl12, we have
- The proof is complete. 0
CHAPTER 2. THE CHOLESKY FACTOMZATION 47
The first inequality in (2.3.30) is attainable, since equality will hold by taking
-4 = c l with c > O. The second inequality is at least nearly attainable. In fact by
taking A = RTR, with R = diag(bn-', F2, . . . , b,1) + ele: with small 6 > O , we . -
easily obtain
This example also suggests that possibly K,(A) is much larger than xc(A) if the
maximum element is much laxger then the minimum element on the diagonal of A.
A reader might want to know why the first inequality in (2.3.30) is not xC(A) 5
K,-(A). This can easily be explained. For a general AA, we have by Theorem 2.2.3
that I14RIIF/II R1I2 5 K=(A)c, where c satisfies llAAllF 5 c IIAl12. For the backward
rounding error AA, we have by Theorem 2.3.4 that (IARIIFIIIRII~ < xc(A)e, where e
satisfies IAA( < e d 8 (see (2.3.17)), from which it follmvs that IIAAII 5 c l l d 8 II =
E I I RIIF, where 1 1 RllF satisfies attainable inequalitieç llAllz 5 liRll$ 5 nllAlln
Like n,(A), xJA) is difficult to estimate directly. Now we derive practical per-
turbation bounds by using the matrix equation approach.
Theorem 2.3.6 With A = R ~ R and the same assumptzons as in Lemma 2.3.1, for
the perturbation AA and result R in (2.3.17) we have
where AR = R - R, R e RD;', and
CH-4PTER 2. THE CHOLESKY FACTORIZATION 48
Proof. Let G r A A / c Since p ( ~ - ' A A ) < 1 (see the proof of Theorem 2.3.4), we
can apply Theorem 2.2.1 hem. Frorn (2.2.6) with R = RD,, it follows that
Then by the Taylor expansion (2.2.3), (2.3.33) follows immediately.
Aiso from (2.3.38) we have for any D E D, that
Thus taking the Frobenius n o m we have
whicli with (GI < d 8 gives
Since this holds for any D E D,, (2.3.34) follows by using Taylor expansion (2.2.3).
It remains to prove (2.3.35). From (2.3.24) and (2.3.39) we have
~ ( ~ . ~ ' d u v e c ( ~ ; ' GD;')^!^ 5 I I R - ' ~ ~ ~ ~ID;'GD;' I I F I I R - ' D ~ ~ ~ IID-' RIIi.
- (2.3.40)
.4ctually this holds for any symmetric G E RnXn since it was essentially obtained
from 'the matrix equation RTX + X T ~ = G with X triangular by the two different
approaches. Notice Ild~vec(D;'GD;')11~ = II D;'GD;'II F, thus frorn (2.3.40) we
must have
ll'oc@il 112 5 IIR-'112 1 1 f i - l ~ l 1 2 I I ~ - ' ~ 1 1 2 7 -
u which implies (2.3.35).
CHAPTER 2. THE CHOLESKY FACTOMZATION 49
Notice that 11 up(X)IIF 5 for any symmetric X (see (1.2.7)), and 11 RI[$ =
llH1I2, then from (2.3.38) it is easy to obtain
which is essentially the first-order bound of DrmaZ, OmladiC and Veselié [20, Theorem
3.11, except that their bound is rigorous and only the 2-nom is used. But this bound
can severely overestimate the tme relative errors of the computed Cholesky factor.
In fact -
&(A) 5 x X A , 1) = 4W-l Un7 (2.3.12)
and nll H-' Il2 can be arbitrarily larger than $,(A). For example, if A =
with small b > O, then R = [ O 1 ] . Tal. O = diag(&, r ) , tbeo simple cornputa-
tions give
Thus the new approximation &(A) to the condition number x,(A) is a significant
improvement on that of DmaE et al. Furthemore, it is easy to see IIH-' I l 2 is invariant
if pivoting is used in computing the Cholesky factorization of A, whereas xc(A)
and yc (A) depend on any pivoting. Thus the new bounds (2.3.20) and (2.3.34)
more closely reflect the t rue sensitivity of the Cholesky factorization t han (2.3.41).
However, if the ill-conditioning of R is mostly due to bad scaling of its columns, then
nll H-l Il2 is âmdl, and is as good as &(A).
We see (2.3.42) implies nll H-' I l 2 is also an upper bound on xC(A) . In fact we
have the following stronger result
which can easily be provecl by using (2.3.24), (2.3.38) and the fact that for any
syrnmehic X, (1 up(X)nF _< -&IxII~ . That is to Say the b t - o r d e r bound in (22.20)
CHAPTER 2. THE CHOLESKY FACTORIZATION 50
is a t l e s t a s small as that in (2.3.41). The example above suggests the former can
be much smdler than the latter.
The practical outcome of Theorem 2.3.6 is that x',(A) is quite easy to estiniate.
According to (1.2.10) in van der Sluis's Theorem 1.2.1, al1 we need to do is to choose
D = D, = diag(llR(i, :) I l2) in $&4, D) in (2.3.37), then use norm estimators (see for
example Higham [30, 1996, Ch. 141) to estimate $,(A, Dr) in 0 ( n 2 ) .
Numerical experiments suggest usually xC(A) is a reasonable approsimation to
,yc(A). But the following example shows &(A) can still be much larger than xC(A) , r 1
In Theorem 2.3.6 the F- and 2-norms are used. If we use a monotone and consistent
matrix norm II.II, then from (2.3.38) it is straightforward to show we have the followiiig
perturbation bound instead of (2.3.34),
The advantage here is that the boiind does not involve the scaling matriu D.
In Theorem 2.3.5 we established a relationship (2.3.30) between x,(A) and K=(A)
defined in Theorem 2.2.3. 1s there a similar relationship between yc(A, D) (or dC(.4))
and dc(A, O) (or G ( A ) ) defined in Theorem 2.2.4? The answer is yes.
Theorem- 2.3.7
1 niaxi a;; -x>(A, n D) 5 n',(A, D) I min; ai;
xXA9 DL
and barn this, 1 ma* ai; - x m n 5 K',(A) 5 min; ai; x ~ A ) -
CHAPTER 2. THE CNOLESKY FACTORIZATION
On the other hand,
The proof is complete. 13
The standard pivoting strategy can usually improve xC(A) too. In fact we have
the following theorem.
Theorem 2.3.8 Let A E Rn"" be symrnetnc positive definite with the Cholesky fac-
torization P A P ~ = RTl? when the standard pivoting strategy Hs used. Then
CHAPTER 2. THE CHOLESKY FACTORLZATION
and so from (2.3.37) we obtain
. - Standard pivoting ensures Ir i i [ 2 Ir,[ for dl j > i. Then (2.3.46) follows immediately
by 11 R - I = 11 H-' Ili/* and (1.2.18) in Theorern 1.2.2. 0 -
111 Theorem 2.2.5 we have
By van der Sluis's result (1.2.16),
where it is possible tha t
K : I 2 ( ~ ) « 4i2(~) if A is badly scaled-the columns of R are badly scaled. But 1 5 IIH1I2 < n since H
is positive definite with hii = 1, hence
and it is possible that - - 1 1 /2 llH 112 « 4'2(4
if R has badly scaled columns. Thus it is expected that x , ( P A P ~ ) Gan be arbitrarily
smaller than n C ( P ~ P T ) .
Suppose the Cholesky factorization of A be approached by using the standard
symmetric pivoting strategy: P A P T = RTR. If the permutation matrix P is known
beforehand and the Cholesky fartorization is applied to P A P ~ directly, it is easy to
observe that Lemma 2.3.1 still holds except that now D, and H should be redefined
as
T 1/2 D , = d i a g ( P A P ) , H E D ; ' P A P ~ D ; ' ,
CH.4 PTER 2. THE CHOLESKY FACTORIZASION
and -4 + d A = R ~ R in (2.3.17) should be replaced by
P ( A + A A ) P ~ = RTR.
Theii by Theorem 2.3.4 we have
- Rll~/llR112 < X C ( P A P ~ ) ~ -
This with (2.3.46) suggests that the computed Cholesky factor R has high accuracy.
However. usuaity the permutation matrix P for A is not known Eeforehand. The
permutation matrix for A + AA, which is produced in the computing process, is
possibly different from that for A. Fortunately, Higharn [30, 1996, Lemma 10.111
'showed that if there are no ties in the pivoting strategy for P A P ~ = R T ~ , then for
sufficiently small AA, the two permutation matrices are the same. Thus the Cholesky
factorization with standard syrnmetric pivoting will most likely gives an R which is
about as accurate as possible.
2.3.3 Rigorous perturbation bound with backward rounding
errors
DrmaE, OmladiC and VeseliE 120, 19941 obtained rigorous perturbation bounds. For
cornparison here we also present our rigorous perturbation bounds, which can be
obtained by applying the results in Section 2.2.2.
Theorem 2.3.9 Let A E RnXn be a syrnmetric positive definite floatzng point mutriz,
with the exact Choksky factorization A = R ~ R . D e f i e D, = diag(.-4)1/2 and R = RD;'. If
-1 2 nc l l ~ ~ @ ~ Il2/ min aii < 114, (2.3.48)
where e = (n + l)u/(l- 2(n + 1)u) with u king the unit round-08, and Vc is dejined
b y (2.3 -2 1 ) , then the Cholesky factorization applied to A succeeds ( barn-ng unde f low
CH-WTER 2. THE CHOLESKY E4CTORlZATION
and overflow) and produces a nonsingular fi, which satisfies
where AR r R - R. Obviously the weaker bound above can be rewn'tten in the
following f o m :
Proof. Let H = D;l.4D;' as before. Then H = fiTR. Frorn (2.3.29) and (2.3.27),
which with the assurnption (2.3.48) implies that (2.3.16) holds. Thus by Lemma 2.3.1
t he Cholcsky factorizat ion applied to A succeeds and the computed Cholesky factor
R satisfies
A + A A = R ~ R , ( A A ~ 5 c ddT ,
which with the assurnption (2.3.48) implies that the condition (2.2.47) of Theo- -
rem 2.2.7 is satisfied. Then (2.3.49) is immediately obtained by using Theorem 2.2.7.
O
Theorem 2.3.10 Let A E RnXn b a symmeCnc positive definite fIoating point matriz
with the ezact Cholesky factorization A = R ~ R . Define D, d i a g ( ~ ) ' / ~ and R ZE
CHAPTER 2. THE CHOLESKY FACTORlZATION
RD;'. Assume D E Dn . If
ne I I R - ' ) ~ ~ 11 R - ' ~ l l 2 IID-'112 < 114, (2.3.51) . -
tuhere 6 (n + l )u / ( l - 2(n + 1)u) with u being the unit round-ofl, and R = RD;1.
then Cholesky factorization applzed to A succeeds (buning underflow and overflow)
and produces a nonsingvlar R, which satisfies
where AR r R - R.
Proof. Let H r D;'AD;'. Since
which with (2.3.51) implies that (2.3.16) holds. Thus by Lemma 2.3.1 Cholesky
factorization applied to A succeeds and the cornputed Cholesky factor R satisfis
where di = a:/2. Then
Hence the bound (2.3.52) c m be easily obtained from (2.2.55) in Theorem 2.2.8.
O
for
If we take i) = 1, it is easy to show by using the fact that II up(X)IIF 5 -&IxIJF any syrnmetric X E RnXn that the assumption (2.3.51) can be weaken to -
~ ~ ~ H - ' ~ ~ ~ 5 -1/2, (2.3.54)
CHAPTER 2. THE CHOLESKY FACTOEUZATION
Table 2.3.1 : Results for Pascal matrices without pivoting
and we have the following bound:
which is a slightly stronger than (2.3.54) where D = I. An equivdent rigorous bound
where only the 2-nom is used was obtained by Dm&, Omladi? and VeseliE [20,
19941.
As we pointed out in the comment following Theorem 2.3.6, with a correct choice
of D, e-g., D = Dr = diag((l(R(i, :)Il2), possibly &(A, D) can
11 H-' Il2. So the bound (2.3.55) is potentially weak, although
is not as constraining as (2.3.51).
2.3.4 Numerical experiments
In Sections 2.3.2 and 2.3.3 we presented new first-order and
bounds for the changes-caused by backward rounding errors
Be much srnaller than
the condition (2.3.54)
rigorous perturbation
for the Cholesky fac-
CHAPTER 2- THE CHOLESKY FACTORlZATION
Table 2.3.2: Resultç for Pascal matrices with pivoting, r pAPT
torization using two different approaches, defined xc(A) = n l l ~ , @ i ~ 1 1 1 / 1 1 ~ 1 1 : ' 2 as
the condition number of the problem, and suggested that usually x,(A) could be
estirnated in practice by $=(A, Dr) = n l ( ~ - ' 1 1 ~ 11 R-'D& llD;iR112/11R112, with Dr =
diag(llR(i, :)Ilz), which can be estimated by standard n o m estimators in 0(n2) . Our
bounds are potentially much smaller than the equivdent bound in Drrnaii, OmladiE
and Veselié [20, 19941. Also we compare x,(A) with nc(A), and compare corre-
sponding estimators &(A, D) with G(A, D) as well. These condition numbers and
condition estimators satisfy the following inegualities (see (1.2.10), (2.3.30), (2.3.35),
(2 .3.36), (2.3.42), (2.3.43); and (2.3.44)):
CHAPTER 2. T H E CHOLESKY FACTORIZATION 58
Now we give a set of examples to show Our findings. The matrices are n x n Pascal
matrices, n = 1,2, . . . , n. The results are shown in Table 2.3.1 without pivoting and
in Table 2.3.2 with pivoting. . -
Note in Tables 2.3.1 and 2.3.2 how &nll~-'11~ can be worse than x,(A). In
Table 2.3.2 pivoting is seen to give a significant improvement on xC(A) . Also we
observe from both Tables 2.3.1 and 2.3.2 that &(A) is a feasonable approximation
of xc(rl). We see xC(A) is smaller than rc,(A) for n > 2.
2.4 Summary and future work
Although with norm-bounded changes in A the Sun [46, 19911 and Stewaxt (41, 19931
first-order perturbation bound (2.2.8) is relevant in the sense that some problems do
attain close to the indicated condition, we have shown that it gives a large over-bound
for most problems. The more refined bound (2.2.25) obtained b) the matnu-vector
equation approach is usually significantly stronger, and is never weaker, and the re-
sulting condition number K,(A) more accurately reflects the true sensitivity of the
problem. Further, the sizes of Our condition numbers depend on any symmetric pivot-
ing used, and numericd results and analyses show that the standard syrnmetric pivot-
ing strategy usually leads to a near optimally conditioned factorization for a given A
in P A P ~ = R*R. Because of the diEculty in understanding and computing iç,(A),
there was need for, and fortunately we have been able to give by the matrix-equation
approach, a simpler bound. Although the new bound (2.2.29) is somewhat weaker,
it provides a computationally practical and useful estimate dc(A, Dr) of K,-(A), and
at the sarne tirne gives us insight into why the Cholesky factorization is bften less
sensitive than we thought, and adds to our understanding as to why the standard
pivoting usually gives a condition number approaching its lower bound $dI2(~) . For the perturbation AA which cornes from the backward rounding error analysis,
we first presented first-order (nearly) attainable bounds (see Theorems 2.3.4) by the
CHAPTER 2. THE CHOLESKY FACTOItIZATION
mat rix-vector equation approach, t hen gave computationally practical bounds (see
Theorem 2.3.6) by the matrix equation approach. Even though the latter are weaker
than the former, both of them are (potentially) stronger than the corresponding - -
equivalent bound (2.3.41) of D m 2 et al. Our condition number xC(A) more closely
reflects the true sensitivity of the problem. ALso numericai experiments and analysis
- show that usually the standard symmetric pivoting strategy c m significantly improve
the condition nurnber x C ( A ) So the computed Cholesky factor rnost likely has hi&
accuracy when the standard symmetnc pivoting is used.
With the relative changes in the elements of A (Le. lAAl 5 €1 Al) we presented
first-order perturbation analyses, which resulted in the condition number p,(.4) and
a practical and useful upper bound &(A) on pc(A) .
In the future we would like to
0 Investigate the ratio K ~ ( A ) / ~ ~ ( A ) , which we suspect is bounded by the function
of n alone, probably involving something like 2".
a Explore the effect of rank-revealing pivoting on K~ in both theory and compu-
tations, and study the optimization problem minp K , ( P A P ~ ) .
m Give a better approximation to xC(A) than our current &(A), which can some-
times overestimate x,(A), or alternatively look for other methods to estimate
xc(A) efficiently.
r Give a better approximation to p,(A) than Our current pC(A), which can some-
times overestimate pc(A), or alternatively look for other methods to estimate
pC ( A ) efficient ly.
Chapter 3
The QR- factorization
3.1
The Q R
Introduction
factorization is an important tool in matrix computations: given an m x n
real matrix A with full column rank, there exists a unique rn x n real matrix Q with
orthonormal colurnns, and a unique nonsingular upper triangular n x n real matrix
R with positive diagonal entries such that
A = QR.
The rnatrix Q is referred to as the orthogonal factor, and R the triangular factor.
Let AA be a real rn x n matrix such that A + d A is still of full colurnn rank, then
-4 + AA has a unique QR factorkation
A + A A = (Q + AQ)(R + AR).
The goal of the perturbation analysis for the QR factorization is to determine a bound
on Il AQll (or Id&[) and 11 AR11 (or IARI) in terms of (a bound on) 11 AAll (or IdAl).
The perturbation analysis for the QR factorization has been considered by several
authors. Given )IdAl], the first result was presented by Stewart [39, 1977). That was
further modified and improved by-Sun [46, 1991]1 Using ditferent approaches Sun [46,
CHAPTER 3. THE QR FACTORIZATION 61
19911 and Stewart [41, 19931 gave first-order perturbation analyses. Recently a new
rigorous perturbation bound for Q alone was given by Sun 149, 19951. Given ( A A ( ,
Sun [48, 19921 presented a rigorous analysis for the components of Q and R. For A-4 . - which has the form of the equivalent backward rounding error (componentwise form)
from a numerically stable computation of the QR factorization, Zha (55, 19931 gave
afirst-order analysis.
The main goal of this chapter is to establish new first-order perturbation bounds
given a bound on IlAAll, which are sharper than the equivalent results for the R factor
iti Suu 146, 19911 and Stewart [41, 19931, and more straightforward than the sharp
result in Sun [49, 19951 for the Q factor, and present the corresponding condition
numbers which- more closely reflect the true sensitivity of the problem.
The rest of this chapter is organized as follows. In Section 3.2 we obtain expres-
sions for ~ ( 0 ) and k(0) in the QR factorization A + tG = Q(t)R(t). These basic
sensitivity expressions will be used to obtain our new perturbation bounds in Sec-
tions 3.3 and 3.4, but in Section 3.2 they are also used to denve Sun's results on the
sensitivity of R and Q. In Section 3.3 we give a refined perturbation analysis for Q,
showing in a simple way why the standard column pivoting strategy for A can be
beneficial for certain aspects of the sensitivity of Q. In Section 3.4 we analyze the
perturbation in R, first by the detailed and tight matrix-vector equation approach,
then by the straightforward matrix equation approach. We give numerical results
and suggest practical condition estimators in Section 3.5. Finally we summarize Our
findings and point out future work in Section 3.6.
CHAPTER 3. THE QR FACTORlZATION 62
3.2 Rate of change of Q and R, and previous
results
Our perturbation bounds for Q will be tighter if we bound separately the perturba-
tions along the column space of A and along its orthogonal complement. Thus we
introduce the following notation. For real rn x n A, let Pl be the orthogonal projector
ont0 R ( A ) , and Pz be the orthogonal projector onto R ( A ) I . For real m x n AA define
so €* = E: + € 2 - Here we denve the basic results on how Q and R change as A changes. We then
derive the first-order results obtained by Sun [46, 1991][49, 19953. The following
theorem summarizes the results we use later.
Theorem 3.2.1 Let A E RmXn be of full column mnk n with the QR factorkation
A = QR, let G
where ' i2(A) E
has the unique
be a real m x n matriz, and let AA = CG, for some E 2 O. If
llAt11211~112 and Pi is the O-rthogonal projector onto R(A) , then A+ AA
QR factorizatzon
with AR and AQ satisfying
AR = ER(O) + O(€*), AQ = E Q ( O ) + 0(t2),
where k(0) and Q ( O ) are def i ed by the unique QR jactorization
~ ( t ) = A + t~ = ~ ( t ) ~ ( t ) , ~ ~ ( t ) ~ ( t ) = 1, 1t1 I E ,
CHAPTER 3. THE QR FACTORIZATION
and so satisfy the equations
wheie the 'up' notation is defined &y (1.2.3).
Proof. Take any Q such that [Q,Q] is square and orthogonal, then for al1 It( 5 e
From the inequality (3.2.2) we see lltQTGllz 5 umin(A) = omin(R) for al1 Itl 5 C. TINS
A + tG has full column rank and the unique QR factorization (3.2.6). Notice that
R(0) = R, R ( E ) = R + AR, & ( O ) = Q and Q ( e ) = Q + AQ, so (3.2.3) holds.
It is easy to ~ e n f y that Q ( t ) and R ( t ) are twice continuously differentiable for
It( 5 e from the algonthm for the QR factorization. If we differentiate R ( t ) T ~ ( t ) =
~ ( t ) ~ ~ ( t ) with respect to t and set t = 0, and use A = QR, we obtain (3.2.7) which
we will see is a linear equation unzquely defining the elements of upper triangular
R ( O ) in terms of the elements of QTG. From upper triangular R(O)R-' in
we see with the 'up' notation (see (1.2.3)) that (3.2.8) holds. Next differentiating
(3.2.6) at t = O gives
G = y ~ ( o ) + Q(O)R,
and combining this with (3.2.8) gives'(3.2.9). Finally the Taylor expansions for R(t)
and Q(t ) about t = O give (3.2.4) and (3.2.5) at t = e. [1
By Theorem 3.2.1 we can easily obtain the kt-order perturbation bound for R
given by Sun [46, 19911 and dso by Stewart [41, 19931, and the first-order bound for
Q given by Sun [49, 19951.
CHAPTER 3. THE QR FACTORIZATION 64
Theorem 3.2.2 Let A E RmXn be of /ull column mnk n with the QR factorilution
A = QR, and let AA be a real m x n matriz. De jke E = IlAAIIF/IIA112 and ci = I I Q Q * A A I I ~ / I I A ~ ~ ~ , see (3.2.1). If (3.2.2) holds, then A + AA has the unique QR . - facton'zation
A + AA = (Q + AQ)(R + AR),
where
Proof. Let G AA/c (if E = 0, the theorem is trivial), then
Cleariy al1 the conclusions of Theorem 3.2.1 hold here. From (3.2.8) and the fact that
for symmetric X, I I up(X)(IF 5 & I I x I I F (see (1.2.7)) we have
and ic2(R) = n2(A),
Thus (3.2.10) follows
If [Q, Q] is square
from the Taylor expression (3.2.4).
and orthogonal, then from (3.2.9) we have
CHAPTER 3. THE QR FACTORIZATION
yhicli, witli the Taylor expression (3.2.5), gives the bound (3.2.1 1) for the Q factor.
cl
Mre see ~ K ~ ( A ) is a mesure for the sensitivity of both R and Q , but i t is not
tlie act ual condition number since for general A the first-order bounds in (3.2.10) and
(3.2.1 1) are not attainable. Thus & K ~ ( A ) is a condition estimator for both R and
Q in the QR factorization.
3.3 Refined -analysis for Q
The results of Sun [49, 19951 give about as good as possible overall bounds on the
change 4 Q in Q. But by looking a t how AQ is distributed between R(Q) and its
ortliogonal complement, and following the ideas in the proof of Theorem 3.2.2, we
are able to obtain a result which is tight but, unlike the related tight result in [49],
easy to follow. It makes clear exactly where any ill-conditioning lies. From (3.2.5)
with Q = [Q, QI square and orthogonal,
and the key is to bound the first term on the right separately from the second. Note
from (3.2.9) with G = AA/e and (3.2.1) that
where G can be chosen to give equality here for any given A. Hence
CH,ilPTER 3. THE QR FACTORIZATION 66
and for that part of AQ in R(Q) the condition number (with respect to the combi-
nation of the F- and 2-noms)
t i p s ( ~ ) z lirnsup { €4
'1P2nQ"F : A + AA = (Q + AQ)(R + AR), eallQII2
is given by -
K*.L ( A ) = Q ( A ) .
Sow for the part of AQ in 'R(Q). We see from (3.2.9) that
which is skew symmetnc with clearly zero diagonal. If we partition Q, G and R as
then from (3.3.2) we have
It is easy to verify for any h - 1 that equalities are obtained by taking G = (qnyT, O),
with y nonzero such that 1 ) ~ : y ~ J 2 = IJ&-A1)12 11y112. It follows that the first-order
bound is attainable in
CHAPTER 3. THE QR FACTORUATION
so for that part of AQ in R(Q) the condition number
In some problems we are mainly (in fact only, if A is square and rionsingular)
interested in the change in Q lying in R(Q), and this result shows its bound can be
smaller than we previously thoughtr In particular if A has only one small singular
value, and we use the standard column pivoting strategy in computing the QR fac-
torization, then will.usual1y be quite well-conditioned cornpaxed with R, and
we will have II ~ ; ! ~ 1 1 ~ 1 1 ~ 1 1 ~ < n2(A). However for some special cases this may not be
true, for example the Kahan matrix in Section 3.5, and then a rank re\realing pivot-
ing strategy çuch as in Hong and Pan [31, 19921 would be required to obtain sudi aii
improvement .
We now summarize the above results as the following theorem.
Theorem 3.3.1 Let -4 = [Q,Q] l o l be the QR factorization of A E RmXn with
L 4
fvll column mnk, and let AA be a real m x n matriz. Let E = IlAAIIF/IIA112, €1 = IIQQT~~I(F/II~112 and €2 = I I Q Q ~ A A ( I ~ / I I A ~ ~ ~ If (3.2.2) holds, then there is a
unique QR factorizatzon satisfying
A + AA = (Q + A Q ) ( R + AR),
where
with
CHAPTER 3. THE QR FACTOREATION 68
3.4 Perturbation analyses for R
In Section 3.2 we saw the key to deriving first-order perturbation bounds for R in
- the QR factorization of full column ranlc A is the equation (3.2.7). Like in Chapter
2 for the Cholesky factor, we will now analyze the equation in two ways. The first,
the matrix-vector equation approach, gives a sharp bound leading to the condition
number r i ,@) for R in the QR factorkation of A; while the second. the matris
equation approach, gives a clear improvement on (3.2.10), and provides an upper
bound on r;,(A). Both approach& provide efficient condition estimators (see Chang
and Paige [9, 19951 for the matrix-vector equation approach), and nice results for
the special case of A P = QR, where P is a permutation matrix giving the standard
column pivoting, but we will only derive the rnatrix equation versions of these. The
tigliter but more complicated matrix-vector equation analysis for the case of pivoting
is given in (91, and only the results will be quoted here. Al1 Our analyses in this section
are based on the same assumptions as in Theorem 3.2.2. Most of the results to be
given here have been presented in Chang, Paige and Stewart [14, 19961.
3.4.1 Matrix-vector equation analysis for R
The matrix-vector equation approach views the -mat& equation (3.2.7) as a large
matrix-vector equation. The upper and lower triangular parts of (3.2.7) contain iden-
t icd information. By using the 'uvec' notation defined by (1.2.4) and 'vec' notation,
we can easily show the upper triangular part can be rewritten in the following form
(for the derivation, see [14]):
f l l
T l 2 r22
Tl3 r23 r33
. .
Since R is nonsingular, WR is &O, and frorn (3i4.1)
uvec(fi(0)) = wil ZR v ~ c ( Q ~ G ) .
Remernbering R ( O ) is upper triangular, we see
I I R ( ~ ) ~ ~ F = 1luvec(&o))ll2 = I l w ; ' z R ~ e c ( & ~ ~ ) l l 2
CHAPTER 3. THE QR E4CTORIZATION
- where for any nonsingular upper triangular R, equality can be obtained by choosing
G sucli that v e c ( ~ ~ G ) lies in the space spanned by the right singular vectors corre-
sponding to the largest singular value of ~ 6 ' 2 ~ . Therefore we see from the Taylor
espansion (3.2.4),
and this bound is attainable to first order in c. This implies for R in the QR fac-
torization of A the condition number (with respect to the combination of the F- and
2-norrns) defined by
K ~ ( A ) E lim sup { c-O
' l A R ' l F : ( A + AA) = (Q + dQ)(R + AR) €1 Il Rlll
is given by
K R ( A ) = I Ipv i1~R112.
From the definition of K , (A) and the Sun's first-order perturbation bound (3.2.10)
we e a d y observe
K R ( A ) I fiQ(4.
This upper bound is achieved if R is an identity matrix, and so is tight.
The structure of WR and ZR reveals that each column of WR is one of the columns
of ZR, and so w&ZR has an n(n + 1)/2 square identity submatrix, giving
We now summarize these results as the following theorem.
Theorem 3.4.1 Wzth the same assumptzons as in Theorem 3.2.2, A + AA has the
unique QR f a c t o ~ ~ a t i o n
- A + A 4 = (Q + AQ)(R+AR) ,
CHAPTER 3. THE QR FACTONZATION
where with W R and ZR as in (3.4.1),
and the first-order bound in (3.4.5) 2s uttainable. O
Unity is the best constant lower bound on K,(A) we c m obtain, as can be seen
froni t lie following example.
Example 1. Let R = diag(l,6,. . . , PL), O < 6 5 1. We can easily show
Fr0111 (3.4.6) we know the first-order perturbation bound in (3.4.5) is at least as
good as as that in (3.2.10). In fact it can be better by an arbitrary factor. Consider
Example 1,
K,(A) = J-, K?(A) = 116,
and
We see the first-order perturbation bound (3.2.10) can severely overestimate the effect
of a perturbation in A.
Suppose we use the standard column pivoting strategy in A P = QR, where P
is a permutation matrix designed so that columns of A are interchanged, during the
computation of the reduction, to make the leading diagonal elements of R as large
as possible, see Golub and Van Loan 126, 1996, $5.41 for details. We see n2(A) in
(3.2.10) does not change, but K,(A) does change. The following result was show by
Chang and Paige [9, 19951.
CHAPTER 3. THE QR FACTORIZATION 72
Theorem 3.4.2 Let A E RmXn be of full colurnn rank, un'th the QR factorization
AP = Q R when the standard column pivoting strategy is used. Then
If R = fin(9), where & ( O ) are the Kahan matrices (see (2.2.41)), then
Theorem 3.4.2 shows that when the standard column pivoting strategy is used,
K,(.AP) is bounded for fked n no matter how large 4 A ) is. Many numerical ex-
periments with this strategy suggest that r(,(AP) is usually close to its lower bound
of one. But it is not for the Kahan matrices. Fortunately such exarnples are rare
in practice, and furthemore if we adopt the rank-mvealing pivoting strategy, the
condition nurnber will most likely be close to its lower bound, see Section 3.5.
3.4.2 Matrix equation analysis for R
-4s far as we can see, K, (A) is unreasonablely expensive to compute or estimate
directly with the usud approach, except when we use pivoting, in which case K,(AP)
usually approaches its lower bound of 1. Fortunatety, by the matrix equation approach
we can obtain an excellent upper bound on K,(A).
In the proof of Theorem 3.2.2 we used the expression of R(O) in (3.2.8) to derive
Sun's first-order perturbation bound. Now we again look a t (3.2.8), repeated here for
da r i ty :
R(o) = U ~ [ Q ~ G R - ' + ( Q ~ G R - ~ ) ~ ] R .
Let D, be the set of d l n x n real positive definite diagonal matrices. For any
D = diag(&,. . . ,6,) E D,, let R = DR. Note that for any matrix B we have
up(B)D = up(BD). Hence if we define B = Q ~ G R - ~ , then -
R ( 0 ) = u ~ [ Q ~ G R - ~ + D-'(Q~GR-')~D]R = [up(B) + D - ~ u ~ ( B ~ ) D ] R . (3.4.9)'
CHAPTER 3. THE QR FACTORIZATION 73
With obviouç notation, the upper triangular matrix up(B) + D-' U P ( B ~ ) D has (i, j )
element bij + bjibj /6* for i < j , and ( 2 , i ) element b,. To bound this, we use:
where
Proof. Clearly
But by the Cauchy-Schwarz theorem, (bij + $bji)* 5 (b;j + b;i)(l + ( 2 ) ~ ) ~ so
From this (3.4.10) follows. 0
We can now bound R ( O ) of (3.4.9)
CHAPTER 3. THE QR FACSORIZATION
But this is true for d l D E D,, so that
K;(A) inf K',(A,D), DED*
where CD is defined in (3.4.1 1). This gives the encouraging result
Hence from the Taylor expansion (3.2.4) we have
where from (3.4.17) this is never worse than the bound (3.2.10).
Clearly d , ( A ) is a measure of the sensitivity of R in the QR factorization. Since
K,(A) is the condition number for R (see the definition (3.4.3)), certainly we have
üsually (3.4.19) is a strict inequality, but if R is diagonal equality will hold. In fact,
take D = R, and let CD = r j j / r i i , j > i, then d R ( A , D ) = J=. On the other
hand, it is straightforwaxd to show rc,(.4) = di+6. Sa n,(A) = K',(A, D), which
implies sR(A) = dR(A). For another proof for this, see Chang, Paige and Stewart [14,
1996, Remark 5.11.
Now we summaxize the above results as the following theorem.
Theorem 3.4.3 With the same assumptions as in Theorem 3.2.2, A + AA has the
unzque QR factorization satisfying
A + AA = (Q + AQ)(R + AR),
where with dJA) as in (3.4.15) and (3.4.16),
CHAPTER 3. THE QR FACTORIZATION 75
and zf R zs diagonal the first inequality in (3.4.21) will become an equality. 0
* - From (3.4.21) we know the first-order perturbation bound (3.4.20) is at least as
good as (3.2.10). In fact it can be better by an arbitrary factor, as can dso be seen
from Example 1. Taking D = R, we have
If we take R = diag(bl-" ,... ,6, l ) , O < 6 $ 1, we see 4 R ) = ii2(A) = hl-",
while
which is close to the upper bound &,(A) for small 6. This shows that relatively
srnall early diagonal elements of R cause poor condition, and suggests if we do not
use pivoting, then there is a significant chance that the condition of the problem will
approach its upper bound, at least for randomly chosen matrices.
With the standard column pivoting strategy in A P = QR, P a permutation
matrix, this analysis also leads simply to a very nice result, even though it is a bit
weaker than the tight result (3.4.8).
Theorem 3.4.4 Let A E RmXn be of full column rank, wzth the QR factorization
AP = Q R when the standard colunn pzuotzng stmtegy is used. Then
Proof. Standard column pivoting ensures (riil 2 Ir;,( and Ir,l 2 Irjj[ for al1 j 2 i.
Since for any D E D,,
(3.4.22) is immediately obtained fiom (1.2.18) in Theorem 1.2.2 by taking D =
diag(R). - 0 .
CHAPTER 3. THE QR FACTORIZATION 76
This analysis gives some insight as to why R in the QR factorization is less sensitive
tliaii the earlier condition estimator &Q(A) indicated. If the ill-conditioning of R
is mostly due to bad scaling of its rows, then correct choice of D can give K*(D-' R)
very near one. If a t the same time b is not large, then d,(A, D) in (3.4.16) can
Lw much smaller than I /%~(R)? see (3.4.17). Standard pivoting aiways ensures that
such a D exists. and in fact gives (3.4.22).
The significance of this analysis is that it provides an excellent upper bound on
( 4 ) hJ,(..I) is quite easy to estimate. Al1 we wed to do is choose a suitable D in
~',(rl, D) in (3.4.16). We consider how to do this in the next section.
3.5 Numerical experiments
In Section 3.4 we presented new first-order perturbation bounds for the R factor of
the QR factorization using two different approaches, obtained the condition number
( 4 = I( I ' I '<~z~~~~ for the R factor, and suggested K,(A) could be estimated in
practice by 4 . 4 , D). Our new first-order results are sharper than previous results
for R, and at least as sharp for Q, and we give some numerical experiments to illustrate
bot h t his, and the possible estimators for n,(A).
We would like to choose D such that d R ( A , D ) is a good approximation to the
minimum &(A) in (3.4.15), and show that this is a good estimate of the condition
iiumber nR(A). Shen a procedure for obtaining an 0(n2) condition estimator for
R in the- QR factorization (i.e. an estimator for sR(A)) , is to choose such a D,
use a standard condition estimator (see for example Higharn [27, 19871) to estimate
rr2(D-lR), and take dR(A, D) in (3.4.16) as the appropriate estimate.
By van der Sluis's Theorem 1.2.1, q ( D - I R ) will be nearly minimal when the
rows of D-'R are equilibrated. But this could lead to a large in (3.4.16). There
- are three obvious possibilities for D. The first one is choosing D to equilibrate R
-prccisely. SpecificaMy, take bi = 4- for i = 1,. . . , n. The second one is
CK4PTER 3. THE QR FACTORIZATION 77
choosing D to equilibrate R as far as possible while keeping 5 1. Specifically,
take hi = ,/=, bi = if 5 6i-1 otherwise bi = bi- 1 , for
i = 2 , . . . ,n. The third one is choosing bi = rii. Computations show that tlie - -
third choice can sornetimes cause unnecessarily large estimates, so we will not give
any results for that choice. We specik the diagonal matrix D obtained by the first
method and the second method by Dl and D2 respectively in the following.
We give three sets of esarnples.
(1 ) The first set of matrices are n x n Pascal matrices, n = 1,2, . . . ,15. The
results are shown in Table 3.5.1 without pivoting, and in Table 3.5.2 with standard
column pivoting. Table 3.5.1 illustrate how the upper bound ~ K , ( A ) can be far
worse than the condition number n,(A), which itself c m be much greater than its
lower bound of 1. In Table 3.5.2 standard column pivoting is seen to give a significant
improvement on I E ~ ( A ) , bringing nR(AP) very close to its lower bound, but of course
&*(AP) = **(A) still. Also we observe £iom Table 3.5.1 that both dR(A, D l )
and dRtR(A, LI2) are very good estimates for nR(A). The latter is a little better than
tlie former. In Table 3.5.2 K,(AP, Di) = n,(AP, Dz) (in fact Dl = D2), and they are
also good estimates for K,(AP) .
(2) The second set of matrices are 10 x 8 Aj, j = 1 ,2 , . . . ,8, which are al1 obtained
from the same random 10 x 8 matrix (produced by the MATLAB function randn),
but with its j t h column multiplied by IO-* to give A,. The results are shown in
Table 3.5.3 without pivoting. Al1 the results with pivoting are similar to that for
j = 8 in Table 3.5.3, and so are not given here. For j = 1,2. . . ,7, n,(A) and
rc,(A) are both close to their upper bound &K~(A), but for j = 8, both KR(A)
and K,(A) are significantly smaller than ~ K ~ ( A ) . Al1 these results are what we
expected, since the matrix R is ill-conditioned due to the fact that q j is very small,
but for j = 1,2, . . . ,7 the rows of R are already essentially equilibrated, and we do
not expect n,(A) to be much better than &c~(A) . Also for the first seven cases
the smallestsingular value of the leading part is close to that of R, so that
CHAPTER 3. THE QR FACTORlZATlON
Table 3.5.1: Results for Pascal matrices without pivoting, -4 = Q R
K ~ ( - 4 ) codd not be much better than &K~(A) . For j = 8, even though R is still
ill-conditioned due to the fact that rg,g is very small, it is not at al1 equilibrated,
and becomes well-conditioned by row scaling. Notice at the same time CD is close
to 1, so hJ,(.4, Dl), dR(A, 04, and therefore n,(A) are much better than f i ~ ~ ( . 4 ) .
In this case, the smallest singular value of R is significantly smaller than that of
Rn-l. Thus K , ( A ) , the condition number for the change in Q lying in the range of
9, is spectacularly better than &&(A). This is a contrived example, but serves to -
emphasize the benefits of pivoting for the condition of both Q and R.
- (3) The third set of matrices are n x n Kahan matrices A = K,(B); see (2.2.41).
Of course without pivoting Q = 1 here, but the condition numbers depend on R
only, and these are al1 we are interested in. The results for n = 5,10,15,20,25 with
û = n/8 are shown in Table 3.5.4, where II is a permutation such that the first column
is moved to the last column position, and the remaining colurnns are rnoved to left
CHAPTER 3. THE QR FACTOMZATION
Table 3.5.2: Results for Pascal matrices with pivoting, AP = Q R
Table 3.5.3: Results for 10 x 8 matrix Aj, j = 1,. . . ,8, without pivoting -
CHAPTER 3. THE QR FACTORIZATION
Table 3.5.4: Results for Kahan matrices, 0 = 7r/8, Al3 = QR
one position'- this permutation II is adopted in the rank-revealing QR factorization
for Kahan matrices, see Hong and Pan [31, 19921. Again we found Dl = D?, and
only list the results corresponding to Dl. As we know the Kahan matrices correspond
to corrcctly pivoted A by standard column pivoting. From Table 3.3.4 we see t h t ,
in tliese extreme cases, with large enough n, s , (A) c m be large even with standard
pivoting. This is about as bad a result as we can get with standard coiumn pivoting
(it gets a bit worse as û -, O in R), since the Kahan matrices make the upper bound
on [ICVR' & I I F approximately reachable, see Theorem 3.4.2. However if we use the
rank-revealing pivoting strategy, we see frorn Table 3.5.4 n,(AII) is again close to its
lower bound of 1. Also we see n,(An) is significantly smaller than n,(.4). This is
due to the fact that the smallest singular value of &-1 is much srnall than that of
R,+ the leading n - 1 square part of R in A n = QR. For both of the cases with
and without rank-revealing pivoting, K'JA, Dl) still estimates s R ( A ) excellently.
In al1 these examples we see dR(A, Dl) and dR(A7 D2) gave excellent estimates for
6, ( A ) , wi t h hJR (A, D2) being marginally preferable.
3.6 Summary and future work
The first-order perturbation analyses presented here show just what the sensitivity
(condition) of each of Q and R i s in the QR factorization of full column rank A, and -
CH..IPTER 3. T H E QR FACTORIZATION 81
in so doing provide their condition numbers (with respect to the measures used, and
for sufficiently small AA) , as well as efficient ways of approximating these. The key
norm-based condition numbers we derived for A + AA = (Q + AQ)(R + A R ) are:
4 R*L = %*(A) for that part of dQ in R ( A ) ~ , see (3.3.1),
n,(A) = fi 11 ~ i 1 ~ 11211.4112 for that part of AQ in R(A) , see (3.3.4),
the estirnate for K,(-1), that is rl,(A) = infDCDI dR(A, D),
where 4 ( A , D) 41 + K ~ ( D - ' R) , sec (3.4.3).
The condition numbers obey
for Q, while for R
see (3.4.6) and (3.4.21). The numencal examples, and an analysis of the n = 2 case
(not given here), suggest that 4 ( A , D), with D chosen to equilibrate = D-'R
subject to CD 5 1, gives an excellent approximation to 4 4 ) in the general case.
In the special case of -4 with orthogonal columns, so R is diagonal, then by taking
O = R,
see Theorem 3.4.3. For general A when we use the standard column pivoting strategy
in the QR factorization, AP = QR, we saw from (3.4.8) and (3.4.2: ) that
CHAPTER 3. THE QR FACTORIZATION 82
As a result of these analyses we see both R and in a certain sense Q can be less
sensitive than was thought from previous analyses. The condition numbers depend on
any column pivoting of A, and show that the standard pivoting strategy often results - -
in much less sensitive R, and sometimes leads to a much smaller possible change of
Q in the range of Q, for a given size of perturbation in A.
- By following the approach of Stewart [38, 1973,Th. 3-11, see also [45, 1990,Th. 2.111,
it would be straightforward, but detailed and lengthy, to extend our first-order re-
sults to provide strict perturbation bounds, as was done in Chapter 2. Our condition
nurnbers and resillting bounds are asymptotically sharp, so there is less need for strict
bounds.
In the future we would like to
Investigate the ratio K , ( A ) / K ' ~ ( A ) .
m Explore the effect of rank-revealing pivoting on rc, in both theory and compu-
tations, and study the optimization problem minp K,(PAP*).
Extend our analysis to the case where AA has the equivalent componentwise
form of backward rounding errors. In fact a new perturbation bound has been
given by Chang and Paige [9, 19953. Also some other results have been obtained
by Chang and Paige [II].
Chapter 4
The L U factorization
4.1 Introduction
The LU factorization is a basic and effective tool in numericd linear algebra: given
a real n x n matrix A whose leading principal submatrices are al1 nonsingular, there
esist a unique unit lower triangular matrix L and an upper triangular matrix II such
tliat
A = LU.
Notice here we require the diagonal elements of L to be 1. L and U are referred to
as the LU factors. The LU factorization is a "high-level" algebraic description of the
Gaussian elimination. Simple examples shows the standard algonthms for the LU
factorization are not numerically stable. IR order to repair this shortcoming of the
algorit hms, partial pivoting or cornplete pivoting is introduced in the computation.
For al1 of these details, see for example Wilkinson [53, 1965,Chap.4], Higham [30,
1996,Cliap.9] and Golub and Van Loan [26, 1996,Chap.3]).
Let AA be a sufficiently small n x n matrix such that the leading principal subma-
trices of A+AA are still al1 nonsingular, then A+AA has the unique LU factonzation
A + AA = (L + A t ) ( U + AU).
CHAPTER 4. THE LU FACTORJZATION 84
The goal of the sensitivity andysis for the LU factorization is to determine a bound
on I)dLI1 (or IALI) and a bound on I(dU1l (or IAUI) in terms of (a bound on) 11 AAll
(or lAAi). . -
The perturbation analysis of the LU factorization has been considered by a few
autliors. Given IldAl(, the first rigorous normwise perturbation bound was presented
y Barrland [2, 19911. Using a different approach, Stewart [4l, 19931 gave first-order
perturbation bounds, whicb recently were improved by Stewart [42, 19951. In [QI, L
was not assumed to be unit lower triangular, and a parameter p was used to control
how much of the perturbation is attached to the diagonals of L and U. Given IAAI,
the first rigorous componentwise perturbation bounds were given by Sun [18, 19921.
The main purpose of this chapter is to establish new Brst-order perturbation
hounds given a bound on (IdAl(, present the condition numbers, give the condition
estimators, and shed liglit on the effect of the partial pivoting and complete pkoting
on the sensitivity of the problem.
The res t of this chapter is organized as follows. In Section 4.2 we obtain expres-
sions for i ( 0 ) and ~ ( 0 ) in the LU factorization A + tG = L(t)Cr(t). These basic
sensitivity expressions will be used to obtain our new perturbation bounds in Section
4.3. In Section 4.3 we present perturbation results, first by the so called matrix-
vector equation approach, which leads to sharp bounds, then by the so called matris
cquation approach, which leads to weaker but practical bounds. We give numerical
examples in Section 4.4. Finally we bnefly summarize our findings and point out
future work in Section 4.5. .
4.2 Rate of change
Here we derive the basic results on
of L and U
how L and U change as A changes, which will be
used later.
CHAPTER 4. THE LU FACTORIZATION 85
Theorem 4.2.1 Let -4 E RnXn have nonsingular leading k x k principal subrnatrices
jor k = 1 , . . . , n with the LU jactorization A = LU, let G is a real n x n m a t e ,
and let AA = CG, for some E 2 O. If c is suficiently small such that al1 leading
prkcipal subrnatrices O/ A + tG are nonsingular for al1 Itl 5 c, then A + AA has the
L U jactorisation
A + AA = ( L + b L ) ( U + AU) ,
with A L and Au' satisfying
where ~ ( 0 ) and o(0) are defined by the unique LU factorization
and so satisfy the equations
Proof. Since al1 leading principal submatrices of A + tG are nonsingular for al1 ( t 5 e ,
A + tG has the unique LU factorization (4.2.4). Note that L(0) = L, L(e) = L + AL, U(0) = U and LI(€) = U + AU. When t = c, (4.2.4) becomes (4.2.1). It is easy to
observe that L ( t ) and U ( t ) are twice continuously difFerentiable for (tl 5 E from a
standard a l g o r i t h for the LU factorization. If we differentiate (4.2.4) and set t = O
in the result , we obtain (4.2.5), which we will see is a linear equation uniquely defining
the elernents of strictly lower triangular ~ ( 0 ) and and upper triangular ~ ( 0 ) in terms
of the elements of G. From (4.2.5) we have
CK4PTER 4. T H E LU FACTORIZATION 86
Xote tliat L-' L(O) is strictly lower triangular and Ù(o)u-' is upper triangular, thus
we have
. - L-l L(O) = s l t ( ~ - l ~ u - l ) ,
which give (4.2.6) and (4.2.7). Finally the
t = 0 give (4.2.2) and (42.3) . Cl
i r ( 0 ) ~ - l = u t ( ~ - l e u - ~ ) ,
Taylor expansions for L( t ) and U ( t ) about
4.3 New perturbation results
The ùasis for deriving first-order perturbation bounds is the equation (4.2.5) (or the
expressions (4.2.6) and (4.2.7) of its solutions). As in the proceeding chapters, ive will
now analyze the equation in two ways. The first, the matrix-vector equation approach,
provides sharp bounds, resulting in the condition numbers of the problem; while the
second, the matrix equation approach, gives practical bounds, resulting in condition
estimators. Throughout t his section we suppose dl assumptions in Theorem 4.2.1
liold, so we can use its conclusions. Also we assume llAAllF 5 E IIAIIF, hence IIGIIF I
I14AllF/r 5 ll.AllF (if E = O, al1 results we will present are obviously true). One
exception is in Theorem 4.3.3 we assume IIAAlliVm 5 e llAlli,m
4.3.1 Matrix-vector equation analysis
It is not difficult to show that with the 'uvec' notation and 'slvec' notation in (1.2.4)
the matrix equation (4.2.5) can be rewritten in the following form:
CHAPTER 4. THE LU FACTORIZATION
a n+l where W = [WL, Wu] with WL E 7Zn2'- being
CH.4PTER 4. THE LU FACTORIZATION
n(n-1) and W b E 7Zn2 7 being
It is easy to observe that after appropnate colurnn permutations, [WL, Wu] wil! be-
corne lower triangular with diagonal elements
Since U is nonsingular, W is also, and from (4.3.1)
Partitioning I V 1 into two blocks, W-' G yu , we have
slvec( L (O)) = Y& v&(G), u v e c ( ~ (O)) = Y[- vec (G) .
CHAPTER 4. THE LU FACTORIZATION
so taking the Znorm and using llGll 5 I lAl l~ @es
wliere equalities can be obtained by choosing G such that vec(G) lies in the space
spanned by the right singular vectors corresponding to the largest singular value of YL - and lb, respectively, and
(42.3) and (42 .3 ) that
llGllF = I I AI1 F . Therefore we see from the Taylor expansions
and for the L factor and the U factor the condition nurnbers (with respect to the
F-nom)
( A ) limsup { c-O
: ( A + d A ) = (L + d L ) ( u + AL'), Ili\A1lF < r l l ~ l l F } , WII F
We summarize these results as the following theorem.
Theorem 4.3.1 Suppose al1 the assumptions of Theoren 4.2.1 hold, and let Il AAII 5
E II Al1 then A + AA has the unique LU factorizBtion
A + AA = ( L + AL)(U + AU),
and these bounds are attainable tb first-order an-c. 0
CHAPTER 4. T H E Lu' FACTORlZATZON
4.3.2 Matrix equation analysis
In Section 4.3.1 we derived sharp perturbation bounds for the L factor and U factor,
.and presented the corresponding condition numbers. But it is difficult to estimate
the condition numbers by using the usud approach. Now we use the matrix equation
approach to derive practical perturbation bounds, leading to the condition estirnators.
First we derive a perturbation bound for the L factor. Let Un-1 denote the ieading
un-1 u (11 - 1) x (n - 1) block of Ci. If we write U = , then from (42.6) -
Denote by D, the set of al1 n x n real positive definite diagonal matrices. Let D =
diag(b,, . . . ,6,) E D,. Note that for any n x n rnatrix B we have D slt(B) = &(RB),
then from (4.3.10) we obtain
L
Yote (Idt(B)IIF 5 llBllF for any B E RnXn, so we have
Since this is true for dl D E Dn, by the Taylor expansion (4.2.2) we have
where
- K ~ ( A ) = ihf dL(A, D), - DED,
CHAPTER 4. THE Lu FACTORIZATION 91
From the definition of i;,(A) and the perturbation bound (4.3.12) we easily see - -
4 4 ) 5 K ~ ( A ) . (4.3.15)
- Also we can obtain a Iower bound on K,(A) . Let u E Rn-' be such tliat I ~ u ~ - ~ ~ v I I ~ =
I I u ~ ? , Il2 I I vllz In ( U . l O ) , take G = [envT, O], where en = (0, . . . ,O, 1)' E Rn. Tlien
it is easy to verik that
L ( O ) = e,[vTu,-Il, O],
Combining this witli the first equality of (4.3.3), we have for this special G that
which gives
or
We would like to point out that (4.3.16) can also be denved directly from the structure
of W in (4.3.1).
Now we denve a practical perturbation bound for the U factor. Notice for any
B E Rn"" and D E D, we have ut(BD) = ut(B)D, then from (4.2.7) we obtain
. - Since this is true for d l
nL(A) = inf KL(A, D), D€D,
D E D,, by the Taylor expansion (4.2.2) we have
From the definition of K"(A) and the perturbation bound (4.3.19) we see
Also we can get a lower bound on K ~ ( A ) . Let v E Rn be such that 11 ~-'v11~ =
11L-'1I2 Ilul12, and take G = ve: in (4.2.7), then combining (4.2.7) and the second
equality of (4.3.3) we can easily show
Like (1.3.16), (4.3.23) can also be showed directfy from the structure of W in (4.3.1).
These results lead to the following theorem.
Theorem 4.3.2 Wzth the same ussumptzons as in Theorern 4.3.1, A + AA has the
unique L U factorization
A + AA = (L + AL)(U + AU),
where for the L factor,-
CHAPTER 4. THE LU FACTORIZATION
I I~n=' i1121l~I I~ IlLll F I KL(A) 5 KL(A) inf K ~ ( A , D),
D€Dn
IIL-' Il2IIAIIF - ( A ) ( A ) inf nL(A, LI), llutlF DED,
II-~IIF I IIUF I I ~ I I ~ ~ 1 1 ~ 1 1 ~ IIUIIF~
then we have
As we know in pract ice tc2(L) is usually smdler or much smaller than Q(U) . So these
bounds suggest the L factor may be more sensitive than the U factor in practice.
However both of the right hand sides of (4.3.30) and (4.3.31) c m be arbitrarily larger
than corresponding left hand sides due to the inequality (4.3.29).
If we take D = I in both dL(Al D) and du@, D) and use IlL112 5 l l L l l ~ , llUll2 5
l l u l l ~ and IIun'ill? 5 IlU-'ll2, we have
Thus from (4.3.23) and (1.3.27) we have
CHAPTER 4. THE LU FACTORTZATION 94
which are due to Stewart [41, 19931. These perturbation bounds are simple, but
can overestimate the true sensitivity of the problem. By using the scaling technology,
Stewart [42, 19951 obtained significant improvements on the above results. In [42], the . - diagonal elements of L were not assumed to be l's, and the diagonal elements of AL
may not be O's, and a parameter p was used to control how much of the perturbation
is attached to the diagonals of L and U. The perturbation bounds given in [42] are
equivalent to
These bounds were derived by using the inequality (4.3.29), so they are unnecessarily
weak as ive pointed out in the preceding comment. (4.3.30) suggests that under the
usual assumption that the diagonal elernents of L are always l's, a better bound than
(4.3.36) could be obtained.
As we know it is expensive to estimate n,(A) and K,(A) directly by the usual
approach. Fortunately we now have other methods to do this. By van der Sluis's
Theorern 1.2.1, rc2(LD-') will be nearly minimum when each column of LD-l has
unit 2-norm, so in practice we choose D = DL = diag(llL(:, j)1I2)' then use a standard
condition estimator and a n o m estirnator to estimate $(A, DL), which costs 0(n2).
Similady, K ~ ( D - ' U ) will be nearly minimum when each row of D-'U has unit 2-
norm, then we choose D = Du = diag(llU(i,:)1I2), and use a standard condition
estimator and a norm estimator to estimate 4 ( A , Du), which costs 0 ( n 2 ) . Numerical
experiments showed n',(A, D L ) and du(A, DU) are good approximations of K,(A) and
KU(A)-
When we use the 1- and oo-norms, we can get perturbation bounds without in-
volving the scaling matrix D.
Theorem 4.3.3 Suppose all the-assumptions of Theorern 4.2.1 hold and let 11 AAllp 5
CHAPTER 4. THE LU FACTORlZATION
eIIAllp, p = l1 M, then A + AA has the unique LU factoriration
-4 + AA = ( L + AL)(U + AU),
Proof. Let G AAIE (if E = 0, the theorem is trivial), then llGllp 5 IIAllpl p = 1, m.
From (4.3.10) we have
so taking the pnorm (p = 1, m) gives
Then (4.3.38) follows immediately from the Taylor expansion (4.2.2). By using
IlAllP 5 IlLllp llqlp, (4.3.39) foU0ws-
The results (4.3.40) and (4.3.41) can similarly be proved. 0
Note cocdp(L-') is invariant under the column scaling of L and cond,(U) is in-
k a n t under the row scaling of U. These make (4.3.38) and (4.3.40) look simpler
than (4.3.8) and (4.3.9), respectively, where the Frobenius n o m is used.
As we know the standard algorithrns for LU factonzation with no pivoting are not
numerically stable. In order to repair this shortcoming of the algonthms, partial or
complete pivoting should be incorporated in the computation. Do these two pivoting
strategies have effects on the sensitivity of the factonzation? Let us see the following
theorem.
CHAPTER 4. THE LU FA4CTORIZATrON 96
Theorem 4.3-4 If partial pivoting is used in the LU facto7-ization: P A = LU, where
P is a permutation matriz. Then
1 1 5 tcy(PA) < &;(PA) < J2n(n + 1)(4" + 6 n - I) inf K ~ ( D - ' U ) . (4.3.43)
D€Dn
If contplete pivoting is used in the LU factofization: PAQ = LU, where P and Q are
permutation matrices. Then
Proof. If partial pivoting or cornplete pivoting is used in computing the LU factoriza-
tion, then l i j 5 1 for i > j. Since lii = 1, I(LT)iil 2 I(LT)ij 1 for d l j > i. By the proof
of Theorern 1.2.2 we see I l L-Il12 5 J4n + 672 - 113. Also note IILIIF 5 d-. Then (4.3.42) and (4.3.44) follow from (4.3.26) by taking D = 1.
Note $ ( A , D) 5 n2(L)n2(D-'LI) (see (4.3.31)) and IlUll~ = IIL-'AIIF < IlL-'112 IIAIIF,
then (4.3.13) foilows from (4.3.28). If cornplete pivoting is used, then luiil > luij 1 for
al1 j > i. Thus by Theorern 1.2.2 we have y ( ~ - L U ) 5 +n(n + 1)(4" + 672 - 1)/6 with D = diag(U). Then from (4.3.43) we obtain the much better result (4.3.45).
O
When partial pivoting or complete pivoting is used, 4 L ) is bounded by a function
of n, but possibly 11~;2~11 may become larger, thus from (4.3.42) and (4.3.44) we
cannot see that xL(PA) and K ~ ( P A Q ) are larger or smaller than K ~ ( A ) . (4.3.42) and
(4.3.44) also suggest there is no big difference between the effects of partial pivoting
and complete pivoting on the sensitivity of the L factor. Similarly from (4.3.43) we
are not quite sure if sU(A) will becornelarger or smaller when partial pivoting is used.
c But note there is an essential difference beheen the upper bound on r(,(PA) and
CHAPTER 4. THE LU FACTORIZATION 97
that on x L ( P A ) - the former has a choice D, which may malie i n f D , ~ . K*(D-'u) not
incrcase much, so the possibility that n,(A) will becorne smaller seems higli. From
(4.3.45) we see complete pivoting can give a significant improvement on sU(A). . -
4.4 Numerical experiments -
In Section 4.3 we presented first-order sharp perturbation bounds for the LU factors,
ob tained the corresponding condition numbers KL(A) and K,(A) , and sugges ted n, (A)
and )iu(A) could be respectively estimated in practice by dL(A, DL) and n',(A, Du)
with DL = diag(1) L(:, j)l12) and DU = diag())U(i, :)Il2). The condition numbers and
condition estimators satisfy the following inequalities (see (1.2.14), (1 .2.15), (4.3.26),
(4.3.25), (4.3.32), and (4.3.33)):
Also we discussed the effects of partial pivoting and complete pivoting on those con-
dition numbers.
Now we give some numerical experiments to illustrate our theoretical analyses.
The matrices have the f o m A = DIBDz , where Di = diag(1, d l , . . . , q-'), D2 =
diag(l,d2,. . . , G-') and B is an n x n random matrix (produced by the MATLAB
function randn). The results for n = 10, d l , d2 E {0.2,1,2} and the same matrix
B are shown in Table 4.4.1 without pivoting, in Table 4.4.2 with partial pivoting,
We give some comments on the results.
CHAPTER 4. THE L Il' FACTORJZATION
Table 4.4.1: Results without pivoting, A = LU
Table 4.4.2: Results with partial pivoting, A P A = ZÜ
Br. ~ ( 4 4 ( A t D r ) Pu %(A) rc',(A, Du) P 3.5e+09 3.5e+09 8.5e+09 1.6e+OO 1 .?e+OO 3.le+00 6.6e+ll
CHAPTER 4. THE LU FACTORIZATION
Table 4-43: ilesults with cornplete pivoting, Â = P A Q =
The results confirm that P = I I L-' I l 1 11(1-'112 llAllF can be much larger than
sL(-4) and r;,(A), especially for the latter, so the first-order bounds ( 43 .34 )
and (4.3.35) can significantly overestimate the true sensitivity of the L U factor-
izat ion.
dL(A, DL) and n',(A, Du) are good approximations of n,(A) and K ~ ( A ) , respec-
tively, no matter whether pivoting is used or not. This is also confirmed by our
other numerical experiments.
Both K , ( P A ) and K, (PAQ) can be much larger or smaller than &,(il). So
partial pivoting and cornplete pivoting can make the L factor more sensitive or
less sensitive. But from Tables 4.4.1-4.4.2 we see partial pivoting can give a
significant improvement on the condition of the U factor. In fact here K,(PA) 5
&,(A) for al1 cases. From Table 4.4.3 we see th$ complete pivoting can give a
more significant irnprovement .
It can be seen for most cases the L factor is more sensitive than the U factor
no matter whether pivoting is used or not.
When partial pivoting or complete pivoting is used, we see both K , and i, are
CHAPTER 4. THE LU FACTORIZATION
close to their lower bounds PL and Pu, respectively.
4 5 Summary and future work
The first-order perturbation analyses presented here show what the sensitivity of each
of L and U is in the LU factorization of A, and in so doing provide their condition
numbers riL(A) and >c,(A) (with respect to the measures used, and for sufficiently
srnall AA), as well as efficient ways of approximating these.
As we lcnow K*(L) is usudly (much) smaller than I E ~ ( U ) , especially in practice
when we use partial pivoting in computing the LU factotization. So we can expect
that the computed solution of the linear system Lx = b will usually be more accurate
than that of the linear system Uy = b. However our analysis and numerical exper-
iments suggest that usually the L factor is more sensitive than the U factor in the
LU factorization, so we expect U is more accurate than L. This is an interesting
phenomenon. Also we see the effect of partial pivoting and complete pivoting on the
sensitivity of L is uncertain - both K,(PA) and K,(PAQ) can be much Iarger or
smaller t han n,(A). But partial pivoting can usually improve the condition of Lr, and
corn plete pivoting can give significant improvement.
In the future we wouid like to
rn Investigate the ratios rc,(A)/rl,(A) and I E ~ ( A ) / d L ( A ) .
a Extend our analysis to the case where lAAl < elAl. .In fact some results have
been obtained by Chang and Paige (121.
Chapter 5 -
The Cholesky downdating
problem
5.1 Introduction
In this chapter we give perturbation analyses of the following problem: given an
upper triangular matriv R E RnXn and a matrix X E R~~~ such that R ~ R - X ~ X
is positive definite, find an upper triangular matrix U E RnXn with positive diagonal
elenlents such t hat
U ~ U = R ~ R - X*X. (5.1.1)
This problem is called the block Cholesky doumdating problern, and the matrix LI
is referred to as the downdated Cholesky factor. The block Cholesky downdating
problem has many important applications, and the case for k = l has been extensively
studied in the literature (see [l, 5, 6, 19,-25, 24, 36, 37, 40)).
Let AR and AX be real n x n and k x n matrices, respectively, such that ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + A X ) is still positive definite, then this has the
unique Cholesky factorization
(LI + A U ) ~ ( U + A U ) = ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + A X ) .
CK4PTER 5. THE CHOLESKY DOWNDATING PROBLEM 102
The goal of the perturbation analysis for the Cholesky downdating problem is to
determine a bound on llAUll (or IAUI) in tems of 11 AR11 (or IARI) and llAXll (or
IW). - .-
The perturbation analysis for the Cholesky downdating problem with nom-bounded
changes in R and X has been considered by several authors. Stewart 140, 19791 pre-
sented perturbation results for single downdating (Le. k = 1). Eldén and Park [22,
19941 made an analysis for block downdating. But these two papers just considered
the case that only R or X is perturbed. More complete analyses, with both R and X
being perturbed, were given by Pan [35, 19931 and Sun [50, 19951. Pan 135, 19931 gave
first-order perturbation bounds for single downdating. Sun [50, 19951 gave rigorous,
also first-order perturbation bounds for single downdating and first-order perturba-
tion bounds for block downdating. Recently Eldén and Park (23, 19961 gave new
firs t-order perturbation bounds for block downdating. Unfortunately t here ivas an
error in their paper when the result of Sun [46, 19911 was applied in denving the
perturbation bound. Because of this, the results presented in [23] will not be cited in
this chapter.
The main purpose of this chapter is to establish new first-order perturbation re-
sults and present new condition numbers which more closely refiect the true sensitivity
of the problem. In Section 5.2 we will give the key result of Sun [50, 19931, and a
new result using the approach of these earlier papers. In Section 5.3 we present new
perturbation results, first by the matrix-vector equation approach, then by the ma-
trix equation approach. We give numerical results and suggest prac tical condition
estimators in Section 5.4. Finally we briefly summarize our findings and point out
future work in Section 5.5. Most of the results have been presented in Chang and
Paige [IO, 19961.
Previous work by others implied the change AR in R was upper tnangular, and
Sun [50, 19951 said this, but neither he nor the othersmade use of this fact. In fact a
backward stable algorithm for computing U given R and X -would produce -the exact
CHAPTER 5. THE CHOLESKY DOWNDATLNG PROBLEM 1 03
resuit Uc = U + AU for nearby data R + AR and X + AX, where it is not clear that
AR would be upper triangular - the form of the equivalent backward rounding error
AR would depend on the algonthm, and if it were upper triangular, i t would require - -
a rounding error analysis to show this. Thus for compieteness it seems necessary to
consider two separate cases - general AR and upper triangular AR. We do this
throughout Sections 5-3-54? and get stronger results for upper triangular AR than
in the general case.
In any perturbation analysis it is important to examine how gooci the results are.
In Section 5.3.1 we produce provably tight bounds, leading to the true condition
numbers (for the norms chosen). The numerical exarnple in Section 5.4 indicates how
much better the results of this new analpis can be compared with some earlier ones,
bu t a theoretical understanding is also desirable. By considering the asymptotic case
as ,Y -, 0, the results simpiify, and are easily understandable. We show the new
results have the correct properties as X -+ O, in contrast to earlier results.
5.2 Basics, previous results, and an improvement
Let I' satisfy rTr = 1. - R - T ~ T ~ ~ ~ - l (so r would be the Cholesky factor of In -
- R - ~ x ~ x R - ' ) , and let o,(r) be the smallqt s ingulcv&e o f f. Notice that for - p p p p p p p - - - - - - - - - - - - - -
fixed R, rTr -+ In as S - O, so o,(r) -. 1. First we derive some relationships
arnong U, R, ,Y and r. 1) Frorn (5.1.1) obviously we have
2) From (5.1.1) it follows that
CHAPTER 5. THE CHOLESICY DO WNDATING PROBLEM
so that taking the h o r m gives
- - 3) From (5.1.1 ) we have
- whicli, combined with (5.2.2), gives
4) From (5.1.1) we have
which, combined with (5.2.2), gives
5 ) By (5.2.2) we have
6) Finally from (5.2.4) we see
Now we derive the basic result on how U changes as R and X change.
Theorem 5.2.1 Suppose we have an upper tn'angular matriz R E RnXn and a matriz
X E RkXn wzth the Cholesky factonzation (ITU = RTR - X ~ X , where U E RnXn is
upper tn'angular with positive diagonal elements. Let G k a real n x n matriz, and
let F be a real k x n matriz. Assume AR = EG and AX = E F , for some c 2 0 . If
CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM
then there is a unique Cholesky factorization
(5 + A U ) ~ ( V + A U ) = ( R + AR)=(R + AR) - ( X + AX)=(X + AX), (5.2.8)
with 4 U satisfyzng
AU = e ~ ( 0 ) + O ( * ) , ( 5 -2.9)
where o(0) is defined by the unique Cholesky factorization
and so satisfies the equations
where the 'up ' notation is defined by (1.2.3).
Proof. If IldRR-1112 5 1, then it is easy to show R + tG is nonsingular for ail 111 5 c.
Notice for al1 1 t 1 5 E ,
and
then if (5.2.7) holds, ( ~ + t ~ ) ~ ( ~ + t G ) - ( x + t F ) ~ ( x + t F ) is positive definite and has
the unique Cholesky factorization (5.2.10). Notice that U ( 0 ) = U and U ( c ) = U+AU,
so (5.2.8) holds.
It is easy to verify that U ( t ) is wice continuously differentiable for Itl 5 É from
t h e algorithm for the Cholesky factorization. If we differentiate (5.2.10) and set t = O
CHAPTER 5. THE CHOLESKY DO WNDATl2VG PROBLEM 106
in the result, we obtain (5.2.11), which, Iike (2.2.5), is a linear equation uniquely
defining the elements of upper triangular o(0) in terms of the elernents of G and F .
Witli the 'up' notation in (1.2.3) we see (5.2.12) holds. Finally the Taylor espansion - -
for U ( t ) about t = O gives (5.2.9) a t t = E. 0
By Theorem 5.2.1 we derive a new first-order perturbation bounds for the block
Cholesky d6wndating problem, from which the first-order perturbation bound given
Theorem 5.2.2 Suppose we have an upper triangular matriz R E Rn '" and a matriz
X E R " ~ wwith the Cholesky factorizotion UTU = R ~ R - X ~ X , where U E RnXn is
upper tn'angular vith positive diagonal elements. Let AR be a real n x n matriz, and
let d X be a red k x n matriz. Define E , m llARllF/ll R1I2 und ex IldxllF/llxll2- Set c = mau{e,, e x } . If
then there is a unique Cholesky factoriration
(U + A U ) ~ ( U + A U ) = ( R + A R ) ~ ( R + AR) - ( X + A X ) ~ ( X + AX),
where
ProoJ Let G = Ai?/€ and F = dX/r (if 6 = 0, the theorem is trivin;), then
It is easy to verify that (5.2.13) implies that (5.2.7) holds, so Theorem 5.2.1 is applica-
ble here. From (5.2.12) and the fact that for any syrnmetric B, I I up(B) II 5 & I I B ~ ~ (see (1.2.7)) we have with (5.2.15) that
't
CH.4l'TER 5. THE CHOLESKY DOWNDATING PROBLEM
which, combined with (5.2.2) and (5.2.3), gives
-Then (5.2.14) follows from the Taylor expansion (5.2.9). If
Froni (5.2.14) we see
can be regarded as the condition estirnators for U with respect to relative changes in
R and ,Y, respectively. Notice from (5.2.1) we see 4, > &, so we can define a new
overali c ~ n d i tion estimator
and combine it with (5.2.5) and (5.2.6), then we obtain Sun's bound
whicli leads to the overall condition estimator proposed by Sun:
We have seen the right hand side of (5.2.14) is never worse than that of (5.2.18), and
Although Q is a minor improvement on P , it is still not what we want. We can
see tliis from the asymptotic behavior of these condition estimators. The Cholesky
factorization is unique, so as X -+ O, LI -t R, and X T d X -> O in (5.2.8). Now
for any upper triangular perturbation AR in R, AU -t AR, so the tme condition
nurnber should approach unity. Here ,B, -4 + &Q(R). The next section shows how
we can -0vercorne this inadequacy.
CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM
5.3 New perturbation results
In Section 5.2 we saw the Iiey to deriving first-order perturbation bounds for U in
- the block Cholesky downdating problem is the equation (5.2.11). We will now ana-
lyze it by two approacbes. The first approach, the matriu-vector equation approach,
gives sharp perturbation - bounds, which lead t o the condition numbers for the block
Cholesky downdating problem, while the second, the the matrix equation approach,
gives a clear improvement on other earlier results, and provides practical condition
estimators for the true condition numbers. Al1 Our discussion is based on the same
assumptions as in Theorem 5.2.2.
5.3.1 Matrix-vector equat ion analysis
The niatrix-vector equation approach views the matrix equation (5.2.11) as a large
matrix-vector equation.
First assume AR is a general real n x n m a t f i . It is easy to show (5.2.11) can be
rewritten in the following matrix-vector form (cf. Chapter3):
CK4PTER 5. THE CHOLESKY DOWNDATING PROBLEM
CHAPTER 5. THE CHOLESKY DOIVNDATING PROBLEM
Since U is nonsingular, Wu is also, and from (5.3
where for any R and ,Y equality can be made by choosing G and F such that
Shen from the Taylor expansion (5.2.9), we see
and the condition numbers for U with respect to relative changes in R and X are
(here subscript G refers to geneml AR, and Iater the subscript T will refer to upper
triangular AR)
CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM
and
- (X + AX)*(X + A X ) , c = ~l~xll~/llXll~) (5.3.8)
respectively. Then a whole condition number for the Cholesky downdating problem
with general AR can be defined as
KCDC(R, X) m a ~ ( n R ~ ( R 9 nx (R, X ) } . (5.3.9)
By the definitions of xRG(R, X) and rc,(X, R), it is easy to verify from (5.2.14)
and (5.2.16) tha t
~ n o ( R x ) 5 d n , rcx(R,X) I dx, (5.3.10)
It is easy to observe that if X -, O, tc,,,(R,X) + IIW;~Z&, where WR is
just Wu with each entry uij replaced by r ~ . If R was found using the standard
pivoting strategy in the Cholesky factorization, then II w&& has a bound which
is a function of n alone (see Theorem 3.4.2). So in this case our condition number
nc,,(R, X) d s o has a bound which is a function of n alone as X -r O.
Now we consider the case where AR is upper trianguZur. (5.2.11) can now be
rewritten in the following matrix-vector fom: .
Wu u v e c ( ~ ( 0 ) ) = WRuvec(G) - Yxvec(F), -
CHAPTER 5. THE CHOLESKY DOWNDATrlVG PROBLEM 112
a n+l where IV,-, E ~9%- and 1% E R*'" are defined as before. and Wn E
n ( n + l l n ( n + l )
R - X I is just WU with each entry u, replaced by T * j . Since O' is iionsingular,
Wu is also, and from (5.3.12) - -
so taking the 2-nom givés
where, like (5.3.3), (5.3.14) will become an equality if we choose G and F as in (5.3.4)
and (5.3.5) with ZR there replaced byAWR. Then from the Taylor espansion (5.2.9),
we see
and the condition numbers for U with respect to relative changes in R and S are
(subscript T indicates upper triangular AR)
and
K,(R,X) ZE lim çup { c - 4
l l A u l l ~ : (u + A U ) ~ ( U + au) = R ~ R W l l 2
respectively. Note n , ( R , X ) is the same as that defined in (5.3.8). T h e n a whole
condition number for the Cholesky downdating problem with upper triangular AR
CHA PTER 5. THE CHOLESKY DO WNDATlNG PROBLEM
can be defiiied as
nCD,(R, X ) - rnau{~,,(R, X), ~ ~ ( 2 , X ) ) .
- Since K,,(R, X) is for a general AR. certainly we have
which can also be proved directly by the fact that the columns of WR form a proper
subset of the columns of ZR. Thus from (5.3.9) and (5.3.18) we see
&cm-(R1 JY) 5 G ~ D G ( R X)- (5.3.20)
If as well X -+ O, then since U + R, wtlwR + Imi.+i1, and r;,,,(R,X) -, 1.
So in tliis case the Cholesky downdating problem becomes very well conditioned no
matter iiow ill-conditioned R or U is.
Finally we summarize the results above as the following theorem.
Theorem 5.3.1 With the same assumptions as in Theorem 5.2.2, there i s a unique
Cholesky factorization
(U + A U ) ~ ( U + AU) = ( R + A R ) ~ ( R + AR) - (X + A X ) ~ ( X + AX),
There are the following relatzonships among the vanous measums of sensitivity of the
problem (see (5.3.10), (5.3.11), (5.3.19) and (5.3.20)) :
CHAPTER 5. THE CHOLESKY DO WNDATING PROBLEM
5.3.2 Mat rix equat ion analysis
.\s far as we see, the condition numbers obtained in the last section are espensive
to- compute or estimate directly with the usual approach. We now use the matris
cquation approach t.o obtain practical condition estimators.
In Theorem 5.2.2 we used the expression of Ü(0) in (5.2.12) to derive a new first-
orcler perturbation bound (5.2.14), frorn which Sun's bound was derived. Now ive
again look at (5.2.12), repeated here for clarity:
Let D, be the set of al1 a x n real positive definite diagonal matrices. For any
D = diag(bi,. . . ,6,) E D,,
up(BD-') = u p ( B ) P 1 and
First witli general AR we
let LI = DU.- Note that for any matris B we have
U ~ ( D - ~ B ) = D-' u ~ ( B ) .
have from (5.3.21) that
so taking the F-norm gives
wliere CD = - ma~~<~<~<,{6~/6~). - Thus from (5.3.22) we have
CHAPTER 5. THE CHOLESKY DOWnDATING PROBLEM
which leads to the following perturbation bound in terms of relative changes
Naturally we define the following two quantities as condition estimators for U with
respect to relative changes in R and X, respectively:
K',,(R,S)= inf KR,(R,X,D), K L ( R , X ) = inf r iL (R ,X ,D) , (5.3.24) DED, DED,
where
Then an overdl condition estimator can be defined as
K',,,(R, X ) = inf riC,,(R, X, D), D E D ~
where
K',,,(R, X, D) = max{n',,(R, X , D ) , &(R, X, D ) } .
Since IISI12 IIRlls, we see
which gives
KC,G(R, X ) = &(R, X ) l n',(R, X ) .
Therefore with these, we have from (5.3.23) that
CHAPTER 5. T H E CHOLESKY DO WNDATLNG PROBLEM
Clearly if we take D = In, (5.3.23) will become (5.2.14), and
It is not difficult to give an example to show # c m be arbitrarily larger than hJcDC(R, X),
as can be seen from the following asymptotic behaviour.
If S -+ O we saw Li -+ R and o,(I') -r 1, so
It is shown in Tlieorem 3.4.4 that with an appropriate choice of D, 41 + c: K@-' R)
has a bound wliich is a function of n only, if R was found using the standard pivoting
strategy in the Cholesky factorization, and in this case, we see dcD,(R, S) is bounded
independeiitly of n2(R) as X + 0, for geneml AR. At the end of this section we give
an even stronger result when X -+ O for the case of upper triangular AR. Note in
the case here that 4 in (5.2.17) can be made as large as we like, and thus arbitrarily
larger than rl,,,(R, X).
By the definitions of rtRG(R, X) and tcX(R7 X) respectively in (5.3.7) and (5.3.8),
we can easily verify from (5.3.29) that
KRC(R, I K ~ o ( ~ , X ) , ~ x ( R 1 X , I K ~ ( ~ Y (5 -3.32)
In the case where AR is upper triangular (so G is upper triangular), we can refine
the analysis further. From (5.3.21) we have
CHAPTER 5. THE CHOLESKY DOWNDATIiVG PROBLEM 117
Notice with the 'slt', 'sut' and 'diag' notation defined in (1.2.1) and (1.2.2),
But for any upper triangular matrix T we have
so that if we define T = d i a g ( ~ - T ~ T ) - GU-', then
th dia^(^-^^^) - GU-' + u-~G* - d i a g ( ~ ~ - ' ) ] = d iag (~ -T R ~ ) - GU-'. (3.3.36)
Thus from (5.3.34)' (5.3.35) and (5.3.36) we obtain
As before, let U = DG, where D = diag(bl,. . . ,6,) E D,. From (5.3.37) it follows
that
Tlien, applying (5.3.2) to this, y e have
CHAPTER 5. THE CHOLESKY DOWNDATIiVG PROBLEM
which leads to the following perturbation bound
Comparing this wit h (5.3.23) and noticing (5.2.3), we see the coefficient multiplying ex
does not change, so d x ( R , X) defined in (5.3.24) can still be regarded as a condition
estimator for U with respect to changes in X. But we now need to define a new -
condition estimator for U with respect to upper triangular changes in R, that is
( R X ) inf dRT(R, X, D), DED,
where
Thus an overall condition estimator can be defined as
KC,T(R, X ) = inf &,,(R, X , D), DED,
Obviously we ha-e
With these, we have from (5.3.38) that
What is the klationship between rc',,,(R, X) and IEICDG(R, X ) = &(R, X ) ? For
any n x n upper tnangular matrix T = (t,), observe the following two facts: -
CHAPTER 5. THE CHOLESKY DOW2VDATDVG PROBLEM 119
1) t i i , i = 1,2,. . . ,n are the eigenvalues of T, so that ltiil 5 IITllz. From this it
follows t ha t
. - Ildiag(T)112 5 11T112-
so that
K',,(R, X) I (1 + Jn)&,(R, X).
Thus ive have from (5.3.28) and (5.3.41) that
&,,(R, X) I (1 + Jn)n',,,(R, X). (5 -3 -44)
On the other hand, iC,,,(R, X) can be arbitrarily smaller than dc,,(R, X). This
can be seen from the asymptotic behaviour, which is important in its own right. AS
X + O, since U + R, o,(I') 4 1 and RU-L -+ In, we have
so for upper triangular changes in R, whether pivoting was used in finding R or not,
CHAPTER 5. THE CHOLESKY DOWnDATRVG PROBLEM 120
Thus when X -. 0, the bound in (5.3.42) reflects the tme sensitivity of the problem.
For the case of general AR, if we do not use pivoting it is straightfonvard to make
KL~,(R, X) in (5.3.27) arbitrarily large even with X = O, see (5.3.25). . -
By the definition of n,(R, X) in (5.3.16), we can easily verify from (5.3.42) that
Thus from tliis and the second inequality in (5.3.32), it follows with (5.3.18) and
(5.3.41) that that
~ z o = ( R , X) < K > ~ ~ ( R , X). (5.3.46)
Now we summarize these resiilts as the following theorem.
Theorem 5.3.2 With the same assvmptzons as in Theorem 5.2.2, there is a unique
Cholesky factorization
(Li + A U ) ~ ( L I + AU) = (R + A R ) ~ ( R + AR) - ( X + AX)*(X + AX) ,
where for geneml AR,
and /or upper triangular AR,
There are the following relationships among the various meusures of sensitivzty o/
the problem (see (5.3.28), (5.3.30), (5.3.32), (5.3.33), (5.3.43), (5.3.44), (5.3.45) and
(5.3.46)) :
CHAPTER 5. THE CHOLESKY DOWNDATDVG PROBLEM 121
Our numencal experiments suggest dc,,(R, X) is usually a good approximation
to ri,,,( R, X). But the following example shows dcDG(R, X) can sometimes be arbi-
t rarily larger t han K,,, ( R, X ) . . -
where 6 is a small positive number. It is not difficult to show
But dCDG(R, X) has an advantage over n,,,(R, X) - it can be quite easy to estimate
- al1 we need to do is to choose a suitable D in dC,,(R, X , D). We consider how to do
this in the next section. In contrast n,,,(R, X) is, as far as we can see, anreasonably
expensive to compute or estimate.
Numerical experiments also suggest d,,,(R, X ) is usually a good approximation
to xCDT(R, X). But sometimes d,,,(R, X) can be arbitrarily larger than K,,,(R, X).
This can also be seen from the example above. In fart, it is not difficult to obtain
Like rcCDG(R, X), n,,,(R, X) is difficult to compute or estimate. But dc,,(R, X) is
easy to estirnate, which is discussed in the next section.
5.4 Numerical experiments
In Section 5.3 we presented new first-order perturbation bounds for the the downdated
Cholesky factor U using first the matrix-vector equation approach, and then the
matrix equation âpproach. We defined n,,,(R, X) for generd AR, and K = ~ = ( R , X)
- for upper triangular AR, as the overall condition nurnbers of the probkrn. Also we '
CHAPTER 5. THE CHOLESKY DOWNDATING PROBLEM 122
gave two corresponding practicd but weaker condition estimat ors dcD, ( R, ,Y) and
G D T ( R i X) for the two AR cases.
CVe would like to choose D such - -
approximations to d D G ( R , X) and
(5.3.26) and (5.3.39) that we want
that dcD,(R1 X, D) and dcDT(R, X, D) are good
rc<,,,(R, X ) , respectively. We see from (5.3.25),
to find D such that J1+(2, Q(D-'u) approx-
irnates its infimum. That is the same problem we faced in Section 3.5. We adopt
the best rnethod of choosing D proposed there. Specificdly take G = ,/wl Ci = 4- if Jm 5 ci-' othernrise Ci = for i = 2, . . . , n. Then NT use
a standard condition estimator to estimate K ~ ( D - ' U ) in 0(n2) operations.
Notice from (5.2.4) we have o,(r) = 41 - IIXR-'IJ;. Usually k, the nurnber of
rows of X, is much smaller than n, so on(î) can be computed in 0(n2). If k is not
much smaller than n, then we use a standard norm estimator to estimate IISR-1112 in
0(n2). Similady llU I l z and I I R1I2 c m be estimated in 0(n2). So finally G D G ( R 9 X, D)
can be estimated in 0 ( n 2 ) . Estimating dc,,(R, X, D) is not as easy as estimating
hi$,,(R,X,D). The part Ildiag(RU-1)112 in dRT(R, X, D) can easily be cornputed
in O(n), since diag(RU-') = diag(rll/ull, . . . , rn,/un,). The part (Isut(RU-')(12 in
rc',,(R, X, D) can roughly be estimated in 0(n2) , based on
and the fact that I I RU-' (1 F can be estimated by a standard n o m estimator in 0(n2).
The value of I(XU-'I12 in n',(R,X, D) can be calculated (if k < n) or estimated
by a standard estimator in 0(n2) . Al1 the remaining values lIR1l2, llXllz and 11U112
can also be estimated by a standard n o m estimator in 0(n2). Hence dR,(R, X, D),
&(R, X, D), and thus K)CDT(R, Xl D) can be estimated in 0 ( n 2 ) . For standard con-
dition estimators and norm estimators, see Chapter 14 of Higham [30, 19961.
The relationships arnong the various overall measures of sensitivity of the Cholesky
CHAPTER 5. THE CHOLESKI' DOWNDATRVG PROBLEM 123
downdating problem presented in Section 5.2 and Section 5.3 are as follows.
2 4 > GDG(Ry X) 2 K C D ~ ( R , X ) 2 K C D T ( R : X),
- - (1 + J~ )&, , (R ,~ ) 2 .',,,(~,x) r k c m ( R ? X ) -
Now ive give one numerical example to illustrate these. The example, quoted from
Sun [50, 19951, is as follows.
2 3 4 R = diag(l ,s ,s ,s , s )
where c = 0.95, s = diic'. T h e resultç obtained using MATL.48 are shown in
Table 5-4.1 for various values of 7:
Table 5.4.1: Results for the exarnple in Sun's paper
Note in Table 5.4.1 how P and 4 c m be far worse than the condition num-
bers K ~ ~ ~ ( R , X) and n,,,(R,X), although q5 is not as bad & P . Also we observe
CHAPTER 5. THE CHOLESKY DO WNDATING PROBLEM 124
that hJ,,,(R, X, D ) and dcDT(R7 X, D ) are very good approximations to rl,,,(R, X)
and K,,,(R, X) , respectively. When X become srnall, al1 of the condition num-
bers and condition estimators decrease. The asymptotic behavior of n',.,(R, X, D), - *
hGDT (R, ,y, D), nCDc(R, X ) and K,,,(R, X) coincides wit h our theoretical results:
when X + O, dc,,(R, X) and n,,,(R, X) will be bounded in terms of n since here
R is actually Ks(arccos(0.95)), a Kahan matrix, which corresponds to the Cholesky
factor of a correctly pivoted A, and hiDT(R7 X ) , K , ~ = ( R , X) -) 1.
5.5 Summary and future work
The first-order perturbation analyses presented here show just what the sensitivity of
the Cholesky downdating problem is, and in so doing provide the condition numbers,
as well as efficient ways of approxirnating these. The key rneasures of the sensitivity
of the problem we derived are:
For general AR:
- overall condition number: >cc,,( R, X) , see (5.3.9),
- overall condition estimator: K',,,(R, X) r idDED, dcDG(Rl X, D), see
(5.3.27),
For triangular AR:
- overall condition number: K,,,(R, X), see (5.3.18),
- overall condition estimator: dcDT(R, X) infDEDn dcDT(R, X, D), see
(5.3.40).
These quantities and the condition estimators 4 (see (5.2.17)) and /3 (see (5.2.19))
obey
CHAPTER 5. T H E CHOLESKY DOWNDATLlVG PROBLEM 125
For the asymptotic case as X -r O, krCDO(R, X) and 4 , , ( R , X) will be bounded in
terms of n, and K,,,(R, X), dc,,(R, D) + 1, while P and q5 have no such properties.
Recently Stewart [42, 19951 presented a bachward rounding error analysis for the - - block downdating algorithm presented by Eldén and Park. It would be straightfor-
ward to combine Our results here with Stewart's result to give a fomard error estimate
for the computed U. But ive choose m t to do this here in order to keep the material
as simple as possibIe.
In the future we would like to
Give better approximations to riCD,(R, X) and nCD,(R, X) than h&,(R, S)
and rt',,,(R, X).
0 Extend our analyses here to other cases, such as that when AR and A S corne
from a componentwise backward rounding error analysis.
Chapter 6
Conclusions and future research
.A new approach, the so called 'matrix-vector equation approach' has been developed
here for the perturbation analysis of matrix factorizations. The basic idea of this
approach is to write the perturbation matrix equation as a matrix-vector equation by
using the special structure and properties of the factors. Using this approach we ob-
tained tight first-order perturbation results and condition numbers for the Cholesky,
QR and LU factorizations, and for the Cholesky downdating problem. Our pertur-
bation bounds give significant improvements on the previous results, and could not
be sharper.
Also we used the so called 'matrix equation approach' originated by G. W. Stew-
art to obtain perturbation bounds that are usually weaker but easier to interpret,
leading to condition estimators which are easily estimated by the standard condition
estimators (for matrix inversion) or norm estimators. Our experiments suggested
that for the Cholesky, QR and LU factorizations with nom-bounded changes in the
original matrices the condition estimators are very good approximations of the cor-
responding condition numbers. Also Our numerical experiments sugges ted for the
Cholesky factorization with component-bounded changes in the original matrïx and
the Cholesky downdating problem with nom-bounded changes in the original matri-
ces, the condition estimators are usually good approximations of the corresponding
CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH
condition numbers, even though some counter-esamples were found.
The matrix-vector equation approach is a powerful general tool, and appears to
be applicable to the perturbation aoalysis of any matrix factorization. The rnatrix - - equation approach is also fairly general, but for each factorization a particular treat-
ment is needed. The combination of these two approaches gives a deep understanding
of these problems. Although first-order perturbation bounds are satisfactory for al1
but tlie most delicate work, we also gave some rigorous perturbation bounds for the
- Cliolesky factorization.
111 computing these factorizations, standard pivoting is often used to improve the
stability of the algorithms. We showed that the condition of these factorizations is
significantly improved by the standard pivoting strategies (except the L factor in the
LU factorization), and provided firmly based theoretical explanations as to why t his
is so. This extremely important information is very useful for designing more reliable
matris aigorithms.
In the future we hope to continue this research in several directions:
To analyze the Cholesky, LU and QR factorizations, and Cholesky downdating
of general matrices, where perturbations have special structure, for example, by
assuming the perturbation has the form of the equivalent backward rounding
error from a numerically stable computation of the factorization (some results
for the Cholesky factorization have been given in this thesis). Such structure
leads to improved sensitivity results.
To analyze other factorizations of general matrices for both generd perturba-
t ions and structured perturbations.
To extend Our approach to
plicat ions mat rices and the
in a numerically stable way
the factorizations of special matrices. In many a p
resulting factorizations used to solve the problems
have some special structure. Applying the existing
CHAPTER 6. CONCLUSIONS AND F U T U R E RESEARCH 128
general perturbation results to these special problems will result in overestima-
tion of the true sensitivity. Our new approach to such perturbation analyses
can make full use of the structure, so should lead to results which closely reflect - -
the true sensitivity of the problems.
References
[1] S. T. Alexander, C.-T. Pan and R. J. Plemmonç, Analysis of a recursive least
squares hyperbolzc rotation algorithm for signal processzng. Linear Algebra Appl.,
98 (1988), pp. 3-40.
[2] A. Barrland. Perturbation bounds for the LD L~ and the L U factorizations. BIT,
31 (1991), pp. 358-363.
[3] R. Bhatia. Matriz factorizations and their perturbations. Linear .4lgebra and
Appi., 197,198 (1994)) pp.245-276.
[4] A. Bjorck and G. H. Golub. Numerical methods for computing angles between
linear subspaces. Math. Comp., 27 (1973), pp. 579-594.
[5] A. Bjorck, H. Park and L. Eldén. Accurate downdating of leust squares solutions.
SIAM J. Matrix Anal. Appl., 15 (1994), pp. 549-568.
[6] A. W. Bojanuyk, R. P. Brent, P. Van Dooren, and F. R. de Hoog. A note on
downdating the ChoZesky factoriration. SIAM J . Sci. Stat. Cornput., 8 (1987),
pp. 210-221.
[7] P. A. Businger and G . H. Golub, Linear least squares solutions b y Householder
Dansfonnations. Numer. Math., 7 (1965), pp. 269-276.
[8] L W . Chang. Perturbation analyses for the Cholesky factorization with back-
ward rounding en-ors. Technical Report SOCS-96.3, School of Cornputer Science,
blcGill University, 1996. - *
[9] X.-Ur. Chang and C. C. Paige. A perturbation analysis for R in the QR fac-
torization. Technical Report SOCS-95.7, School of Computer Science, McGill
University, 1995.
[1 O] X.- W. Chang and C. C. Paige. Perturbation analyses for the Cholesky downdating
problem. Subrnitted to S I A M J. Matrix Anal. Appl., April 1996. 15 pp.
[ I l ] >;.-W. Chang and C . C . Paige. Perturbation analyses for the QR factorization
with bounds on changes zn the elements of A. In preparation.
[12] X.-W. Chang and C. C. Paige. Perturbation analyses for the L U factorization.
In preparat ion.
(131 X.-W. Chang, C. C. Paige and G. W. Stewart. New perturbation analyses for the
Cholesky factorization. I M A J . Numer. Anal., 16 (1996), pp. 457-484.
[14] X.-W. Chang, C. C. Paige and G. W. Stewart. Perturbation analyses for the QR
factorization. SIAM J. Matnx Anal. Appl., to appear. 18 pp.
1151 A. K. Cline, -4. R. Conn and C. F. Van Loan. Genemlizing the LINPACK condi-
tion estimator. Numerical Analysis, ed. J. P. Hennart. Lecture Notes in Math-.
ernatics, no. 909, Springer-Verlag, NY. 1982.
[16] A. K. Cline, C. B. Moler. G. W. Stewart and J. H. Wilkinson. An estimate for the
condition number of a matriz. S I A M J . Numer. And., 16 (1979), pp. 368-375.
[l?] J. W. Demmel. On floating point eror s in Cholesky. Technical Report, CS-89-87,
Depart ment of Computer Science, University of Tennessee, October 1989. 6pp.
LAPACK Working Note 14. -
[18] J. D. Dixon. Estimating extrema1 eigenvalues and condition number of matrices.
SIAM J. Numer. Anal., 20(4):812-814, 1983.
1191 J. J. Dongarra, J. R. Bunch, C. B. Moler and G. W. Stewart. LINPACK User's
Guide, SIAM, Philadelpliia, 1979.
[20] 2. Drmai., M. OmladiE and K. Veselié. On the perturbation of the Cho1esl;yfac-
torization. SIAM J. Matrix Anal. Appl., 15 (1994), pp. 1319-1332.
(211 L. Eldén and H. Park. Block-doumdating of least squares solutions. SIAM J.
Matrix Anal. Appl., 15 (1994), pp. 1018-1034.
1221 L. Eldén and H. Park. Perturbation analysis for block downdating of a Choleslly
decomposition. Nurnerische Mathematik, 68 (1994), pp. 457-467.
1231 L. Eldén and H. Park. Perturbation and error analyses for block doumdating of
a Cholesky decomposition. BIT, 36 (1996), pp. 247-263.
1241 P. E. Gill, G. H. Golub, W. Murray and M. A. Saunders. Methods for modzfying
matriz factorizations. Math. Comp., 28 (1974), pp. 505-535.
1251 G. H. Golub and G . P. S t y a , Numerical computations for univariate lznear
models. J. Stat. Comp. Simul., 2 (1973), pp. 253-272.
[ X ] G. H. Golub and C. F. Van Loan. Mat* computations. Third Edition. The
Johns Hopkins University Press, Baltimore, Maryland, 1996.
1271 N. J. Higharn. A sumey of condition number estimation for triangular matrices.
SIAM Rev., 29 (1987), pp. 575-596.
[28] N. J. Higham. Analyszs of the Cholesky decomposition of a semi-definite matriz.
Reliable Numerical Computation, ed. M. G. Cox & S. J. Harnmarling. Oxford
University Press, 1990, pp. 161-186.
REFERENCES 132
(291 N. J. Higham, A suruey of componentwise perturbation theory in numerical linear
algebra. In Matliematica of Computation 1943-1993: ..\ Half Century of Com-
putational Mathematics, Walter Gautschi, editor, volume 48 of Proceedings of . m
Symposia in Applied Mathernatics, American Mathematical Society, 1994, pp.
[30) N. J. Higham. Accuracy and Stability of Numerical Algorithms. Society for In-
dustrial and Applied Mathematics, Philadelphia, PA, USA, 1996.
[31] Y. P. Hong and C.-T. Pan. Rank-revealing QR facto~zations and the szngular
value decomposition. Mathematics of Computation, 58 (1992), pp. 213-232.
[32r W. Kahan. Numerical linear algebm. Can. Math. Bull., 9 (1966), pp. 737-801.
[33] J . Meinguet. Refined Emor Analyses of Cholesky Factoriration. SIAM J . Numer.
Anal., 20 (1983), pp. 1243-1250.
[34] C. C. Paige. Covariance matriz representation in linear filtering. In Linear Al-
gebra and Its Role in Systerns Theory, B.N. Datta (ed.), AMS, Providence, RI,
1985, pp. 309-321.
[35] C.-T. Pan. A perturbation analyszs of the problem of downdating a Cholesky
factorization. Linear Algebra Appl., 183 (EM), pp. 103-1 16.
[36] CI-S. Pan and R. Plemrnons. Least squares modification with inverse factoriza-
tions: paralLe1 implications. J . Comp. Appl. Math., 27 (1989), pp. 109-127.
[3 71 M. A. Saunders. Large-scale linear progmmming uszng the Cholesky lac t oriza tion .
Technical Report CS252, Cornputer Science Department, Stanford University,
1972.
[38] G. W . Stewart. Emor and perturbation bounds for subspaces associated with cer-
tain eigenvalue ~problems. SIAM Rev., 15 (1 973), pp. 727-764.
[39] G. W. Stewart. Perturbation bounds for the QR jactorization of a matrix. SL4M
J. Nunier. Anal., 14 (1977), pp. 509-518.
[4Q] G. W. Stewart. The eflects of rounding e r o r on an algorithm for downdating a
Cholesky factorization. J. Inst. Math. Appl., 23 (1979), pp. 203-213.
[41] G. W. Stewart. On the perturbation of LU, C h o l e s k ~ and QR factorizations.
SIAM J. Matriz Anal. Appl., 14 (1993), pp. 1141-1145.
[42] G. W. Stewart. On sequeritial updates and downdates. IEEE Transactions on
Signal Processiiig, 43 ( 1995), pp. 2642-2648.
[43] G. W. Stewart. The tnangular matrices of Gaussian elimination and related
decompositions. Tcchnical Report CS-TR-3533 UMIACS-TR-95-91, Department
of Computer Science, University of Maryland, 1995.
[44] G. W. Stewart. On the Perturbation of LU and Cholesky Factors. Teclmical Re-
port CS-TR-3535 UMIACS-TR-95-93, Department of Computer Science, Uni-
versity of Maryland, 1993.
[45] G. W. Stewart and J . G. Sun. Matriz perturbation theory, Academic Press, Lon-
don, 1990.
[46] J . G. Sun. Perturbation bounds for the Cholesky and QR factoBration. BIT, 31
(1991)) 341-352.
[47] J . G. Sun. Rounding-emor and perturbation bovnds for the Cholesky and LD L~
factorizations. Linear Algebra and App1.,173 (1992), pp. 77-97.
[48] J. G. Sun. Cornponentwise perturbation bounds for some rnatrix decompositions.
BIT, 32302-714,1992.
1491 J. G. Sun. On perturbation bounds for the QR factorization. Linear Algebra and
.4ppl., 215 (1993), pp. 9-5-112.
[50] J.-G. Sun. Perturbation analysis of the Cholesky downduting and QR updating
problems. S I A M J. Matris Anal. Appi., 16 (1995), pp. 760-775.
[SI] -4. van der Sluis. Condition nurnbers and equilzbration of matrices. Numerisclie
Mathematik, 14 (1969), 14-23.
[52] J. H. Wilkinson. Roundzng e m r s in algebraic processes. Prentice- Hall, Engle- -
wood Cliffs, NJ, 1963.
1531 J. H. Wilkinson. The aZgebraic ezgenvalue problem. Oxford University Press,
1965.
[54] J. H. Wilkinson. A pnor e m r analyszs of algebraic processes. In Proc. Inter-
nationaI Congress of Mathematicians, Moscow 1966, 1. G. Petrovsky, ed., Mir
Publishers, Moscow, 1968, pp. 629-640.
[Sb] H . Zha. A cornponentvrise perturbation analysis of the QR decomposition. S I A M
J. Matriw Anal. Appl., 14 (1993), pp. 1124-1131.
IMAGE EVALUATION TEST TARGET (QA-3)
APPLIED IMAGE. lnc - = 1653 East Main Street - -. . , Rochester, NY 14609 USA -- -- - - Phone: 71 6/482-0300 -- -- - - Fa: 71 61288-5989
Q 1993. A p p l i Image. Inc.. All R i i t s Resenied