Elements of the Theory of Generalized Inverses of Matrices

download Elements of the Theory of Generalized Inverses of Matrices

of 89

description

Modules and Monographs inUndergraduate Mathematicsand its Applications Project

Transcript of Elements of the Theory of Generalized Inverses of Matrices

  • umap Modules and Monographs in Undergraduate Mathematics and its Applications Project

    ELEMENTS OF THE THEORY OF GENERALIZED INVERSES FOR MATRICES

    Randall E. Cline Mathematics Department University of Tennessee Knoxville, Tennessee 37916

    The Project acknowledges Robert M. Thrall, Chairman of the UMAP Monograph Editorial Board, for his help in the development and review of this monograph.

  • Modules and Monographs in Undergraduate Mathematics and its Applications Project

    The goal of UMAP is to develop, through a community of users and developers, a system of instructional modules and monographs in undergraduate mathematics which may be used to supplement existing courses and from which complete courses may eventua1ly be bnilt.

    The Project is gnided by a National Steering Committee of mathematicians, scientists, and educators. UMAP is funded by a grant from the National Science Foundation to Education Development Center, Inc., a publicly supported, nonprofit corporation engaged in educational research in the U.S. and abroad.

    The Project acknowledges the help of the Monograph Editorial Board in the development and review of this monograph. Members of the Monograph Editorial Board include: Robert M. Thrall, Chairman, of Rice University; Clayton Aucoin, Clemson University; James C. Frauenthal, SUNY at Stony Brook; Helen Marcus-Roberts, Montclair State College; Ben Noble, University of Wisconsin; Paul C. Rosenbloom, Columbia University. Ex-officio members: Michael Anbar, SUNY at Buffalo; G. Robert Boynton, University of Iowa; Kenneth R. Rebman, California State University; Carroll O. Wilde, Naval Postgraduate School; Douglas A. Zahn, Florida State University.

    The Project wishes to thank Thomas N.E. Greville of the University of Wisconsin at Madison for his review of this manuscript and Edwina R. Michener, UMAP Editorial Consultant, for her assistance.

    Project administrative staff: Ross L. Finney, Director; Solomon Garfunkel, Associate Director/Consortium Coordinator; Felicia DeMay, Associate Director for Administration; Barbara Kelczewski, Coordinator for Materials Production.

    ISBN-13: 978-0-8176-3013-3 e-ISBN-13: 978-1-4684-6717-8 DOl: 10.1007/978-1-4684-6717-8

    Copyright1979 by Education Development Center, Inc. All rights reserved. No part of this publication may be

    reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

  • TABLE OF CONTENTS

    Preface .

    1. INTRODUCTION

    1.1 Preliminary Remarks 1.2 Matrix Notation and Terminology 1.3 A Rationale for Generalized Inverses

    2. SYSTEMS OF EQUATIONS AND THE MOORE-PENROSE INVERSE OF A MATRIX . . . . . . .

    2.1 Zero, One or Many Solutions of Ax = b 2.2 Full Rank Factorizations and the

    Moore-Penrose Inverse 2.3 Some Geometric Illustrations 2.4 Miscellaneous Exercises

    3. MORE ON MOORE-PENROSE INVERSES

    v

    1

    1 2 4

    9

    9

    12 21 27

    29

    3.1 Basic Properties of A+ . . 29

    4.

    3.2 Applications with Matrices of Special Structure 40 3.3 Miscellaneous Exercises 56

    DRAZIN INVERSES 4.1 The Drazin Inverse of a Square Matrix 4.2 An Extension to Rectangular Matrices 4.3 Expressions Relating Ad and A+ 4.4 Miscellaneous Exercises

    57 57 64 68 70

    5. OTHER GENERALIZED INVERSES 71

    5.1 Inverses That Are Not Unique 71

    APPENDIX 1: Hints for Certain Exercises 79

    APPENDIX 2: Selected References 85

    Index to Principal Definitions 87

    -iii-

  • Preface

    The purpose of this monograph is to provide a concise introduction to the theory of generalized inverses of matrices that is accessible to undergraduate mathematics majors. Although results from this active area of research have appeared in a number of excellent graduate level text-books since 1971, material for use at the undergraduate level remains fragmented. The basic ideas are so fundamental, however, that they can be used to unify various topics that an undergraduate has seen but perhaps not related.

    Material in this monograph was first assembled by the author as lecture notes for the senior seminar in mathematics at the University of Tennessee. In this seminar one meeting per week was for a lecture on the subject matter, and another meeting was to permit students to present solutions to exercises. Two major problems were encountered the first quarter the seminar was given. These were that some of the students had had only the required one-quarter course in matrix theory and were not sufficiently familiar with eigenvalues, eigenvectors and related concepts, and that many

    -v-

  • of the exercises required fortitude. At the suggestion of the UMAP Editor, the approach in the present monograph is (1) to develop the material in terms of full rank factoriza-tions and to relegate all discussions using eigenvalues and eigenvectors to exercises, and (2) to include an appendix of hints for exercises. In addition, it was suggested that the order of presentation be modified to provide some motivation for considering generalized inverses before developing the algebraic theory. This has been accomplished by introducing the Moore-Penrose inverse of a matrix and immediately exploring its use in characterizing particular solutions to systems of equations before establishing many of its alge-braic properties.

    To prepare a monograph of limited length for use at the undergraduate level precludes giving extensive references to original sources. Most of the material can be found in texts such as Ben-Israel and Greville [2] or Rao and Mitra [11] .

    Every career is always influenced by colleagues. The author wishes to express his appreciation particularly to T.N.E. Greville, L.D. Pyle and R.M. Thrall for continuing encouragement and availability for consultation.

    Randall E. Cline Knoxville, Tennessee September 1978

    -vi-

  • 1 Introduction

    1.1 Preliminary Remarks

    The material in this monograph requires a knowledge of basic matrix theory available in excellent textbooks such as Halmos [7], Noble [9] or Strang [12]. Fundamental definitions and concepts are used without the detailed discussion which would be included in a self-contained work. Therefore, it may be helpful to have a standard linear algebra textbook for reference if needed.

    Many examples and exercises are included to illustrate and complement the topics discussed in the text. It is recom-mended that every exercise be attempted. Although perhaps not always successful, the challenge of distinguishing among what can be assumed, what is known and what must be shown is an integral part of the development of the nebulous concept called mathematical maturity.

    -1-

  • 1.2 Matrix Notation and Terminology

    Throughout subsequent sections capital Latin letters denote matrices and small Latin letters denote column vectors. Unless otherwise stated, all matrices (and thus vectors--being matrices having a single column) are assumed to have complex numbers as elements. Also, sizes of matrices are assumed to be arbitrary, subject to conformability in sums and products. For example, writing A+B tacitly assumes A and B have the same size, whereas AB implies that A is m by nand B is n by p for some m, nand p. (Note, however, that even with AB defined, BA is defined if and only if m = p.) The special symbols I and are used to denote the n by n identity matrix and the m by n null matrix, respectively, with sizes deter-mined by the context when no subscripts are used. If it is important to emphasize size we will write In or 0mn

    For any A = (a .. ), the conjugate transpose of A is H 1J H _ _ written as A. Thus A = (a .. ), where a .. denotes the con-J 1 J 1 jugate of the complex scalar a .. , and if x is a column vector

    J 1 H with components xl"" ,xn ' then x is the row vector

    Consequently, for a real matrix (vector) the superscript "H" denotes transpose.

    Given vectors x and y, we write the inner product of x and y as

    n L x.y ..

    i=l 1 1

    Since only Euclidean norms will be considered, we write I Ixl I without a subscript to mean

    I I x I I = + v'{X;XT = + II I x. I 2 . i=l 1

    To conclude this section it is noted that there are certain concepts in the previously cited textbooks which are

    -2-

  • either used implicitly or discussed in a manner that does not emphasize their importance for present purposes. Although sometimes slightly redundant, the decision to include such topics was based upon the desire to stress fundamental understanding.

    Exercises 1.1 Let x be any m-tuple and y be any n-tuple.

    1.2

    a.

    b.

    Form XyH and yXH

    Suppose m = n and that neither x nor y is the zero vector. Prove that xyH is Hermitian if and only if y = ax for some real scalar a.

    H H Let A be any m by n matrix with rows wI' ... ,wm and columns dl Bb b h H H d xl' ... ,xn an et e any n y p matrix Wit rows Yl , ... 'Yn an

    columns zl' ... 'zp.

    a. Prove that the product AB can be written as

    AB

    and also as

    AB n H L x.y.

    i=l I I

    (z ,w ) p m

    b. Prove that A = 0 if and only if either AHA = 0 or AAH = o.

    c. Show that BAHA BAH = CAH.

    CAHA for any matrices A,B and C implies

    *1.3 Let A be any normal matrix with eigenvalues Al, ... ,A n and ortho-normal eigenvectors xl' ... ,xn .

    *Exercises or portions of exercises designated by an asterisk assume a knowledge of eigenvalues and eigenvectors.

    - 3-

  • a. Show that A can be written as

    b.

    A

    If E. = I

    potent,

    n H L A.x.x. . i=l I I I

    XiXiH,i=l, ... ,n, show that Ei is Hermitian and idem-and that E.E. = E.E. = 0 for all i ;. j.

    I J J I

    c. Use the expression for A in 1 .3a and the result of 1 .3b to conclude that A is Hermitian if and only if all eigenvalues A. are real.

    I

    1.3 A Rationale for Generalized Inverses

    Given a square matrix, A, the existence of a matrix, X, such that AX = I is but one of many equivalent necessary and sufficient conditions that A is nonsingular. (See Exercise 1.4.) In this case X = A- l is the unique two-sided inverse of A, and x = A-lb is the unique solution of the linear algebraic system of equations Ax = b for every right-hand side b. Loosely speaking, the theory of generalized inverses of matrices is concerned with extending the concept of an inverse of a square nonsingular matrix to singular matrices and, more generally, to rectangular matrices by considering various sets of equa-tions which A and X may be required to satisfy. For this purpose we will use combinations of the following five funda-mental equations:

    (1. l) AkXA Ak , for some positive integer k, (1. 2) XAX X, (1. 3) (AX}H AX, (1. 4) (XA}H XA, (1. 5) AX XA. (It should be noted that (l.l) with k > 1 and (1.5) implicitly assume A and X are square matrices, whereas (l.l) with k = 1, (1.2), (1.3), and (1.4) require only that X has the size of AH. Also, observe that all of the equations clearly hold when A

    -4-

  • is square and nonsingular, and X = A-I.) Given A and subsets of Equations (l.l)-(l.S), it is logical to ask whether a solution X exists, is it unique, how can it be constructed and what properties does it have? These are the basic ques-tions to be explored in subsequent chapters.

    In Chapter 2 we establish the existence and uniqueness of a particular generalized inverse of any matrix A (to be called the Moore-Penrose inverse of A), and show how this inverse can be used to characterize the minimal norm or least squares solutions to systems of equations Ax = b when A has full row rank or full column rank. This inverse is then further explored in Chapter 3 where many of its properties are derived and certain applications discussed. In Chapter 4 we consider another unique generalized inverse of square matrices A (called the Drazin inverse of A), and relate this inverse to Moore-Penrose inverses. The concluding chapter is to provide a brief introduction to the theory of generalized inverses that are not unique.

    Exerc i ses 1.4 For any A, let N(A) denote the null space of A, that is,

    N(A) = {zlAz = OJ. a. If A is a n by n matrix, show that the following conditions

    are equivalent: (i) A is nonsingular, (ii) N(A) contains only the null vector, (i i i) Rank (A) = n, (iv) A has a right inverse, (v) Ax = b has a unique solution for every right-hand

    side b.

    b. What other equivalent statements can be added to this list?

    1.5 Let

    2 x [

    -/ -01 -01 _41] o 0-1 o 0 -1

    2

    2

    -5-

  • a. Show that X = A4 -1

    b. If AS is the five by five matrix obtained by extending A4 in the obvious manner, that is, AS = (a .. ) where

    I J

    a .. = I J

    {2' i f i = j - 1 ,

    1, otherwi se,

    form AS- l More generally, given An of this form for any n ~ 2, what is A -1 7

    n

    c. Prove that An is unimodular for all n ~ 2, that is, I det A I = 1.

    n

    d. Show that any system of equations Anx an integral solution x.

    b with b integral has

    1.6 Let x be any vector with I Ixl I = 1 and let k be any real number.

    a. Show that A = + kxx H is nonsingular for all k # -1.

    b. Given the forty by forty matrix A = (a ij ) wi th

    = r' if i = j, a .. I J I, otherwise, construct A-I.

    c. Show that A in 1.6a is an involution when k -2.

    d. Show that A is idempotent when k = -1.

    *e. Show that A has one eigenvalue equal to l+k and all other eigenvalues equal to unity. (Hint: Consider x and any vector y orthogonal to x.)

    *f. Construct an orthonormal set of eigenvectors for A in 1.6b.

    1.7 Given the following pairs of matrices, show that A and X satisfy (1 .1) wi th k = I, (1. 2) , (1 .3), and (1 .4) . a.

    _22J, X . "10[: -'] A = [1 0 -1 0 o ; -2

    -6-

  • b. [~ 21 -12 \ ' ~6 lit -7 !] ; A : X : 1/51 15 -6 3 I J c. [~ :J 1/50G !] . A : X :

    1.8 Show that the matrices

    r-,' -5 -6 -It [-' -5 -It ] 2 I X 1/2 ; It 2 A : , l~ 2 4 2 I 2

    satisfy (1.1) with k : 2, ( 1 .2) and (1.5).

    -7-

  • 2 Systems of Equations

    and the Moore -Penrose Inverse

    of a Matrix

    2 .1 Zero, One or Many Solutions of Ax = b

    Given a linear algebraic system of m equations in n unknowns written as Ax = b, a standard method to determine the number of solutions is to first reduce the augmented matrix [A,b) to row echelon form. The number of solutions is then characterized by relations among the number of unknowns, rank (A) and rank ([A, b) ). In particular, Ax = b is a cons i s-tent system of equations, that is, there exists at least one solution, if and only if rank (A) = rank ([A,b)). Moreover, a consistent system of equations Ax = b has a unique solution if and only if rank (A) = n. On the other hand, Ax = b has no exact solution when rank (A) < rank ([A,b)). It is the purpose of this chapter to show how the Moore-Penrose inverse of A can be used to distinguish among these three cases and to provide alternative forms of representations which are frequently employed in each case.

    For any matrix, A, let CS(A) denote the column space of A, that is,

    -9-

  • CS (A) = {y I y = Ax, for some vector x}, Then Ax = b a consistent system of equations implies bECS (A) and conversely (which is simply another way of saying that A and [A,b] have the same rank). Now by definition,

    rank (A) = dimension (CS(A, and if

    N (A) = {z I Az = O} denotes the null space of A (cf. Exercise 1.4), then we have the well-known relation that

    rank (A) + dimension (N(A = n. Given a consistent system of equations Ax = b with A m by

    n of rank r, it follows, therefore, that if r = n, then A has full column rank and x = ALb is the unique solution, where AL is any left inverse of A. The problem in this case is thus to construct ALb.

    However, when r < n, so that N(A) consists of more than only the zero vector, then for any solution, xl' of Ax = b, any vector zEN(A) and any scalar a,x Z = xl + az is also a solu-tion of Ax = b. Conversely, if xl and Xz are any pair of solutions of Ax = b, and if Z = xl - x z' then Az = AX I - AX Z = b - b = 0 so that zEN(A). Hence all solutions to Ax = b in this case can be written as

    n-r (Z .1) + 2 a. z. ,

    i=l 1 1

    where xl is any particular solution, zl"",zn_r are any set of vectors which form a basis of N(A) and a l , ... ,an - r are arbitrary scalars.

    Often the problem now is simply to characterize all solu-tions. More frequently, it is to determine those solutions which satisfy one or more additional conditions as, for example, in linear programming where we wish to construct a nonnegative solution of Ax = b which also maximizes (c,x) where c is some given vector and A, band c have real elements.

    -10-

  • Given an inconsistent system of equations Ax = b, that is, where rank (A) < rank [A,b] so that there is no exact solution, a frequently used procedure is to construct a vector x, say, which is a "best approximate" solution by some criterion. Perhaps the most generally used criterion is that of least squares in which it is required to determine x to minimize IIAx - bll or, equivalently, to minimize I!Ax - b11 2 . In this case, if A has full column rank, then ~ = (AHA)-lAHb is the least squares solution (see Exercise 2.7).

    Exe rc i ses 2. I

    2.2

    Given the following matrices, Ai' and vectors, bi , determine which of the sets of equations A.x = b. have a unique solution, infi-

    I I nitely many solutions or no exact solution, and construct the unique solutions when they exi st.

    ( i ) AI = 1-\

    2 7] r-'] (; ; ) [' i]' b,{l ~ , bl = l-64 ; A2 = I L4 -3 3

    (i i i) 13 :1 b,' r: 1 ' (;,) [' _:J b4 ' [:j, A = 0 3 A4 = 3 3 l~ -2 I 0 0 3J 2

    (v) A, - [;

    -:1' b,{J (v i) [2 3 4] = [112] 0 A6 = I 2 0 I ,b6 O For any partitioned matrix A columns of the matrix

    [B,R] with B nonsingular, prove that

    form a basis of N(A). 2.3 Construct a basis for N(A6) in Exercise 2. I.

    -11-

  • 2.4 Apply the Gram-Schmidt process to the basis in Exercise 2.3 to construct an orthonormal basis of N(A6).

    2.5 Show that if zl and z2 are any vectors which form an orthonormal basis of N(A6) and if all solutions of A6x = b6 are written as

    where H 1

    xl = IT [0 -1 2) ,

    then for every solution such that IIxl1 2 = 25124.

    2.6 Show that A, AHA and AAH have the same rank for every matrix A.

    2.7 a. Given any system of equations Ax = b wi th A m by nand rank (A) = n, show by use of calculus that the least squares solution, ~, has the form ~ = (AHA)-IAHb. Suppose m = n?

    b. Construct the least squares solution of Ax = b if

    2.2 Full Rank Factorizations and the Moore-Penrose Inverse

    Given any matrix A (not necessarily square), it follows at once that if X is any matrix such that A and X satisfy (1.1) with k 1, (1.2), (1.3) and (1.4), then X is unique. For if

    (2.2) AXA = A, XAX = X, (AX)H AX, (XA)H = XA, and if A and Y also satisfy these equations, then

    X XAX = X(AX)II = XXHAH = XXH(AYA)H XXHAH(Ay)H = XAY = (XA)Hy = AHXHy (AYA)HXHy = (YA)HAHXHy = YAXAY = YAY = Y.

    Now if A has full row rank, then with X any right inverse of A, AX = I is Hermitian and the first two equations in (2.2) hold. Dually, if A has full column rank and X is any left inverse of

    -12-

  • A, XA = I is Hermitian and again the first two equations in C2.2) hold. As shown in the following lemma, there is a choice of X in both cases so that all four conditions hold.

    LEMMA 1: Let A be any matrix with full row rank or full column rank. If A has full row rank, then X = AHCAAH)-l is the unique right inverse of A with XA Hermitian. If A has full column rank, then X = CAHA)-lAH is the unique left inverse of A with AX Hermitian.

    Proof: If A is any matrix with full row rank, AAH is non-singular by Exercise 2.6. Now X = AHCAAH)-l is a right inverse of A, and

    CXA)H = (AHCAAH)-lA)H = AHCAAH)-lA = XA.

    Thus A and X satisfy the four equations in C2.2), and X is unique.

    The dual relationship when A has full column rank follows in an analogous manner with AHA nonsingular .

    It should be noted that X = A-I in C2.2) when A is square and nonsingular, and that both forms for X in Lemma 1 reduce to A-I in this case. More generally, we will see in Theorem 4 that the unique X in C2.2) exists for every matrix A. Such an X is called the Moore-Penrose inverse of A and is written A+. Thus we have from Lemma 1 the special cases:

    C 2.3)

    Example 2.1

    If A = [~ and so

    JAHCAAH)-l, if A has full row rank,

    lCAHA)-lAH, if A has full column rank.

    o 1

    -1] 2 1

    [~ 1:.[ 2 -1] 3 -1 2

    -13 -

  • Example 2.2

    If Y is the column vector

    then + 1 Y IS [1 2-i 0 3iJ.

    Note in general that the Moore-Penrose inverse of any nonzero row or column vector is simply the conjugate transpose of the vector multiplied by the reciprocal of the square of its length.

    Let us next consider the geometry of solving systems of equations in terms of A+ for the special cases in (2.3). Given any system of equations Ax = b and any matrix, X, such that AXA = A, it follows at once that the system is consis-tent if and only if

    (2.4) AXb = b. For if (2.4) holds, then x = Xb is a solution. Conversely, if Ax = b is consistent, multiplying each side on the left by AX gives Ax = AXAx = AXb, so that (2.4) follows. Suppose now that Ax = b is a system of m equations in n unknowns where A has full row rank. Then Ax = b is always consistent since (2.4) holds with X any right inverse of A, and we have from (2.1) that all solutions can be written as (2. 5) x = Xb + Zy

    with Z any matrix with n-m columns which form a basis of N(A) and y an arbitrary vector. Taking X to be the right inverse A+ in this case gives Theorem 2.

    THEOREM 2: For any system of equations Ax = b where A has full row rank, x = A+b is the unique solution with IIxl12 minimal.

    -14-

  • Proof: With X = A+ in (2.5),

    (x,x) = (A+b+Zy,A+b+Zy) (A+b,A+b) + (Zy,Zy) = IIA+bI1 2 + IIZyl12

    since

    (A+b,Zy) o. Thus

    where equality holds if and only if Zy o .

    Example 2.3

    If A is the matrix in Example 2.1 and b = [_54J, then

    x =

    . 2 122 is the minimal norm solution of Ax = b wIth II x II = -3-'

    It was noted in Section 2.1 that the least squares solu-tion of an inconsistent system of equations Ax = b when A has full column rank is x = (AHA)-lAHb. From (2.3) we have, therefore, that x = A+b is the least squares solution in this case. Although this result can be established by use of calculus (Exercise 2.7), the following derivation in terms of norms is more direct.

    THEOREM 3: For any system of equations Ax = b where A has full column rank, x = A+b is the unique vector with I Ib-Axl 12 minimal.

    Proof: If A is square or if m > n and Ax = b is consistent, + H -1 H. + then with A (A A) A a left Inverse of A and AA b = b, the

    vector x = A+b is the unique solution with Ilb-Axl1 2 = O. On the other hand, if m > n and Ax = b is inconsistent,

    2 lib-Axil + + 2 II (I-AA )b-A(x-A b) II Ilb-AA+bI1 2 + IIA(x-A+b) 112

    -15-

  • since AH(I-AA+) = O. Hence Ilb-Axl1 2 > Ilb-AA+bI1 2 where equality holds if and only if I IA(x-A+b) I 12 = O. But A with full column rank implies I IAyl 12 > 0 for any vector y r 0, in particular for y = x - A+b .

    Example 2.4

    If

    A

    then

    and

    1 [3 IT -4

    3 7

    -2] -1 '

    AA+b

    is the least squares solution of Ax minimal.

    b with I I b - Ax I I 2 144 11

    Having established A+ for the special cases in Lemma 1, it remains to establish existence for the general case of an arbitrary matrix A. For this purpose we first require a definition.

    DEFINITION 1: Any product EFG with E m by r, F r by rand G r by n is called a full rank factorization if each of the matrices E, F and G has rank r.

    The importance of Definition 1 is that any nonnull matrix can be expressed in terms of full rank factorizations, and that the Moore-Penrose inverse of such a product is the product of the corresponding inverse in reverse order.

    To construct a full rank factorization of a nonnull matrix, let A be any m by n matrix with rank r. Designate columns of A as a l , ... ,an. Then A with rank r implies that there exists at least one set of r columns of A which are

    -16-

  • linearly independent. Let J = {jl, ... ,jr} be any set of indices for which ajl' ... ,ajr are linearly independent, and let E be the m by r matrix

    E = [a. , ... ,a. 1. )1 )r

    If r = n, then A = E is a trivial full rank factorization (with F = G = I). Suppose r < n. Then for every column a). ,jiJ, there is a column vector y., say, such that a. = Ey .. ) )) Now form the r by n matrix, G, with columns gl, ... ,gn as follows: Let

    if jiJ,

    where ei,i=l, ... ,r, denote unit vectors. For this matrix G we then have

    Moreover, since the columns e1, ... ,e r of G form a r by r identity matrix, rank (G) = r, and with rank (E) = r by con-struction, A = EG is a full rank factorization (with F = I).

    That a full rank factorization A = EFG is not unique is apparent by observing that if M and N are any nonsingular matrices, then A = EM(M-lFN)N-lG is also a full rank factori-zation. The following example illustrates four full rank factorizations of a given matrix, A, where F ~ I in each case.

    Example 2.5

    Let

    A

    Then

    A

    o 1 3

    4

    1 -5

    2 2 2

    o 1

    6 1 -1 -15

    2 -1

    1 1

    -17-

    2 -1

    o 3 1 -1

  • [4 6] 1 -1 -5 -15

    [ 4/5 -1/5

    3/5 -2/5

    1 o

    7/5 - 3/5

    1 o

    -3 2

    o 1

    -7] 3 .

    Using full rank factorization, the existence of the Moore-Penrose inverse of any matrix follows at once. The following theorem, stated in the form rediscovered by Penrose [10] but originally established by Moore [8], is fundamental to the theory of generalized inverses of matrices.

    THEOREM 4: For any matrix, A, the four equations

    AXA A, XAX X, (AX)H = AX, (XA)H = XA have a unique solution X A+. matrix, A+ = 0 If A is not

    nm

    If A = 0 mn

    is the m by n null the null matrix, then for any

    full rank factorization EFG of A, A+ = G+F-IE+

    Proof: Uniqueness in every case follows from the remarks after (2.2).

    If A = 0 ,then XAX = X implies X = A+ = 0 If A is = nm

    not the null matrix, then for any full rank factorization A EFG it follows by definition that E has full column rank, F is nonsingular and G has full row rank. Thus E+ = (EHE)-lEH and G+ = GH(GGH)-l, by (2.3), with E+ a left inverse of E and G+ a right inverse of G. Then if X = G+F-IE+, XA = G+G and AX = EE+ are Hermitian, by Lemma 1. Moreover, AXA = A and XAX = X, so that X = A+ .

    It should be noted that although the existence of a full rank factorization A = EG has been established for any non-null matrix A, this does not provide a systematic computational procedure for constructing a factorization. Such a procedure will be developed in Exercise 3.3, however, after we have considered the relationship between A+ and the Moore-Penrose inverse of matrices obtained by permuting rows or columns or both rows and colu=s of A. Observe, moreover, that if Ax = b is any system of equations with A = EG a full rank factoriza-tion, and if y Gx, then y = E+b is the least squares solution to Ey b, by Theorem 3. Now the system of equations

    -18-

  • Gx = E+b is always consistent and has minimal norm solution X = G+E+b, by Theorem 2. Consequently we can combine the results of Theorems 2 and 3 by saying that x = A+b is the least squares solution of Ax = b with I Ixl 12 minimal. Although of mathematical interest (see, for example, Exercises 3.12 and 3.13), most practical applications of least squares require that problems be formulated in such a way that the matrix A has full column rank.

    Exercises 2.8 Show that xl

    2.10 Let u be the column vector with n elements each equal to unity. Show that

    2.11 a.

    [I ,u]+ -- n+T [ (n+l )uIH-uuH].

    Given any real numbers bl, ... ,bn , show that all solutions to the equations

    1, .. ,n,

    can be written as

    X. I

    and

    n b. - _1_ L b. + Cl,

    I n+l i=l I

    1 m = -1 L b. - Cl,

    n+ i=l I

    where Cl is arbitrary.

    1, ... ,n,

    b. For what choice of Cl can we impose the additional condition that

    n L x. 01

    i=l I -19-

  • c. Show that when the condition in 2. lIb is imposed, xn+l becomes simply the mean of bl , ... ,bn , that is,

    1 n = - L b ..

    ni=l I

    d. Show that the problem of solving the equations in 2.11a, subject to the conditions in 2. lIb, can be formulated equivalently as a system of equations Ax = b with the n+l by n+l matrix A Hermitian and nonsingular.

    2.12 Given any real numbers bl , ... ,bn , show that the mean is the least squares solution to the equations

    = 1, ... ,no

    2.13 If Ax = b is any system of equations with A one, show that

    uvH a matrix of rank

    is the least squares solution with minimal norm.

    2.14 Let Ax = b be any consistent system of equations and let zl' ... ,zn-r be any set of vectors which form an orthonormal basis of N(A), where rank (A) = r. Show that if x is any solution of Ax = b,

    n-r A+b = x - L a.z.

    i=l I I

    with a. = (x,z.), I I

    1, ... , n-r.

    2.15 (Continuation): Let A be any m by n matrix with full row rank, and let Z be any n by n-m matrix whose columns form an orthonormal basis of N(A). Prove that if X is any right inverse of A,

    2.16 Use the results of Exercises 2.4 and 2.15 to construct A6+ starting with the right inverse

    X = [2 -3] -1 2 00 o 0

    -20-

  • 2.3 Some Geometric Illustrations

    In this section we illustrate the geometry of Theorems 2 and 3 with some diagrams:

    Consider a single equation in three real variables of the form

    (2.6) H Then it is well known that all vectors x = [xl,x2,x31 which

    satisfy (2.6) is a plane Pl(bl)' as shown in Figure 1. Now the plane Pl(O) is

    ..J'------f------ xl

    Figure I. The plane PI (b l ) and solution x.

    parallel to PI (b l ), and consists of all solutions H

    Z = [zl,z2,z3 1 to the homogeneous equation.

    (2.7)

    Then if b l r 0, all solutions xEPl(b l ) can be written as x = x + z for some zEPI(O), and conversely, as shown in Figure 2. (Clearly, this is the geometric interpretation of (2.1) for a single equation in three unknowns with two vectors

    H required to span PI (0).) If we now let a l = [aU,a12,a131, so that (2.6) can be written as alHx = b l , Theorem 2 implies

    -21-

  • ~+------ Xl

    A H+ that the solution of the form x = a l b l is the point on Pl(bl) with minimal distance from the origin. Also, since the vector ~ is perpendicular to the planes Pl(O),Pl(b l ), II x II is the distance between PI (0) and PI (b l ). The repre-sentation of any solution x as x = alH+b l + alz, corresponding to (2.1) in this case, is illustrated in Figure 3.

    x3

    ~----r------- Xl

    Figure 3. H+ The representation x = a l bl + a1z.

    -22-

  • Suppose next that we consider C2.6) and a second equation C2.8)

    Let the plane of solutions of C2.8) be designated as P2 Cb Z). Then it follows that either the planes PICbl) and PZCb 2) coin-cide, or they are parallel and distinct, or they intersect in a straight line. In the first case, when PICbl) and P2 Cb Z) coincide, the equation in C2.8) is a multiple of (Z.6) and any point satisfying one equation also satisfies the other. On the other hand, when PICb l ) and Pz(b Z) are parallel and distinct, there is no exact solution. Finally, when PICb l ) and P2 Cb Z) intersect in a straight line IZ' say, that is, IZ = PI (b l ) n Pz (b z), then any point on IZ satisfies both (2.6) and (2.8). Observe, moreover, that with

    the point on 112 with minimal distance from the origin is x = A+b. This last case is illustrated in Figure 4, where IZ is a "translation" of the subspace N(A) of the form Pl(O)n Pz(O).

    -23-

  • The extension to three or more equations is now obvious: Given a third equation

    (2.9)

    let P3 (b 3) be the associated plane of solutions. Then assum-ing the planes PI (b l ) and P2 (b 2) do not coincide or are not parallel and distinct, that is, they intersect in the line 12 as shown in Figure 4, the existence of a vector xH = (x l ,x 2 ,x 3) satisfying (2.6), (2.8) and (2.9) is determined by the condi-tions that either P3 (b 3) contains the line 12' or P3 (b 3) and 12 are parallel and distinct, or P3 (b 3) and 12 intersect in a single point. (The reader is urged to construct figures to illustrate these cases. An illustration of three different planes containing the same line may also be found in Figure S.) For m ~ 4 equations, similar considerations as to the intersections of planes Pk (b k) and lines .. = p. (b.) n P. (b.) IJ 1 1 J J again hold, but diagrams become exceedingly difficult to visualize.

    For any system of equations Ax = b let y = AA+b, that is, y is the perpendicular projection of b onto CS(A), the column space of A. Then it follows from (2.4) that Ax = y is always a consistent system of equations, and from Theorem Z that A+y = A+(AA+)b = A+b is the minimal norm solution. More-over, we have from Theorem 3 that if Ax = b is inconsistent, then

    is minimal. Thus, the minimal norm solution A+y of Ax y also minimizes Ily-b 112.

    Consider an inconsistent system of, say, three equations in two unknowns, Ax = b, and suppose rank (A) = 2. Let

    + H H H . Y = AA b have components YI'Y2'Y3' and let a l ,a 2 ,a 3 desIg-nate rows of A. Now if Pi(Yi) is the plane of all solutions of aiHx = Yi' i = 1,2,3, then the set of all solutions of Ax = y is the line 12 = PI (YI) n P2 (Y2) as shown in Figure Sa, A+y = A+b is the point on 12 of minimal norm and Ily-bl1 2 is minimal, as shown in Figure Sb.

    -24-

  • b

    ~~------------_ bl

    (a) (b)

    Figure 5. (a) Solutions of Ax = y where y = AA+b.

    (b) The vectors b, y and (I-AA+)b.

    To concludc this section we remark that since

    is an orthogonal decomposition of any vector b with

    then the ratio

    (2.10) > 0

    provides a measure of inconsistency of the system Ax = b. In particular, = 0 implies b is orthogonal to CS(A), whereas large values of imply that b is nearly contained in CSeA), that is, II (I-AA+)bI1 2 is relatively small. (For statistical applications [1) [4) [5), the values IlbI1 2 ,IIAA+bI1 2 and

    + 2 II (I-AA )b II are frequently referred to as TSS [Total sum of squares], SSR [Sum of squares due to regression) and SSE [Sum of squares due to error), respectively. Under certain general assumptions, particular mUltiplcs of can be shown to have distributions which can be used in tests of significance.)

    -25-

  • Although the statistical theory of linear regression models is not germane to the present considerations, formation of in (2.10) can provide insight into the inconsistency of a system of equations Ax = b. (See Exercise 2.18.)

    Exerc ises 2.17 Use the techniques of sol id analytic geometry to prove that the

    1 ines 12 = PI (bl)n P2(b 2), bl F 0 and b2 F 0 and Il2 = PI (o)n PZ(O) are parallel. In addition, show by similar methods that if

    H H Pi(b i ) = {xi a i x = b i , a i = (ail,aiZ,ai3)}'

    then

    z.18 Given any points (xi'Yi)' i = O,l, ... ,n, in the (x,y) plane with xO, ... ,xn distinct, it is well known that there is a unique interpolating polynomial Pn(x) of degree 2 n (that is, Pn(x i ) = Yi fo raIl i = 0, ... , n), and i f

    then aO, ... ,an can be determined by solving the system of equa-tions Aa = y where

    A

    x n

    x n

    n

    , a

    a n

    , y

    Now any matrix, A, with this form is called a Vandermonde matrix, and it can be shown that

    det(A) = IT (x.-x.). i

  • Thus, with xO, ... ,xn distinct, A is nonsingular, and if Ak denotes the submatrix consisting of the first k columns of A, k = 1,2, ... ,n, then Ak has full column rank for every k.

    For k 2 n, the least squares polynomial approximation of degree k to the points (x. y.), i = 0, ... ,n, is defined to be that

    I, I polynomial

    which minimizes

    n 2 L [yo - Pk(x.)] i=O I I

    a. Show that the coefficients a O' ... ,ak of the least squares polynomial approximation of degree k are elements of the vector a, where

    + a = Ak+ 1 y.

    b. Show that with TSS n 2 L y. then SSR

    i=O I

    c. Given the data

    -I o 2

    -5 -4 -3 10

    construct the best I inear, quadratic and cubic least squares approximations. For each case determine SSR and SSE. What conclusions can you draw from the data available?

    2.4 Miscellaneous Exercises

    2.19 Let A, ZI and Z2 be any matrices.

    a. Prove that a solution, X, to the equations XAX X, AX and XA = Z2' if it exists, is unique.

    b. For what choices of ZI and Z2 is X a general ized inverse

    2.20 Verify the following steps in the original Penrose proof of the existence of X in (2.2):

    -27-

    ZI

    of A?

  • a.

    b.

    c.

    The equations of XAX = X and the single equations XXHAH (XA)H = XA are equivalent to

    (AX)H = AX are equivalent to X. Dually, AXA = A and the single equation XAAH AH.

    If there exists a matrix B satisfying BAHAAH X = BAH is a solution of the equations XXHAH

    AH, then X and XAAH

    The matrices AHA, (AHA)2, (AHA)3, ... , are not all linearly independent, so that there exists scalars d l , ... ,dk not all zero, for which

    H H 2 H dlA A + d2 (A A) + ... + dk(A A) = O. 2 (Note that if A has n columns, k ~ n +1. Why?)

    d. Let ds be the first nonzero scalar in the matrix polynomial in 2.20c, and let

    Then B(AHA)s+1 = (AHA)s.

    e. The matrix B also satisfies BAHAAH = AH.

    2.21 Let A and X be any matrices such that AXA = A. Show that if Ax b is a consistent system of equations, then all solutions can be written as

    x = Xb + (I-XA)y where y is arbitrary. (Note, in particular, that this expression is equivalent to the form for x in (2.5) since columns of I-XA form a basis for N(A). Why?)

    2.22 (Continuation): Prove, more generally, that AWC = B is a consistent system of equations if and only if AA+BC+C = B, in which case all solutions can be written as

    where Y is arbitrary.

    -28-

  • 3 More on

    Moore -Penrose Inverses

    3.1 Basic Properties of A+

    The various properties of A+ discussed in this section are fundamental to the theory of Moore-Penrose inverses. In many cases, proofs simply require verification that the defining equations in (2.2) are satisfied for A and some particular matrix X. Having illustrated this proof technique in a number of cases, we will leave the remaining similar arguments as exercises.

    LEMMA 5: Let A be any m by n matrix. Then

    (a) A m by n implies A+ n by m; (b) A

    mn implies A+ 0 nm'

    (c) A++ A' , (d) AH+ A+H. , (e) A+ = (AHA)+AH AH(AAH)+; (f) (AHA)+ = A+AH+;

    -29-

  • (g) (aA)+

    (h) IfU

    (i) If A

    = a+A+ for any scalar a, where

    + = f/a, if a 'f 0, a

    0, if a = o ,

    and V are unitary matrices, (UAV) + vHA+UH; n H L A. where A. A.

    i=l 1 1 J n +

    o whenever i 'f j, A+ = L A. ; i=l 1

    (j) If A is normal, A+A AA+; (k) A, A+, A+A and AA+ all have rank equal to trace (A+A).

    Proof: Properties (a) and (b) have been noted previously in Section 1.3 and Theorem 4, respectively. The relations in (c) and (d) follow by observing that there is complete duality in the roles of A and X in the defining equations.

    To establish the first expression for A+ in (e), let X = (AHA)+AH. Then XA = (AHA)+AHA is Hermitian, and also

    H + H AX = A(A A) A by use of (d). Moreover, XAX X and AXA = A(AHA)+AHA = AH+AHA(AHA)+AHA AH+AHA = A.

    The second expression in (e) follows by a similar type of argument, as do the expressions in (g) and (h).

    H+ H + To prove (f) we have A = A(A A) by (d) and (e). Then A+AH+ = (AHA)+AHA(AHA)+ = (AHA)+.

    To prove (i), observe first that AiHAj = 0 implies A.+A. = A.+A.+HA.HA. 0

    1 J 1 1 1 J

    and also A.+A. = 0 since J 1 A.HA. = O.

    J 1

    Now we can again show that A and A+ satisfy the defining equation.

    That (j) holds follows by use of (e) to write A+A = (AHA)+AHA = (AAH)+AAH = AH+AH = (AA+)H AA+.

    -30-

  • To show that A, A+, A+A and AA+ all have the same rank, we can apply the fact that the rank of a product of matrices never exceeds the rank of any factor to the equations AA+A = A and A+AA+ = A+. Then rank (A) = trace (A+A) holds since rank (E) = trace (E) for any idempotent matrix E [7) .

    Observe in Lemma See) that these expressions for A+ reduce to the expressions in (2.3) whenever A has full row rank or full column rank. Moreover, observe that the rela-tionship (EG)+ = G+E+ which holds for full rank factorizations EG, by Theorem 4, also holds for AHA where A is any matrix, by Lemma S(f). The following example shows, however, that the relation (BA)+ = A+B+ need not hold for arbitrary matrices A and B.

    Example 3.1

    Let

    A

    Then

    [~ 1 -1]. 1 -1 BA [~ ~J = (BA) + ,

    since BA is Hermitian and idempotent. Also,

    A+ = (AHA)-lAH = 1/2 [ 2 -!] [~ 1 ~] = -2 1

    and

    ll'[: :] [ 2 - 2] B+ BH(BBH)-l -2 3 -1 -1

    so that

    A +B + = r 1 -1] r -1 1

    (BA)+.

    we have

    1/2 [ 2 0 ~] - 2 1 [2 -'] 1/2 o 1 , o -1

    Let A be any mbyn matrix with columns al, ... ,an , and let Q designate the permutation matrix obtained by permuting columns of In in any order {jl'" .,jn}' Then

    -31-

  • AQ [a" , ... , a" 1. J l I n

    " " 1."f H H d" h fA In a s1.m1.1ar manner, wI" .. wm eS1.gnate t e rows 0 , and if P is the permutation matrix obtained by permuting rows of 1m in any order {i l , .. ,im}, then

    H

    PA

    Combining these observations it follows, therefore, that if A is any m by n matrix formed by permuting rows of A or col-umns of A or both rows and also columns of A in any manner, then A = PAQ for some permutation matrices P and Q. Moreover, since P and Q are unitary matrices,

    by Lemma Seh), and thus

    In other woyds, A+ can be obtained by permuting rows and/or columns of A+. Example 3.2

    Construct B+ if

    B = G 1 o Since B in this case can be written as

    B PAQ = [~ ~] G o 1 where A is the matrix in Example 2.1, then with P and Q Hermitian

    -32-

  • o 1 o

    ~] = i[ ~ -~]. -1 2

    (It should be noted that B can be written alternately as

    :1[: : :] B [~ o 1 and so we have also B+ = QlHA+.)

    Further applications of full rank factorizations and of permuted matrices PAQ in the computation of A+ will be illustrated in the exercises at the end of this section. We turn now to a somewhat different method for computing A+. This procedure essentially provides a method for constructing the Moore-Penrose inverse of any matrix with k columns, given that the Moore-Penrose inverse of the submatrix consisting of the first k-l columns is known.

    For any k ~ 2, let Ak denote the matrix with k columns, a l , ... ,ak . Then Ak can be written in partitioned form as Ak = [Ak_l,akl. Assuming Ak - l + is known, Ak can be formed using the formulas in Theorem 6.

    THEOREM 6 : For any matrix Ak = [Ak_l,ak 1 , let +

    c = (I - Ak-lAk - l )ak k and let

    H H+ + Yk a k Ak - l Ak - l a k

    Then

    (3.1) A + k [Ak-1 -:kk ,. akbk]

    where

    b k r;' if ck f 0,

    -1 H H+ + (l+Yk) a k Ak _l Ak - l ' if c k O.

    -33-

  • Proof: Since c k is a column vector, the two cases c k r 0 and c k 0 are exhaustive.

    Let X designate the right-hand side of (3.1). Then to establish the representation for Ak+ requires only that we show the defining equations in (2.2) are satisfied by Ak and X for the two forms of b k .

    Forming AkX and XAk gives

    (3. 2)

    by definition of c k ' and

    (3.3)

    Continuing, using (3.2) gives

    Ak_l+ak(I-bkak)]

    bkak

    (3.4) [Ak - l + ckbkAk _l , Ak_lAk_l+ak + ckbkak]

    and

    (30 5)

    since

    Suppose now that c k r 0 and b k +

    c k . Then

    + in (3.2) is Hermitian. Also, with c k c k = 1, and since A+ 0 0 1 0 HA H+ 0 h +A 0 d k-lck = lmp les c k k-l = so t at c k k-l = an

    + thus c k a k = 1, then

    XA k :] in (3.3) is Hermitian. Moreover,

    -34-

  • in (3.4), and

    x

    in (3.5). Having shown that the defining equations hold, then X = Ak+ in (3.1) when c k f O.

    -1 H H+ + Suppose c k = 0 and b k = (l+Yk) a k Ak - l Ak - l Then

    in (3.2) is Hermitian. In this case, with

    a nonnegative real number and

    we have also

    + in (3.3) Hermitian. Furthermore, with bkAk-1Ak-l +

    since c k = 0 implies Ak-1Ak - l a k = a k ,

    in (3.4) and XAkX = X in (3.5). Thus, when c k = 0 it has been shown again that Ak and X satisfy the defining equations for the given form for b k .

    That the formulas in Theorem 6 can be used not only directly to construct Ak+' assuming Ak - l + is known, but also recursively to form A+ for any matrix A is easily seen: Let A be any matrix with n columns a l , ... ,an , and for k = 1, ... ,n,

    -35-

  • let Ak designate the submatrix consisting of the first k columns of A. Now A1+ = a 1+ follows directly from Lemma 5(b) or (e), and if n > 2, A2+" .. ,A + = A+ can be formed

    - n sequentially using Theorem 6.

    Example 3.3

    Let

    A [~~ -~l -1 1 -3

    Then with Al = aI'

    and

    Hence

    and so

    + Al 1/5[2 0 -1],

    1/29[3 10 6]

    [1/5 [2 o -1] - 1/145 [3 1/29[3 10 6]

    -10

    10

    50 - 35]

    30 [11

    = 1/29 3 [55

    1/145 15

    Continuing,

    and

    1/29 [ 29] -58

    Thus, with Y3 = 5 and

    o.

    -36-

    -2

    10

  • H H+ + a 3 A2 A 2

    + H + (A 2 a 3 ) A2 1/29[5 -22 -19] ,

    we have

    b 3 1/174[5 -22 -19]

    so that

    [11 -2 -:] 1/174[ 5 -22 -19] 1/29 3

    + 10 -10 44 38 A3

    1/174[5 -22 -19]

    [61 10 -23] 1/174 2: 16 - 2 .

    -22 -19

    As will be indicated in Exercise 3.7, there is a converse + .

    of Theorem 6 which can be used to construct Ak _l ' glven + [Ak_l,a k ] Combining Theorem 6 and its converse thus provides

    a technique for constructing the Moore-Penrose inverse of a matrix, A, say, starting from any matrix, A, of the same size with A+ known. (For practical purposes, however, A and A should differ in a small number of columns.

    Exercises 3. I Let A be any

    H H wI , . ,wn I , ... ,n such

    matrix with columns a l , ... ,an , and let Prove that if K denotes any subset of

    that a i = 0, then wi H = 0 for all iEK.

    A+ have rows the indices

    3.2 Let A be any mby n matrix with rank r, 0 < r < min(m,n). a. Prove that there exist permutation matrices, P and Q, such

    that A = PAQ has the partitioned form

    b.

    c.

    with W r by rand nonsingular.

    Show that Z = YW-IX.

    Construct A+.

    -37-

  • 3.3 (Continuation): A matrix [U,Vl is called upper trapezoidal if U is upper triangular and nonsingular; a matrix, B, is called lower trapezoidal if BH is upper trapezoidal.

    a. Show that any matrix, A, in Exercise 3.2 has a full rank factorization A = EG with E lower trapezoidal and G upper trapezoidal. (Such a factori zation A = EG is called a trapezoidal decomposition of A. )

    b. Construct a trapezoidal decomposition of some matrix, A, obtained from

    2

    4 6

    -1

    o

    Hint: Start with

    [; 0

    :] [: A ... ,. *

    where the asterisk

    2 -1

    :] * * 0 * "

    denotes elements yet to be determined, and proceed to construct a P and Q, if necessary, so that the elements e22 and g22 of PE and GQ, respectively, are both nonzero. (Note that if this is not possible, the factorization is complete.) Now compute the remaining ele-ments in the second column of PE and the second row of GQ, and continue.

    c. Compute A+.

    Let A [B,Rl be any m by n upper trapezoidal matrix wi th n ~ m + 2 and let zl be any nonnull vector in N(A). Show that Gaussian el imination, together with a permutation matrix, Q, can be used to reduce the matrix

    to an upper trapezoidal matrix S, say. Prove now that if z2 is H H

    any vector such that SZ2 = 0, then Q z2~N(A) and (zl ,Q z2) = o.

    -38-

  • 3.5 Apply the procedure in Exercise 3.4 to construct an orthonormal bas is for N(A) in Exercise 3.3.

    3.6

    3.7

    + Show that the two forms for [Ak_I,ak) in Theorem 6 can be written in a single expression as

    + [Ak-I-Adkk-H/akdkHj [Ak_I,ak) =

    where

    d H k

    Let + partitioned [Ak_I,ak) be as

    ['H] [Ak_I,ak)+ = dkH wi th H dk a row vector

    b. Construct A+ if

    A [:

    0 -I

    :]. 0 0 I

    Hint: See Exercise 3.3c.

    '1 f d H ..J. I k ak .,. ,

    1.

    3.8 For any product AB let BI = A+AB and Al = ABIBI+' Then (AB)+ = (A B )+ = B +A + Why does this expression reduce to I I I I Theorem 4 when AB is a full rank factorization?

    *3.9 a. Use Exercise 1.3 to prove that if A is any normal matrix, + I I H

    A = L.: r- xXi 1

    -39-

  • b.

    where Ll indicates that the sum is taken over indices with eigenvalues Ai I O.

    Prove that if A is normal. (An)+ c. If \ = 2i-2. i = 1.2.3. and

    Xl =...!..[ ~l x2 = 1/3[~1 x3 = _1 [-~l /2 -1 2 312 1 construct the Moore-Penrose inverse of the matrix. A. for which AX i = Aix i i = 1.2.3.

    3.2 Applications with Matrices of Special Structure

    For many applications of mathematics it is required to solve systems of equations Ax = b in which A or b or both A and b have some special structure resulting from the physical considerations of the particular problem. In some cases this special structure is such that we can obtain information con-cerning the set of all solutions. For example, the explicit form for all solutions of the equations

    xi + xn+l = b i , i = l, ... ,n,

    given in Exercise 2.11, was obtained using the Moore-Penrose inverse of the matrix [I,u) from Exercise 2.10 where u is the n-tuple with each element equal to unity. In this section we introduce the concept of the Kronecker product of matrices which can be used to characterize all solutions of certain classes of problems that occur in the design of experiments and in linear programming.

    DEFINITION 2: For any mbyn matrix, P, and sbyt matrix. Q = (qkt)' the Kronecker product of P and Q is the ms by nt matrix, P X Q, of the form

    -40-

  • It should be noted in Definition 2 that if P = (p .. ) 1J and Q = (qk~)' then P X Q is obtained by replacing each ele-ment qk~ by the matrix qk~P, whereas Q X P is obtained by replacing each element Pij by the matrix PijQ. Consequently P X Q and Q X P differ only in the order in which rows and columns appear, and there exist permutation matrices Rand S, say, such that QXP = R[PXQ]S. (We remark also that some authors, for example, Thrall and Tornheim [13], define the Kronecker product of P and Q alternately as P X Q = (p .. Q) , 1J that is, our Q X P. In view of the discussion in Section 3.1 of the Moore-Penrose inverses of matrices A and A, where A is obtained by permuting rows of A, columns of A or both, each of the following results obtained using the form for P X Q in Definition 2 has a corresponding dual if the alternate definition is employed.) Examj2le 3.4

    If

    [: 2' [~ 4 -:] , P oJ, Q i

    then

    [~ 0 4 8 -1 -lJ pXQ 0 12 0 -3 4 i 2i 3 0 3i 0 9 and

    Q xP [~ 4 -1 0 8 -;1 i 3 4 2i 12 -3 0 0 3i 9 0 0

    Given any Kronecker product P X Q, it follows from Defini tion 2 that

    (3.6) [P X Q]H - H (qnP ) = pH X QH.

    -41-

  • Also, for any matrices Rand S = (skt) with the products PR and QS defined, the product [P X Q1 [R X Sl is defined, and we have by use of block multiplication that

    [PXQ1[RXS1

    Therefore,

    n L q oS olPR j=l mJ J

    (3. 7) [P X Q1 [R X Sl = PR X QS.

    n L q oSotPR j=l mJ J

    The following lemma can be established simply by combining the relationships in (3.6) and (3.7) to show that the defining equations in (2.2) are satisfied.

    LEMMA 7: For any matrices P and Q, [p X Q1 +

    ExamEle 3.5

    To construct A+ if

    A [1 0 1 4 0

    -;] 3 -1 4 6 0 3 8 0 4 ' 9 -3 8 12 -4

    observe that A = P X Q, where

    P = e 0 -~J, Q = [~ ~] . "3 -42-

  • Then we have

    -l/Z [ 4 -~] , [22 1:] Q+ Q-l p+ ir -9 -3 17 -8

    so that

    88 16 -44 -8 -36 60 18 -30

    + p+ X Q+ 1 68 -3Z -34 16 A lZ2 -66 -12 ZZ 4

    27 -45 -9 15 51 24 17 -8

    ExamEle 3.6

    To construct A+ if

    1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0

    A 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1

    observe first that if u. denotes the vector with all i 1

    elements each equal to unity, then A can be written in partitioned form as

    ["3 13 u 3 0 13 II (3.8) A = u 3 13 0 u 0 3 Whereupon, the first two columns of A in (3.8) can be written as the Kronecker product [u3 ,1 3l XU Z' Next observe that permuting columns of A to form

    ["3 13 u 3 13 0 I'J A u 3 13 0 0 u 3 the last four columns of A become [u 3 , 13 l X I 2 Therefore

    -43-

  • A can be wyitten as

    and thus

    (3.9)

    Permuting rows of [I,u)+ in Exercise 2.10 now gives

    + [u i ' Ii)

    so that (3.9) becomes

    (3.10) A

    1 i+l

    Substituting numerical values into (3.10), a suitable per-mutation of rows of A+ yields A+

    Matrices, A, as in Example 3.6, with elements zero or one occur frequently in the statistical design of experiments, and the technique of introducing Kronecker products can often be used to construct A+, and thus all solutions of systems of equations Ax ; b by use of Exercise 2.21. The additional restrictions on Ax ; b to obtain solutions with particular properties can then be formulated in terms of conditions on N(A) or, equivalently, I-A+A. (See Exercise 3.11.)

    That Kronecker products can be combined with forms for Moore-Penrose inverses of partitioned matrices to construct A+ for other classes of structured matrices is shown by the representation in Theorem 8.

    THEOREM 8: Let W be any mbyn matrix, and for any positive integer p let G ; (pI +WIlW)-l. Then p n

    -44-

  • (3.11) + + II [G Xu, w X I - G W Xu u J. p p p p p p

    Proof: Observe first that with

    then

    (3.12)

    Also, observe that with (pI +WHW)W+ = pW+ + WH, the relation n

    (3.13)

    together with the fact that G is Hermitian, implies p

    II'G IV P

    +

    is Hermitian.

    Let

    A rI XU H1 l :XIP , p j

    and let + + H X= [G XU ,W XI -GW Xuu J. p p p p p p

    Then it follows from (3.12) and (3.7) that H + + H XA=G Xuu +WWXI -GWWXuu P P P P P P P

    G (I - W+W) Xu u H + W+W X I P n P p P 1 + H + -(1 -W W) Xu u + W W X I P n P p P

    is Hermitian. Also, with upHup p,

    -45-

  • AXA 1 + H + -W (I - W W) Xu u + WW W X Ip P n P p

    Continuing, we have

    I Xu H P P

    A.

    XA(G Xu) p p 1 + + -(I -W W)G Xpu +W WG XU = GpXup ' p n p p p P

    and

    XA(W+XI) p leI -w+W)w+Xu u H+W+WW+XI = W+XI P n P p P p'

    XA (G W+ Xu u H) P P P

    1 + + H + + H --0 -W W) G W X pu u +W WG W Xu u p n p p p p p p

    lienee, XAX = X. Finally, forming AX gives

    AX l pG WG : u p p

    W+ X u H - G W+ X u H I p p p P WW+ X I - WG W+ Xu u H

    P P P P -

    Now

    W+ Xu H - G W+ X pu H = G WH Xu H P P P P p'

    by (3.13), which, with Gp and WGpW+ Hermitian, implies CAX)H = AX.

    Having shown that A and X satisfy the equations in (2.2), then X = A+ which establishes (3.11) .

    ExamEle 3.7

    If p = 3, then for any m by n matrix, W,

    lIn xu 3Hj li I I n n 0 0 W X I3 W 0 0 W -46-

  • and

    G3 W+-G W+ + ~G3W'1 ['n X U3T 3 -1,3W -G W+ W+-G W+ G3 3 3 -G 3W+ I'

    W X 13 W+-G 3W+J

    - - G3 -G 11'+ -G 11'+ 3 3

    where G3 = (3I+WHW)-1. Suppose now that we let T = T(p,W) denote the matrix In

    Theorem 8 which is completely determined by p and the sub-matrix 11', that is,

    [-I Xu Hj T = T(p,W) = n p . WXI P

    Now given a system of equations Tx = b with II' m by n of rank r, 0 < r ~ n, and p any positive integer it follows that if we partition x and b as

    x = , b

    with x(l~ ... ,x(p) and b(O)n-tuples and b(l), ... ,b(P) m-tuples, then x is a solution if and only if

    (3.14) Ix (j) b(O) j=l

    and

    (3.15) Wx(j) = b(j) j = 1, ... ,p. ,

    In other words, each x(j) must be a solution of m equations in n unknowns, subject to the condition that the sum of the solutions is equal to b(O). These characterizations are

    -47-

  • further explored for the general case of an arbitrary matrix, W, in Exercises 3.16 and 3.17 and for an important special case in Exercises 3.18 and 3.19.

    Exercises 3.10 Complete the numerical construction of A+ in Example 3.6 and

    verify that A and A+ satisfy the defining equations in (2.2).

    3.11 Matrices of the form [up,I p] and, more generally, Kronecker products such as A in Example 3.6 in which at least one of the matrices has this form occur frequently in statistical design of experiments [1] [4]. For example, suppose it is requi red to examine the effect of p different fertil izers on soy bean yield. One approach to this problem is to divide a field into pq subsec-tions (called plots), randomly assign each of the p type of fertil izers to q plots, and measure the yield from each. Neglect-ing other factors which may effect yield, a model for this experiment has the form

    (3.16) m + t. + e .. I I J

    where y .. is the yield of the jth plot to which ferti lizer i has IJ

    been appl ied, m is an estimate of an overall "main" effect, ti is an estimate of the effect of the particular fertil izer treatment and e ij is the experimental error associated with the particular plot. The question now is to determine m and tl , ... ,tp to mini-mize the sum of squares of experimental error, that is,

    p q 2 I I e .. i =1 j=l I J

    a. If y and e denote the vectors H

    y = (y 1 1 ' ... ,y p 1 ,y 1 2 ' ... ,y p2 ' ... ,y I q , ... ,y pq)

    and

    show that data for the model in (3.16) can be represented as

    (3.1 n y = Ax + e,

    where x H (m,t l , ... ,tp) and A -48-

  • b. Show that rIA} = p, and construct the minimal norm solution x = A+y, to (3. 17).

    c. For statistical appl ications it is also assumed that

    p It.o.

    i=1 I

    Starting wi th x from 3.llb above, construct that solution X, say, for which this additional condition holds.

    3.12 (Continuation): Given the experimental situation described in Exercise 3. II, it is sometimes assumed that there is another effect, cal led a block effect, present. In this case one of two models is assumed: First, if there is no interaction between the treatment and block effect, then

    (3.18)

    whereas if there is an assumed interaction between the treatment and block effect, then

    (3.19) Y'lj' = m + t. + b. + (tb) .. + e .. , I j I j I j

    where Yij' m and ti have the same meaning as in (3.16), b.,j=I, ... ,q, designate block effects, and (tb} .. ,i=I, ... ,p and j I j j=1 , ... ,q, designate the effect of the interaction between treat-ment i and block j. a. Using the notation for Y and e from Exercise 3. I I, show that

    the data for the model in (3.18) can be represented as

    Y = Alx + e,

    where now

    A = I

    and

    b. Show that the data for the model in (3.19) can be represented as

    where H

    x = (m, t I ' ... , t P , b I ' ... ,b q , ( t b) II ' ... , ( t b) I q , ... , ( t b) pi' ... , ( t b) pq)

    - 4 9-

  • + c. Use the procedure of Example 3.6 to construct A2 and thus

    the solution x = A/Y. (For statistical appl ications the model in 0.19) is not meaningful unless there is more than one observation for each pair of indices i and j, that is, a model of the form

    where k = 1, ... ,r. In this case the unique solution is obtained by assuming

    p q L t. = L b.

    i=l I j=l J p

    o and also L(tb) .. i=l I J

    q L (tb) .. j=l I J o

    for all i and j. Note, in addi tion, that the construction of Al+ in 3 12a above is somewhat more complicated, but can be formed using related techniques. The particular solution used for statistical appl ications in this case assumes that

    p L t.

    i=l I

    q L b.

    j=l J O. )

    *3.13 Show that if P and Q are any square matrices with x an eigenvector of P corresponding to eigenvalue A and y an eigenvector of Q cor-responding to eigenvalue fl, then xXy is an eigenvector of pXQ corresponding to eigenvalue Afl.

    3.14 a. Show that for any p and w, !-T+r = (I -W+W) X (I -u U +). n p p p '~b. Construct a complete orthonormal set of eigenvectors for I-n.

    3.15 Prove directly that for any mbyn matrix W of rank r and any p, rank (T) = n + r(p-l).

    3.16 For any system of equations Tx = b with x and b partitioned to give (3.14) and (3.15), let X and B denote the matrices

    X = [x(l), ... ,x(p)l, B = [btl) , ... ,b(P)j.

    a. Prove that Tx = b if and only if there exists a matrix X such that

    -50-

  • (3.20)

    and

    (3.21 )

    Xu p

    WX = B.

    b. Prove that a necessary condition for a solution, X, to (3.20) and (3.21) to exist is WW+B = Band \vb(o) = Bu .

    p

    (Continuation): For any eigenvector z. Xy. of I -T+T, where I J np

    YJ' has components Y'I'"'Y' , let Z .. denote the matrix J J p I J

    Z .. = [y.lz., ... ,y. z.J. I J J I J P I

    1 if and only if

    (3.22) Z .. U I J P

    o

    and

    (3.23) WZ = O. IJ

    3.18 The transportation problem in linear programming is an example of a problem in which it is required to solve a system of equations Tx = b. This famous problem can be stated as follows: Consider a company with n plants which produce a l , ... ,an units, respectively, of a given product in some time period. This company has p distributors which require bl , ... ,b p units, respectively, of the product in the same time period, where

    n

    L a. i=l I

    p L boO

    j=l J If there is a unit cost c .. for shipping from plant i to distributor

    I J j, i=l , ... ,n and j=l , ... ,p, then how should the shipments be allo-cated in order to minimize total transportation cost? This problem can be illustrated in a schematic form (called a tableau) as shown in Figure 6 where 0 1 , ... ,On designate origins of shipment (plants), Dl , ... ,Dp designate destinations (distributors) and for each i and j, x .. denotes the number of units to be shipped from O. to D ..

    I J I J

    The problem now is to determine the x .. , i=l, ... ,n and j=l, ... ,p I J

    to minimize the total shipping cost.

    -51-

  • 1 0. J P

    1 ~ bJ ~ xll x lj x lp

    0. ~ ~ ~ I x .. X.

    xi 1 I J IP

    ~ ~ ~

    n x

    nl x nj x np

    bl b. b J p

    Figure 6. The transportation problem tableau.

    n p (3.24) L L c . . x . . ,

    i=l j=l IJ IJ

    subject to the conditions that

    (3.25 )

    and

    (3.26)

    p L x .. = a j , j=l I J

    n

    L x .. i =1 I J

    b., j J

    1, .. . ,n,

    1, . ,p.

    a l

    a. I

    a n

    La. = Lb. I j J I

    Also, we must have x .. > 0, for all i and j, and, assuming frac-IJ -

    tional units cannot be manufactured or shipped, all ai' bj and x ij

    -52-

  • integers. (This last requirement that the x .. are integers IJ

    fol lows automatically when the a. and b. are [6].) I J

    a.

    b.

    *c.

    Show that if the x .. in Figure 6 are elements of an n by p matrix X, and if t6)- H then the condi tions in b - [al, ... ,a ] , (3.25) can be written as Xu = b(o)and the cond i t ions in (3.26) p become

    Therefore, (3.25) and (3.26) together imply that any set of numbers x .. which satisfy the row and column requirements of

    I J H the tableau is a solution of Tx = b where T = T(p,u ) and

    H n b = [al, ... ,an,bl, ... ,b p]

    H Prove that if T T(p,u n ), then

    [[r - _1 u u HJ Xu + U + X [1 - _1 u u HJ] n n+p n n p , n p n+p p p . Moreover, show that if x ij is the element in row j of the tableau form of x = T+b, then

    1 x .. = -a.

    I J P I

    n L a.

    np i=l I

    for i = l, ... ,n and j=l, ... ,p.

    and column

    Show that rank (T) = n + p - 1 when W = u H and thus rank n '

    (1 - T+r) = (n-l) (p-l). Also, construct a complete otrho-np +

    normal set of eigenvectors of r - T 1. and show that z. Xy. np I J

    is an eigenvector corresponding to eigenvalue A.y. = 1 if and I J

    only if all row sums and column sums in the tableau form are zero.

    d. The vector g = (1 - r+T)c is called the gradient of the np

    inner product

    n p (c,x) L L c . . x ..

    i=l j=l IJ I J

    in (3.24) . Show that the elements, gij in the tableau form for g can be written as

    -53-

  • n p n p gij = c ij - n .1=L1C ij - P j.=L1C ij + np L Llc ij i=l j=

    for i = 1, ... ,n and j=l, ... ,p. 3.19 (Continuation): The transportation problem has been generalized

    in a number of different ways, and one of these extensions follows directly using matrices of the form T = T{p,W). Suppose that we are given q transportation problems, each with n origins and p destinations, and let aik,bjk,Cijk and x ijk be the row sums, col-umn sums, costs and variables, respectively, associated with the kth tableau, k=l, ... ,q. A "three-dimensional" transportation problem is now obtained by adding the conditions that

    (3.27)

    for i=l, ... ,n and j=l, ... ,p, where dll, ... ,d np are given positive integers. (The choice of nomenclature "three-dimensional is apparent by noting that if the tableaus are stacked to form a parallelopiped with q layers each with np cells, then (3.27) simply implies np conditions that must be satisfied when the x ijk are summed in the vertical direction as shown in Figure 7, where only the row, column and vertical sum requirements are indicated.) a. Show that the conditions

    b.

    n p L a k i=l I L b k , j=l j k=l, ... ,q,

    Jla ik

    p L d .. , i=l, ... ,n, j=l Ij

    q n L b k k=l j L d .. , i=l Ij j=l , ... ,p,

    are necessary in order for a three-dimensional transportation problems to have a solution.

    Show that the conditions which the x ijk must satisfy if there is a solution can be written as Tx = b where T = T{q,W) with

    -54-

  • Figure 7. The parallelopiped requirements for the tableau of a three-dimensional transportation problem.

    W ; T ; T(p,u H) the matrix for the "two-dimensional trans-np n

    portation problem in Exercise 3.18 and a suitable vector b.

    c. Show that

    [1-[1 - _I u u H J XI] qp p+qpp n

    and that G T +; [u,Vl where q np

    u; I (I -n {n+q} p

    -55-

  • and

    v = u X 1 [1 - n + 2p + q u u HJ p n(p+q) n (n+p) (n+p+q) n n .

    3.3 Miscellaneous Exercises

    3.20 Prove that a necessary and sufficient condition that the equations AX = C, XS D have a common solution is that each equation has a solution and that AD = CS, in which case X = A+C + DS+ - A+ADB+ is a particular solution.

    3.21 Prove Lemma 7.

    3.22 Prove that

    for any matrix B, and that

    -56-

  • 4 Drazin Inverses

    4.1 The Drazin Inverse of a Square Matrix

    In this section we consider another type of generalized inverse for square complex matrices. The inverse in Theorem 9, due to Drazin [3], has a variety of applications.

    THEOREM 9: For any square matrix, A, there is a unique matrix X such that

    (4.1)

    (4. 2)

    (4.3)

    Ak = Ak+1X, for some positive integer k,

    Proof: Observe first that if A = 0 is the null matrix, then A and X = 0 satisfy (4.1), (4.2) and (4.3).

    Suppose A f 0 is any n by n matrix. Then there exist scalars d l , ... ,d t , not all zero, such that

    t . I d.A 1 = 0,

    i=l 1 -57-

  • where t < nZ+l since the Ai can be viewed as vectors with n Z elements. Let dk be the first nonzero coefficient. Then we can write

    (4.4) where

    U = - 1 [I d.A i - k - l ). dk i=k+l ~ Since U is a polynomial in A, U and A commute. Also, multi-plying both sides of (4.4) by AU gives

    Ak = Ak+ZU Z = Ak+3U3 = ... ,

    and thus

    (4.5) for all m > 1.

    Let X = AkUk+l . Then for this choice of X,

    and

    by use of (4.5). Also, X and A commute since U and A commute. Thus the conditions (4.1), (4.Z) and (4.3) hold for this X.

    To show that X is unique, suppose that Y is also a solution to (4.1), (4.Z) and (4.3), where X corresponds to an exponent kl and Y corresponds to an exponent kZ in (4.1). Let ~ = maximum (kl,kz). Then it follows using (4.1), (4.Z), (4.3) and (4.5) that

    X X3AZ X~+lAk xk+lAk+ly A A

    XAY XAk+lyk+l = Akyk+l

    AYZ yZA = Y

    to establish uniqueness .

    -58-

  • We will call the unique matrix X in Theorem 9 the D~azin inverse of A and write X alternately as X = Ad' Also, we will call the smallest k such that (4.1) holds the index of A.

    That Ad is a generalized inverse of A is apparent by noting that (4.1) holds with k = 1 when X = A-I exists and also (4.2) and (4.3) hold. Observe, moreover, that in general (4.1) can be rewritten as

    (4.6) and (4.2) becomes XAX = X, by use of (4.3), so that the defining equations in Theorem 9 can be viewed as an alterna-tive to those used for A+ in which AXA = A is replaced by (4.6), (1.2) remains unchanged, and (1.3) and (1.4) are replaced by the condition in (4.3) that A and X commute. (Various relationships between Ad and A+ will be explored in the exercises at the end of this section and in Section 4.3. )

    As will be discussed following the proof of Lemma 10, full rank factorizations of A can be used effectively in the construction of Ad'

    LEMMA 10: 2 For any factorization A = BC, Ad = B(CB)d C. Proof: Observe first that for any square matrix A and tive integers k, and n, have A mAn m-n if m we = Ad m > d Am+nA n

    = Am Hm > k and A has index k. d Let k denote the larger of the index of Be and the

    index of CB. Then

    -59-

    posi-n and

  • Suppose now that A = BlC l is a full rank factorization where rank (A) = r l . Forming the r l byr l matrix ClB l , then either ClB l is nonsingular, or ClB l = 0, or rank (ClB l ) = r Z where 0 < r Z < r l . In the first case, with ClB l nonsingular, (ClBl)d (ClBl)-l so that Ad = Bl(ClBl)-ZC l , by Lemma 10, where

    On the other hand, if ClB l = 0 then (ClBl)d = 0 and thus Ad = 0 by again using Lemma 10. Finally, if rank (ClB l )

    < r Z < r l , then for any BZC Z' we have

    full rank factorization

    Z (ClBl)d = BZ(CZBZ)d Cz

    so that Ad in Lemma 10 becomes Ad = BlBZ(CZBZ)d3CZCl. The same argument now applies to CZB Z' that is, either CZB Z is nonsingular and

    or CZB Z = 0 and thus Ad = 0, or rank (ClB l ) = r3 where o < r3 < r Z' and CZB Z = B3C3 is a full rank factorization to which Lemma 10 can be applied. Continuing in this manner with

    then either BmCm 0 for some index m, and so Ad = 0, or rank (BmCm) rank (CmBm) > 0 for some index m, in which case

    and thus

    (4.7) Ad = BlB Z ... B (C B )-m-lC C m m m m m-l in Lemma 10. Observe, moreover, that with A = BlC l , A2 B1ClBlC l = BlBZCZC l , ... , Am = BlB Z ... BmCmCm_l ... Cl and

    -60-

  • (4.8)

    we have form in each Ci

    and

    either Am = (4.7) where, has full

    B + m-l

    B + m

    row

    Am+l =

    since rank,

    0, and Ad = 0, or that Ad has the each Bi has full column rank and

    C + m-l

    C + m

    B C m m

    C B . m m

    m m+l Therefore, in both cases we have rank (A ) = rank (A ). Furthermore, it follows in both cases that (4.1) holds for k = m and does not hold for any k < m. That is to say, k in

    . k k+l (4.1) is the smallest positive Integer such that A and A have the same rank.

    Example 4.1

    If A is the singular matrix

    A [: : - ~l 3 3 -1

    written as the full rank factorization

    [: 1" [~ 2 -2] A = BlC l = 1 1 ' then

    ClB l = [: :] is nonsingular, so that A has index one, and

    -61-

  • 0'.[: -Z6 "] -Z 35 -33 Ad BleClB1) Cl 3 -1 ExamEle 4. Z

    If A is the matrix

    li 0 0

    11 0 1

    A 0 0 0 0

    with A BIC l the full rank factorization,

    A BIC l l: where

    0

    1 1 0

    o 1

    -1

    ] [: ~] ,

    -1

    o 1

    0 0

    0

    ~]

    0

    1] 1 -1

    is a full rank factorization. Continuing,

    C ZB Z = [~ ~] so that

    CZB2 = B3C3 = [~] [1 OJ is a full rank factorization with C3B3 index three and Ad becomes

    -62-

    7. Hence A has

  • Ad -4

    BIB2B3(C3B3) C3C2C1 1 0 II

    1 [3431

    7"

    24101 ~ . [1 0 0 0 0 0 0 0] I 0 0 0 ~r o J 0 0 0 For the special case of matrices with index one we have

    (4.9) AAdA = A, AdAAd = Ad' AAd = AdA,

    so that

    (4.10)

    by the duality in the roles of A and Ad. Conversely, if (4.10) holds, then the first and last relations in (4.9) follow from the defining relations in (4.2) and (4.3) applied to (Ad)d and Ad' and the second relation in (4.9) is simply (4.2) for Ad and A. Consequently, (4.10) holds if and only if A has index one. In this special case the Drazin inverse of A is frequently called the group inverse of A, and is designated alternately as A#. Thus X = A#, when it exists, is the unique solution of AX A = A, XAX = A and AX = XA, and it follows from Lemma 10 that for any full rank factorization A = BC, A# = B(CB)-2C.

    Exercises 4.1 Compute for the matrices Ad A{ 2 -ll ~ 8 1] , [: 2 -:J. 0 Al 5 A2 0 3 0 0 7 0

    -2 -2 -3

    4.2 Given any matrices Band C of the same size where B has full column rank, we will say that C is aZias to B if B+ = (CHB)+C H.

    a. Prove that if C is alias to B, then CHB is nonsingular.

    b. Show that the set of all matrices al ias to B form an equivalence class.

    -63-

    ;j'

  • c. Prove that Ad = A+ if and only if C is alias to B for any full rank factorization A = BCH.

    d. Note, in particular, that Ad = A+ when A is Hermitian. Prove this fact directly and also by using the result in 4.2c above.

    4.3 Prove that (AH)d = Ad H and that ((Ad)d)d = Ad for any matrix A.

    4.4 Prove that Ad = 0 for any nilpotent matrix A.

    4.5 Prove that [PXQ1d = PdXQd for any square matrices P and Q. What is the index of pXQ?

    4.2 An Extension to Rectangular Matrices

    The Drazin inverse of a matrix, A, as defined in Theorem 9, exists only if A is square, and an obvious question is how this definition can be extended to rectangular matrices. One approach to this problem is to observe that if B is a m by n with m > n, say, then B can be augmented by m-n columns of zeroes to form a square matrix A. Now forming Ad' we might then take those columns of Ad which correspond to the locations of columns of B in A as a definition of the "Drazin inverse" of B. As shown in the following example, however, the difficulty in this approach is that there are ( m 1 such matrices A, obtained by considering all possible m-n) arrangements of the n columns of B (taken without any permuta-tions) and the m-n columns of zeroes, and that Ad can be different in each case.

    Example 4.3

    If

    B [: J and [:

    1 :] , " [:

    0 l [: z :] , 0 AZ 0 A3 I 3 -1 0 -1 -1 -64-

  • then

    }[l 10 :] , t[: 0 l 3 (A 2 ) d 0 9 0 - 1 [:

    -2

    l] 1 -13

    are obtained by applying Lemma 10 to the matrices Ai where

    1 o

    o o

    o

    1

    BC. 1

    Observe in Example 4.3 that the nonzero columns of each matrix (Ai)d correspond to the product B(C i B)d 2 . Conse-quently, using the nonzero columns of (Ai)d to define the "Drazin inverse" of B implies that the resulting matrix is a function of Ci . That such matrices are uniquely determined by a set of defining equations and are special cases of a class of generalized inverses that can be constructed for any matrix B will be apparent from Theorem 11.

    THEOREM 11: For any mbyn matrix B and any nbym matrix IV, there is a unique matrix X such that

    (4.11) (4.12) (4.13)

    (BW)k (BW)k+l xw , for some positive integer k, XWBWX X,

    BWX XWB.

    Proof: Let X B(WB)d 2 . Then with XW = B(WB)d2W = (BW)d' by Lemma 10, (4.11) holds with k the index of BW. Also,

    and

    so that (4.12) and (4.13) hold. -65-

  • To 'show that X is unique we can proceed as in the proof of Theorem 9. Thus, suppose Xl and Xz are solutions of (4.11), (4.lZ) and (4.13) corresponding to positive integers kl and kZ' respectively, in (4.11). Then with k = maximum (kl,kZ)' it follows that

    Xl = XlWBWXl = BWXlWXl = (BW)Z(XlW)ZXl

    ... = (BW)k(XlW)kXl = (BW)k+lXZW(XlW)kXl

    XZ(WB)k+lW(XlW)kXl = XZWBW(BW)k(XlW)kXl

    Continuing in a similar manner with

    then ~

    Xz(WXZ)k+l(WB)k+lWBWXl

    Xz(WXZ)k+lW(BW)k+lXlWB

    Therefore, with Xl = XZ' the solution to (4.11), (4.lZ) and (4.13) is unique .

    The unique matrix, X, in Theorem 11 will be called the W-weighted Drazin inverse of B and will be written alternately as X = (BW)d'

    The choice of nomenclature W-weighted Drazin inverse of B is easily seen by noting that with (BW)d = B(WB)d z , then

    -66-

  • (BW)d = Bd when B is square and W is the identity matrix. Also, observe more generally that with Band (BW)d of the same size and with Wand WBW the size of BH, the relation BW(BW)d = (BW)dWB in (4.13) can be viewed as a generalized commutativity condition, and (BW)dWBW(BW)d = (BW)d in (4.12) is analogous to (4.2) when written in the form XAX = X. ExamEle 4.4

    If

    B I~ l3 :J

    is the matrix in Example 4.3, and

    II 2 -:] , w = [1 3 ~] , WI l3 -1 2 0 0

    then

    [169 338] "6 [: :] (BW )d _1_ 60 169 , (BW ) d 1 (91) 2 87

    -169 2

    Exercises 4.6 Verify that Band (BW)d satisfy the defining equations in Theorem

    11 for W = Cl , C2 , C3 in Example 4.3 and for W = Wl ' W2 in Example 4.4.

    4.7

    4.8

    Prove that E = Ed for any idempotent matrix E, and thus that B2Bd when B is square and W = Bd. (Consequently, B when W = Bd and B has index one.)

    H + -1 Show that if W is any matrix alias to B, then (BW)d = W (WB) . 4.9 Prove that (BW)d = BH+B+B H+ for any matrix B when W = BH. (Note

    that this result follows at once from Lemma 5(f) and Exercise 4.8 if B has full column rank, whereas Lemma 5(f) and Exercise 4.2d can be used for the general case.)

    -67-

  • 4.3 Expressions Relating Ad and AT

    It is an immediate consequence of Exercise 4.9 that if W BH and so

    then

    B+ = W(BW)dW.

    Thus, using W-weighted Drazin inverses with W = BH,(BW)d and B+ are related directly in terms of products of matrices which implies that (BW)d and B+ have the same rank. In con-trast, for any square matrix, A, we have

    k k+l rank (A ) = rank (A ) ,

    with k the index of A, whereas rank (A+) = rank (A). There-fore, rank (Ad) ~ rank (A+) with equality holding if and only if A has a group inverse. The following result can be used to give a general expression for the Drazin inverse of a matrix, A, in terms of powers of A and a Moore-Penrose inverse.

    THEOREM 12: For any square matrix A with index k,

    (4.14) AkYAk

    for any matrix Y such that

    Proof: Starting with the right-hand side of (4.14) we have AkYAk = A k+1A2k+lyA 2k+1A k+l d d

    Observe in (4.15) that one obvious choice of Y is (A 2k +l )+, and it then follows that A~,(A~)+ and Ad have the same rank for every positive integer ~ > k. In this case,

    -68-

  • various relationships among A,(A)+ and Ad can be established. For example, it can be shown that for any > k, there is a unique matrix X satisfying

    (4.16) AXA = A , XAX = X and

    (4.17) (XA)H XA , AX = AAd

    Dually, there is a unique matrix X satisfying (4.16) and

    (4.18) (AX)H AX, XA = AAd

    The unique solutions of (4.16) and (4.17) and of (4.16) and (4.18) are called the left and right power inverses of A, respectively, and are designated as (A)L and (A)R' More-over, it can be shown (Exercise 4.11) that

    (4.19)

    and (Exercise 4.14) that (A)L and (A)R can be computed using full rank factorizations.

    Exercises 4.10 Show that if A and W satisfy AWA = A and (WA)H = WA for any

    positive integer , then WA = (A)+A, and conversely. What is the dual form of this result for AW?

    4.11 Prove that (A) in (4.19) is the unique solution to (4.16) and L (4.1]) .

    4.12 Prove that for every > k, (A)+ = (A)LA(A)R and Ad = (A)RA2 -1 (A)L' -

    4.13 Use a sequence of full rank factorizations

    A = B1C l , A2 = B1B2C2Cl , ... ,

    to show that A(A)+ = Ak(Ak)+ and (A)+A = (Ak)+Ak for all > k. 4.14 (Continuation): Show that

    -69-

  • 4.15

    4.16

    and

    2 2 Construct (A )L and (A )R for the matrix A in Example 4.1 Prove that if Ax = b is a consistent system of equations and if A has index one, then the general solution of Anx = b, n = 1,2, ... , can be written as x = ARnb + (I-ALA)y where y is arbitrary. (Note that this expression reduces to x = A-nb when A is nonsingular. The terminology "power inverse" of A was chosen since we use powers of AR in a similar manner to obtain a particular solution of Anx = b.)

    4.4 Miscellaneous Exercises

    4.17 Let Band W be any matrices, m by nand n by m, respectively, and let p be any positive integer.

    a. Show that there is a unique matrix X such that

    BWX XWB,

    a unique matrix X such that

    WX = WB(WB)/,

    and that the unique X which satisfies both sets of equations is X = B(WB)/.

    b. Show that if p ~ 1, q ~ -1 and r ~ 0 are integers such that q + 2r + 2 = p, and if (WB)q = (WB)d when q = -1, then

    B(WB)dP = B(WB)q[WB)rW)(B(WB)q)]d2 .

    (Consequently, the unique X in 4.17a is the (WB)r W-weighted Drazin inverse of B(WB)q.)

    4.18 Prove that if A and B are any matrices such that Ad 2

    AAd = BBd

    -70-

    2 Bd ' then

  • 5 Other Generalized Inverses

    5.1 Inverses That Are Not Unique

    Given matrices A and X, subsets of the relations in (1.1) to (1.5) other than those used to define A+ and Ad provide additional types of generalized inverses. Although not unique, some of these generalized inverses exhibit the essential properties of A+ required in various applications. For example, observe that only the condition AXA = A was needed to characterize consistent systems of equations Ax = b by the relation AXb = b in (2.4). Moreover, if A and X also satisfy (XA)H = XA, then XA = A+A, by Exercise 4.10, and with A+b a particular solution of Ax = b, the general solution in Exercise 2.21 can be written as

    with the orthogonal decomposition

    -71-

  • (Note that this is an extension of the special case of matrices with full row rank used in the proof of Theorem 2.) In this section we consider relationships among certain of these generalized inverses in terms of full rank factoriza-tions, and illustrate the construction of such inverses with numerical examples.

    For any A and X such that AXA = A, rank (X) ~ rank (A), whereas XAX = X implies rank (X) ~ rank (A). The-following lemma characterizes solutions ofAXA = A and XAX = X in terms of group inverses.

    LEMMA 13: For any full rank factorizations A = BC and X = YZ, AXA = A and XAX X if and only if AX = (BZ)#BZ and XA = (YC)#YC. Proof: If A = BC and X = YZ are full rank factorizations where B is mbyr, C is rbyn, Y is nbys and Z is sbym, then AXA = A implies

    (5.1) CYZB = I r ,

    and XAX X implies

    (5.2) ZBCY = Is.

    Consequently, with r = s, ZB = (Cy)-l so that

    AX = BCYZ = B(ZB)-lZ = (BZ)#BZ and

    XA YZBC Y(Cy)-lC (YC)#YC, by Lemma 10.

    Conversely, since Z and (YC)#YCY = Y. Hence XA = (YC)#YC gives XAX =

    # and C have full row rank, (BZ) BZB=B AX = (BZ)#BZ gives AXA = A, and X

    It should be noted that the relation in (5.1) is both necessary and sufficient to have AXA = A, and does not require that YZ is a full rank factorization. Dually, (5.2) is both necessary and sufficient to have XAX = X, and BC

    -72-

  • need not be a full rank factorization. Observe, moreover, that given any matrix, A, with full rank factorization A = BC, then for any choice of Y such that CY has full column rank, taking Z = (CY)LBL with (CY)L any left inverse of CY and BL any left inverse of B gives a matrix X = YZ such that (5.2) holds. Therefore, we can always construct matrices, X, of any given rank not exceeding the rank of A with XAX = X. On the other hand, given full rank factorizations A = BC and X = YZ such that AXA = A and XAX = X, then for any matrix U with full column rank satisfying CU = 0 and for any matrix V with UVA defined we have

    (5.3) A(X+UV)A = A. Now

    (5.4)

    where the first matrix on the right-hand side has full column rank (Exercise 5.6). Thus, for any choice of V such that the second matrix on the right-hand side of (5.4) has full row rank, (5.3) holds and rank (X+UV) > rank (A).

    The following example illustrates the construction of matrices, X, of prescribed rank such that A and X satisfy at least one of the conditions AXA = X and XAX = X.

    Example 5.1

    Let A be the matrix

    from Example 4.1 with full rank factorization

    A BC 2 1

    -2] 1 .

    Then rank (A) = 2, and Xo = 0 satisfies XoAXo = Xo trivially. To construct a matrix, Xl' of rank one such that XlAXl = Xl'

    -73-

  • note first that

    [-1 z :] (5.5) BL =} Z -1

    is a left inverse of B. Now if yH [3 4 5], then

    Cy = c:], (Cy)+ = 194[-1 9], zH k[l9 -11 0] so that

    -33 -44 -55

    To next construct a matrix Xz of rank two such that XZAX Z Xz (and thus AXZA = A), let

    [1 -1] Y = 1 1. 1 -1

    Then

    CY [0 4] (Cy) -1 = 1 [3 = 5 -3' 20 5

    and with BL the left inverse of B in (5.5),

    so that

    -4 6

    -4

    z

    10

    Finally, to construct a matrix, X3 , of rank three such that AX3A = A, let

    -74-

  • where u~N(C). Then

    with det X3 = -5.

    -4 6

    -4 -~] -3

    That the procedure in Example 5.1 can be extended to construct matrices X of given rank satisfying (AX)H = AX and at least one of the conditions AXA = A and XAX = X is apparent by observing that CY with full column rank implies BCY has full column rank. Hence, taking Z = (BCY)+, (S.Z) holds and AX = BCY(BCY)+ is Hermitian. In the following example we indicate matrices Z in Example 5.1 so that the resulting matrices Xi satisfy (AXi)H = AX i , i = 1,2,3. Example 5.2

    Given the matrix A and full rank factorization A BC in Example 5.1, again let yH = [3 4 5]. Then

    and

    H yz [51

    8~4 68 85

    21 28 35

    1 804[17 7 8]

    24] 32 40

    satisfies XlAX l = Xl with AX I Hermitian and rank (Xl) 1. Continuing, if we again use

    [1 -1] Y = 1 1 1 -1

    then

    AY BCY [10 - 2]

    5 5, Z = 5 1

    -75-

    (AY)+ 1 [ 16 220 -20

    5

    35

  • and

    [lB -15 ~] Xz YZ lio - z ZO 18 -15 with XzAX z = Xz ,AXZ Hermitian and rank (XZ) = Z.