Lecture 2: Generalized Inverses
1
Moore’s plan
The striking analogies between the theories for linear equations in
n–dimensional Euclidean space, for Fredholm integral equations in
the space of continuous functions defined on a finite real interval,
and for linear equations in Hilbert space of infinitely many
dimensions, led Moore to lay down his well–known principle.
“The existence of analogies between central features of various
theories implies the existence of a more fundamental general
theory embracing the special theories as particular instances and
unifying them as to those central features.” (Moore, 1912)
“The effectiveness of the reciprocal of a non–singular finite
matrix in the study of properties of such matrices makes it
desirable to define if possible an analogous matrix to be
associated with each finite matrix even if it is not square or, if
square, is not necessarily non–singular.” (Moore 1935)
2-b
Desiderata
Cm×nr = the m × n matrices over C with rank r.
A matrix A ∈ Cn×n is nonsingular if rank A = n, or detA 6= 0.
The inverse of A satisfies, by definition, the following equations,
AXA = A (1)
XAX = X (2)
(AX)∗ = AX (3)
(XA)∗ = XA (4)
AX = XA (5)as well as the conditions
Ax = λx =⇒ A−1x =1
λx (6)
A, B nonsingular =⇒ (AB)−1 = B−1A−1 (7)
These properties are desirable, can one have them for general A ?
3
The Penrose equations
The Penrose equations for A ∈ Cm×n are:
AXA = A , (1)
XAX = X , (2)
(AX)∗ = AX , (3)
(XA)∗ = XA . (4)
Let A{i, j, . . . , k} denote the set of matrices X ∈ Cn×m which
satisfy equations (i), (j), · · · , (k).
A matrix X ∈ A{i, j, . . . , k} is called an {i, j, . . . , k}–inverse of A,
and also denoted by A(i,j,...,k).
In particular, a {1}–inverse, a {2}–inverse, a {1, 3}–inverse, etc.
The Moore–Penrose inverse of A is its {1, 2, 3, 4}–inverse,
denoted A†.
4-a
Why Moore’s work was unknown in 1955?
Answer: Telegraphic style and idiosyncratic notation. Example:
(29.3) Theorem.
UC B1 II B2 II κ12·) ·
∃ |λ21 type M2
κ∗ M1
κ � ·S2 κ12 λ21 = δ11M1
κ
· S1 λ21 κ12 = δ22M2
κ∗
English translation:
(29.3) Theorem.
For every matrix A there exists a unique matrix X : R(A) → R(A∗)
such that
AX = PR(A) , XA = PR(A∗) .
5-a
Construction of {1}–inverses
Given A ∈ Cm×nr , let E ∈ C
m×mm and P ∈ C
n×nn be such that
EAP =
[Ir KO O
]
. (1)
Then for any L ∈ C(n−r)×(m−r), the n × m matrix
X = P
[Ir OO L
]
E (2)
is a {1}–inverse of A. The partitioned matrices in (1), (2) must be
suitably interpreted in case r = m or r = n.
Proof. Write (1) as
A = E−1
[Ir KO O
]P−1 ,
then verify that any X given by (2) satisfies AXA = A. �
6
Linear equations
Given A ∈ Cm×n , b ∈ C
m, the equations
Ax = b (1)
have a solution if and only if for any X ∈ A{1},
AXb = b , (2)
in which case the general solution is
x = X b + (I − XA)y , y ∈ Cn arbitrary (3)
Proof. AXA = A =⇒ AX idempotent, rank AX = rank A.
∴ AX = PR(A),M , for some M such that Cm = R(A) ⊕ M .
Ax = b consistent ⇐⇒ b ∈ R(A) ⇐⇒ PR(A),Mb = b , ∀ M
Finally, A (X b + (I − XA)y) = AX b = b. �
7-a
Linear matrix equations
Theorem. Let A ∈ Cm×n , B ∈ C
p×q , D ∈ Cm×q. Then the
matrix equation
AXB = D (1)
is consistent if and only if for some A(1), B(1),
AA(1)DB(1)B = D , (2)
in which case the general solution is
X = A(1)DB(1) + Y − A(1)AY BB(1) (3)
for arbitrary Y ∈ Cn×p.
Proof. If (1) is consistent then
D = AXB = AA(1)AXBB(1)B = AA(1)DB(1)B .
8
Kronecker products and matrix equations
The Kronecker product A ⊗ B of the two matrices
A = (aij) ∈ Cm×n , B ∈ Cp×q is the mp × nq matrix
A ⊗ B =
a11B a12B · · · a1nB
a21B a22B · · · a2nB
· · · · · · · · · · · ·
am1B am2B · · · amnB
For X = (xij) ∈ Cm×n, let vec(X) = (vk) ∈ C
mn be the vector
obtained by listing the elements of X by rows,
vn(i−1)+j = xij (i ∈ 1, m ; j ∈ 1, n)
Lemma. For compatible matrices A, X, B
(A ⊗ BT ) vec(X) = vec (AXB)
9
Construction of {1, 2}–inverses
Proposition. Let Y, Z ∈ A{1}, and let
X = Y AZ .
Then X ∈ A{1, 2}.
Proof. AXA = A(Y AZ)A = (AY A)ZA = AZA = A ,
XAX = (Y AZ)A(Y AZ) = Y (AZA)Y AZ = Y (AY A)Z = X . �
Proposition. Any two of the following statements imply the third:
(a) X ∈ A{1} ,
(b) X ∈ A{2} ,
(c) rank X = rank A .
Proof. X ∈ A{1}, Y ∈ A{2} =⇒ rank Y ≤ rank A ≤ rank X , etc.
10-a
Projections
Theorem. For any A ∈ Cm×n , A(1) ∈ A{1}.
R(AA(1)) = R(A) , N(A(1)A) = N(A) , R((A(1)A)∗) = R(A∗) .
Proof. Always R(AX) ⊂ R(A) , N(A) ⊂ N(XA) .
ButAXA = A =⇒ rank AX = rank XA = rank A .
Theorem. Let X be a {1, 2}–inverses of A. Then:
(a) AX is the projector on R(A) along N(X), and
(b) XA is the projector on R(X) along N(A).
Proof. AX = (AX)2 =⇒ AX = PR(AX),N(AX)
AXA = A =⇒ R(AX) = R(A)
XAX = X , rankAX = rank X =⇒ N(AX) = N(X)
11-a
The set of {1, 3}–inverses
Theorem. The set A{1, 3} consists of all solutions for X of
AX = AA(1,3) , (1)
where A(1,3) is an arbitrary element of A{1, 3}.
Proof. If X satisfies (1), then
AXA = AA(1,3)A = A , AX = (AX)∗ . ∴ X ∈ A{1, 3} .
Conversely, if X ∈ A{1, 3}, then
AA(1,3) = AXAA(1,3) = (AX)∗AA(1,3) = X∗A∗(A(1,3))∗A∗
= X∗A∗ = AX .
Theorem. The set A{1, 4} consists of all solutions for X of
XA = A(1,4)A .
12-b
Characterizations of {1, 3}, and {1, 4}–inverses
Recall that for Cn = L ⊕ M .
M = L⊥ ⇐⇒ PL,M is Hermitian
Theorem. For any A ∈ Cm×n:
(a) AX = PR(A) ⇐⇒ X ∈ A{1, 3}
(b) XA = PR(A∗) ⇐⇒ X ∈ A{1, 4}
Proof. (a) ⇐=
AXA = A =⇒ AX = PR(AX),N(AX)
AXA = A =⇒ R(AX) = R(A) ∴ AX = PR(A),N(AX)
AX = (AX)∗ =⇒ N(AX) = R(A)⊥ ∴ AX = PR(A)
(a) =⇒AX = PR(A) = AA(1,3) =⇒ X ∈ A{1, 3}
13-a
{1, 2, 3}, and {1, 2, 4}–inverses
Theorem (Urquhart). For every A ∈ Cm×n ,
(A∗A)(1)A∗ ∈ A{1, 2, 3} , (a)
A∗(AA∗)(1) ∈ A{1, 2, 4} , (b)
A(1,4)AA(1,3) ∈ A{1, 2, 3, 4} . (c)
Proof of (a). Let X := (A∗A)(1)A∗.
R(A∗A) = R(A∗) (why?) =⇒ A∗ = A∗AU , ∃ U ∴ A = U∗A∗A
∴ AXA = U∗A∗A(A∗A)(1)A∗ = U∗A∗A = A ∴ X ∈ A{1}
rank X ≤ rankA∗ and X ∈ A{1} =⇒ rank X ≥ rank A
∴ rankX = rank A ∴ X ∈ A{2}Finally
AX = U∗A∗A(A∗A)(1)A∗AU = U∗A∗AU ∴ X ∈ A{3} �
14
The Moore–Penrose inverse
Theorem (Penrose). Given A ∈ Cm×n, a solution of
AXA = A , (1)
XAX = X , (2)
(AX)∗ = AX , (3)
(XA)∗ = XA , (4)
exists and is unique. The {1, 2, 3, 4}–inverse of A is denoted A†.
Proof. Uniqueness. Let X, Y ∈ A{1, 2, 3, 4}. Then
X = X(AX)∗ = XX∗A∗ = X(AX)∗(AY )∗
= XAY = (XA)∗(Y A)∗Y = A∗Y ∗Y
= (Y A)∗Y = Y .
Existence. A† = A(1,4)AA(1,3) . �
15
Full–rank factorization
Given A ∈ Cm×nr , r > 0, a full–rank factorization is
A = CR , C ∈ Cm×rr , R ∈ C
r×nr (1)
Theorem (MacDuffee). Given A ∈ Cm×nr , r > 0, C, R as in (1),
A† = R∗(C∗AR∗)−1C∗ . (2)
Proof. C∗AR∗ is nonsingular, because
C∗AR∗ = (C∗C)(RR∗) , a product of nonsingular matrices .
Let X = RHS(2) = R∗(RR∗)−1(C∗C)−1C∗ , and check that X
satisfies the 4 Penrose equations. �
A† = R∗(RR∗)−1(C∗C)−1C∗ = R†C† (3)
Q: What is a “good” method for full–rank factorization ?
16-b
Singular value decomposition
Let A ∈ Cm×nr , r > 0, and let
AA∗ui = σ2i ui , i ∈ 1, m
A∗Avi = σ2i vi , i ∈ 1, n
σ1 ≥ σ1 ≥ · · · ≥ σr > 0 = σr+1 = σr+2 = · · ·
The singular value decomposition (SVD) of A is
A = UΣV ∗ (SVD)
U = [u1 u2 · · · um] ∈ Cm×m , U∗U = Im ,
V = [v1 v2 · · · vn] ∈ Cn×n , V ∗V = In ,
Σ = diag(σ1, σ2, · · · , σr) ∈ Rm×n .
Theorem (Penrose).A† = V Σ†U∗
where Σ† = diag
(1
σ1,
1
σ2, · · · ,
1
σr
)∈ R
n×m
17
Properties of the Moore–Penrose inverse
(a) For any scalar λ,λ† =
1λ
, if λ 6= 0 ;
0 , otherwise .
If a,b are column vectors then
(b) a† = (a∗a)†a∗ (c) (ab∗)† = (a∗a)†(b∗b)†ba∗
(d) If D = diag(λ1, · · · , λk) ∈ Cm×n then
D† = diag(λ†1, · · · , λ
†k) ∈ C
n×m
For any matrix A
(e) (A†)† = A (f) (A∗)† = (A†)∗
(g) (AT )† = (A†)T (h) A† = (A∗A)†A∗ = A∗(AA∗)†
(i) R(A†) = R(A∗) (j) N(A†) = N(A∗)
(k) AA† = PR(A) (l) A†A = PR(A∗)
(m) If U and V are unitary matrices, (UAV )† = V ∗A†U∗
(n) For any matrices A, B: (A ⊗ B)† = A† ⊗ B†
18
Non–properties of the Moore–Penrose inverse
(a) In general, for compatible A, B,
(AB)† 6= B†A†
(b) If A, B are similar, i.e. B = S−1AS for some nonsingular S,
then, in general, B† 6= S−1A†S .
(c) If Jk(0) is a Jordan block corresponding to the eigenvalue zero,
then (Jk(0))† = (Jk(0))T . For example,
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
†
=
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
∴ A† is not a polynomial in A.
19-b
Continuity of the inverse
Let ‖ · ‖ be a multiplicative matrix norm, i.e.
‖XY ‖ ≤ ‖X‖‖Y ‖ , if XY is defined
Let X ∈ Cn×nn . Then the perturbation (X + E) = (I + EX−1)X
is nonsingular for all E such that ‖E‖ <1
‖X−1‖and its inverse is
(X + E)−1 = X−1(I − EX−1 + (EX−1)2 − (EX−1)3 + · · ·
)
which converges if
‖EX−1‖ < 1 , guaranteed by ‖E‖ <1
‖X−1‖
The inverse is a continuous function Cn×nn 7→ Cn×n
n , and the
nonsingular matrices are an open set in Cn×n.
20
The Moore–Penrose inverse is discontinuous
Ex. Let
X(ǫ) =
1 0
0 ǫ
→ X(0) =
1 0
0 0
, as ǫ → 0 .
But
X(ǫ)† =
1 0
01
ǫ
6→ X(0)† =
1 0
0 0
.
For perturbations Ek → O,
(X + Ek)† → X† ⇐⇒ rank (X + Ek) → rank X
21-a
The Smith normal form
A nonsingular matrix A ∈ Zn×n whose inverse A−1 is also in Zn×n
is called a unit matrix.
Two matrices A, S ∈ Zm×n are said to be equivalent over Z if
there exist two unit matrices P ∈ Zm×m and Q ∈ Z
n×n such that
PAQ = S . (1)
Theorem. Let A ∈ Zm×nr . Then A is equivalent over Z to a
matrix S = [sij ] ∈ Zm×nr such that:
(a) sii 6= 0 , i ∈ 1, r,
(b) sij = 0 otherwise, and
(c) sii divides si+1,i+1 for i ∈ 1, r − 1.
S is called the Smith normal form of A, and its nonzero
elements sii (i ∈ 1, r) are invariant factors of A.
22-a
Integer solutions
Let A ∈ Zm×n,b ∈ Z
m and let the linear equation
Ax = b (P)
be consistent. It is required to determine if (P) has an integer
solution, in which case determine all of them.
Theorem (Hurt and Waid). Let A ∈ Zm×n. Then there is an
n × m matrix X satisfying
AXA = A , (1)
XAX = X , (2)
AX ∈ Zm×m, XA ∈ Z
n×n . (6)
Proof. Let PAQ = S be the Smith normal form of A. Then
X = QS†P .
23
Integer solutions (con’d)
Let A the {1, 2}–inverse of A as given above.
Theorem (Hurt and Waid). Let A and b be integral, and let
the vector equation
Ax = b (P)
be consistent. Then (P) has an integral solution if and only if the
vector
Ab
is integral, in which case the general integral solution of (P) is
x = Ab + (I − AA)y , y ∈ Zn .
24
Application of {2}–inverses to Newton’s method
The Newton method for solving a single equation in 1 variable,
f(x) = 0 ,
is xk+1 = xk −f(xk)
f ′(xk), (k = 0, 1, . . .) .
A Newton method for solving m equations in n variables
fi(x1, . . . , xn) = 0 , i ∈ 1, m or f(x) = 0 ,
is similarly given, for the case m = n, by
xk+1 = xk − f ′(xk)−1f(xk) , (k = 0, 1, . . .) ,
where f ′(xk) is the derivative of f at xk, represented by the
matrix of partial derivatives (the Jacobian matrix)
f ′(xk) =
(∂fi
∂xj
(xk)
).
25
Notation
We denote the derivative of f at c
f ′(c) =
(∂fi
∂xj
(c)
)by Jf (c) or by Jc .
We denote by ‖ · ‖ both a vector norm in Rn and a matrix norm
consistent with it,
‖Ax‖ ≤ ‖A‖‖x‖ , ∀ x .
For a given point x0 ∈ Rn and a positive scalar r we denote by
B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖< r}
the open ball with center x0 and radius r. The closed ball
with the same center and radius is
B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖≤ r} .
26
Newton method using {2}–inverses of f ′
Theorem. Let x0 ∈ Rn, r > 0 and let f : R
n → Rm be
differentiable in B(x0, r). Let M > 0 be such that
‖Ju − Jv‖ ≤ M ‖u− v‖ (1)
for all u,v ∈ B(x0, r). Further, assume that for all x ∈ B(x0, r),
the Jacobian Jx has a {2}–inverse Tx ∈ Rn×m, TxJxTx = Tx,
such that ‖Tx0‖∥∥f(x0)
∥∥ < α , (2)
and, ‖(Tu − Tv)f(v)‖ ≤ N ‖u− v‖2
, ∀ u,v ∈ B(x0, r) (3)
M
2‖Tu‖ + N ≤ K < 1 , ∀ u ∈ B(x0, r) (4)
for some positive scalars N, K and α, and
h := αK < 1 ,α
1 − h< r . (5)
27
Theorem (cont’d)
Then:
(a) Starting at x0, all iterates
xk+1 = xk − Txk f(xk), k = 0, 1, . . . (6)
lie in B(x0, r).
(b) The sequence {xk} converges, as k → ∞, to a point
x∞ ∈ B(x0, r), that is a solution of
Tx∞f(x) = 0 . (7)
(c) For all k ≥ 0∥∥xk − x∞
∥∥ ≤ αh2k−1
1 − h2k. (8)
Since 0 < h < 1, the method is (at least) quadratically convergent.
The iterates converge not to a solution of f(x) = 0, but of (7). The
degree of approximation depends on the {2}–inverse used.
28-b
Top Related