of 28 /28
Lecture 2: Generalized Inverses 1
• Author

others
• Category

## Documents

• view

4

1

Embed Size (px)

### Transcript of Lecture 2: Generalized Inverses - Ben-Israel.netbenisrael.net › GI-LECTURE-2.pdfLecture 2:...

• Lecture 2: Generalized Inverses

1

• Moore’s plan

The striking analogies between the theories for linear equations in

n–dimensional Euclidean space, for Fredholm integral equations in

the space of continuous functions defined on a finite real interval,

and for linear equations in Hilbert space of infinitely many

dimensions, led Moore to lay down his well–known principle.

“The existence of analogies between central features of various

theories implies the existence of a more fundamental general

theory embracing the special theories as particular instances and

unifying them as to those central features.” (Moore, 1912)

“The effectiveness of the reciprocal of a non–singular finite

matrix in the study of properties of such matrices makes it

desirable to define if possible an analogous matrix to be

associated with each finite matrix even if it is not square or, if

square, is not necessarily non–singular.” (Moore 1935)

2-b

• Desiderata

Cm×nr = the m × n matrices over C with rank r.

A matrix A ∈ Cn×n is nonsingular if rank A = n, or detA 6= 0.

The inverse of A satisfies, by definition, the following equations,

AXA = A (1)

XAX = X (2)

(AX)∗ = AX (3)

(XA)∗ = XA (4)

AX = XA (5)as well as the conditions

Ax = λx =⇒ A−1x =1

λx (6)

A, B nonsingular =⇒ (AB)−1 = B−1A−1 (7)

These properties are desirable, can one have them for general A ?

3

• The Penrose equations

The Penrose equations for A ∈ Cm×n are:

AXA = A , (1)

XAX = X , (2)

(AX)∗ = AX , (3)

(XA)∗ = XA . (4)

Let A{i, j, . . . , k} denote the set of matrices X ∈ Cn×m which

satisfy equations (i), (j), · · · , (k).

A matrix X ∈ A{i, j, . . . , k} is called an {i, j, . . . , k}–inverse of A,

and also denoted by A(i,j,...,k).

In particular, a {1}–inverse, a {2}–inverse, a {1, 3}–inverse, etc.

The Moore–Penrose inverse of A is its {1, 2, 3, 4}–inverse,

denoted A†.

4-a

• Why Moore’s work was unknown in 1955?

Answer: Telegraphic style and idiosyncratic notation. Example:

(29.3) Theorem.

UC B1 II B2 II κ12·) ·

∃ |λ21 type M2

κ∗M

1

κ ·S2 κ12 λ21 = δ11M1

κ

· S1 λ21 κ12 = δ22M2

κ∗

English translation:

(29.3) Theorem.

For every matrix A there exists a unique matrix X : R(A) → R(A∗)

such that

AX = PR(A) , XA = PR(A∗) .

5-a

• Construction of {1}–inverses

Given A ∈ Cm×nr , let E ∈ Cm×mm and P ∈ C

n×nn be such that

EAP =

[Ir KO O

]

. (1)

Then for any L ∈ C(n−r)×(m−r), the n × m matrix

X = P

[Ir OO L

]

E (2)

is a {1}–inverse of A. The partitioned matrices in (1), (2) must be

suitably interpreted in case r = m or r = n.

Proof. Write (1) as

A = E−1

[Ir KO O

]P−1 ,

then verify that any X given by (2) satisfies AXA = A. �

6

• Linear equations

Given A ∈ Cm×n , b ∈ Cm, the equations

Ax = b (1)

have a solution if and only if for any X ∈ A{1},

AXb = b , (2)

in which case the general solution is

x = X b + (I − XA)y , y ∈ Cn arbitrary (3)

Proof. AXA = A =⇒ AX idempotent, rank AX = rank A.

∴ AX = PR(A),M , for some M such that Cm = R(A) ⊕ M .

Ax = b consistent ⇐⇒ b ∈ R(A) ⇐⇒ PR(A),Mb = b , ∀ M

Finally, A (X b + (I − XA)y) = AX b = b. �

7-a

• Linear matrix equations

Theorem. Let A ∈ Cm×n , B ∈ Cp×q , D ∈ Cm×q. Then the

matrix equation

AXB = D (1)

is consistent if and only if for some A(1), B(1),

AA(1)DB(1)B = D , (2)

in which case the general solution is

X = A(1)DB(1) + Y − A(1)AY BB(1) (3)

for arbitrary Y ∈ Cn×p.

Proof. If (1) is consistent then

D = AXB = AA(1)AXBB(1)B = AA(1)DB(1)B .

8

• Kronecker products and matrix equations

The Kronecker product A ⊗ B of the two matrices

A = (aij) ∈ Cm×n , B ∈ Cp×q is the mp × nq matrix

A ⊗ B =

a11B a12B · · · a1nB

a21B a22B · · · a2nB

· · · · · · · · · · · ·

am1B am2B · · · amnB

For X = (xij) ∈ Cm×n, let vec(X) = (vk) ∈ C

mn be the vector

obtained by listing the elements of X by rows,

vn(i−1)+j = xij (i ∈ 1, m ; j ∈ 1, n)

Lemma. For compatible matrices A, X, B

(A ⊗ BT ) vec(X) = vec (AXB)

9

• Construction of {1, 2}–inverses

Proposition. Let Y, Z ∈ A{1}, and let

X = Y AZ .

Then X ∈ A{1, 2}.

Proof. AXA = A(Y AZ)A = (AY A)ZA = AZA = A ,

XAX = (Y AZ)A(Y AZ) = Y (AZA)Y AZ = Y (AY A)Z = X . �

Proposition. Any two of the following statements imply the third:

(a) X ∈ A{1} ,

(b) X ∈ A{2} ,

(c) rank X = rank A .

Proof. X ∈ A{1}, Y ∈ A{2} =⇒ rank Y ≤ rank A ≤ rank X , etc.

10-a

• Projections

Theorem. For any A ∈ Cm×n , A(1) ∈ A{1}.

R(AA(1)) = R(A) , N(A(1)A) = N(A) , R((A(1)A)∗) = R(A∗) .

Proof. Always R(AX) ⊂ R(A) , N(A) ⊂ N(XA) .

ButAXA = A =⇒ rank AX = rank XA = rank A .

Theorem. Let X be a {1, 2}–inverses of A. Then:

(a) AX is the projector on R(A) along N(X), and

(b) XA is the projector on R(X) along N(A).

Proof. AX = (AX)2 =⇒ AX = PR(AX),N(AX)

AXA = A =⇒ R(AX) = R(A)

XAX = X , rankAX = rank X =⇒ N(AX) = N(X)

11-a

• The set of {1, 3}–inverses

Theorem. The set A{1, 3} consists of all solutions for X of

AX = AA(1,3) , (1)

where A(1,3) is an arbitrary element of A{1, 3}.

Proof. If X satisfies (1), then

AXA = AA(1,3)A = A , AX = (AX)∗ . ∴ X ∈ A{1, 3} .

Conversely, if X ∈ A{1, 3}, then

AA(1,3) = AXAA(1,3) = (AX)∗AA(1,3) = X∗A∗(A(1,3))∗A∗

= X∗A∗ = AX .

Theorem. The set A{1, 4} consists of all solutions for X of

XA = A(1,4)A .

12-b

• Characterizations of {1, 3}, and {1, 4}–inverses

Recall that for Cn = L ⊕ M .

M = L⊥ ⇐⇒ PL,M is Hermitian

Theorem. For any A ∈ Cm×n:

(a) AX = PR(A) ⇐⇒ X ∈ A{1, 3}

(b) XA = PR(A∗) ⇐⇒ X ∈ A{1, 4}

Proof. (a) ⇐=

AXA = A =⇒ AX = PR(AX),N(AX)

AXA = A =⇒ R(AX) = R(A) ∴ AX = PR(A),N(AX)

AX = (AX)∗ =⇒ N(AX) = R(A)⊥ ∴ AX = PR(A)

(a) =⇒AX = PR(A) = AA

(1,3) =⇒ X ∈ A{1, 3}

13-a

• {1, 2, 3}, and {1, 2, 4}–inverses

Theorem (Urquhart). For every A ∈ Cm×n ,

(A∗A)(1)A∗ ∈ A{1, 2, 3} , (a)

A∗(AA∗)(1) ∈ A{1, 2, 4} , (b)

A(1,4)AA(1,3) ∈ A{1, 2, 3, 4} . (c)

Proof of (a). Let X := (A∗A)(1)A∗.

R(A∗A) = R(A∗) (why?) =⇒ A∗ = A∗AU , ∃ U ∴ A = U∗A∗A

∴ AXA = U∗A∗A(A∗A)(1)A∗ = U∗A∗A = A ∴ X ∈ A{1}

rank X ≤ rankA∗ and X ∈ A{1} =⇒ rank X ≥ rank A

∴ rankX = rank A ∴ X ∈ A{2}Finally

AX = U∗A∗A(A∗A)(1)A∗AU = U∗A∗AU ∴ X ∈ A{3} �

14

• The Moore–Penrose inverse

Theorem (Penrose). Given A ∈ Cm×n, a solution of

AXA = A , (1)

XAX = X , (2)

(AX)∗ = AX , (3)

(XA)∗ = XA , (4)

exists and is unique. The {1, 2, 3, 4}–inverse of A is denoted A†.

Proof. Uniqueness. Let X, Y ∈ A{1, 2, 3, 4}. Then

X = X(AX)∗ = XX∗A∗ = X(AX)∗(AY )∗

= XAY = (XA)∗(Y A)∗Y = A∗Y ∗Y

= (Y A)∗Y = Y .

Existence. A† = A(1,4)AA(1,3) . �

15

• Full–rank factorization

Given A ∈ Cm×nr , r > 0, a full–rank factorization is

A = CR , C ∈ Cm×rr , R ∈ Cr×nr (1)

Theorem (MacDuffee). Given A ∈ Cm×nr , r > 0, C, R as in (1),

A† = R∗(C∗AR∗)−1C∗ . (2)

Proof. C∗AR∗ is nonsingular, because

C∗AR∗ = (C∗C)(RR∗) , a product of nonsingular matrices .

Let X = RHS(2) = R∗(RR∗)−1(C∗C)−1C∗ , and check that X

satisfies the 4 Penrose equations. �

A† = R∗(RR∗)−1(C∗C)−1C∗ = R†C† (3)

Q: What is a “good” method for full–rank factorization ?

16-b

• Singular value decomposition

Let A ∈ Cm×nr , r > 0, and let

AA∗ui = σ2i ui , i ∈ 1, m

A∗Avi = σ2i vi , i ∈ 1, n

σ1 ≥ σ1 ≥ · · · ≥ σr > 0 = σr+1 = σr+2 = · · ·

The singular value decomposition (SVD) of A is

A = UΣV ∗ (SVD)

U = [u1 u2 · · · um] ∈ Cm×m , U∗U = Im ,

V = [v1 v2 · · · vn] ∈ Cn×n , V ∗V = In ,

Σ = diag(σ1, σ2, · · · , σr) ∈ Rm×n .

Theorem (Penrose).A† = V Σ†U∗

where Σ† = diag

(1

σ1,

1

σ2, · · · ,

1

σr

)∈ Rn×m

17

• Properties of the Moore–Penrose inverse

(a) For any scalar λ,λ† =

, if λ 6= 0 ;

0 , otherwise .

If a,b are column vectors then

(b) a† = (a∗a)†a∗ (c) (ab∗)† = (a∗a)†(b∗b)†ba∗

(d) If D = diag(λ1, · · · , λk) ∈ Cm×n then

D† = diag(λ†1, · · · , λ†k) ∈ C

n×m

For any matrix A

(e) (A†)† = A (f) (A∗)† = (A†)∗

(g) (AT )† = (A†)T (h) A† = (A∗A)†A∗ = A∗(AA∗)†

(i) R(A†) = R(A∗) (j) N(A†) = N(A∗)

(k) AA† = PR(A) (l) A†A = PR(A∗)

(m) If U and V are unitary matrices, (UAV )† = V ∗A†U∗

(n) For any matrices A, B: (A ⊗ B)† = A† ⊗ B†

18

• Non–properties of the Moore–Penrose inverse

(a) In general, for compatible A, B,

(AB)† 6= B†A†

(b) If A, B are similar, i.e. B = S−1AS for some nonsingular S,

then, in general, B† 6= S−1A†S .

(c) If Jk(0) is a Jordan block corresponding to the eigenvalue zero,

then (Jk(0))† = (Jk(0))

T . For example,

0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

=

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

∴ A† is not a polynomial in A.

19-b

• Continuity of the inverse

Let ‖ · ‖ be a multiplicative matrix norm, i.e.

‖XY ‖ ≤ ‖X‖‖Y ‖ , if XY is defined

Let X ∈ Cn×nn . Then the perturbation (X + E) = (I + EX−1)X

is nonsingular for all E such that ‖E‖ <1

‖X−1‖and its inverse is

(X + E)−1 = X−1(I − EX−1 + (EX−1)2 − (EX−1)3 + · · ·

)

which converges if

‖EX−1‖ < 1 , guaranteed by ‖E‖ <1

‖X−1‖

The inverse is a continuous function Cn×nn 7→ Cn×nn , and the

nonsingular matrices are an open set in Cn×n.

20

• The Moore–Penrose inverse is discontinuous

Ex. Let

X(ǫ) =

1 00 ǫ

→ X(0) =

1 00 0

, as ǫ → 0 .

But

X(ǫ)† =

1 0

01

ǫ

6→ X(0)† =

1 00 0

.

For perturbations Ek → O,

(X + Ek)† → X† ⇐⇒ rank (X + Ek) → rank X

21-a

• The Smith normal form

A nonsingular matrix A ∈ Zn×n whose inverse A−1 is also in Zn×n

is called a unit matrix.

Two matrices A, S ∈ Zm×n are said to be equivalent over Z if

there exist two unit matrices P ∈ Zm×m and Q ∈ Zn×n such that

PAQ = S . (1)

Theorem. Let A ∈ Zm×nr . Then A is equivalent over Z to a

matrix S = [sij ] ∈ Zm×nr such that:

(a) sii 6= 0 , i ∈ 1, r,

(b) sij = 0 otherwise, and

(c) sii divides si+1,i+1 for i ∈ 1, r − 1.

S is called the Smith normal form of A, and its nonzero

elements sii (i ∈ 1, r) are invariant factors of A.

22-a

• Integer solutions

Let A ∈ Zm×n,b ∈ Zm and let the linear equation

Ax = b (P)

be consistent. It is required to determine if (P) has an integer

solution, in which case determine all of them.

Theorem (Hurt and Waid). Let A ∈ Zm×n. Then there is an

n × m matrix X satisfying

AXA = A , (1)

XAX = X , (2)

AX ∈ Zm×m, XA ∈ Zn×n . (6)

Proof. Let PAQ = S be the Smith normal form of A. Then

X = QS†P .

23

• Integer solutions (con’d)

Let Â the {1, 2}–inverse of A as given above.

Theorem (Hurt and Waid). Let A and b be integral, and let

the vector equation

Ax = b (P)

be consistent. Then (P) has an integral solution if and only if the

vector

Âb

is integral, in which case the general integral solution of (P) is

x = Âb + (I − ÂA)y , y ∈ Zn .

24

• Application of {2}–inverses to Newton’s method

The Newton method for solving a single equation in 1 variable,

f(x) = 0 ,

is xk+1 = xk −f(xk)

f ′(xk), (k = 0, 1, . . .) .

A Newton method for solving m equations in n variables

fi(x1, . . . , xn) = 0 , i ∈ 1, m or f(x) = 0 ,

is similarly given, for the case m = n, by

xk+1 = xk − f ′(xk)−1f(xk) , (k = 0, 1, . . .) ,

where f ′(xk) is the derivative of f at xk, represented by the

matrix of partial derivatives (the Jacobian matrix)

f ′(xk) =

(∂fi

∂xj(xk)

).

25

• Notation

We denote the derivative of f at c

f ′(c) =

(∂fi

∂xj(c)

)by Jf (c) or by Jc .

We denote by ‖ · ‖ both a vector norm in Rn and a matrix norm

consistent with it,

‖Ax‖ ≤ ‖A‖‖x‖ , ∀ x .

For a given point x0 ∈ Rn and a positive scalar r we denote by

B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖< r}

the open ball with center x0 and radius r. The closed ball

with the same center and radius is

B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖≤ r} .

26

• Newton method using {2}–inverses of f ′

Theorem. Let x0 ∈ Rn, r > 0 and let f : Rn → Rm be

differentiable in B(x0, r). Let M > 0 be such that

‖Ju − Jv‖ ≤ M ‖u− v‖ (1)

for all u,v ∈ B(x0, r). Further, assume that for all x ∈ B(x0, r),

the Jacobian Jx has a {2}–inverse Tx ∈ Rn×m, TxJxTx = Tx,

such that ‖Tx0‖∥∥f(x0)

∥∥ < α , (2)

and, ‖(Tu − Tv)f(v)‖ ≤ N ‖u− v‖2

, ∀ u,v ∈ B(x0, r) (3)

M

2‖Tu‖ + N ≤ K < 1 , ∀ u ∈ B(x

0, r) (4)

for some positive scalars N, K and α, and

h := αK < 1 ,α

1 − h< r . (5)

27

• Theorem (cont’d)

Then:

(a) Starting at x0, all iterates

xk+1 = xk − Txk f(xk), k = 0, 1, . . . (6)

lie in B(x0, r).

(b) The sequence {xk} converges, as k → ∞, to a point

x∞ ∈ B(x0, r), that is a solution of

Tx∞f(x) = 0 . (7)

(c) For all k ≥ 0∥∥xk − x∞

∥∥ ≤ α h2k−1

1 − h2k. (8)

Since 0 < h < 1, the method is (at least) quadratically convergent.

The iterates converge not to a solution of f(x) = 0, but of (7). The

degree of approximation depends on the {2}–inverse used.

28-b