Lecture 2: Generalized Inverses - Ben-Israel.netbenisrael.net › GI-LECTURE-2.pdfLecture 2:...

of 28 /28
Lecture 2: Generalized Inverses 1

Embed Size (px)

Transcript of Lecture 2: Generalized Inverses - Ben-Israel.netbenisrael.net › GI-LECTURE-2.pdfLecture 2:...

  • Lecture 2: Generalized Inverses

    1

  • Moore’s plan

    The striking analogies between the theories for linear equations in

    n–dimensional Euclidean space, for Fredholm integral equations in

    the space of continuous functions defined on a finite real interval,

    and for linear equations in Hilbert space of infinitely many

    dimensions, led Moore to lay down his well–known principle.

    “The existence of analogies between central features of various

    theories implies the existence of a more fundamental general

    theory embracing the special theories as particular instances and

    unifying them as to those central features.” (Moore, 1912)

    “The effectiveness of the reciprocal of a non–singular finite

    matrix in the study of properties of such matrices makes it

    desirable to define if possible an analogous matrix to be

    associated with each finite matrix even if it is not square or, if

    square, is not necessarily non–singular.” (Moore 1935)

    2-b

  • Desiderata

    Cm×nr = the m × n matrices over C with rank r.

    A matrix A ∈ Cn×n is nonsingular if rank A = n, or detA 6= 0.

    The inverse of A satisfies, by definition, the following equations,

    AXA = A (1)

    XAX = X (2)

    (AX)∗ = AX (3)

    (XA)∗ = XA (4)

    AX = XA (5)as well as the conditions

    Ax = λx =⇒ A−1x =1

    λx (6)

    A, B nonsingular =⇒ (AB)−1 = B−1A−1 (7)

    These properties are desirable, can one have them for general A ?

    3

  • The Penrose equations

    The Penrose equations for A ∈ Cm×n are:

    AXA = A , (1)

    XAX = X , (2)

    (AX)∗ = AX , (3)

    (XA)∗ = XA . (4)

    Let A{i, j, . . . , k} denote the set of matrices X ∈ Cn×m which

    satisfy equations (i), (j), · · · , (k).

    A matrix X ∈ A{i, j, . . . , k} is called an {i, j, . . . , k}–inverse of A,

    and also denoted by A(i,j,...,k).

    In particular, a {1}–inverse, a {2}–inverse, a {1, 3}–inverse, etc.

    The Moore–Penrose inverse of A is its {1, 2, 3, 4}–inverse,

    denoted A†.

    4-a

  • Why Moore’s work was unknown in 1955?

    Answer: Telegraphic style and idiosyncratic notation. Example:

    (29.3) Theorem.

    UC B1 II B2 II κ12·) ·

    ∃ |λ21 type M2

    κ∗M

    1

    κ ·S2 κ12 λ21 = δ11M1

    κ

    · S1 λ21 κ12 = δ22M2

    κ∗

    English translation:

    (29.3) Theorem.

    For every matrix A there exists a unique matrix X : R(A) → R(A∗)

    such that

    AX = PR(A) , XA = PR(A∗) .

    5-a

  • Construction of {1}–inverses

    Given A ∈ Cm×nr , let E ∈ Cm×mm and P ∈ C

    n×nn be such that

    EAP =

    [Ir KO O

    ]

    . (1)

    Then for any L ∈ C(n−r)×(m−r), the n × m matrix

    X = P

    [Ir OO L

    ]

    E (2)

    is a {1}–inverse of A. The partitioned matrices in (1), (2) must be

    suitably interpreted in case r = m or r = n.

    Proof. Write (1) as

    A = E−1

    [Ir KO O

    ]P−1 ,

    then verify that any X given by (2) satisfies AXA = A. �

    6

  • Linear equations

    Given A ∈ Cm×n , b ∈ Cm, the equations

    Ax = b (1)

    have a solution if and only if for any X ∈ A{1},

    AXb = b , (2)

    in which case the general solution is

    x = X b + (I − XA)y , y ∈ Cn arbitrary (3)

    Proof. AXA = A =⇒ AX idempotent, rank AX = rank A.

    ∴ AX = PR(A),M , for some M such that Cm = R(A) ⊕ M .

    Ax = b consistent ⇐⇒ b ∈ R(A) ⇐⇒ PR(A),Mb = b , ∀ M

    Finally, A (X b + (I − XA)y) = AX b = b. �

    7-a

  • Linear matrix equations

    Theorem. Let A ∈ Cm×n , B ∈ Cp×q , D ∈ Cm×q. Then the

    matrix equation

    AXB = D (1)

    is consistent if and only if for some A(1), B(1),

    AA(1)DB(1)B = D , (2)

    in which case the general solution is

    X = A(1)DB(1) + Y − A(1)AY BB(1) (3)

    for arbitrary Y ∈ Cn×p.

    Proof. If (1) is consistent then

    D = AXB = AA(1)AXBB(1)B = AA(1)DB(1)B .

    8

  • Kronecker products and matrix equations

    The Kronecker product A ⊗ B of the two matrices

    A = (aij) ∈ Cm×n , B ∈ Cp×q is the mp × nq matrix

    A ⊗ B =

    a11B a12B · · · a1nB

    a21B a22B · · · a2nB

    · · · · · · · · · · · ·

    am1B am2B · · · amnB

    For X = (xij) ∈ Cm×n, let vec(X) = (vk) ∈ C

    mn be the vector

    obtained by listing the elements of X by rows,

    vn(i−1)+j = xij (i ∈ 1, m ; j ∈ 1, n)

    Lemma. For compatible matrices A, X, B

    (A ⊗ BT ) vec(X) = vec (AXB)

    9

  • Construction of {1, 2}–inverses

    Proposition. Let Y, Z ∈ A{1}, and let

    X = Y AZ .

    Then X ∈ A{1, 2}.

    Proof. AXA = A(Y AZ)A = (AY A)ZA = AZA = A ,

    XAX = (Y AZ)A(Y AZ) = Y (AZA)Y AZ = Y (AY A)Z = X . �

    Proposition. Any two of the following statements imply the third:

    (a) X ∈ A{1} ,

    (b) X ∈ A{2} ,

    (c) rank X = rank A .

    Proof. X ∈ A{1}, Y ∈ A{2} =⇒ rank Y ≤ rank A ≤ rank X , etc.

    10-a

  • Projections

    Theorem. For any A ∈ Cm×n , A(1) ∈ A{1}.

    R(AA(1)) = R(A) , N(A(1)A) = N(A) , R((A(1)A)∗) = R(A∗) .

    Proof. Always R(AX) ⊂ R(A) , N(A) ⊂ N(XA) .

    ButAXA = A =⇒ rank AX = rank XA = rank A .

    Theorem. Let X be a {1, 2}–inverses of A. Then:

    (a) AX is the projector on R(A) along N(X), and

    (b) XA is the projector on R(X) along N(A).

    Proof. AX = (AX)2 =⇒ AX = PR(AX),N(AX)

    AXA = A =⇒ R(AX) = R(A)

    XAX = X , rankAX = rank X =⇒ N(AX) = N(X)

    11-a

  • The set of {1, 3}–inverses

    Theorem. The set A{1, 3} consists of all solutions for X of

    AX = AA(1,3) , (1)

    where A(1,3) is an arbitrary element of A{1, 3}.

    Proof. If X satisfies (1), then

    AXA = AA(1,3)A = A , AX = (AX)∗ . ∴ X ∈ A{1, 3} .

    Conversely, if X ∈ A{1, 3}, then

    AA(1,3) = AXAA(1,3) = (AX)∗AA(1,3) = X∗A∗(A(1,3))∗A∗

    = X∗A∗ = AX .

    Theorem. The set A{1, 4} consists of all solutions for X of

    XA = A(1,4)A .

    12-b

  • Characterizations of {1, 3}, and {1, 4}–inverses

    Recall that for Cn = L ⊕ M .

    M = L⊥ ⇐⇒ PL,M is Hermitian

    Theorem. For any A ∈ Cm×n:

    (a) AX = PR(A) ⇐⇒ X ∈ A{1, 3}

    (b) XA = PR(A∗) ⇐⇒ X ∈ A{1, 4}

    Proof. (a) ⇐=

    AXA = A =⇒ AX = PR(AX),N(AX)

    AXA = A =⇒ R(AX) = R(A) ∴ AX = PR(A),N(AX)

    AX = (AX)∗ =⇒ N(AX) = R(A)⊥ ∴ AX = PR(A)

    (a) =⇒AX = PR(A) = AA

    (1,3) =⇒ X ∈ A{1, 3}

    13-a

  • {1, 2, 3}, and {1, 2, 4}–inverses

    Theorem (Urquhart). For every A ∈ Cm×n ,

    (A∗A)(1)A∗ ∈ A{1, 2, 3} , (a)

    A∗(AA∗)(1) ∈ A{1, 2, 4} , (b)

    A(1,4)AA(1,3) ∈ A{1, 2, 3, 4} . (c)

    Proof of (a). Let X := (A∗A)(1)A∗.

    R(A∗A) = R(A∗) (why?) =⇒ A∗ = A∗AU , ∃ U ∴ A = U∗A∗A

    ∴ AXA = U∗A∗A(A∗A)(1)A∗ = U∗A∗A = A ∴ X ∈ A{1}

    rank X ≤ rankA∗ and X ∈ A{1} =⇒ rank X ≥ rank A

    ∴ rankX = rank A ∴ X ∈ A{2}Finally

    AX = U∗A∗A(A∗A)(1)A∗AU = U∗A∗AU ∴ X ∈ A{3} �

    14

  • The Moore–Penrose inverse

    Theorem (Penrose). Given A ∈ Cm×n, a solution of

    AXA = A , (1)

    XAX = X , (2)

    (AX)∗ = AX , (3)

    (XA)∗ = XA , (4)

    exists and is unique. The {1, 2, 3, 4}–inverse of A is denoted A†.

    Proof. Uniqueness. Let X, Y ∈ A{1, 2, 3, 4}. Then

    X = X(AX)∗ = XX∗A∗ = X(AX)∗(AY )∗

    = XAY = (XA)∗(Y A)∗Y = A∗Y ∗Y

    = (Y A)∗Y = Y .

    Existence. A† = A(1,4)AA(1,3) . �

    15

  • Full–rank factorization

    Given A ∈ Cm×nr , r > 0, a full–rank factorization is

    A = CR , C ∈ Cm×rr , R ∈ Cr×nr (1)

    Theorem (MacDuffee). Given A ∈ Cm×nr , r > 0, C, R as in (1),

    A† = R∗(C∗AR∗)−1C∗ . (2)

    Proof. C∗AR∗ is nonsingular, because

    C∗AR∗ = (C∗C)(RR∗) , a product of nonsingular matrices .

    Let X = RHS(2) = R∗(RR∗)−1(C∗C)−1C∗ , and check that X

    satisfies the 4 Penrose equations. �

    A† = R∗(RR∗)−1(C∗C)−1C∗ = R†C† (3)

    Q: What is a “good” method for full–rank factorization ?

    16-b

  • Singular value decomposition

    Let A ∈ Cm×nr , r > 0, and let

    AA∗ui = σ2i ui , i ∈ 1, m

    A∗Avi = σ2i vi , i ∈ 1, n

    σ1 ≥ σ1 ≥ · · · ≥ σr > 0 = σr+1 = σr+2 = · · ·

    The singular value decomposition (SVD) of A is

    A = UΣV ∗ (SVD)

    U = [u1 u2 · · · um] ∈ Cm×m , U∗U = Im ,

    V = [v1 v2 · · · vn] ∈ Cn×n , V ∗V = In ,

    Σ = diag(σ1, σ2, · · · , σr) ∈ Rm×n .

    Theorem (Penrose).A† = V Σ†U∗

    where Σ† = diag

    (1

    σ1,

    1

    σ2, · · · ,

    1

    σr

    )∈ Rn×m

    17

  • Properties of the Moore–Penrose inverse

    (a) For any scalar λ,λ† =

    , if λ 6= 0 ;

    0 , otherwise .

    If a,b are column vectors then

    (b) a† = (a∗a)†a∗ (c) (ab∗)† = (a∗a)†(b∗b)†ba∗

    (d) If D = diag(λ1, · · · , λk) ∈ Cm×n then

    D† = diag(λ†1, · · · , λ†k) ∈ C

    n×m

    For any matrix A

    (e) (A†)† = A (f) (A∗)† = (A†)∗

    (g) (AT )† = (A†)T (h) A† = (A∗A)†A∗ = A∗(AA∗)†

    (i) R(A†) = R(A∗) (j) N(A†) = N(A∗)

    (k) AA† = PR(A) (l) A†A = PR(A∗)

    (m) If U and V are unitary matrices, (UAV )† = V ∗A†U∗

    (n) For any matrices A, B: (A ⊗ B)† = A† ⊗ B†

    18

  • Non–properties of the Moore–Penrose inverse

    (a) In general, for compatible A, B,

    (AB)† 6= B†A†

    (b) If A, B are similar, i.e. B = S−1AS for some nonsingular S,

    then, in general, B† 6= S−1A†S .

    (c) If Jk(0) is a Jordan block corresponding to the eigenvalue zero,

    then (Jk(0))† = (Jk(0))

    T . For example,

    0 1 0 0

    0 0 1 0

    0 0 0 1

    0 0 0 0

    =

    0 0 0 0

    1 0 0 0

    0 1 0 0

    0 0 1 0

    ∴ A† is not a polynomial in A.

    19-b

  • Continuity of the inverse

    Let ‖ · ‖ be a multiplicative matrix norm, i.e.

    ‖XY ‖ ≤ ‖X‖‖Y ‖ , if XY is defined

    Let X ∈ Cn×nn . Then the perturbation (X + E) = (I + EX−1)X

    is nonsingular for all E such that ‖E‖ <1

    ‖X−1‖and its inverse is

    (X + E)−1 = X−1(I − EX−1 + (EX−1)2 − (EX−1)3 + · · ·

    )

    which converges if

    ‖EX−1‖ < 1 , guaranteed by ‖E‖ <1

    ‖X−1‖

    The inverse is a continuous function Cn×nn 7→ Cn×nn , and the

    nonsingular matrices are an open set in Cn×n.

    20

  • The Moore–Penrose inverse is discontinuous

    Ex. Let

    X(ǫ) =

    1 00 ǫ

    → X(0) =

    1 00 0

    , as ǫ → 0 .

    But

    X(ǫ)† =

    1 0

    01

    ǫ

    6→ X(0)† =

    1 00 0

    .

    For perturbations Ek → O,

    (X + Ek)† → X† ⇐⇒ rank (X + Ek) → rank X

    21-a

  • The Smith normal form

    A nonsingular matrix A ∈ Zn×n whose inverse A−1 is also in Zn×n

    is called a unit matrix.

    Two matrices A, S ∈ Zm×n are said to be equivalent over Z if

    there exist two unit matrices P ∈ Zm×m and Q ∈ Zn×n such that

    PAQ = S . (1)

    Theorem. Let A ∈ Zm×nr . Then A is equivalent over Z to a

    matrix S = [sij ] ∈ Zm×nr such that:

    (a) sii 6= 0 , i ∈ 1, r,

    (b) sij = 0 otherwise, and

    (c) sii divides si+1,i+1 for i ∈ 1, r − 1.

    S is called the Smith normal form of A, and its nonzero

    elements sii (i ∈ 1, r) are invariant factors of A.

    22-a

  • Integer solutions

    Let A ∈ Zm×n,b ∈ Zm and let the linear equation

    Ax = b (P)

    be consistent. It is required to determine if (P) has an integer

    solution, in which case determine all of them.

    Theorem (Hurt and Waid). Let A ∈ Zm×n. Then there is an

    n × m matrix X satisfying

    AXA = A , (1)

    XAX = X , (2)

    AX ∈ Zm×m, XA ∈ Zn×n . (6)

    Proof. Let PAQ = S be the Smith normal form of A. Then

    X = QS†P .

    23

  • Integer solutions (con’d)

    Let  the {1, 2}–inverse of A as given above.

    Theorem (Hurt and Waid). Let A and b be integral, and let

    the vector equation

    Ax = b (P)

    be consistent. Then (P) has an integral solution if and only if the

    vector

    Âb

    is integral, in which case the general integral solution of (P) is

    x = Âb + (I − ÂA)y , y ∈ Zn .

    24

  • Application of {2}–inverses to Newton’s method

    The Newton method for solving a single equation in 1 variable,

    f(x) = 0 ,

    is xk+1 = xk −f(xk)

    f ′(xk), (k = 0, 1, . . .) .

    A Newton method for solving m equations in n variables

    fi(x1, . . . , xn) = 0 , i ∈ 1, m or f(x) = 0 ,

    is similarly given, for the case m = n, by

    xk+1 = xk − f ′(xk)−1f(xk) , (k = 0, 1, . . .) ,

    where f ′(xk) is the derivative of f at xk, represented by the

    matrix of partial derivatives (the Jacobian matrix)

    f ′(xk) =

    (∂fi

    ∂xj(xk)

    ).

    25

  • Notation

    We denote the derivative of f at c

    f ′(c) =

    (∂fi

    ∂xj(c)

    )by Jf (c) or by Jc .

    We denote by ‖ · ‖ both a vector norm in Rn and a matrix norm

    consistent with it,

    ‖Ax‖ ≤ ‖A‖‖x‖ , ∀ x .

    For a given point x0 ∈ Rn and a positive scalar r we denote by

    B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖< r}

    the open ball with center x0 and radius r. The closed ball

    with the same center and radius is

    B(x0, r) = {x ∈ Rn : ‖ x − x0 ‖≤ r} .

    26

  • Newton method using {2}–inverses of f ′

    Theorem. Let x0 ∈ Rn, r > 0 and let f : Rn → Rm be

    differentiable in B(x0, r). Let M > 0 be such that

    ‖Ju − Jv‖ ≤ M ‖u− v‖ (1)

    for all u,v ∈ B(x0, r). Further, assume that for all x ∈ B(x0, r),

    the Jacobian Jx has a {2}–inverse Tx ∈ Rn×m, TxJxTx = Tx,

    such that ‖Tx0‖∥∥f(x0)

    ∥∥ < α , (2)

    and, ‖(Tu − Tv)f(v)‖ ≤ N ‖u− v‖2

    , ∀ u,v ∈ B(x0, r) (3)

    M

    2‖Tu‖ + N ≤ K < 1 , ∀ u ∈ B(x

    0, r) (4)

    for some positive scalars N, K and α, and

    h := αK < 1 ,α

    1 − h< r . (5)

    27

  • Theorem (cont’d)

    Then:

    (a) Starting at x0, all iterates

    xk+1 = xk − Txk f(xk), k = 0, 1, . . . (6)

    lie in B(x0, r).

    (b) The sequence {xk} converges, as k → ∞, to a point

    x∞ ∈ B(x0, r), that is a solution of

    Tx∞f(x) = 0 . (7)

    (c) For all k ≥ 0∥∥xk − x∞

    ∥∥ ≤ α h2k−1

    1 − h2k. (8)

    Since 0 < h < 1, the method is (at least) quadratically convergent.

    The iterates converge not to a solution of f(x) = 0, but of (7). The

    degree of approximation depends on the {2}–inverse used.

    28-b