Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’....

14
1 Topics in Eigen-analysis Lin Zanjiang 28 July 2014 Contents 1 Terminology ......................................................................................... 2 2 Some Basic Properties and Results ....................................................... 2 3 Eigen-properties of Hermitian Matrices ................................................ 5 3.1 Basic Theorems ............................................................................................................... 5 3.2 Quadratic Forms & Nonnegative Definite Matrices........................................................ 6 3.2.1 Definitions ................................................................................................................ 7 3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices ........................................ 7 4 Inequalities and Extremal Properties of Eigenvalues ............................. 9 4.1 The Rayleigh Quotient and the Courant-Fischer Min-max Theorem.............................. 9 4.2 Some Eigenvalue Inequalities ....................................................................................... 11 4.3 Application to Principal Component Analysis (PCA) ................................................... 13

Transcript of Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’....

Page 1: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

1

Topics in Eigen-analysis

Lin Zanjiang

28 July 2014

Contents

1 Terminology ......................................................................................... 2

2 Some Basic Properties and Results ....................................................... 2

3 Eigen-properties of Hermitian Matrices ................................................ 5

3.1 Basic Theorems ............................................................................................................... 5

3.2 Quadratic Forms & Nonnegative Definite Matrices ........................................................ 6

3.2.1 Definitions ................................................................................................................ 7

3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices ........................................ 7

4 Inequalities and Extremal Properties of Eigenvalues ............................. 9

4.1 The Rayleigh Quotient and the Courant-Fischer Min-max Theorem .............................. 9

4.2 Some Eigenvalue Inequalities ....................................................................................... 11

4.3 Application to Principal Component Analysis (PCA) ................................................... 13

Page 2: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

2

1 Terminology

(1) ๐‘๐ด(๐‘ฅ) โˆถ= det(๐‘ฅ๐ผ โˆ’ ๐ด) โ‰ the characteristic polynomial of ๐ด

(2) The eigenvalues ๐œ† of ๐ด โˆถ= the roots of ๐‘๐ด(๐‘ฅ)

(3) The ๐œ† โˆ’eigenvectors ๐’™ โˆถ= the nonzero solutions to (๐œ†๐ผ โˆ’ ๐ด)๐’™ = ๐ŸŽ

(4) The eigenvalue-eigenvector equation: ๐ด๐’™ = ๐œ†๐’™

(5) ๐‘†๐ด(๐œ†) โˆถ= The eigenspace of a matrix ๐ด corresponding to the eigenvalue ๐œ†

(6) The characteristic equation: |๐‘ฅ๐ผ โˆ’ ๐ด| = 0

(7) Standard inner product: โŸจ๐’›, ๐’˜โŸฉ = ๐‘ง1๐‘ค1ฬ…ฬ…ฬ…ฬ… + โ‹ฏ+ ๐‘ง๐‘›๐‘ค๐‘›ฬ…ฬ… ฬ…ฬ… ๐’›, ๐’˜ โˆˆ โ„‚๐‘›

2 Some Basic Properties and Results

Theorem 2.1

(a) The eigenvalues of ๐ด are the same as the eigenvalues of ๐ด๐‘‡

(b) ๐ด is singular if and only if at least one eigenvalue of ๐ด is equal to zero

(c) The eigenvalues and corresponding geometric multiplicities of ๐ต๐ด๐ตโˆ’1 are the

same as those of ๐ด, if ๐ต is a nonsingular matrix

(d) The modulus of each eigenvalue of ๐ด is equal to 1 if ๐ด is an orthogonal matrix

Theorem 2.2(revision)

Suppose that ๐œ† is an eigenvalue, with multiplicity ๐‘Ÿ โ‰ฅ 1, of the ๐‘› ร— ๐‘› matrix ๐ด. Then

1 โ‰ค dim{๐‘†๐ด(๐œ†)} โ‰ค ๐‘Ÿ

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“. If ๐œ† is an eigenvalue of ๐ด , then by definition an ๐’™ โ‰  ๐ŸŽ satisfying the

equation ๐ด๐’™ = ๐œ†๐’™ exists, and so clearly dim {๐‘†๐ด(๐œ†)} โ‰ฅ 1. Now let k = dim {๐‘†๐ด(๐œ†)},

and let ๐’™1, โ‹ฏ , ๐’™๐‘˜ be linearly independent eigenvectors corresponding to ๐œ†. Form a

nonsingular ๐‘› ร— ๐‘› matrix ๐‘‹ that has these ๐‘˜ vectors as its first ๐‘˜ columns; that is,

๐‘‹ has the form ๐‘‹ = [๐‘‹1 ๐‘‹2] , where ๐‘‹1 = (๐’™1,โ‹ฏ , ๐’™๐‘˜) and ๐‘‹2 is ๐‘› ร— (๐‘› โˆ’ ๐‘˜) .

Since each column of ๐‘‹1 is an eigenvector of ๐ด corresponding to the eigenvalue ๐œ†, it

follows that ๐ด๐‘‹1 = ๐œ†๐‘‹1, and

Page 3: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

3

๐‘‹โˆ’1๐‘‹1 = [๐ผ๐‘˜(0)],

which follows from the fact that ๐‘‹โˆ’1๐‘‹ = ๐ผ๐‘›. As a result, we find that

๐‘‹โˆ’1๐ด๐‘‹ = ๐‘‹โˆ’1[๐ด๐‘‹1 ๐ด๐‘‹2]

= ๐‘‹โˆ’1[๐œ†๐‘‹1 ๐ด๐‘‹2]

= [๐œ†๐ผ๐‘˜ ๐ต1(0) ๐ต2

],

where ๐ต1 and ๐ต2 are a partition of the matrix ๐‘‹โˆ’1๐ด๐‘‹2 . If u is an eigenvalue of

๐‘‹โˆ’1๐ด๐‘‹, then

0 = |๐‘‹โˆ’1๐ด๐‘‹ โˆ’ ๐œ‡๐ผ๐‘›| = |(๐œ† โˆ’ ๐œ‡)๐ผ๐‘˜ ๐ต1(0) ๐ต2 โˆ’ ๐œ‡๐ผ๐‘›โˆ’๐‘˜

|

= (๐œ† โˆ’ ๐œ‡)๐‘˜|๐ต2 โˆ’ ๐œ‡๐ผ๐‘›โˆ’๐‘˜|,

Thus, ๐œ† must be an eigenvalue of ๐‘‹โˆ’1๐ด๐‘‹ with multiplicity of at least ๐‘˜. The result

follows because from Theorem 2.1(c), the eigenvalues and corresponding geometric

multiplicities of ๐‘‹โˆ’1๐ด๐‘‹ are the same as those of ๐ด.

Theorem 2.3

Let ๐œ† be an eigenvalue of the ๐‘› ร— ๐‘› matrix ๐ด , and let ๐’™ be a corresponding

eigenvector. Then,

(a) If ๐‘˜ โ‰ฅ 1 is an integer, ๐œ†๐‘˜ is an eigenvalue of ๐ด๐‘˜ corresponding to the

eigenvector ๐’™.

(b) If ๐ด is nonsingular, ๐œ†โˆ’1 is an eigenvalue of ๐ดโˆ’1 corresponding to the

eigenvector ๐’™.

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(a) Let us prove by induction. Clearly, (a) holds when ๐‘˜ = 1 because it follows from

the definition of eigenvalue and eigenvector. Next, if (a) holds for ๐‘˜ โˆ’ 1, that is,

๐ด๐‘˜โˆ’1๐’™ = ๐œ†๐‘˜โˆ’1๐’™, then

๐ด๐‘˜๐’™ = ๐ด(๐ด๐‘˜โˆ’1๐’™) = ๐ด(๐œ†๐‘˜โˆ’1๐’™)

= ๐œ†๐‘˜โˆ’1(๐ด๐’™) = ๐œ†๐‘˜โˆ’1(๐œ†๐’™) = ๐œ†๐‘˜๐’™

(b) Let us premultiply the equation ๐ด๐’™ = ๐œ†๐’™ by ๐ดโˆ’1, which gives the equation

๐’™ = ๐œ†๐ดโˆ’1๐’™

Since ๐ด is nonsingular, we know from Theorem 2.1(b) that ๐œ† โ‰  0, and so dividing

Page 4: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

4

both sides of the above equation by ๐œ† yields

๐ดโˆ’1๐’™ = ๐œ†โˆ’1๐’™ ,

which implies that ๐ดโˆ’1 has an eigenvalue ๐œ†โˆ’1 and corresponding eigenvector ๐’™.

Theorem 2.4

Let ๐ด be an ๐‘› ร— ๐‘› matrix with eigenvalues ๐œ†1, โ‹ฏ , ๐œ†๐‘›. Then

(a) tr(๐ด) = โˆ‘ ๐œ†๐‘–๐‘›๐‘–=1

(b) |๐ด| = โˆ ๐œ†๐‘–๐‘›๐‘–=1

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

Express the characteristic equation, |๐‘ฅ๐ผ๐‘› โˆ’ ๐ด| = 0 into the polynomial form

๐‘ฅ๐‘› + ๐›ผ๐‘›โˆ’1๐‘ฅ๐‘›โˆ’1 +โ‹ฏ+ ๐›ผ1๐‘ฅ + ๐›ผ0 = 0

To determine ๐‘Ž0, we can substitute ๐‘ฅ = 0 into the equation, thus, ๐›ผ0 = |(0)๐ผ๐‘› โˆ’ ๐ด| =

|๐ด|. To determine ๐›ผ๐‘›โˆ’1, we are actually going to find the coefficient of ๐‘ฅ๐‘›โˆ’1. Recall

that the determinant is actually a sum of terms that are products of one entry in each

column(row) with row(column) positions spanning over all permutations of the integer

(1, 2,โ‹ฏ ,๐‘š) with proper +/โˆ’ signs, it can be easily seen that the only term that

involves at least ๐‘› โˆ’ 1 of the diagonal elements of (๐‘ฅ๐ผ๐‘› โˆ’ ๐ด) is the term that

involves the product of all the diagonal elements. Now that this term involves an even

permutation as there exists a trivial composition of zero transpositions, the sign term

should be +1, therefore, ๐›ผ๐‘›โˆ’1 will be the coefficient of ๐‘ฅ๐‘›โˆ’1 in

(๐‘ฅ โˆ’ ๐‘Ž11)(๐‘ฅ โˆ’ ๐‘Ž22)โ‹ฏ (๐‘ฅ โˆ’ ๐‘Ž๐‘›๐‘›),

which is obviously โˆ’tr(๐ด) . Now finally, note that ๐œ†1, โ‹ฏ , ๐œ†๐‘› are the roots to the

characteristic equation, it follows that

(๐‘ฅ โˆ’ ๐œ†1)(๐‘ฅ โˆ’ ๐œ†2)โ‹ฏ (๐‘ฅ โˆ’ ๐œ†๐‘›) = 0

With coefficient matching, we find that

|๐ด| =โˆ๐œ†๐‘–

๐‘›

๐‘–=1

, tr(๐ด) =โˆ‘๐œ†๐‘–

๐‘›

๐‘–=1

which completes the proof.

Page 5: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

5

3 Eigen-properties of Hermitian Matrices

3.1 Basic Theorems

Theorem 3.1.1

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a matrix, then ๐ด is hermitian if and only if

โŸจ๐ด๐’™, ๐’šโŸฉ = โŸจ๐’™, ๐ด๐’šโŸฉ

For all vectors ๐’™, ๐’š โˆˆ โ„‚๐‘›.

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(1) If ๐ด is hermitian, then โŸจ๐ด๐’™, ๐’šโŸฉ = ๐’š๐‘‡๐ด๐’™ = ๐’š๐‘‡๐ด๐‘‡๐’™ = (๐ด๐’š)๐‘ป๐’™ = โŸจ๐’™, ๐ด๐’šโŸฉ , which

proves the " โŸน " part.

(2) For the converse direction, let ๐’™, ๐’š take the form of ๐’†๐Ÿ, โ‹ฏ , ๐’†๐’ respectively, which

are the standard basis of โ„๐‘›, then it is immediately clear that โˆ€๐‘–, ๐‘— = 1,โ‹ฏ , ๐‘›, ๐‘Ž๐‘–๐‘— =

๐‘Ž๐‘—๐‘–ฬ…ฬ… ฬ… โŸน ๐ด is hermitian.

Theorem 3.1.2 (The Schurโ€™s Theorem Revisited)

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a matrix, there exists a unitary matrix ๐‘ˆ such that

๐‘ˆ๐ป๐ด๐‘ˆ = ๐‘‡

is upper triangular, which is called the Schurโ€™s Theorem. Actually, it is just the complex

counterpart of the Triangulation Theorem we learnt in class. With a minor modification,

we may write

๐ด = ๐‘ˆ๐‘‡๐‘ˆ๐ป

And this is called the Schur Decomposition, with the diagonal entries of ๐‘‡ being the

eigenvalues of ๐ด.

3.1.3 Theorem (The Spectral Theorem Revisited)

If the ๐ด in the last theorem turns out to be hermitian, then the corresponding ๐‘‡ will

become diagonal. This is called the Spectral Theorem. Similarly, it is the complex

extension of the Principal Axis Theorem we learnt in class. Again, with the same

modification, we may rewrite it as

๐ด = ๐‘ˆ๐‘‡๐‘ˆ๐ป

This is called the Spectral Decomposition. These two eigenvalue decompositions are

just special cases of the Singular Value Decomposition which applies to nonsquare

Page 6: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

6

matrices.

3.1.4 Theorem

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a hermitian matrix, then

(a) ๐’™๐ป๐ด๐’™ is real โˆ€๐’™ โˆˆ โ„‚๐’.

(b) All eigenvalues of ๐ด are real

(c) ๐‘†๐ป๐ด๐‘† is hermitian for all ๐‘† โˆˆ ๐‘€๐‘›ร—๐‘›.

(d) Eigenvectors of ๐ด corresponding to distinct eigenvalues are orthogonal.

(e) It is possible to construct a set of ๐‘› orthonormal eigenvectors of ๐ด.

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(a) (๐’™๐ป๐ด๐’™ฬ…ฬ… ฬ…ฬ… ฬ…ฬ… ฬ…) = (๐’™๐ป๐ด๐’™)๐ป = ๐’™๐ป๐ด๐ป๐’™ = ๐’™๐ป๐ด๐’™ , that is, ๐’™๐ป๐ด๐’™ equals its complex

conjugate and hence is real.

(b) (1) If ๐ด๐’™ = ๐œ†๐’™ and ๐’™๐ป๐’™ = ๐‘˜ โˆˆ โ„+ , then ๐œ† =๐œ†

๐‘˜๐’™๐ป๐’™ =

1

๐‘˜๐’™๐ป๐œ†๐’™ =

1

๐‘˜(๐’™๐ป๐ด๐’™ ),

which is real by (a).

(2) Alternative proof for (b): let ๐œ†, ๐œ‡ be 2 eigenvalues of ๐ด, having eigenvectors

๐’™, ๐’š correspondingly, it follows that ๐ด๐’™ = ๐œ†๐’™ and ๐ด๐’š = ๐œ‡๐’š , according to

Theorem 3.1.1, we have ฮปโŸจ๐’™, ๐’šโŸฉ = โŸจ๐œ†๐’™, ๐’šโŸฉ = โŸจ๐ด๐’™, ๐’šโŸฉ = โŸจ๐’™, ๐ด๐’šโŸฉ = โŸจ๐’™, ๐œ‡๐’šโŸฉ =

๏ฟฝฬ…๏ฟฝโŸจ๐’™, ๐’šโŸฉ. In the case where ๐œ‡ = ๐œ† & ๐’š = ๐’™, it becomes ๐œ†โŸจ๐’™, ๐’™โŸฉ = ๏ฟฝฬ…๏ฟฝโŸจ๐’™, ๐’™โŸฉ, which

in turn implies that ๐œ† = ๏ฟฝฬ…๏ฟฝ since we know โŸจ๐’™, ๐’™โŸฉ = โ€–๐’™โ€–2 > 0 for a nonzero

eigenvector ๐’™. Therefore, ๐œ† must be real. And similarly, ๐œ‡ is real.

(c) (๐‘†๐ป๐ด๐‘†)๐ป = ๐‘†๐ป๐ด๐ป๐‘† = ๐‘†๐ป๐ด๐‘†, so ๐‘†๐ป๐ด๐‘† is always hermitian.

(d) Following the discussion in (b)(2), finally we can get the equation ๐œ†โŸจ๐’™, ๐’šโŸฉ =

๐œ‡โŸจ๐’™, ๐’šโŸฉ, so if ๐œ† โ‰  ๐œ‡, then it follows immediately that โŸจ๐’™, ๐’šโŸฉ = 0, thus implying

that ๐‘ฅ and ๐‘ฆ are orthogonal.

(e) Following from the Spectral Decomposition, we rewrite it as

๐ด๐‘ˆ = ๐‘ˆ๐‘‡

โ†” A[๐’™1 โ‹ฏ ๐’™๐‘›] = [๐’™๐Ÿ โ‹ฏ ๐’™๐’] [๐œ†1 0

โ‹ฑ0 ๐œ†๐‘›

]

โ†” [๐ด๐’™1 โ‹ฏ ๐ด๐’™๐’] = [๐œ†๐’™1 โ‹ฏ ๐œ†๐’™๐‘›]

Therefore, it can be easily seen that (๐’™1 โ‹ฏ ๐’™๐‘›) are the corresponding

eigenvectors to (๐œ†1 โ‹ฏ ๐œ†๐‘›). As ๐‘ˆ is unitary, it follows that these eigenvectors

are orthonormal.

3.2 Quadratic Forms & Nonnegative Definite Matrices

Page 7: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

7

3.2.1 Definitions

(a) Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a symmetric matrix (hermitian matrix with real entries) and ๐’™

denote a ๐‘› ร— 1 column vector, then ๐‘„ = ๐’™๐‘‡๐ด๐’™ is said to be a quadratic form.

Observe that

๐‘„ = ๐’™๐‘‡๐ด๐’™ = (๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘›) (

๐‘Ž๐Ÿ๐Ÿ โ‹ฏ ๐‘Ž๐Ÿ๐’โ‹ฎ โ‹ฑ โ‹ฎ๐‘Ž๐’๐Ÿ โ‹ฏ ๐‘Ž๐’๐’

)(

๐‘ฅ1โ‹ฎ๐‘ฅ๐‘›)

= (๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘›)

(

โˆ‘๐‘Ž๐Ÿ๐’Š๐‘ฅ๐‘–

โ‹ฎ

โˆ‘๐‘Ž๐’๐’Š๐‘ฅ๐‘–)

= โˆ‘๐‘Ž๐‘–๐‘—๐‘ฅ๐‘–๐‘ฅ๐‘—๐‘–,๐‘—

(b) Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a symmetric matrix, then ๐ด is

(1) Positive definite if โˆ€๐’™ โ‰  0 & ๐’™ โˆˆ โ„๐‘›, ๐‘„ = ๐’™๐‘‡๐ด๐’™ > 0

(2) Nonnegative definite (Positive semidefinite) if โˆ€๐’™ โˆˆ โ„๐‘›, ๐‘„ = ๐’™๐‘‡๐ด๐’™ โ‰ฅ 0

(3) Negative definite if โˆ€๐’™ โ‰  0 & ๐’™ โˆˆ โ„๐‘›, ๐‘„ = ๐’™๐‘‡๐ด๐’™ < 0

(4) Nonpositive definite (Negative semidefinite) if โˆ€๐’™ โˆˆ โ„๐‘›, ๐‘„ = ๐’™๐‘‡๐ด๐’™ โ‰ค 0

(5) Indefinite if ๐‘„ > 0 for some ๐’™ while ๐‘„ < 0 for some other ๐’™

We are only interested in positive or nonnegative cases because all theorems will be

similar for negative or nonpositive cases.

3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices

3.2.2.1 Theorem

Let ๐œ†1, โ‹ฏ , ๐œ†๐‘› be the eigenvalues of the ๐‘› ร— ๐‘› symmetric matrix ๐ด, then

(a) ๐ด is positive definite if and only if ๐œ†๐‘– > 0 for all ๐‘–,

(b) ๐ด is nonnegative definite if and only if ๐œ†๐‘– โ‰ฅ 0 for all ๐‘–.

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(a) Let the columns of ๐‘ˆ = (๐’™1 โ‹ฏ ๐’™๐‘›) be a set of orthonormal eigenvectors of ๐ด

corresponding to the eigenvalues ๐œ†1, โ‹ฏ , ๐œ†๐‘› , so that ๐ด = ๐‘ˆ๐‘‡๐‘ˆ๐‘‡ , where ๐‘‡ =

diag (๐œ†1, โ‹ฏ , ๐œ†๐‘›) . If ๐ด is positive definite, then ๐’™๐‘‡๐ด๐’™ > 0 for all ๐’™ โ‰  ๐ŸŽ , so in

particular, choosing ๐’™ = ๐’™๐’Š, we have

๐’™๐’Š๐‘‡๐ด๐’™๐’Š = ๐’™๐’Š

๐‘ป(๐œ†๐‘–๐’™๐’Š) = ๐œ†๐‘–๐’™๐’Š๐‘‡๐’™๐’Š = ๐œ†๐’Š > 0

Page 8: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

8

Conversely, if ๐œ†๐‘– > 0 for all ๐‘–, then for any ๐’™ โ‰  0 define ๐’š = ๐‘ˆ๐‘‡๐’™, and note that

๐’™๐‘‡๐ด๐’™ = ๐’™๐‘‡๐‘ˆ๐‘‡๐‘ˆ๐‘‡๐’™ = ๐’š๐‘ป๐‘‡๐’š =โˆ‘๐‘ฆ๐’Š๐Ÿ๐œ†๐’Š

๐’

๐’Š=๐Ÿ

has to be positive since the ๐œ†๐’Šs are all positive and at least one of the ๐‘ฆ๐’Š๐Ÿs is positive

because ๐’š โ‰  ๐ŸŽ.

(b) By similar argument as in (a), it is easy to show that ๐ด is nonnegative definite if

and only if ๐œ†๐‘– โ‰ฅ 0 for all ๐‘–.

3.2.2.2 Theorem

Let ๐‘‡ be an ๐‘š ร— ๐‘› real matrix with rank(๐‘‡) = ๐‘Ÿ. Then

(a) ๐‘‡๐‘‡๐‘‡ has ๐‘Ÿ positive eigenvalues. It is always nonnegative definite and positive

definite if ๐‘Ÿ = ๐‘›.

(b) The positive eigenvalues of ๐‘‡๐‘‡๐‘‡ are equal to the positive eigenvalues of ๐‘‡๐‘‡๐‘‡.

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(a) For any nonzero ๐‘› ร— 1 vector ๐’™, let ๐’š = ๐‘‡๐’™, then

๐’™๐‘‡๐‘‡๐‘‡๐‘‡๐’™ = ๐’š๐‘‡๐’š =โˆ‘๐‘ฆ๐’Š2

๐‘›

๐‘–=1

is nonnegative, so ๐‘‡๐‘‡๐‘‡ is nonnegative definite, and thus by Theorem 3.2.2.1(b), all of

its eigenvalues are nonnegative. Further, observe that ๐’™ โ‰  ๐ŸŽ is an eigenvector of ๐‘‡๐‘‡๐‘‡

corresponding to a zero eigenvalue if and only if ๐’š = ๐‘‡๐’™ = ๐ŸŽ and thus the above

equation equals zero. Therefore, the number of zero eigenvalues equals the dimension

of ๐‘›๐‘ข๐‘™๐‘™(๐‘‡), which is ๐‘› โˆ’ ๐‘Ÿ, so (a) is proved.

(b) Let ๐œ† > 0 be an eigenvalue of ๐‘‡๐‘‡๐‘‡ with multiplicity โ„Ž. Since the ๐‘› ร— ๐‘› matrix

๐‘‡๐‘‡๐‘‡ is symmetric, we can find an ๐‘› ร— โ„Ž matrix ๐‘‹, whose columns are orthonormal,

satisfying

๐‘‡๐‘‡๐‘‡๐‘‹ = ๐œ†๐‘‹.

Let ๐‘Œ = ๐‘‡๐‘‹ and observe that

๐‘‡๐‘‡๐‘‡๐‘Œ = ๐‘‡๐‘‡๐‘‡๐‘‡๐‘‹ = ๐‘‡(๐œ†๐‘‹) = ๐œ†๐‘‡๐‘‹ = ๐œ†๐‘Œ,

so that ๐œ† is also an eigenvalue of ๐‘‡๐‘‡๐‘‡ with multiplicity also being โ„Ž because

rank(๐‘Œ) = rank(๐‘‡๐‘‹) = rank((๐‘‡๐‘‹)๐‘‡๐‘‡๐‘‹)

= rank(๐‘‹๐‘‡๐‘‡๐‘‡๐‘‡๐‘‹) = rank(๐œ†๐‘‹๐‘‡๐‘‹)

Page 9: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

9

= rank(๐œ†๐ผโ„Ž) = โ„Ž

So the proof is done.

4 Inequalities and Extremal Properties of Eigenvalues

4.1 The Rayleigh Quotient and the Courant-Fischer Min-max

Theorem

In this section, we are going to investigate some extremal properties of the eigenvalues

of a hermitian matrix, and see how to turn the problem of finding the eigenvalues into

a constrained optimization problem.

4.1.1 Definition

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a hermitian matrix, then the Rayleigh quotient of ๐ด, denoted as

๐‘…๐ด(๐’™), is a function from โ„‚๐‘›\{๐ŸŽ} to โ„, defined as follows:

๐‘…๐ด(๐’™) =๐’™๐ป๐ด๐’™

๐’™๐ป๐’™

It is not difficult to see that when the norm of ๐’™, โ€–๐’™โ€– = 1, the Rayleigh quotient of ๐ด

actually equals its quadratic form. In the next part, we are ready to relate the Rayleigh

quotient of a hermitian matrix to it eigenvalues.

4.1.2 Theorem

Let ๐ด be a hermitian ๐‘› ร— ๐‘› matrix with ordered eigenvalues ๐œ†1 โ‰ฅ ๐œ†2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œ†๐‘›. For

any ๐’™ โˆˆ โ„‚๐‘›\{๐ŸŽ}

๐œ†๐‘› โ‰ค๐’™๐ป๐ด๐’™

๐’™๐ป๐’™โ‰ค ๐œ†1

And, in particular

๐œ†๐‘› = min๐’™โ‰ ๐ŸŽ

๐’™๐ป๐ด๐’™

๐’™๐ป๐’™

๐œ†1 = max๐’™โ‰ ๐ŸŽ

๐’™๐ป๐ด๐’™

๐’™๐ป๐’™

Page 10: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

10

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

Let ๐ด = ๐‘ˆ๐‘‡๐‘ˆ๐ป be the spectral decomposition of ๐ด , where the columns of ๐‘ˆ =(๐’™๐Ÿ โ‹ฏ ๐’™๐’) are the orthonormal set of eigenvectors corresponding to ๐œ†1,โ‹ฏ , ๐œ†๐‘› ,

which make up the diagonal entries of the diagonal matrix ๐‘‡ . As in the proof of

Theorem 3.2.2.1, define ๐’š = ๐‘ˆ๐ป๐’™, then we have

๐’™๐ป๐ด๐’™

๐’™๐ป๐’™=๐’™๐ป๐‘ˆ๐‘‡๐‘ˆ๐ป๐’™

๐’™๐ป๐‘ˆ๐‘ˆ๐ป๐’™=๐’š๐ป๐‘‡๐’š

๐’š๐ป๐’š=โˆ‘ ๐œ†๐’Š๐‘ฆ๐’Š

๐Ÿ๐’๐’Š=๐Ÿ

โˆ‘ ๐‘ฆ๐’Š๐Ÿ๐’๐’Š=๐Ÿ

Together with the fact that

๐œ†๐’โˆ‘๐‘ฆ๐’Š๐Ÿ

๐’

๐’Š=๐Ÿ

โ‰คโˆ‘๐œ†๐’Š๐‘ฆ๐’Š๐Ÿ

๐’

๐’Š=๐Ÿ

โ‰ค ๐œ†๐Ÿโˆ‘๐‘ฆ๐’Š๐Ÿ

๐’

๐’Š=๐Ÿ

The proof is complete.

In fact, we can see that the implication of this theorem is that we may regard the problem

of finding the largest and smallest eigenvalues of a hermitian matrix as a constrained

optimization problem:

maximize: ๐’™๐ป๐ด๐’™

subject to: ๐’™๐ป๐’™ = 1

Below is a theorem that generalizes the above theorem to all eigenvalues of ๐ด.

4.1.3 Theorem (the Courant-Fischer min-max theorem)

Let ๐ด be an ๐‘› ร— ๐‘› hermitian matrix, then

(a) ๐œ†๐’Š = maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ด๐’™

(b) ๐œ†๐’Š = mindim(๐‘‰)=๐‘›โˆ’๐‘–+1

max๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ด๐’™

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

(a) Recall that ๐’™๐ป๐ด๐’™ = ๐’š๐ป๐‘‡๐’š = โˆ‘ ๐œ†๐’Š๐‘ฆ๐’Š๐Ÿ๐’

๐’Š=๐Ÿ , where ๐’š = ๐‘ˆ๐ป๐’™, or equivalently, ๐’™ =

๐‘ˆ๐’š, and also note that the linear transformation from ๐’™ to ๐’š is an isomorphism and

there is no scaling as ๐‘ˆ is unitary, so we may change the constraint to dim(๐‘Š) =

๐‘–, ๐’š โˆˆ ๐‘Š, โ€–๐’šโ€– = 1. And in order to get the maximum under the constraint dim(๐‘Š) =

๐‘–, we just choose ๐‘Š = span{๐’†๐Ÿ, โ‹ฏ , ๐’†๐’Š}. Therefore, it is easily verified that

Page 11: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

11

๐œ†๐’Š โ‰ค min๐’šโˆˆspan{๐’†๐Ÿ,โ‹ฏ ,๐’†๐’Š}

โˆ‘๐œ†๐’‹๐‘ฆ๐’‹๐Ÿ

๐’Š

๐’‹=๐Ÿ

= maxdim(๐‘Š)=๐‘–

min๐’šโˆˆ๐‘Š,โ€–๐’šโ€–=1

๐’š๐ป๐‘‡๐’š

= maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ด๐’™

Now it remains to prove that the left-hand side of the above equation is greater than or

equal to the right-hand side to finally get the equality. To prove it, we must show that

every ๐‘– โˆ’dimensional subspace ๐‘‰ of โ„‚๐‘› contains a unit vector ๐’™ such that

๐œ†๐’Š โ‰ฅ ๐’™๐ป๐ด๐’™

And from previous discussion, we know that it is equivalent to say that every

๐‘– โˆ’dimensional subspace ๐‘Š of โ„‚๐‘› contains a unit vector ๐’š such that

๐œ†๐’Š โ‰ฅ ๐’š๐ป๐ด๐’š

Now let ๐›บ = span{๐’†๐’Š, โ‹ฏ , ๐’†๐’} with dimension ๐‘› โˆ’ ๐‘– + 1, so it must have nonempty

intersection with every ๐‘Š. Then let ๐’˜ be a unit vector in ๐›บ โˆฉ๐‘Š, we may write

๐’˜๐ป๐‘‡๐’˜ =โˆ‘๐œ†๐’‹๐‘ค๐’‹๐Ÿ

๐’

๐’‹=๐’Š

with

โˆ‘๐‘ค๐‘— = 1

๐’

๐’‹=๐’Š

Thus it is immediately clear that the reverse inequality is proved. So finally we achieve

the equality.

(b) This can be proved simply by replacing ๐ด by โ€“ ๐ด and using the fact that

๐œ†๐’Š(โˆ’๐ด) = โˆ’๐œ†๐‘›โˆ’๐‘–+1(๐ด).

4.2 Some Eigenvalue Inequalities

In this section we are going to introduce a few inequalities concerning eigenvalues,

which may be applied in eigenvalue estimation and inferences and also eigenvalue

perturbation theories. It turns out that many of these inequalities can be proved using

the min-max theorem derived in the last section.

Page 12: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

12

4.2.1 Theorem (Weylโ€™s inequality)

Let ๐ด, ๐ต โˆˆ ๐‘€๐‘›ร—๐‘› be hermitian matrices, then for 1 โ‰ค ๐‘– โ‰ค ๐‘›, we have

๐œ†๐’Š(๐ด) + ๐œ†๐’(๐ต) โ‰ค ๐œ†๐’Š(๐ด + ๐ต) โ‰ค ๐œ†๐’Š(๐ด) + ๐œ†๐Ÿ(๐ต)

๐‘ƒ๐‘Ÿ๐‘œ๐‘œ๐‘“.

First, we have

๐œ†๐’Š(๐ด + ๐ต) = maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป(๐ด + ๐ต)๐’™

= maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

( ๐’™๐ป๐ด๐’™ + ๐’™๐ป๐ต๐’™)

โ‰ฅ maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ด๐’™ + min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ต๐’™

= maxdim(๐‘‰)=๐‘–

min๐’™โˆˆ๐‘‰,โ€–๐’™โ€–=1

๐’™๐ป๐ด๐’™ + ๐œ†๐’(๐ต) = ๐œ†๐’Š(๐ด) + ๐œ†๐’(๐ต)

Which proves the left inequality. The right inequality can be proved in exactly the same

manner.

4.2.2 Corollary

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a hermitian matrix, ๐ต โˆˆ ๐‘€๐‘›ร—๐‘› be a positive semidefinite matrix,

then for 1 โ‰ค ๐‘– โ‰ค ๐‘›, we have

๐œ†๐’Š(๐ด) โ‰ค ๐œ†๐’Š(๐ด + ๐ต)

4.2.3 Theorem

Let ๐ด, ๐ต โˆˆ ๐‘€๐‘›ร—๐‘› be hermitian matrices, if 1 โ‰ค ๐‘—1 โ‰ค โ‹ฏ โ‰ค ๐‘—๐‘˜ โ‰ค ๐‘›, then

โˆ‘๐œ†๐‘—โ„“(๐ด + ๐ต) โ‰ค

๐‘˜

โ„“=1

โˆ‘๐œ†๐‘—โ„“(๐ด) +

๐‘˜

โ„“=1

โˆ‘๐œ†๐‘—โ„“(๐ต)

๐‘˜

โ„“=1

4.2.4 Theorem (Cauchy Interlacing Theorem)

Let ๐ด โˆˆ ๐‘€๐‘›ร—๐‘› be a hermitian matrix with eigenvalues ๐œ†1 โ‰ฅ ๐œ†2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œ†๐‘› , and

partitioned as follows:

๐ด = [๐ป ๐ตโˆ—

๐ต ๐‘…]

where ๐ป โˆˆ ๐‘€๐‘šร—๐‘š with eigenvalues ๐œƒ1 โ‰ฅ ๐œƒ2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œƒ๐‘š, then

๐œ†๐‘˜ โ‰ฅ ๐œƒ๐‘˜ โ‰ฅ ๐œ†๐‘˜+๐‘›โˆ’๐‘š

Page 13: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

13

4.3 Application to Principal Component Analysis (PCA)

Principal component analysis (PCA) is a technique that is useful for the compression

and classification of data. The purpose is to reduce the dimensionality of a data set

(sample) by finding a new set of variables, smaller than the original set of variables that

nonetheless retains most of the sample's information.

4.3.1 Definition

Let X = (๐’™๐Ÿ, ๐’™๐Ÿ, โ‹ฏ , ๐’™๐’‘) be an ๐‘› ร— ๐‘ data matrix with p being the number of

variables and n being the number of observations for each variable. Define the first

principal component of the sample by the linear transformation

๐’›๐Ÿ = ๐‘‹๐’‚๐Ÿ =โˆ‘๐‘Ž๐‘–1๐’™๐‘–

๐‘

๐‘–=1

where the vector ๐’‚๐Ÿ = (๐‘Ž11, ๐‘Ž21, โ‹ฏ , ๐‘Ž๐‘1)๐‘‡ is chosen such that var[๐’›๐Ÿ] is maximal.

Likewise, the ๐‘˜th principal component is defined as the linear transformation

๐’›๐’Œ = ๐‘‹๐’‚๐’Œ =โˆ‘๐‘Ž๐‘–1๐’™๐‘– , ๐‘˜ = 1,โ‹ฏ , ๐‘

๐‘

๐‘–=1

where the vector ๐’‚๐’Œ = (๐‘Ž1๐‘˜, ๐‘Ž2๐‘˜, โ‹ฏ , ๐‘Ž๐‘๐‘˜)๐‘‡ is chosen such that var[๐’›๐’Œ] is maximal

subject to cov[๐’›๐’Œ, ๐’›๐’] = 0 โŸน ๐’‚๐’Œ๐‘‡๐’‚๐’ = 0, ๐‘˜ > ๐‘™ โ‰ฅ 1 and to ๐’‚๐’Œ

๐‘‡๐’‚๐’Œ = 1.

After some computation, we get var[๐’›๐’Œ] = ๐’‚๐’Œ๐‘‡๐‘†๐’‚๐’Œ , where ๐‘† =

1

๐‘›โˆ’1๐‘‹๐‘‡๐‘‹ is the

covariance matrix of the data matrix ๐‘‹. It is clear that ๐‘† is nonnegative definite, thus

having nonnegative eigenvalues. If we want to find ๐‘˜(๐‘˜ < ๐‘) principal components,

then we get

๐‘ = [๐’›๐Ÿ โ‹ฏ ๐’›๐’Œ] = ๐‘‹[๐’‚๐Ÿ โ‹ฏ ๐’‚๐’Œ] = ๐‘‹๐ด

The matrix ๐‘ is called the score matrix while ๐ด is called the loading matrix.

4.3.2 Methods for implementing PCA

Page 14: Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โˆ€ โˆˆโ„‚ ๐’. (b) All eigenvalues of # are real (c) ๐ป is hermitian for all โˆˆ๐‘€ ร— . (d) Eigenvectors

14

(a) Constrained optimization

To maximize ๐’‚๐Ÿ๐‘‡๐‘†๐’‚๐Ÿ subject to ๐’‚๐Ÿ

๐‘‡๐’‚๐Ÿ = 1 , we use the technique of Lagrange

multipliers. We want to maximize the function

๐’‚๐Ÿ๐‘‡๐‘†๐’‚๐Ÿ โˆ’ ๐œ‡(๐’‚๐Ÿ

๐‘‡๐’‚๐Ÿ โˆ’ 1)

w. r. t. ๐’‚๐Ÿ

by differentiating w. r. t. ๐’‚๐Ÿ :

d

d๐’‚๐Ÿ(๐’‚๐Ÿ

๐‘‡๐‘†๐’‚๐Ÿ โˆ’ ๐œ‡(๐’‚๐Ÿ๐‘‡๐’‚๐Ÿ โˆ’ 1)) = ๐ŸŽ

๐‘†๐’‚๐Ÿ โˆ’ ๐œ‡๐’‚๐Ÿ = ๐ŸŽ ๐‘†๐’‚๐Ÿ = ๐œ‡๐’‚๐Ÿ

From this step, it is obvious that ๐œ‡ is an eigenvalue of ๐‘†(and of course, we can proceed

to this step without using the Lagrange multiplier, but instead using the min-max

theorem), so to maximize ๐’‚๐Ÿ๐‘‡๐‘†๐’‚๐Ÿ , certainly we are going to choose the largest

eigenvalue ๐œ†1 of ๐‘†. And then set ๐’‚๐Ÿ to be its corresponding eigenvector, then we

get our first principal component ๐’›๐Ÿ = ๐’‚๐Ÿ๐‘‡๐‘‹, which finishes our first step. To find the

principal component, note that ๐’‚๐Ÿ, โ‹ฏ , ๐’‚๐’‘ constitute an orthonormal basis, so to satisfy

the zero-covariance constraint, we have to do the optimization in the orthogonal

complement of span{๐’‚๐Ÿ,โ‹ฏ , ๐’‚๐’Œโˆ’๐Ÿ} , thus according to the min-max theorem,

max ๐’‚๐’Œ๐‘‡๐‘†๐’‚๐’Œ = ๐œ†๐‘˜, ๐’›๐’Œ = ๐’‚๐’Œ

๐‘‡๐‘‹, where ๐’‚๐’Œ is the unit eigenvector corresponding to

๐œ†๐‘˜. Finally, we conclude that ๐‘ = [๐’›๐Ÿ โ‹ฏ ๐’›๐’Œ] = ๐‘‹[๐’‚๐Ÿ โ‹ฏ ๐’‚๐’Œ].

(2) Spectral decomposition and singular value decomposition (SVD)

First, we draw the conclusion from (1) that if we write the spectral decomposition of ๐‘†

as

๐‘† = ๐ด๐‘‡๐ด๐‘‡

where ๐‘‡ = diag( ๐œ†1, ๐œ†2, โ‹ฏ , ๐œ†๐‘›) with ๐œ†1 โ‰ฅ ๐œ†2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œ†๐‘› , then the first ๐‘˜ columns

of ๐ด just makes up the loading matrix that we want.

Next, let ๐‘‹ = ๐‘ˆ๐›ด๐‘‰๐‘‡ be the singular value decomposition of the data matrix ๐‘‹, recall

that ๐‘‹๐‘‡๐‘‹ = ๐‘† = ๐‘‰๐›ด2๐‘‰๐‘‡, so the columns of ๐‘‰ are the unit eigenvectors of ๐‘†, then let

๐ด = ๐‘‰, we finally get ๐‘ = ๐‘‹๐ด = ๐‘‹๐‘‰ = ๐‘ˆ๐›ด๐‘‰๐‘‡๐‘‰ = ๐‘ˆ๐›ด.

In practice, the singular value decomposition is the standard way to do PCA since it

avoids the trouble of computing ๐‘‹๐‘‡๐‘‹.