Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) 𝐻 # is real ∀ ∈ℂ 𝒏....

1

Topics in Eigen-analysis

Lin Zanjiang

28 July 2014

Contents

1 Terminology ......................................................................................... 2

2 Some Basic Properties and Results ....................................................... 2

3 Eigen-properties of Hermitian Matrices ................................................ 5

3.1 Basic Theorems ............................................................................................................... 5

3.2 Quadratic Forms & Nonnegative Definite Matrices ........................................................ 6

3.2.1 Definitions ................................................................................................................ 7

3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices ........................................ 7

4 Inequalities and Extremal Properties of Eigenvalues ............................. 9

4.1 The Rayleigh Quotient and the Courant-Fischer Min-max Theorem .............................. 9

4.2 Some Eigenvalue Inequalities ....................................................................................... 11

4.3 Application to Principal Component Analysis (PCA) ................................................... 13

2

1 Terminology

(1) 𝑐𝐴(𝑥) ∶= det(𝑥𝐼 − 𝐴) ≝ the characteristic polynomial of 𝐴

(2) The eigenvalues 𝜆 of 𝐴 ∶= the roots of 𝑐𝐴(𝑥)

(3) The 𝜆 −eigenvectors 𝒙 ∶= the nonzero solutions to (𝜆𝐼 − 𝐴)𝒙 = 𝟎

(4) The eigenvalue-eigenvector equation: 𝐴𝒙 = 𝜆𝒙

(5) 𝑆𝐴(𝜆) ∶= The eigenspace of a matrix 𝐴 corresponding to the eigenvalue 𝜆

(6) The characteristic equation: |𝑥𝐼 − 𝐴| = 0

(7) Standard inner product: ⟨𝒛, 𝒘⟩ = 𝑧1𝑤1̅̅̅̅ + ⋯+ 𝑧𝑛𝑤𝑛̅̅ ̅̅ 𝒛, 𝒘 ∈ ℂ𝑛

2 Some Basic Properties and Results

Theorem 2.1

(a) The eigenvalues of 𝐴 are the same as the eigenvalues of 𝐴𝑇

(b) 𝐴 is singular if and only if at least one eigenvalue of 𝐴 is equal to zero

(c) The eigenvalues and corresponding geometric multiplicities of 𝐵𝐴𝐵−1 are the

same as those of 𝐴, if 𝐵 is a nonsingular matrix

(d) The modulus of each eigenvalue of 𝐴 is equal to 1 if 𝐴 is an orthogonal matrix

Theorem 2.2(revision)

Suppose that 𝜆 is an eigenvalue, with multiplicity 𝑟 ≥ 1, of the 𝑛 × 𝑛 matrix 𝐴. Then

1 ≤ dim{𝑆𝐴(𝜆)} ≤ 𝑟

𝑃𝑟𝑜𝑜𝑓. If 𝜆 is an eigenvalue of 𝐴 , then by definition an 𝒙 ≠ 𝟎 satisfying the

equation 𝐴𝒙 = 𝜆𝒙 exists, and so clearly dim {𝑆𝐴(𝜆)} ≥ 1. Now let k = dim {𝑆𝐴(𝜆)},

and let 𝒙1, ⋯ , 𝒙𝑘 be linearly independent eigenvectors corresponding to 𝜆. Form a

nonsingular 𝑛 × 𝑛 matrix 𝑋 that has these 𝑘 vectors as its first 𝑘 columns; that is,

𝑋 has the form 𝑋 = [𝑋1 𝑋2] , where 𝑋1 = (𝒙1,⋯ , 𝒙𝑘) and 𝑋2 is 𝑛 × (𝑛 − 𝑘) .

Since each column of 𝑋1 is an eigenvector of 𝐴 corresponding to the eigenvalue 𝜆, it

follows that 𝐴𝑋1 = 𝜆𝑋1, and

3

𝑋−1𝑋1 = [𝐼𝑘(0)],

which follows from the fact that 𝑋−1𝑋 = 𝐼𝑛. As a result, we find that

𝑋−1𝐴𝑋 = 𝑋−1[𝐴𝑋1 𝐴𝑋2]

= 𝑋−1[𝜆𝑋1 𝐴𝑋2]

= [𝜆𝐼𝑘 𝐵1(0) 𝐵2

],

where 𝐵1 and 𝐵2 are a partition of the matrix 𝑋−1𝐴𝑋2 . If u is an eigenvalue of

𝑋−1𝐴𝑋, then

0 = |𝑋−1𝐴𝑋 − 𝜇𝐼𝑛| = |(𝜆 − 𝜇)𝐼𝑘 𝐵1(0) 𝐵2 − 𝜇𝐼𝑛−𝑘

|

= (𝜆 − 𝜇)𝑘|𝐵2 − 𝜇𝐼𝑛−𝑘|,

Thus, 𝜆 must be an eigenvalue of 𝑋−1𝐴𝑋 with multiplicity of at least 𝑘. The result

follows because from Theorem 2.1(c), the eigenvalues and corresponding geometric

multiplicities of 𝑋−1𝐴𝑋 are the same as those of 𝐴.

Theorem 2.3

Let 𝜆 be an eigenvalue of the 𝑛 × 𝑛 matrix 𝐴 , and let 𝒙 be a corresponding

eigenvector. Then,

(a) If 𝑘 ≥ 1 is an integer, 𝜆𝑘 is an eigenvalue of 𝐴𝑘 corresponding to the

eigenvector 𝒙.

(b) If 𝐴 is nonsingular, 𝜆−1 is an eigenvalue of 𝐴−1 corresponding to the

eigenvector 𝒙.

𝑃𝑟𝑜𝑜𝑓.

(a) Let us prove by induction. Clearly, (a) holds when 𝑘 = 1 because it follows from

the definition of eigenvalue and eigenvector. Next, if (a) holds for 𝑘 − 1, that is,

𝐴𝑘−1𝒙 = 𝜆𝑘−1𝒙, then

𝐴𝑘𝒙 = 𝐴(𝐴𝑘−1𝒙) = 𝐴(𝜆𝑘−1𝒙)

= 𝜆𝑘−1(𝐴𝒙) = 𝜆𝑘−1(𝜆𝒙) = 𝜆𝑘𝒙

(b) Let us premultiply the equation 𝐴𝒙 = 𝜆𝒙 by 𝐴−1, which gives the equation

𝒙 = 𝜆𝐴−1𝒙

Since 𝐴 is nonsingular, we know from Theorem 2.1(b) that 𝜆 ≠ 0, and so dividing

4

both sides of the above equation by 𝜆 yields

𝐴−1𝒙 = 𝜆−1𝒙 ,

which implies that 𝐴−1 has an eigenvalue 𝜆−1 and corresponding eigenvector 𝒙.

Theorem 2.4

Let 𝐴 be an 𝑛 × 𝑛 matrix with eigenvalues 𝜆1, ⋯ , 𝜆𝑛. Then

(a) tr(𝐴) = ∑ 𝜆𝑖𝑛𝑖=1

(b) |𝐴| = ∏ 𝜆𝑖𝑛𝑖=1


Express the characteristic equation, |𝑥𝐼𝑛 − 𝐴| = 0 into the polynomial form

𝑥𝑛 + 𝛼𝑛−1𝑥𝑛−1 +⋯+ 𝛼1𝑥 + 𝛼0 = 0

To determine 𝑎0, we can substitute 𝑥 = 0 into the equation, thus, 𝛼0 = |(0)𝐼𝑛 − 𝐴| =

|𝐴|. To determine 𝛼𝑛−1, we are actually going to find the coefficient of 𝑥𝑛−1. Recall

that the determinant is actually a sum of terms that are products of one entry in each

column(row) with row(column) positions spanning over all permutations of the integer

(1, 2,⋯ ,𝑚) with proper +/− signs, it can be easily seen that the only term that

involves at least 𝑛 − 1 of the diagonal elements of (𝑥𝐼𝑛 − 𝐴) is the term that

involves the product of all the diagonal elements. Now that this term involves an even

permutation as there exists a trivial composition of zero transpositions, the sign term

should be +1, therefore, 𝛼𝑛−1 will be the coefficient of 𝑥𝑛−1 in

(𝑥 − 𝑎11)(𝑥 − 𝑎22)⋯ (𝑥 − 𝑎𝑛𝑛),

which is obviously −tr(𝐴) . Now finally, note that 𝜆1, ⋯ , 𝜆𝑛 are the roots to the

characteristic equation, it follows that

(𝑥 − 𝜆1)(𝑥 − 𝜆2)⋯ (𝑥 − 𝜆𝑛) = 0

With coefficient matching, we find that

|𝐴| =∏𝜆𝑖

𝑛

𝑖=1

, tr(𝐴) =∑𝜆𝑖

𝑛

𝑖=1

which completes the proof.

5

3 Eigen-properties of Hermitian Matrices

3.1 Basic Theorems

Theorem 3.1.1

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a matrix, then 𝐴 is hermitian if and only if

⟨𝐴𝒙, 𝒚⟩ = ⟨𝒙, 𝐴𝒚⟩

For all vectors 𝒙, 𝒚 ∈ ℂ𝑛.


(1) If 𝐴 is hermitian, then ⟨𝐴𝒙, 𝒚⟩ = 𝒚𝑇𝐴𝒙 = 𝒚𝑇𝐴𝑇𝒙 = (𝐴𝒚)𝑻𝒙 = ⟨𝒙, 𝐴𝒚⟩ , which

proves the " ⟹ " part.

(2) For the converse direction, let 𝒙, 𝒚 take the form of 𝒆𝟏, ⋯ , 𝒆𝒏 respectively, which

are the standard basis of ℝ𝑛, then it is immediately clear that ∀𝑖, 𝑗 = 1,⋯ , 𝑛, 𝑎𝑖𝑗 =

𝑎𝑗𝑖̅̅ ̅ ⟹ 𝐴 is hermitian.

Theorem 3.1.2 (The Schur’s Theorem Revisited)

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a matrix, there exists a unitary matrix 𝑈 such that

𝑈𝐻𝐴𝑈 = 𝑇

is upper triangular, which is called the Schur’s Theorem. Actually, it is just the complex

counterpart of the Triangulation Theorem we learnt in class. With a minor modification,

we may write

𝐴 = 𝑈𝑇𝑈𝐻

And this is called the Schur Decomposition, with the diagonal entries of 𝑇 being the

eigenvalues of 𝐴.

3.1.3 Theorem (The Spectral Theorem Revisited)

If the 𝐴 in the last theorem turns out to be hermitian, then the corresponding 𝑇 will

become diagonal. This is called the Spectral Theorem. Similarly, it is the complex

extension of the Principal Axis Theorem we learnt in class. Again, with the same

modification, we may rewrite it as

𝐴 = 𝑈𝑇𝑈𝐻

This is called the Spectral Decomposition. These two eigenvalue decompositions are

just special cases of the Singular Value Decomposition which applies to nonsquare

6

matrices.

3.1.4 Theorem

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, then

(a) 𝒙𝐻𝐴𝒙 is real ∀𝒙 ∈ ℂ𝒏.

(b) All eigenvalues of 𝐴 are real

(c) 𝑆𝐻𝐴𝑆 is hermitian for all 𝑆 ∈ 𝑀𝑛×𝑛.

(d) Eigenvectors of 𝐴 corresponding to distinct eigenvalues are orthogonal.

(e) It is possible to construct a set of 𝑛 orthonormal eigenvectors of 𝐴.


(a) (𝒙𝐻𝐴𝒙̅̅ ̅̅ ̅̅ ̅) = (𝒙𝐻𝐴𝒙)𝐻 = 𝒙𝐻𝐴𝐻𝒙 = 𝒙𝐻𝐴𝒙 , that is, 𝒙𝐻𝐴𝒙 equals its complex

conjugate and hence is real.

(b) (1) If 𝐴𝒙 = 𝜆𝒙 and 𝒙𝐻𝒙 = 𝑘 ∈ ℝ+ , then 𝜆 =𝜆

𝑘𝒙𝐻𝒙 =

1

𝑘𝒙𝐻𝜆𝒙 =

1

𝑘(𝒙𝐻𝐴𝒙 ),

which is real by (a).

(2) Alternative proof for (b): let 𝜆, 𝜇 be 2 eigenvalues of 𝐴, having eigenvectors

𝒙, 𝒚 correspondingly, it follows that 𝐴𝒙 = 𝜆𝒙 and 𝐴𝒚 = 𝜇𝒚 , according to

Theorem 3.1.1, we have λ⟨𝒙, 𝒚⟩ = ⟨𝜆𝒙, 𝒚⟩ = ⟨𝐴𝒙, 𝒚⟩ = ⟨𝒙, 𝐴𝒚⟩ = ⟨𝒙, 𝜇𝒚⟩ =

�̅�⟨𝒙, 𝒚⟩. In the case where 𝜇 = 𝜆 & 𝒚 = 𝒙, it becomes 𝜆⟨𝒙, 𝒙⟩ = �̅�⟨𝒙, 𝒙⟩, which

in turn implies that 𝜆 = �̅� since we know ⟨𝒙, 𝒙⟩ = ‖𝒙‖2 > 0 for a nonzero

eigenvector 𝒙. Therefore, 𝜆 must be real. And similarly, 𝜇 is real.

(c) (𝑆𝐻𝐴𝑆)𝐻 = 𝑆𝐻𝐴𝐻𝑆 = 𝑆𝐻𝐴𝑆, so 𝑆𝐻𝐴𝑆 is always hermitian.

(d) Following the discussion in (b)(2), finally we can get the equation 𝜆⟨𝒙, 𝒚⟩ =

𝜇⟨𝒙, 𝒚⟩, so if 𝜆 ≠ 𝜇, then it follows immediately that ⟨𝒙, 𝒚⟩ = 0, thus implying

that 𝑥 and 𝑦 are orthogonal.

(e) Following from the Spectral Decomposition, we rewrite it as

𝐴𝑈 = 𝑈𝑇

↔ A[𝒙1 ⋯ 𝒙𝑛] = [𝒙𝟏 ⋯ 𝒙𝒏] [𝜆1 0

⋱0 𝜆𝑛

]

↔ [𝐴𝒙1 ⋯ 𝐴𝒙𝒏] = [𝜆𝒙1 ⋯ 𝜆𝒙𝑛]

Therefore, it can be easily seen that (𝒙1 ⋯ 𝒙𝑛) are the corresponding

eigenvectors to (𝜆1 ⋯ 𝜆𝑛). As 𝑈 is unitary, it follows that these eigenvectors

are orthonormal.

3.2 Quadratic Forms & Nonnegative Definite Matrices

7

3.2.1 Definitions

(a) Let 𝐴 ∈ 𝑀𝑛×𝑛 be a symmetric matrix (hermitian matrix with real entries) and 𝒙

denote a 𝑛 × 1 column vector, then 𝑄 = 𝒙𝑇𝐴𝒙 is said to be a quadratic form.

Observe that

𝑄 = 𝒙𝑇𝐴𝒙 = (𝑥1 ⋯ 𝑥𝑛) (

𝑎𝟏𝟏 ⋯ 𝑎𝟏𝒏⋮ ⋱ ⋮𝑎𝒏𝟏 ⋯ 𝑎𝒏𝒏

)(

𝑥1⋮𝑥𝑛)

= (𝑥1 ⋯ 𝑥𝑛)

(

∑𝑎𝟏𝒊𝑥𝑖

⋮

∑𝑎𝒏𝒊𝑥𝑖)

= ∑𝑎𝑖𝑗𝑥𝑖𝑥𝑗𝑖,𝑗

(b) Let 𝐴 ∈ 𝑀𝑛×𝑛 be a symmetric matrix, then 𝐴 is

(1) Positive definite if ∀𝒙 ≠ 0 & 𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 > 0

(2) Nonnegative definite (Positive semidefinite) if ∀𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 ≥ 0

(3) Negative definite if ∀𝒙 ≠ 0 & 𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 < 0

(4) Nonpositive definite (Negative semidefinite) if ∀𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 ≤ 0

(5) Indefinite if 𝑄 > 0 for some 𝒙 while 𝑄 < 0 for some other 𝒙

We are only interested in positive or nonnegative cases because all theorems will be

similar for negative or nonpositive cases.

3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices

3.2.2.1 Theorem

Let 𝜆1, ⋯ , 𝜆𝑛 be the eigenvalues of the 𝑛 × 𝑛 symmetric matrix 𝐴, then

(a) 𝐴 is positive definite if and only if 𝜆𝑖 > 0 for all 𝑖,

(b) 𝐴 is nonnegative definite if and only if 𝜆𝑖 ≥ 0 for all 𝑖.


(a) Let the columns of 𝑈 = (𝒙1 ⋯ 𝒙𝑛) be a set of orthonormal eigenvectors of 𝐴

corresponding to the eigenvalues 𝜆1, ⋯ , 𝜆𝑛 , so that 𝐴 = 𝑈𝑇𝑈𝑇 , where 𝑇 =

diag (𝜆1, ⋯ , 𝜆𝑛) . If 𝐴 is positive definite, then 𝒙𝑇𝐴𝒙 > 0 for all 𝒙 ≠ 𝟎 , so in

particular, choosing 𝒙 = 𝒙𝒊, we have

𝒙𝒊𝑇𝐴𝒙𝒊 = 𝒙𝒊

𝑻(𝜆𝑖𝒙𝒊) = 𝜆𝑖𝒙𝒊𝑇𝒙𝒊 = 𝜆𝒊 > 0

8

Conversely, if 𝜆𝑖 > 0 for all 𝑖, then for any 𝒙 ≠ 0 define 𝒚 = 𝑈𝑇𝒙, and note that

𝒙𝑇𝐴𝒙 = 𝒙𝑇𝑈𝑇𝑈𝑇𝒙 = 𝒚𝑻𝑇𝒚 =∑𝑦𝒊𝟐𝜆𝒊

𝒏

𝒊=𝟏

has to be positive since the 𝜆𝒊s are all positive and at least one of the 𝑦𝒊𝟐s is positive

because 𝒚 ≠ 𝟎.

(b) By similar argument as in (a), it is easy to show that 𝐴 is nonnegative definite if

and only if 𝜆𝑖 ≥ 0 for all 𝑖.

3.2.2.2 Theorem

Let 𝑇 be an 𝑚 × 𝑛 real matrix with rank(𝑇) = 𝑟. Then

(a) 𝑇𝑇𝑇 has 𝑟 positive eigenvalues. It is always nonnegative definite and positive

definite if 𝑟 = 𝑛.

(b) The positive eigenvalues of 𝑇𝑇𝑇 are equal to the positive eigenvalues of 𝑇𝑇𝑇.


(a) For any nonzero 𝑛 × 1 vector 𝒙, let 𝒚 = 𝑇𝒙, then

𝒙𝑇𝑇𝑇𝑇𝒙 = 𝒚𝑇𝒚 =∑𝑦𝒊2

𝑛

𝑖=1

is nonnegative, so 𝑇𝑇𝑇 is nonnegative definite, and thus by Theorem 3.2.2.1(b), all of

its eigenvalues are nonnegative. Further, observe that 𝒙 ≠ 𝟎 is an eigenvector of 𝑇𝑇𝑇

corresponding to a zero eigenvalue if and only if 𝒚 = 𝑇𝒙 = 𝟎 and thus the above

equation equals zero. Therefore, the number of zero eigenvalues equals the dimension

of 𝑛𝑢𝑙𝑙(𝑇), which is 𝑛 − 𝑟, so (a) is proved.

(b) Let 𝜆 > 0 be an eigenvalue of 𝑇𝑇𝑇 with multiplicity ℎ. Since the 𝑛 × 𝑛 matrix

𝑇𝑇𝑇 is symmetric, we can find an 𝑛 × ℎ matrix 𝑋, whose columns are orthonormal,

satisfying

𝑇𝑇𝑇𝑋 = 𝜆𝑋.

Let 𝑌 = 𝑇𝑋 and observe that

𝑇𝑇𝑇𝑌 = 𝑇𝑇𝑇𝑇𝑋 = 𝑇(𝜆𝑋) = 𝜆𝑇𝑋 = 𝜆𝑌,

so that 𝜆 is also an eigenvalue of 𝑇𝑇𝑇 with multiplicity also being ℎ because

rank(𝑌) = rank(𝑇𝑋) = rank((𝑇𝑋)𝑇𝑇𝑋)

= rank(𝑋𝑇𝑇𝑇𝑇𝑋) = rank(𝜆𝑋𝑇𝑋)

9

= rank(𝜆𝐼ℎ) = ℎ

So the proof is done.

4 Inequalities and Extremal Properties of Eigenvalues

4.1 The Rayleigh Quotient and the Courant-Fischer Min-max

Theorem

In this section, we are going to investigate some extremal properties of the eigenvalues

of a hermitian matrix, and see how to turn the problem of finding the eigenvalues into

a constrained optimization problem.

4.1.1 Definition

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, then the Rayleigh quotient of 𝐴, denoted as

𝑅𝐴(𝒙), is a function from ℂ𝑛\{𝟎} to ℝ, defined as follows:

𝑅𝐴(𝒙) =𝒙𝐻𝐴𝒙

𝒙𝐻𝒙

It is not difficult to see that when the norm of 𝒙, ‖𝒙‖ = 1, the Rayleigh quotient of 𝐴

actually equals its quadratic form. In the next part, we are ready to relate the Rayleigh

quotient of a hermitian matrix to it eigenvalues.

4.1.2 Theorem

Let 𝐴 be a hermitian 𝑛 × 𝑛 matrix with ordered eigenvalues 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛. For

any 𝒙 ∈ ℂ𝑛\{𝟎}

𝜆𝑛 ≤𝒙𝐻𝐴𝒙

𝒙𝐻𝒙≤ 𝜆1

And, in particular

𝜆𝑛 = min𝒙≠𝟎

𝒙𝐻𝐴𝒙

𝒙𝐻𝒙

𝜆1 = max𝒙≠𝟎

𝒙𝐻𝐴𝒙

𝒙𝐻𝒙

10


Let 𝐴 = 𝑈𝑇𝑈𝐻 be the spectral decomposition of 𝐴 , where the columns of 𝑈 =(𝒙𝟏 ⋯ 𝒙𝒏) are the orthonormal set of eigenvectors corresponding to 𝜆1,⋯ , 𝜆𝑛 ,

which make up the diagonal entries of the diagonal matrix 𝑇 . As in the proof of

Theorem 3.2.2.1, define 𝒚 = 𝑈𝐻𝒙, then we have

𝒙𝐻𝐴𝒙

𝒙𝐻𝒙=𝒙𝐻𝑈𝑇𝑈𝐻𝒙

𝒙𝐻𝑈𝑈𝐻𝒙=𝒚𝐻𝑇𝒚

𝒚𝐻𝒚=∑ 𝜆𝒊𝑦𝒊

𝟐𝒏𝒊=𝟏

∑ 𝑦𝒊𝟐𝒏𝒊=𝟏

Together with the fact that

𝜆𝒏∑𝑦𝒊𝟐

𝒏

𝒊=𝟏

≤∑𝜆𝒊𝑦𝒊𝟐

𝒏

𝒊=𝟏

≤ 𝜆𝟏∑𝑦𝒊𝟐

𝒏

𝒊=𝟏

The proof is complete.

In fact, we can see that the implication of this theorem is that we may regard the problem

of finding the largest and smallest eigenvalues of a hermitian matrix as a constrained

optimization problem:

maximize: 𝒙𝐻𝐴𝒙

subject to: 𝒙𝐻𝒙 = 1

Below is a theorem that generalizes the above theorem to all eigenvalues of 𝐴.

4.1.3 Theorem (the Courant-Fischer min-max theorem)

Let 𝐴 be an 𝑛 × 𝑛 hermitian matrix, then

(a) 𝜆𝒊 = maxdim(𝑉)=𝑖

min𝒙∈𝑉,‖𝒙‖=1

𝒙𝐻𝐴𝒙

(b) 𝜆𝒊 = mindim(𝑉)=𝑛−𝑖+1

max𝒙∈𝑉,‖𝒙‖=1

𝒙𝐻𝐴𝒙


(a) Recall that 𝒙𝐻𝐴𝒙 = 𝒚𝐻𝑇𝒚 = ∑ 𝜆𝒊𝑦𝒊𝟐𝒏

𝒊=𝟏 , where 𝒚 = 𝑈𝐻𝒙, or equivalently, 𝒙 =

𝑈𝒚, and also note that the linear transformation from 𝒙 to 𝒚 is an isomorphism and

there is no scaling as 𝑈 is unitary, so we may change the constraint to dim(𝑊) =

𝑖, 𝒚 ∈ 𝑊, ‖𝒚‖ = 1. And in order to get the maximum under the constraint dim(𝑊) =

𝑖, we just choose 𝑊 = span{𝒆𝟏, ⋯ , 𝒆𝒊}. Therefore, it is easily verified that

11

𝜆𝒊 ≤ min𝒚∈span{𝒆𝟏,⋯ ,𝒆𝒊}

∑𝜆𝒋𝑦𝒋𝟐

𝒊

𝒋=𝟏

= maxdim(𝑊)=𝑖

min𝒚∈𝑊,‖𝒚‖=1

𝒚𝐻𝑇𝒚

= maxdim(𝑉)=𝑖


𝒙𝐻𝐴𝒙

Now it remains to prove that the left-hand side of the above equation is greater than or

equal to the right-hand side to finally get the equality. To prove it, we must show that

every 𝑖 −dimensional subspace 𝑉 of ℂ𝑛 contains a unit vector 𝒙 such that

𝜆𝒊 ≥ 𝒙𝐻𝐴𝒙

And from previous discussion, we know that it is equivalent to say that every

𝑖 −dimensional subspace 𝑊 of ℂ𝑛 contains a unit vector 𝒚 such that

𝜆𝒊 ≥ 𝒚𝐻𝐴𝒚

Now let 𝛺 = span{𝒆𝒊, ⋯ , 𝒆𝒏} with dimension 𝑛 − 𝑖 + 1, so it must have nonempty

intersection with every 𝑊. Then let 𝒘 be a unit vector in 𝛺 ∩𝑊, we may write

𝒘𝐻𝑇𝒘 =∑𝜆𝒋𝑤𝒋𝟐

𝒏

𝒋=𝒊

with

∑𝑤𝑗 = 1

𝒏

𝒋=𝒊

Thus it is immediately clear that the reverse inequality is proved. So finally we achieve

the equality.

(b) This can be proved simply by replacing 𝐴 by – 𝐴 and using the fact that

𝜆𝒊(−𝐴) = −𝜆𝑛−𝑖+1(𝐴).

4.2 Some Eigenvalue Inequalities

In this section we are going to introduce a few inequalities concerning eigenvalues,

which may be applied in eigenvalue estimation and inferences and also eigenvalue

perturbation theories. It turns out that many of these inequalities can be proved using

the min-max theorem derived in the last section.

12

4.2.1 Theorem (Weyl’s inequality)

Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 be hermitian matrices, then for 1 ≤ 𝑖 ≤ 𝑛, we have

𝜆𝒊(𝐴) + 𝜆𝒏(𝐵) ≤ 𝜆𝒊(𝐴 + 𝐵) ≤ 𝜆𝒊(𝐴) + 𝜆𝟏(𝐵)


First, we have

𝜆𝒊(𝐴 + 𝐵) = maxdim(𝑉)=𝑖


𝒙𝐻(𝐴 + 𝐵)𝒙

= maxdim(𝑉)=𝑖


( 𝒙𝐻𝐴𝒙 + 𝒙𝐻𝐵𝒙)

≥ maxdim(𝑉)=𝑖


𝒙𝐻𝐴𝒙 + min𝒙∈𝑉,‖𝒙‖=1

𝒙𝐻𝐵𝒙

= maxdim(𝑉)=𝑖


𝒙𝐻𝐴𝒙 + 𝜆𝒏(𝐵) = 𝜆𝒊(𝐴) + 𝜆𝒏(𝐵)

Which proves the left inequality. The right inequality can be proved in exactly the same

manner.

4.2.2 Corollary

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, 𝐵 ∈ 𝑀𝑛×𝑛 be a positive semidefinite matrix,

then for 1 ≤ 𝑖 ≤ 𝑛, we have

𝜆𝒊(𝐴) ≤ 𝜆𝒊(𝐴 + 𝐵)

4.2.3 Theorem

Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 be hermitian matrices, if 1 ≤ 𝑗1 ≤ ⋯ ≤ 𝑗𝑘 ≤ 𝑛, then

∑𝜆𝑗ℓ(𝐴 + 𝐵) ≤

𝑘

ℓ=1

∑𝜆𝑗ℓ(𝐴) +

𝑘

ℓ=1

∑𝜆𝑗ℓ(𝐵)

𝑘

ℓ=1

4.2.4 Theorem (Cauchy Interlacing Theorem)

Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix with eigenvalues 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 , and

partitioned as follows:

𝐴 = [𝐻 𝐵∗

𝐵 𝑅]

where 𝐻 ∈ 𝑀𝑚×𝑚 with eigenvalues 𝜃1 ≥ 𝜃2 ≥ ⋯ ≥ 𝜃𝑚, then

𝜆𝑘 ≥ 𝜃𝑘 ≥ 𝜆𝑘+𝑛−𝑚

13

4.3 Application to Principal Component Analysis (PCA)

Principal component analysis (PCA) is a technique that is useful for the compression

and classification of data. The purpose is to reduce the dimensionality of a data set

(sample) by finding a new set of variables, smaller than the original set of variables that

nonetheless retains most of the sample's information.

4.3.1 Definition

Let X = (𝒙𝟏, 𝒙𝟐, ⋯ , 𝒙𝒑) be an 𝑛 × 𝑝 data matrix with p being the number of

variables and n being the number of observations for each variable. Define the first

principal component of the sample by the linear transformation

𝒛𝟏 = 𝑋𝒂𝟏 =∑𝑎𝑖1𝒙𝑖

𝑝

𝑖=1

where the vector 𝒂𝟏 = (𝑎11, 𝑎21, ⋯ , 𝑎𝑝1)𝑇 is chosen such that var[𝒛𝟏] is maximal.

Likewise, the 𝑘th principal component is defined as the linear transformation

𝒛𝒌 = 𝑋𝒂𝒌 =∑𝑎𝑖1𝒙𝑖 , 𝑘 = 1,⋯ , 𝑝

𝑝

𝑖=1

where the vector 𝒂𝒌 = (𝑎1𝑘, 𝑎2𝑘, ⋯ , 𝑎𝑝𝑘)𝑇 is chosen such that var[𝒛𝒌] is maximal

subject to cov[𝒛𝒌, 𝒛𝒍] = 0 ⟹ 𝒂𝒌𝑇𝒂𝒍 = 0, 𝑘 > 𝑙 ≥ 1 and to 𝒂𝒌

𝑇𝒂𝒌 = 1.

After some computation, we get var[𝒛𝒌] = 𝒂𝒌𝑇𝑆𝒂𝒌 , where 𝑆 =

1

𝑛−1𝑋𝑇𝑋 is the

covariance matrix of the data matrix 𝑋. It is clear that 𝑆 is nonnegative definite, thus

having nonnegative eigenvalues. If we want to find 𝑘(𝑘 < 𝑝) principal components,

then we get

𝑍 = [𝒛𝟏 ⋯ 𝒛𝒌] = 𝑋[𝒂𝟏 ⋯ 𝒂𝒌] = 𝑋𝐴

The matrix 𝑍 is called the score matrix while 𝐴 is called the loading matrix.

4.3.2 Methods for implementing PCA

14

(a) Constrained optimization

To maximize 𝒂𝟏𝑇𝑆𝒂𝟏 subject to 𝒂𝟏

𝑇𝒂𝟏 = 1 , we use the technique of Lagrange

multipliers. We want to maximize the function

𝒂𝟏𝑇𝑆𝒂𝟏 − 𝜇(𝒂𝟏

𝑇𝒂𝟏 − 1)

w. r. t. 𝒂𝟏

by differentiating w. r. t. 𝒂𝟏 :

d

d𝒂𝟏(𝒂𝟏

𝑇𝑆𝒂𝟏 − 𝜇(𝒂𝟏𝑇𝒂𝟏 − 1)) = 𝟎

𝑆𝒂𝟏 − 𝜇𝒂𝟏 = 𝟎 𝑆𝒂𝟏 = 𝜇𝒂𝟏

From this step, it is obvious that 𝜇 is an eigenvalue of 𝑆(and of course, we can proceed

to this step without using the Lagrange multiplier, but instead using the min-max

theorem), so to maximize 𝒂𝟏𝑇𝑆𝒂𝟏 , certainly we are going to choose the largest

eigenvalue 𝜆1 of 𝑆. And then set 𝒂𝟏 to be its corresponding eigenvector, then we

get our first principal component 𝒛𝟏 = 𝒂𝟏𝑇𝑋, which finishes our first step. To find the

principal component, note that 𝒂𝟏, ⋯ , 𝒂𝒑 constitute an orthonormal basis, so to satisfy

the zero-covariance constraint, we have to do the optimization in the orthogonal

complement of span{𝒂𝟏,⋯ , 𝒂𝒌−𝟏} , thus according to the min-max theorem,

max 𝒂𝒌𝑇𝑆𝒂𝒌 = 𝜆𝑘, 𝒛𝒌 = 𝒂𝒌

𝑇𝑋, where 𝒂𝒌 is the unit eigenvector corresponding to

𝜆𝑘. Finally, we conclude that 𝑍 = [𝒛𝟏 ⋯ 𝒛𝒌] = 𝑋[𝒂𝟏 ⋯ 𝒂𝒌].

(2) Spectral decomposition and singular value decomposition (SVD)

First, we draw the conclusion from (1) that if we write the spectral decomposition of 𝑆

as

𝑆 = 𝐴𝑇𝐴𝑇

where 𝑇 = diag( 𝜆1, 𝜆2, ⋯ , 𝜆𝑛) with 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 , then the first 𝑘 columns

of 𝐴 just makes up the loading matrix that we want.

Next, let 𝑋 = 𝑈𝛴𝑉𝑇 be the singular value decomposition of the data matrix 𝑋, recall

that 𝑋𝑇𝑋 = 𝑆 = 𝑉𝛴2𝑉𝑇, so the columns of 𝑉 are the unit eigenvectors of 𝑆, then let

𝐴 = 𝑉, we finally get 𝑍 = 𝑋𝐴 = 𝑋𝑉 = 𝑈𝛴𝑉𝑇𝑉 = 𝑈𝛴.

In practice, the singular value decomposition is the standard way to do PCA since it

avoids the trouble of computing 𝑋𝑇𝑋.

Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) 𝐻 # is real ∀ ∈ℂ 𝒏....

Documents

Transcript of Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) 𝐻 # is real ∀ ∈ℂ 𝒏....