Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โ โโ ๐....
Transcript of Topics in Eigen-analysis - University of Hong Kong in Eigen...(a) ๐ป # is real โ โโ ๐....
1
Topics in Eigen-analysis
Lin Zanjiang
28 July 2014
Contents
1 Terminology ......................................................................................... 2
2 Some Basic Properties and Results ....................................................... 2
3 Eigen-properties of Hermitian Matrices ................................................ 5
3.1 Basic Theorems ............................................................................................................... 5
3.2 Quadratic Forms & Nonnegative Definite Matrices ........................................................ 6
3.2.1 Definitions ................................................................................................................ 7
3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices ........................................ 7
4 Inequalities and Extremal Properties of Eigenvalues ............................. 9
4.1 The Rayleigh Quotient and the Courant-Fischer Min-max Theorem .............................. 9
4.2 Some Eigenvalue Inequalities ....................................................................................... 11
4.3 Application to Principal Component Analysis (PCA) ................................................... 13
2
1 Terminology
(1) ๐๐ด(๐ฅ) โถ= det(๐ฅ๐ผ โ ๐ด) โ the characteristic polynomial of ๐ด
(2) The eigenvalues ๐ of ๐ด โถ= the roots of ๐๐ด(๐ฅ)
(3) The ๐ โeigenvectors ๐ โถ= the nonzero solutions to (๐๐ผ โ ๐ด)๐ = ๐
(4) The eigenvalue-eigenvector equation: ๐ด๐ = ๐๐
(5) ๐๐ด(๐) โถ= The eigenspace of a matrix ๐ด corresponding to the eigenvalue ๐
(6) The characteristic equation: |๐ฅ๐ผ โ ๐ด| = 0
(7) Standard inner product: โจ๐, ๐โฉ = ๐ง1๐ค1ฬ ฬ ฬ ฬ + โฏ+ ๐ง๐๐ค๐ฬ ฬ ฬ ฬ ๐, ๐ โ โ๐
2 Some Basic Properties and Results
Theorem 2.1
(a) The eigenvalues of ๐ด are the same as the eigenvalues of ๐ด๐
(b) ๐ด is singular if and only if at least one eigenvalue of ๐ด is equal to zero
(c) The eigenvalues and corresponding geometric multiplicities of ๐ต๐ด๐ตโ1 are the
same as those of ๐ด, if ๐ต is a nonsingular matrix
(d) The modulus of each eigenvalue of ๐ด is equal to 1 if ๐ด is an orthogonal matrix
Theorem 2.2(revision)
Suppose that ๐ is an eigenvalue, with multiplicity ๐ โฅ 1, of the ๐ ร ๐ matrix ๐ด. Then
1 โค dim{๐๐ด(๐)} โค ๐
๐๐๐๐๐. If ๐ is an eigenvalue of ๐ด , then by definition an ๐ โ ๐ satisfying the
equation ๐ด๐ = ๐๐ exists, and so clearly dim {๐๐ด(๐)} โฅ 1. Now let k = dim {๐๐ด(๐)},
and let ๐1, โฏ , ๐๐ be linearly independent eigenvectors corresponding to ๐. Form a
nonsingular ๐ ร ๐ matrix ๐ that has these ๐ vectors as its first ๐ columns; that is,
๐ has the form ๐ = [๐1 ๐2] , where ๐1 = (๐1,โฏ , ๐๐) and ๐2 is ๐ ร (๐ โ ๐) .
Since each column of ๐1 is an eigenvector of ๐ด corresponding to the eigenvalue ๐, it
follows that ๐ด๐1 = ๐๐1, and
3
๐โ1๐1 = [๐ผ๐(0)],
which follows from the fact that ๐โ1๐ = ๐ผ๐. As a result, we find that
๐โ1๐ด๐ = ๐โ1[๐ด๐1 ๐ด๐2]
= ๐โ1[๐๐1 ๐ด๐2]
= [๐๐ผ๐ ๐ต1(0) ๐ต2
],
where ๐ต1 and ๐ต2 are a partition of the matrix ๐โ1๐ด๐2 . If u is an eigenvalue of
๐โ1๐ด๐, then
0 = |๐โ1๐ด๐ โ ๐๐ผ๐| = |(๐ โ ๐)๐ผ๐ ๐ต1(0) ๐ต2 โ ๐๐ผ๐โ๐
|
= (๐ โ ๐)๐|๐ต2 โ ๐๐ผ๐โ๐|,
Thus, ๐ must be an eigenvalue of ๐โ1๐ด๐ with multiplicity of at least ๐. The result
follows because from Theorem 2.1(c), the eigenvalues and corresponding geometric
multiplicities of ๐โ1๐ด๐ are the same as those of ๐ด.
Theorem 2.3
Let ๐ be an eigenvalue of the ๐ ร ๐ matrix ๐ด , and let ๐ be a corresponding
eigenvector. Then,
(a) If ๐ โฅ 1 is an integer, ๐๐ is an eigenvalue of ๐ด๐ corresponding to the
eigenvector ๐.
(b) If ๐ด is nonsingular, ๐โ1 is an eigenvalue of ๐ดโ1 corresponding to the
eigenvector ๐.
๐๐๐๐๐.
(a) Let us prove by induction. Clearly, (a) holds when ๐ = 1 because it follows from
the definition of eigenvalue and eigenvector. Next, if (a) holds for ๐ โ 1, that is,
๐ด๐โ1๐ = ๐๐โ1๐, then
๐ด๐๐ = ๐ด(๐ด๐โ1๐) = ๐ด(๐๐โ1๐)
= ๐๐โ1(๐ด๐) = ๐๐โ1(๐๐) = ๐๐๐
(b) Let us premultiply the equation ๐ด๐ = ๐๐ by ๐ดโ1, which gives the equation
๐ = ๐๐ดโ1๐
Since ๐ด is nonsingular, we know from Theorem 2.1(b) that ๐ โ 0, and so dividing
4
both sides of the above equation by ๐ yields
๐ดโ1๐ = ๐โ1๐ ,
which implies that ๐ดโ1 has an eigenvalue ๐โ1 and corresponding eigenvector ๐.
Theorem 2.4
Let ๐ด be an ๐ ร ๐ matrix with eigenvalues ๐1, โฏ , ๐๐. Then
(a) tr(๐ด) = โ ๐๐๐๐=1
(b) |๐ด| = โ ๐๐๐๐=1
๐๐๐๐๐.
Express the characteristic equation, |๐ฅ๐ผ๐ โ ๐ด| = 0 into the polynomial form
๐ฅ๐ + ๐ผ๐โ1๐ฅ๐โ1 +โฏ+ ๐ผ1๐ฅ + ๐ผ0 = 0
To determine ๐0, we can substitute ๐ฅ = 0 into the equation, thus, ๐ผ0 = |(0)๐ผ๐ โ ๐ด| =
|๐ด|. To determine ๐ผ๐โ1, we are actually going to find the coefficient of ๐ฅ๐โ1. Recall
that the determinant is actually a sum of terms that are products of one entry in each
column(row) with row(column) positions spanning over all permutations of the integer
(1, 2,โฏ ,๐) with proper +/โ signs, it can be easily seen that the only term that
involves at least ๐ โ 1 of the diagonal elements of (๐ฅ๐ผ๐ โ ๐ด) is the term that
involves the product of all the diagonal elements. Now that this term involves an even
permutation as there exists a trivial composition of zero transpositions, the sign term
should be +1, therefore, ๐ผ๐โ1 will be the coefficient of ๐ฅ๐โ1 in
(๐ฅ โ ๐11)(๐ฅ โ ๐22)โฏ (๐ฅ โ ๐๐๐),
which is obviously โtr(๐ด) . Now finally, note that ๐1, โฏ , ๐๐ are the roots to the
characteristic equation, it follows that
(๐ฅ โ ๐1)(๐ฅ โ ๐2)โฏ (๐ฅ โ ๐๐) = 0
With coefficient matching, we find that
|๐ด| =โ๐๐
๐
๐=1
, tr(๐ด) =โ๐๐
๐
๐=1
which completes the proof.
5
3 Eigen-properties of Hermitian Matrices
3.1 Basic Theorems
Theorem 3.1.1
Let ๐ด โ ๐๐ร๐ be a matrix, then ๐ด is hermitian if and only if
โจ๐ด๐, ๐โฉ = โจ๐, ๐ด๐โฉ
For all vectors ๐, ๐ โ โ๐.
๐๐๐๐๐.
(1) If ๐ด is hermitian, then โจ๐ด๐, ๐โฉ = ๐๐๐ด๐ = ๐๐๐ด๐๐ = (๐ด๐)๐ป๐ = โจ๐, ๐ด๐โฉ , which
proves the " โน " part.
(2) For the converse direction, let ๐, ๐ take the form of ๐๐, โฏ , ๐๐ respectively, which
are the standard basis of โ๐, then it is immediately clear that โ๐, ๐ = 1,โฏ , ๐, ๐๐๐ =
๐๐๐ฬ ฬ ฬ โน ๐ด is hermitian.
Theorem 3.1.2 (The Schurโs Theorem Revisited)
Let ๐ด โ ๐๐ร๐ be a matrix, there exists a unitary matrix ๐ such that
๐๐ป๐ด๐ = ๐
is upper triangular, which is called the Schurโs Theorem. Actually, it is just the complex
counterpart of the Triangulation Theorem we learnt in class. With a minor modification,
we may write
๐ด = ๐๐๐๐ป
And this is called the Schur Decomposition, with the diagonal entries of ๐ being the
eigenvalues of ๐ด.
3.1.3 Theorem (The Spectral Theorem Revisited)
If the ๐ด in the last theorem turns out to be hermitian, then the corresponding ๐ will
become diagonal. This is called the Spectral Theorem. Similarly, it is the complex
extension of the Principal Axis Theorem we learnt in class. Again, with the same
modification, we may rewrite it as
๐ด = ๐๐๐๐ป
This is called the Spectral Decomposition. These two eigenvalue decompositions are
just special cases of the Singular Value Decomposition which applies to nonsquare
6
matrices.
3.1.4 Theorem
Let ๐ด โ ๐๐ร๐ be a hermitian matrix, then
(a) ๐๐ป๐ด๐ is real โ๐ โ โ๐.
(b) All eigenvalues of ๐ด are real
(c) ๐๐ป๐ด๐ is hermitian for all ๐ โ ๐๐ร๐.
(d) Eigenvectors of ๐ด corresponding to distinct eigenvalues are orthogonal.
(e) It is possible to construct a set of ๐ orthonormal eigenvectors of ๐ด.
๐๐๐๐๐.
(a) (๐๐ป๐ด๐ฬ ฬ ฬ ฬ ฬ ฬ ฬ ) = (๐๐ป๐ด๐)๐ป = ๐๐ป๐ด๐ป๐ = ๐๐ป๐ด๐ , that is, ๐๐ป๐ด๐ equals its complex
conjugate and hence is real.
(b) (1) If ๐ด๐ = ๐๐ and ๐๐ป๐ = ๐ โ โ+ , then ๐ =๐
๐๐๐ป๐ =
1
๐๐๐ป๐๐ =
1
๐(๐๐ป๐ด๐ ),
which is real by (a).
(2) Alternative proof for (b): let ๐, ๐ be 2 eigenvalues of ๐ด, having eigenvectors
๐, ๐ correspondingly, it follows that ๐ด๐ = ๐๐ and ๐ด๐ = ๐๐ , according to
Theorem 3.1.1, we have ฮปโจ๐, ๐โฉ = โจ๐๐, ๐โฉ = โจ๐ด๐, ๐โฉ = โจ๐, ๐ด๐โฉ = โจ๐, ๐๐โฉ =
๏ฟฝฬ ๏ฟฝโจ๐, ๐โฉ. In the case where ๐ = ๐ & ๐ = ๐, it becomes ๐โจ๐, ๐โฉ = ๏ฟฝฬ ๏ฟฝโจ๐, ๐โฉ, which
in turn implies that ๐ = ๏ฟฝฬ ๏ฟฝ since we know โจ๐, ๐โฉ = โ๐โ2 > 0 for a nonzero
eigenvector ๐. Therefore, ๐ must be real. And similarly, ๐ is real.
(c) (๐๐ป๐ด๐)๐ป = ๐๐ป๐ด๐ป๐ = ๐๐ป๐ด๐, so ๐๐ป๐ด๐ is always hermitian.
(d) Following the discussion in (b)(2), finally we can get the equation ๐โจ๐, ๐โฉ =
๐โจ๐, ๐โฉ, so if ๐ โ ๐, then it follows immediately that โจ๐, ๐โฉ = 0, thus implying
that ๐ฅ and ๐ฆ are orthogonal.
(e) Following from the Spectral Decomposition, we rewrite it as
๐ด๐ = ๐๐
โ A[๐1 โฏ ๐๐] = [๐๐ โฏ ๐๐] [๐1 0
โฑ0 ๐๐
]
โ [๐ด๐1 โฏ ๐ด๐๐] = [๐๐1 โฏ ๐๐๐]
Therefore, it can be easily seen that (๐1 โฏ ๐๐) are the corresponding
eigenvectors to (๐1 โฏ ๐๐). As ๐ is unitary, it follows that these eigenvectors
are orthonormal.
3.2 Quadratic Forms & Nonnegative Definite Matrices
7
3.2.1 Definitions
(a) Let ๐ด โ ๐๐ร๐ be a symmetric matrix (hermitian matrix with real entries) and ๐
denote a ๐ ร 1 column vector, then ๐ = ๐๐๐ด๐ is said to be a quadratic form.
Observe that
๐ = ๐๐๐ด๐ = (๐ฅ1 โฏ ๐ฅ๐) (
๐๐๐ โฏ ๐๐๐โฎ โฑ โฎ๐๐๐ โฏ ๐๐๐
)(
๐ฅ1โฎ๐ฅ๐)
= (๐ฅ1 โฏ ๐ฅ๐)
(
โ๐๐๐๐ฅ๐
โฎ
โ๐๐๐๐ฅ๐)
= โ๐๐๐๐ฅ๐๐ฅ๐๐,๐
(b) Let ๐ด โ ๐๐ร๐ be a symmetric matrix, then ๐ด is
(1) Positive definite if โ๐ โ 0 & ๐ โ โ๐, ๐ = ๐๐๐ด๐ > 0
(2) Nonnegative definite (Positive semidefinite) if โ๐ โ โ๐, ๐ = ๐๐๐ด๐ โฅ 0
(3) Negative definite if โ๐ โ 0 & ๐ โ โ๐, ๐ = ๐๐๐ด๐ < 0
(4) Nonpositive definite (Negative semidefinite) if โ๐ โ โ๐, ๐ = ๐๐๐ด๐ โค 0
(5) Indefinite if ๐ > 0 for some ๐ while ๐ < 0 for some other ๐
We are only interested in positive or nonnegative cases because all theorems will be
similar for negative or nonpositive cases.
3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices
3.2.2.1 Theorem
Let ๐1, โฏ , ๐๐ be the eigenvalues of the ๐ ร ๐ symmetric matrix ๐ด, then
(a) ๐ด is positive definite if and only if ๐๐ > 0 for all ๐,
(b) ๐ด is nonnegative definite if and only if ๐๐ โฅ 0 for all ๐.
๐๐๐๐๐.
(a) Let the columns of ๐ = (๐1 โฏ ๐๐) be a set of orthonormal eigenvectors of ๐ด
corresponding to the eigenvalues ๐1, โฏ , ๐๐ , so that ๐ด = ๐๐๐๐ , where ๐ =
diag (๐1, โฏ , ๐๐) . If ๐ด is positive definite, then ๐๐๐ด๐ > 0 for all ๐ โ ๐ , so in
particular, choosing ๐ = ๐๐, we have
๐๐๐๐ด๐๐ = ๐๐
๐ป(๐๐๐๐) = ๐๐๐๐๐๐๐ = ๐๐ > 0
8
Conversely, if ๐๐ > 0 for all ๐, then for any ๐ โ 0 define ๐ = ๐๐๐, and note that
๐๐๐ด๐ = ๐๐๐๐๐๐๐ = ๐๐ป๐๐ =โ๐ฆ๐๐๐๐
๐
๐=๐
has to be positive since the ๐๐s are all positive and at least one of the ๐ฆ๐๐s is positive
because ๐ โ ๐.
(b) By similar argument as in (a), it is easy to show that ๐ด is nonnegative definite if
and only if ๐๐ โฅ 0 for all ๐.
3.2.2.2 Theorem
Let ๐ be an ๐ ร ๐ real matrix with rank(๐) = ๐. Then
(a) ๐๐๐ has ๐ positive eigenvalues. It is always nonnegative definite and positive
definite if ๐ = ๐.
(b) The positive eigenvalues of ๐๐๐ are equal to the positive eigenvalues of ๐๐๐.
๐๐๐๐๐.
(a) For any nonzero ๐ ร 1 vector ๐, let ๐ = ๐๐, then
๐๐๐๐๐๐ = ๐๐๐ =โ๐ฆ๐2
๐
๐=1
is nonnegative, so ๐๐๐ is nonnegative definite, and thus by Theorem 3.2.2.1(b), all of
its eigenvalues are nonnegative. Further, observe that ๐ โ ๐ is an eigenvector of ๐๐๐
corresponding to a zero eigenvalue if and only if ๐ = ๐๐ = ๐ and thus the above
equation equals zero. Therefore, the number of zero eigenvalues equals the dimension
of ๐๐ข๐๐(๐), which is ๐ โ ๐, so (a) is proved.
(b) Let ๐ > 0 be an eigenvalue of ๐๐๐ with multiplicity โ. Since the ๐ ร ๐ matrix
๐๐๐ is symmetric, we can find an ๐ ร โ matrix ๐, whose columns are orthonormal,
satisfying
๐๐๐๐ = ๐๐.
Let ๐ = ๐๐ and observe that
๐๐๐๐ = ๐๐๐๐๐ = ๐(๐๐) = ๐๐๐ = ๐๐,
so that ๐ is also an eigenvalue of ๐๐๐ with multiplicity also being โ because
rank(๐) = rank(๐๐) = rank((๐๐)๐๐๐)
= rank(๐๐๐๐๐๐) = rank(๐๐๐๐)
9
= rank(๐๐ผโ) = โ
So the proof is done.
4 Inequalities and Extremal Properties of Eigenvalues
4.1 The Rayleigh Quotient and the Courant-Fischer Min-max
Theorem
In this section, we are going to investigate some extremal properties of the eigenvalues
of a hermitian matrix, and see how to turn the problem of finding the eigenvalues into
a constrained optimization problem.
4.1.1 Definition
Let ๐ด โ ๐๐ร๐ be a hermitian matrix, then the Rayleigh quotient of ๐ด, denoted as
๐ ๐ด(๐), is a function from โ๐\{๐} to โ, defined as follows:
๐ ๐ด(๐) =๐๐ป๐ด๐
๐๐ป๐
It is not difficult to see that when the norm of ๐, โ๐โ = 1, the Rayleigh quotient of ๐ด
actually equals its quadratic form. In the next part, we are ready to relate the Rayleigh
quotient of a hermitian matrix to it eigenvalues.
4.1.2 Theorem
Let ๐ด be a hermitian ๐ ร ๐ matrix with ordered eigenvalues ๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐. For
any ๐ โ โ๐\{๐}
๐๐ โค๐๐ป๐ด๐
๐๐ป๐โค ๐1
And, in particular
๐๐ = min๐โ ๐
๐๐ป๐ด๐
๐๐ป๐
๐1 = max๐โ ๐
๐๐ป๐ด๐
๐๐ป๐
10
๐๐๐๐๐.
Let ๐ด = ๐๐๐๐ป be the spectral decomposition of ๐ด , where the columns of ๐ =(๐๐ โฏ ๐๐) are the orthonormal set of eigenvectors corresponding to ๐1,โฏ , ๐๐ ,
which make up the diagonal entries of the diagonal matrix ๐ . As in the proof of
Theorem 3.2.2.1, define ๐ = ๐๐ป๐, then we have
๐๐ป๐ด๐
๐๐ป๐=๐๐ป๐๐๐๐ป๐
๐๐ป๐๐๐ป๐=๐๐ป๐๐
๐๐ป๐=โ ๐๐๐ฆ๐
๐๐๐=๐
โ ๐ฆ๐๐๐๐=๐
Together with the fact that
๐๐โ๐ฆ๐๐
๐
๐=๐
โคโ๐๐๐ฆ๐๐
๐
๐=๐
โค ๐๐โ๐ฆ๐๐
๐
๐=๐
The proof is complete.
In fact, we can see that the implication of this theorem is that we may regard the problem
of finding the largest and smallest eigenvalues of a hermitian matrix as a constrained
optimization problem:
maximize: ๐๐ป๐ด๐
subject to: ๐๐ป๐ = 1
Below is a theorem that generalizes the above theorem to all eigenvalues of ๐ด.
4.1.3 Theorem (the Courant-Fischer min-max theorem)
Let ๐ด be an ๐ ร ๐ hermitian matrix, then
(a) ๐๐ = maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป๐ด๐
(b) ๐๐ = mindim(๐)=๐โ๐+1
max๐โ๐,โ๐โ=1
๐๐ป๐ด๐
๐๐๐๐๐.
(a) Recall that ๐๐ป๐ด๐ = ๐๐ป๐๐ = โ ๐๐๐ฆ๐๐๐
๐=๐ , where ๐ = ๐๐ป๐, or equivalently, ๐ =
๐๐, and also note that the linear transformation from ๐ to ๐ is an isomorphism and
there is no scaling as ๐ is unitary, so we may change the constraint to dim(๐) =
๐, ๐ โ ๐, โ๐โ = 1. And in order to get the maximum under the constraint dim(๐) =
๐, we just choose ๐ = span{๐๐, โฏ , ๐๐}. Therefore, it is easily verified that
11
๐๐ โค min๐โspan{๐๐,โฏ ,๐๐}
โ๐๐๐ฆ๐๐
๐
๐=๐
= maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป๐๐
= maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป๐ด๐
Now it remains to prove that the left-hand side of the above equation is greater than or
equal to the right-hand side to finally get the equality. To prove it, we must show that
every ๐ โdimensional subspace ๐ of โ๐ contains a unit vector ๐ such that
๐๐ โฅ ๐๐ป๐ด๐
And from previous discussion, we know that it is equivalent to say that every
๐ โdimensional subspace ๐ of โ๐ contains a unit vector ๐ such that
๐๐ โฅ ๐๐ป๐ด๐
Now let ๐บ = span{๐๐, โฏ , ๐๐} with dimension ๐ โ ๐ + 1, so it must have nonempty
intersection with every ๐. Then let ๐ be a unit vector in ๐บ โฉ๐, we may write
๐๐ป๐๐ =โ๐๐๐ค๐๐
๐
๐=๐
with
โ๐ค๐ = 1
๐
๐=๐
Thus it is immediately clear that the reverse inequality is proved. So finally we achieve
the equality.
(b) This can be proved simply by replacing ๐ด by โ ๐ด and using the fact that
๐๐(โ๐ด) = โ๐๐โ๐+1(๐ด).
4.2 Some Eigenvalue Inequalities
In this section we are going to introduce a few inequalities concerning eigenvalues,
which may be applied in eigenvalue estimation and inferences and also eigenvalue
perturbation theories. It turns out that many of these inequalities can be proved using
the min-max theorem derived in the last section.
12
4.2.1 Theorem (Weylโs inequality)
Let ๐ด, ๐ต โ ๐๐ร๐ be hermitian matrices, then for 1 โค ๐ โค ๐, we have
๐๐(๐ด) + ๐๐(๐ต) โค ๐๐(๐ด + ๐ต) โค ๐๐(๐ด) + ๐๐(๐ต)
๐๐๐๐๐.
First, we have
๐๐(๐ด + ๐ต) = maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป(๐ด + ๐ต)๐
= maxdim(๐)=๐
min๐โ๐,โ๐โ=1
( ๐๐ป๐ด๐ + ๐๐ป๐ต๐)
โฅ maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป๐ด๐ + min๐โ๐,โ๐โ=1
๐๐ป๐ต๐
= maxdim(๐)=๐
min๐โ๐,โ๐โ=1
๐๐ป๐ด๐ + ๐๐(๐ต) = ๐๐(๐ด) + ๐๐(๐ต)
Which proves the left inequality. The right inequality can be proved in exactly the same
manner.
4.2.2 Corollary
Let ๐ด โ ๐๐ร๐ be a hermitian matrix, ๐ต โ ๐๐ร๐ be a positive semidefinite matrix,
then for 1 โค ๐ โค ๐, we have
๐๐(๐ด) โค ๐๐(๐ด + ๐ต)
4.2.3 Theorem
Let ๐ด, ๐ต โ ๐๐ร๐ be hermitian matrices, if 1 โค ๐1 โค โฏ โค ๐๐ โค ๐, then
โ๐๐โ(๐ด + ๐ต) โค
๐
โ=1
โ๐๐โ(๐ด) +
๐
โ=1
โ๐๐โ(๐ต)
๐
โ=1
4.2.4 Theorem (Cauchy Interlacing Theorem)
Let ๐ด โ ๐๐ร๐ be a hermitian matrix with eigenvalues ๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐ , and
partitioned as follows:
๐ด = [๐ป ๐ตโ
๐ต ๐ ]
where ๐ป โ ๐๐ร๐ with eigenvalues ๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐, then
๐๐ โฅ ๐๐ โฅ ๐๐+๐โ๐
13
4.3 Application to Principal Component Analysis (PCA)
Principal component analysis (PCA) is a technique that is useful for the compression
and classification of data. The purpose is to reduce the dimensionality of a data set
(sample) by finding a new set of variables, smaller than the original set of variables that
nonetheless retains most of the sample's information.
4.3.1 Definition
Let X = (๐๐, ๐๐, โฏ , ๐๐) be an ๐ ร ๐ data matrix with p being the number of
variables and n being the number of observations for each variable. Define the first
principal component of the sample by the linear transformation
๐๐ = ๐๐๐ =โ๐๐1๐๐
๐
๐=1
where the vector ๐๐ = (๐11, ๐21, โฏ , ๐๐1)๐ is chosen such that var[๐๐] is maximal.
Likewise, the ๐th principal component is defined as the linear transformation
๐๐ = ๐๐๐ =โ๐๐1๐๐ , ๐ = 1,โฏ , ๐
๐
๐=1
where the vector ๐๐ = (๐1๐, ๐2๐, โฏ , ๐๐๐)๐ is chosen such that var[๐๐] is maximal
subject to cov[๐๐, ๐๐] = 0 โน ๐๐๐๐๐ = 0, ๐ > ๐ โฅ 1 and to ๐๐
๐๐๐ = 1.
After some computation, we get var[๐๐] = ๐๐๐๐๐๐ , where ๐ =
1
๐โ1๐๐๐ is the
covariance matrix of the data matrix ๐. It is clear that ๐ is nonnegative definite, thus
having nonnegative eigenvalues. If we want to find ๐(๐ < ๐) principal components,
then we get
๐ = [๐๐ โฏ ๐๐] = ๐[๐๐ โฏ ๐๐] = ๐๐ด
The matrix ๐ is called the score matrix while ๐ด is called the loading matrix.
4.3.2 Methods for implementing PCA
14
(a) Constrained optimization
To maximize ๐๐๐๐๐๐ subject to ๐๐
๐๐๐ = 1 , we use the technique of Lagrange
multipliers. We want to maximize the function
๐๐๐๐๐๐ โ ๐(๐๐
๐๐๐ โ 1)
w. r. t. ๐๐
by differentiating w. r. t. ๐๐ :
d
d๐๐(๐๐
๐๐๐๐ โ ๐(๐๐๐๐๐ โ 1)) = ๐
๐๐๐ โ ๐๐๐ = ๐ ๐๐๐ = ๐๐๐
From this step, it is obvious that ๐ is an eigenvalue of ๐(and of course, we can proceed
to this step without using the Lagrange multiplier, but instead using the min-max
theorem), so to maximize ๐๐๐๐๐๐ , certainly we are going to choose the largest
eigenvalue ๐1 of ๐. And then set ๐๐ to be its corresponding eigenvector, then we
get our first principal component ๐๐ = ๐๐๐๐, which finishes our first step. To find the
principal component, note that ๐๐, โฏ , ๐๐ constitute an orthonormal basis, so to satisfy
the zero-covariance constraint, we have to do the optimization in the orthogonal
complement of span{๐๐,โฏ , ๐๐โ๐} , thus according to the min-max theorem,
max ๐๐๐๐๐๐ = ๐๐, ๐๐ = ๐๐
๐๐, where ๐๐ is the unit eigenvector corresponding to
๐๐. Finally, we conclude that ๐ = [๐๐ โฏ ๐๐] = ๐[๐๐ โฏ ๐๐].
(2) Spectral decomposition and singular value decomposition (SVD)
First, we draw the conclusion from (1) that if we write the spectral decomposition of ๐
as
๐ = ๐ด๐๐ด๐
where ๐ = diag( ๐1, ๐2, โฏ , ๐๐) with ๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐ , then the first ๐ columns
of ๐ด just makes up the loading matrix that we want.
Next, let ๐ = ๐๐ด๐๐ be the singular value decomposition of the data matrix ๐, recall
that ๐๐๐ = ๐ = ๐๐ด2๐๐, so the columns of ๐ are the unit eigenvectors of ๐, then let
๐ด = ๐, we finally get ๐ = ๐๐ด = ๐๐ = ๐๐ด๐๐๐ = ๐๐ด.
In practice, the singular value decomposition is the standard way to do PCA since it
avoids the trouble of computing ๐๐๐.