eign cha

8. the singular value decomposition

cmda 3606; mark embree

version of 5 May 2014

The singular value decomposition (SVD) is among the mostimportant and widely applicable matrix factorizations. It provides anatural way to untangle a matrix into its four fundamental subspaces,and reveals the relative importance of each direction within thosesubspaces. Thus the singular value decomposition is a vital toolfor analyzing data, and it provides a slick way to understand (andprove) many fundamental results in matrix theory. It is the perfecttool for solving least squares problems, and provides the best way toapproximate a matrix with one of lower rank. These notes constructthe SVD in various forms, then describe a few of its most compellingapplications.

8.1 Helpful facts about symmetric matrices

To derive the singular value decomposition of a general (rectangular)matrix A ∈ Cm×n, we shall rely on several special properties of thesquare, symmetric matrix A∗A. For this reason we first recall somefundamental results from the theory of symmetric matrices.

We shall use the term “symmetric” to mean a matrix where A∗ = A.Often such matrices are instead called “Hermitian” or “self-adjoint,” andthe term “symmetric” is reserved for matrices with AT = A. If the entries ofA are all real, there is no distinction.

For example, when

H =

[3 −1−1 3

],

we have λ1 = 4 and λ2 = 2, with

v1 =

[ √2/2

−√

2/2

], v2 =

[√2/2√2/2

].

Note that these eigenvectors are unitvectors, and they are orthogonal.

Theorem 8.1 (Spectral Theorem) Suppose H ∈ Cn×n is symmetric.Then there exist n (not necessarily distinct) eigenvalues λ1, . . . , λn andcorresponding unit-length eigenvectors v1, . . . , vn such that

Hvj = λjvj.

The eigenvectors form an orthonormal basis for Cn:

Cn = span{v1, . . . , vn}

and v∗j vk = 0 when j 6= k, and v∗j vj = ‖vj‖2 = 1.

As a consequence of the Spectral Theorem, we can write any sym-metric matrix H ∈ Cn×n in the form

H =n

∑j=1

λjvjv∗j . (8.1)

This equation expresses H as the sum of the special rank-1 matricesλjvjv

∗j . The singular value decomposition will provide a similar way

to tease apart a rectangular matrix.

8. the singular value decomposition 2

Theorem 8.2 All eigenvalues of a symmetric matrix are real.

Proof. Let (λj, vj) be an arbitrary eigenpair of the symmetric matrixH, so that Hvj = λjvj. Without loss of generality, we can assume thatvj is scaled so that ‖vj‖ = 1, i.e., v∗j vj = ‖vj‖2 = 1. Thus

λj = λj(v∗j vj) = v∗j (λjvj) = v∗j (Hvj).

Since H is symmetric, H = H∗, and so

v∗j (Hvj) = v∗j H∗vj = (Hvj)∗vj = (λjvj)

∗vj = λjv∗j vj = λj.

Thus λj = λj, which is only possible if λj is real.

For the example above,

x∗Hx =

[x1x2

]∗ [ 3 −1−1 3

] [x1x2

]= 3x2

1 − 2x1x2 + x22

= 2(x1 − x2)2 + (x1 + x2)

2.

This last expression, the sum of squares,is clearly positive for all nonzero x, soH is positive definite.

Definition 8.1 A symmetric matrix H ∈ Cn×n is positive definiteprovided x∗Hx > 0 for all nonzero x ∈ Cn; if x∗Hx ≥ 0 for all x ∈ Cn, wesay H is positive semidefinite.

Theorem 8.3 All eigenvalues of a symmetric positive definite matrix arepositive; all eigenvalues of a symmetric positive semidefinite matrix arenonnegative.

Proof. Let (λj, vj) denote an eigenpair of the symmetric positive defi-nite matrix H ∈ Cn×n with ‖vj‖2 = v∗j vj = 1. Since H is symmetric ,λj must be real. We conclude that

λj = λjv∗j vj = v∗j (λjvj) = v∗j Hvj,

which must be positive since H is positive definite and vj 6= 0.The proof for positive semidefinite matrices is the same, except we

can only conclude that λj = v∗j Hvj ≥ 0.

Can you prove the converse of thistheorem? (A symmetric matrix withpositive eigenvalues is positive defi-nite.) Hint: use the Spectral Theorem.With this result, we can check if H ispositive definite by just looking at itseigenvalues, rather than working out aformula for x∗Hx, as done above.

8.2 Derivation of the singular value decomposition: Full rank case

We seek to derive the singular value decomposition of a general rect-angular matrix. To simplify our initial derivation, we shall assumethat A ∈ Cm×n with m ≥ n, and that rank(A) is as large as possible,i.e.,

rank(A) = n.

First, form A∗A, which is an n× n matrix. Notice that A∗A is alwaysA∗

A

= A∗Asymmetric, since(A∗A)∗ = A∗(A∗)∗ = A∗A.

Furthermore, this matrix is positive definite: notice that

x∗A∗Ax = (Ax)∗(Ax) = ‖Ax‖2 ≥ 0.


Since rank(A) = n, notice that Keep this in mind: If rank(A) < n, thendim(N(A)) > 0, so there exist x 6= 0for which x∗A∗Ax = ‖Ax‖2 = 0. HenceA∗A will only be positive semidefinitein this case.

dim(N(A)) = n− rank(A) = 0.

Since the null space of A is trivial, Ax 6= 0 whenever x 6= 0, so

x∗A∗Ax = ‖Ax‖2 > 0

for all nonzero x. Hence A∗A is positive definite.

We are now ready to construct our first version of the singularvalue decomposition. We shall construct the pieces one at a time,then assemble them into the desired decomposition.

Step 1. Compute the eigenvalues and eigenvectors of A∗A.

As a consequence of results about symmetric matrices presentedabove, we can find n eigenpairs {(λj, vj)}n

j=1 of H = A∗A with unit

eigenvectors (v∗j vj = ‖vj‖2 = 1) that are orthogonal to one another(v∗j vk = 0 when j 6= k). We are free to pick any convenient indexingfor these eigenpairs; we shall label them so that the eigenvaluesare decreasing in size, λ1 ≥ λ2 ≥ · · · ≥ λn > 0. It is helpful to

Even if A is a square matrix, be sure tocompute the eigenvalues and eigenvec-tors of A∗A.

Since A∗A is positive definite, all itseigenvalues are positive.emphasize that v1, . . . , vn ∈ Cn.

Step 2. Define σj = ‖Avj‖ =√

λj, j = 1, . . . , n.

Note that σ2j = ‖Avj‖2

2 = v∗j A∗Avj = λj. Since the eigenvaluesλ1, . . . , λn are decreasing in size, so too are the σj values:

σ1 ≥ σ2 ≥ · · · ≥ σn > 0.

Step 3. Define uj = Avj/σj for j = 1, . . . , n.

Notice that u1, . . . , un ∈ Cm. Because σj = ‖Avj‖, we ensure that

The assumption that rank(A) = nhelped us out here, by ensuring thatσj > 0 for all j: hence we can safelydivide by σj in the definition of uj.

‖uj‖ =∥∥∥ 1

σjAvj

∥∥∥ =‖Avj‖

σj= 1.

Furthermore, these uj vectors are orthogonal. To see this, write

u∗j uk =1

σjσk(Avj)

∗(Avk) =1

σjσkv∗j A∗Avk.

Since vk is an eigenvector of A∗A corresponding to eigenvalue λk,

1σjσk

v∗j A∗Avk =1

σjσkv∗j (λkvk) =

λj

σjσkv∗j vk.

Since the eigenvectors of the symmetric matrix A∗A are orthogo-nal, v∗j vk = 0 when j 6= k, so the uj vectors inherit the orthogonal-ity of the vj vectors:

u∗j uk = 0, j 6= k.


Step 4. Put the pieces together.

For all j = 1, . . . , n,Avj = σjuj,

regardless of whether σj = 0 or not. We can stack these n vector

Av1 · · ·Avn = σ1u1 · · · σnunequations as columns of a single matrix equation, | | |

Av1 Av2 · · · Avn

| | |

=

| | |σ1u1 σ2u2 · · · σnun

| | |

.

Note that both matrices in this equation can be factored into theproduct of simpler matrices:

A

| | |v1 v2 · · · vn

| | |

=

| | |u1 u2 · · · un

| | |

σ1

σ2. . .

σn

.

Denote these matrices as

A

v1 · · · vn

= u1 · · · un

σ1

. . .

σn

AV = UΣ, (8.2)

where A ∈ Cm×n, V ∈ Cn×n, U ∈ Cm×n, and Σ ∈ Cn×n.

We now have all the ingredients for various forms of the sin-gular value decomposition. Since the eigenvectors vj of the symmet-ric matrix A∗A are orthonormal, the square matrix V has orthonor-mal columns. This means that

V∗V = I,

since the (j, k) entry of V∗V is simply v∗j vk. Since V is square, the

equation V∗V = I implies that V∗ = V−1. Thus, in addition to V∗V,

The inverse of a square matrix isunique: since V∗ does what the inverseof V is supposed to do, i.e., V∗V = I, itmust be the unique matrix V−1.

we also haveVV∗ = VV−1 = I.

Thus multiplying both sides of equation (8.2) on the right by V∗ gives

A = UΣV∗. (8.3)

This factorization is the reduced (or skinny) singular value decomposi-

A = U

Σ V∗

tion of A. It can be obtained via the MATLAB command

[Uhat, Sighat, V] = svd(A,0).


What can be said of the matrix U ∈ Cm×n? Recall that its columns,the vectors u1, . . . , un, are orthonormal. However, in contrast to V, wecannot conclude that UU∗ = I when m > n. Why not? Because whenm > n, U has a nontrivial null space, and hence cannot be invertible.

When m > n, there must existsome nonzero z ∈ Cm such thatz ⊥ u1, . . . , un, which implies U∗z = 0.Hence UU∗z = 0, so we cannot haveUU∗ = I. However, UU∗ ∈ Cm×m

is a projector onto the n-dimensionalsubspace span{u1, . . . , un} of Cm.

We wish to augment the matrix U with m− n additional columnvectors, to give a full set of m orthonormal vectors in Cm. Here is therecipe to find these extra vectors: For j = n + 1, . . . , m, pick

uj ⊥ span{u1, . . . , uj−1}

with u∗j uj = 1. Then define

U =

| | | |u1 · · · un un+1 · · · um

| | | |

∈ Cm×m. (8.4)

We have constructed u1, . . . , um to be orthonormal vectors, so

U∗U = I.

However, since U ∈ Cm×m, this orthogonality also implies U−1 = U∗.

U

Σ =

U Σ

Now we are ready to replace the rectangular matrix U ∈ Cm×n inthe reduced SVD (8.3) with the square matrix U ∈ Cm×m. To do so,we also need to replace Σ ∈ Cn×n by some Σ ∈ Cm×n in such a waythat

UΣ = UΣ.

The simplest approach is to obtain Σ by appending zeros to the endof Σ, thus ensuring there is no contribution when the new entries ofU multiply against the new entries of Σ:

Σ =

[Σ

0

]∈ Cm×n. (8.5)

Finally, we arrive at the main result, the full singular value decomposi-

U

Σ =

U U

Σ

0

tion, for the case where rank(A) = n.

Theorem 8.4 (Singular value decomposition, provisional version)Suppose A ∈ Cm×n has rank(A) = n, with m ≥ n. Then we can write

A = UΣV∗,

where the columns of U ∈ Cm×m and V ∈ Cn×n are orthonormal,

U∗U = I ∈ Cm×m, V∗V = I ∈ Cn×n,

and Σ ∈ Cm×n is zero everywhere except for entries on the main diagonal,where the (j, j) entry is σj, for j = 1, . . . , n and

σ1 ≥ σ2 ≥ · · · ≥ σn > 0.


A = U Σ

V∗

The full SVD is obtained via the MATLAB command

[U,S,V] = svd(A).

Definition 8.2 Let A = UΣV∗ be a full singular value decomposition. Thediagonal entries of Σ, denoted σ1 ≥ σ2 ≥ · · · ≥ σn, are called the singularvalues of A. The columns u1, . . . , um of U are the left singular vectors;the columns v1, . . . , vm of V are the right singular vectors.

8.3 The dyadic form of the SVD

We are now prepared to develop an analogue of the formula (8.1) forrectangular matrices. Consider the reduced SVD,

A = UΣV∗,

and multiply UΣ to obtain

| | |u1 u2 · · · un

| | |

σ1

σ2. . .

σn

=

| | |σ1u1 σ1u2 · · · σnun

| | |

.

Now notice that you can write A = (UΣ)V∗ as | | |σ1u1 σ1u2 · · · σnun

| | |

| | |

v1 v2 · · · vn

| | |

=n

∑j=1

σjujv∗j ,

which parallels the form (8.1) we had for symmetric matrices:

A =n

∑j=1

σjujv∗j . (8.6)

This expression is called the dyadic form of the SVD. Because we have

A =n

∑j=1

σj uj

vj∗

=n

∑j=1

σj ujv∗j

ordered σ1 ≥ σ2 ≥ · · · ≥ σn, the leading terms in this sum dominatethe others. This fact plays a crucial role in applications where wewant to approximate a matrix with its leading low-rank part.

8.4 A small example

Consider the matrix

A =

1 10 0√2 −

√2

,

for which A∗A is the symmetric matrix used as an example earlier inthese notes:

A∗A =

[3 −1−1 3

].


This matrix has rank(A) = 2 = n, so we can apply the analysisdescribed above.


We have already seen that, for this matrix, λ1 = 4 and λ2 = 2, with

v1 =

[ √2/2

−√

2/2

], v2 =

[√2/2√2/2

],

with λ1 ≥ λ2, the required order. The vectors v1 and v2 will be theright singular vectors of A.


λj, j = 1, . . . , n.

In this case, we compute

σ1 =√

λ1 = 2, σ2 =√

λ2 =√

2.

Alternatively, we could have computed the singular values from

Av1 =

1 10 0√2 −

√2

[ √2/2−√

2/2

]=

002

Av2 =

1 10 0√2 −

√2

[√2/2√2/2

]=

√

200

,

with σ1 = ‖Av1‖ = 2 and σ2 = ‖Av2‖ =√

2.

Step 3. Define uj = Avj/σj, j = 1, . . . , n.

We use the vectors Av1 and Av2 computed at the last step:

u1 =1σ1

Av1 =12

002

=

001

, u2 =1σ2

Av2 =1√2

√

201

=

100

.


We immediately have the reduced SVD A = UΣV∗: 1 10 0√2 −

√2

=

0 10 01 0

[ 2 00√

2

] [√2/2 −

√2/2√

2/2√

2/2

].

To get the full SVD, we need a unit vector u3 that is orthogonal tou1 and u2. In this case, such a vector is easy to spot:

u3 =

010

.


Thus we can write the full SVD A = UΣV∗: 1 10 0√2 −

√2

=

0 1 00 0 11 0 0

2 00√

20 0

[√2/2 −√

2/2√2/2

√2/2

].

Finally, we write the dyadic form of the SVD, A = ∑2j=1 σjujv

∗j : 1 1

0 0√2 −

√2

= 2

001

[√

2/2 −√

2/2 ] +√

2

100

[√

2/2√

2/2 ]

=

0 00 0√2 −

√2

+

1 10 00 0

.

8.5 Derivation of the singular value decomposition: Rank defi-cient case

Having computed the singular value decomposition of a matrix A ∈Cm×n with rank(A) = n, we must now consider the adjustmentsnecessary when rank(A) = r < n, still with m ≥ n.

Recall that the dimension of the null space of A is given by

dim(N(A)) = n− rank(A) = n− r.

How do the null spaces of A and A∗A compare?

Lemma 8.1 For any matrix A ∈ Cm×n, N(A∗A) = N(A).

Proof. First we show that N(A) is contained in N(A∗A). If x ∈ N(A),then Ax = 0. Premultiplying by A∗ gives A∗Ax = 0, so x ∈ N(A∗A).

Now we show that N(A∗A) is contained in N(A). If x ∈ N(A∗A),then A∗Ax = 0. Premultiplying by x∗ gives

0 = x∗A∗Ax = (Ax)∗(Ax) = ‖Ax‖2.

Since ‖Ax‖ = 0, we conclude that Ax = 0, and so x ∈ N(A).Since the spaces N(A) and N(A∗A) each contain the other, we

conclude that N(A) = N(A∗A).

Can you construct a 2× 2 matrix Awhose only eigenvalue is zero, butdim(N(A)) = 1? What is the multiplic-ity of the zero eigenvalue of A∗A?

Now we can make a crucial insight: the dimension of N(A) tellsus how many zero eigenvalues A∗A has. In particular, supposex1, . . . , xn−r is a basis for N(A). Then Axj = 0 implies

A∗Axj = 0, j = 1, . . . , n− r

= 0xj,

and so λ = 0 is an eigenvalue of A∗A of multiplicity n− r.


How do these zero eigenvalues of A∗A affect the singular valuedecomposition? To begin, perform Steps 1 and 2 of the SVD proce-dure just as before.


Since we order the eigenvalues of A∗A so that λ1 ≥ · · · ≥ λn ≥ 0,and we have just seen that zero is an eigenvalue of A∗A of multi-plicity n− r, we must have

λ1 ≥ λ2 ≥ · · · ≥ λr > 0, λr+1 = · · · = λn = 0.

The corresponding orthonormal eigenvectors are v1, . . . , vn, withthe last n− r of these vectors in N(A∗A) = N(A), i.e., Avj = 0.


λj, j = 1, . . . , n.

This step proceeds without any alterations, though now we have

σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σr+1 = · · · = σn = 0.

The third step of the SVD construction needs alteration, since we canonly define the left singular vectors via uj = Avj/σj when σj > 0, thatis, for j = 1, . . . , r. Any choice for the remaining vectors, ur+1, . . . , un,will trivially satisfy the equation Avj = σjuj, since Avj = 0 andσj = 0 for j = r + 1, . . . , n. Since we are building U ∈ Cm×n (andeventually U ∈ Cm×m) to have orthonormal, we will simply build outur+1, . . . , un so that all the vectors u1, . . . , un are orthonormal.

Step 3a. Define uj = Avj/σj for j = 1, . . . , r.

Step 3b. Construct orthonormal vectors ur+1, . . . , un.

For each j = r + 1, . . . , n, construct a unit vector uj such that

uj ⊥ span{u1, . . . , uj−1}.

This procedure is exactly the same as used above to construct thevectors un+1, . . . , um to extend the reduced SVD with U ∈ Cm×n tothe full SVD with U ∈ Cm×m.

If r = 0 (which implies the trivial caseA = 0), just set u1 to be any unit vector.


This step proceeds exactly as before. Now we define

U =

| | | |u1 · · · ur ur+1 · · · un

| | | |

∈ Cm×n,


Σ =

σ1. . .

σr

σr+1. . .

σn

=

σ1. . .

σr

0. . .

0

∈ Cn×n,

and

V =

| | | |v1 · · · vr vr+1 · · · vn

| | | |

∈ Cn×n.

Notice that V is still a square matrix with orthonormal columns, soV∗V = I and V−1 = V∗. Since Avj = σjuj holds for j = 1, . . . , n, weagain have the reduced singular value decomposition

A = UΣV∗.

As before, U ∈ Cm×n can be enlarged to give U ∈ Cn×n by supply-ing extra orthogonal unit vectors that complete a basis for Cm:

uj ⊥ span{u1, . . . , uj−1}, ‖uj‖ = 1, j = n + 1, . . . , m.

Constructing U ∈ Cm×m as in (8.4) and Σ ∈ Cm×n as in (8.5), wehave the full singular value decomposition

A = UΣV∗.

The dyadic decomposition could still be written as

A =n

∑j=1

σjujv∗j ,

but we get more insight if we crop the trivial terms from this sum.Since σr+1 = · · · = σn = 0, we can truncate the decomposition toits first r terms in the sum:

A =r

∑j=1

σjujv∗j .

We will see that this form of A is especially useful for understand-ing the four fundamental subspaces.

8.6 The connection to AA∗

Our derivation of the SVD relied heavily on an eigenvalue decompo-sition of A∗A. How does the SVD relate to AA∗? Consider forming

AA∗ = (UΣV∗)(UΣV∗)∗

= UΣV∗VΣ∗U∗

= UΣΣ∗U∗. (8.7)


Notice that ΣΣ∗ is a diagonal m×m matrix:

ΣΣ∗ =

[Σ

0

][ Σ∗

0 ] =[

Σ2

00 0

],

where we have used the fact that Σ is a diagonal matrix. Indeed,

Σ2=

σ21

. . .σ2

n

=

λ1. . .

λn

,

where the λj values still denote the eigenvalues of A∗A. Thus equa-tion (8.7) becomes

AA∗ = U[

Λ 00 0

]U∗,

which is a diagonalization of AA∗. Postmultiplying this equation byU, we have

(AA∗)U = U[

Λ 00 0

];

the first n columns of this equation give

AA∗uj = λjuj, j = 1, . . . , n,

while the last m− n columns give

AA∗uj = 0uj, j = n + 1, . . . , m.

Thus the columns u1, . . . , un are eigenvectors of AA∗. Notice thenthat AA∗ and A∗A have the same eigenvalues, except that AA∗ hasm− n extra zero eigenvalues.

This suggests a different way to com-pute the U matrix: form AA∗ andcompute all its eigenvectors, givingu1, . . . , um all at once. Thus we avoidthe need for a special procedure toconstruct unit vectors orthogonal tou1, . . . , ur .8.7 Modification for the case of m < n

How does the singular value decomposition change if A has morecolumns than rows, n > m? The answer is easy: write the SVD ofA∗ (which has more rows than columns) using the procedure above,then take the conjugate-transpose of each term in the SVD. If thismakes good sense, skip ahead to the next section. If you prefer thegory details, read on.

We will formally adapt the steps described above to handle thecase n > m. Let r = rank(A) ≤ m.

Step 1. Compute the eigenvalues and eigenvectors of AA∗.

Label the eigenvalues of AA∗ ∈ Cm×m as

λ1 ≥ λ2 ≥ · · · ≥ λm

and corresponding orthonormal eigenvectors as

u1, u2, · · · , um


Step 2. Define σj = ‖A∗uj‖ =√

λj, j = 1, . . . , m.

Step 3a. Define vj = A∗uj/σj for j = 1, . . . , r.

Step 3b. Construct orthonormal vectors vr+1, . . . , vm.

Notice that these vectors only arise in the rank-deficient case,when r < m.

Steps 3a and 3b construct a matrixV ∈ Cn×m with orthonormal columns.

Step 3c. Construct orthonormal vectors vm+1, . . . , vn.

Following the same procedure as step 3b, we construct the extravectors needed to obtain a full orthonormal basis for Cn.


First, defining

U =

| |u1 · · · um

| |

∈ Cm×m, V =

| |v1 · · · vm

| |

∈ Cn×m,

with diagonal matrix

Σ = diag(σ1, . . . , σm) ∈ Cm×m,

we have the reduced SVD

A = UΣV∗.

A

= U Σ V∗

To obtain the full SVD, we extend V to obtain

V =

| | | |v1 · · · vm vm+1 · · · vn

| | | |

∈ Cn×n,

and similarly extend Σ,

Σ =[

Σ 0]∈ Cm×n.

where we have now added extra zero columns, in contrast to theextra zero rows added in the m > n case in (8.5). We thus arrive atthe full SVD,

A = UΣV∗.

A

= U Σ

V∗

8.8 General statement of the singular value decomposition

We now can state the singular value decomposition in its fullestgenerality.


Theorem 8.5 (Singular value decomposition) Suppose A ∈ Cm×n hasrank(A) = r. Then we can write

A = UΣV∗,

where the columns of U ∈ Cm×m and V ∈ Cn×n are orthonormal,

U∗U = I ∈ Cm×m, V∗V = I ∈ Cn×n,

and Σ ∈ Cm×n is zero everywhere except for entries on the main diagonal,where the (j, j) entry is σj, for j = 1, . . . , min{m, n} and

σ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σmin{m,n} = 0.

Denoting the columns of U and V as u1, . . . , um and v1, . . . , vm, we canwrite

A =r

∑j=1

σjujv∗j . (8.8)

Of course, when r = 0 all the singularvalues are zero; when r = min{m, n},all the singular values are positive.

8.9 Connection to the four fundamental subspaces

Having labored to develop the singular value decomposition in itscomplete generality, we are ready to reap its many rewards. We beginby establishing the connection between the singular vectors and the‘four fundamental subspaces,’ i.e., the column space

R(A) = {Ax : x ∈ Cn} ⊆ Cm,

the row spaceR(A∗) = {A∗y : y ∈ Cm} ⊆ Cn,

the null space

N(A) = {x ∈ Cn : Ax = 0} ⊆ Cn,

and the left null space

N(A∗) = {y ∈ Cm : A∗y = 0} ⊆ Cm.

We shall explore these spaces using the dyadic form of theSVD (8.8). To characterize the column space, apply A to a genericvector x ∈ Cn:

Ax =( r

∑j=1

σjujv∗j)

x =r

∑j=1

(σjujv∗j x

)=

r

∑j=1

(σjv∗j x)uj, (8.9)

where in the last step we have switched the order of the scalar v∗j xand the vector uj. We see that Ax is a weighted sum of the vectorsu1, . . . , ur. Since this must hold for all x ∈ Cn, we conclude that

R(A) ⊆ span{u1, . . . , ur}.


Can we conclude the converse? We know that R(A) is a subspace, soif we can show that each of the vectors u1, . . . , ur is in R(A), then wewill know that

span{u1, . . . , ur} ⊆ R(A). (8.10)

To show that uk ∈ R(A), we must find some x such that Ax = uk.Inspect equation (8.9). We can make Ax = uk if all the coefficientsσjv∗j x are zero when j 6= k, and σkv∗k x = 1. Can you see how to useorthogonality of the right singular vectors v1, . . . , vr to achieve this?Setting

x =1σk

vk,

we have Ax = uk. Thus uk ∈ R(A), and we can conclude that (8.10)holds. Since R(A) and span{u1, . . . , ur} contain one another, weconclude that

R(A) = span{u1, . . . , ur}.

We can characterize the row space in exactly the same way, usingthe dyadic form

A∗ =( r

∑j=1

σjujv∗j)∗

=r

∑j=1

(σjujv∗j

)∗=

r

∑j=1

σjvju∗j .

Adapting the argument we have just made leads to

R(A∗) = span{v1, . . . , vr}.

Equation (8.9) for Ax is also the key that unlocks the null spaceN(A). For what x ∈ Cn does Ax = 0? Let us consider

We quietly used (v∗j x)∗ = x∗vj here.If v∗j x is real, (v∗j x)∗ = x∗vj = v∗j x;if v∗j x is complex, we must be more

careful: (v∗j x)∗ = x∗vj = v∗j x, where theline denotes complex-conjugation.

‖Ax‖2 = (Ax)∗(Ax) =( r

∑j=1

(σjv∗j x)uj

)∗( r

∑k=1

(σkv∗k x

)uk

)

=( r

∑j=1

(σjx∗vj)u∗j)( r

∑k=1

(σkv∗k x

)uk

)

=r

∑j=1

r

∑k=1

((σjx∗vj)(

σkv∗k x)u∗j uk

).

Since the left singular vectors are orthogonal, u∗j uk = 0 for j 6= k, thisdouble-sum collapses: only the terms with j = k make a nontrivialcontribution:

‖Ax‖2 =r

∑j=1

(σjx∗vj)(

σjv∗j x)u∗j uj =

r

∑j=1

σ2j |v∗j x|2, (8.11)

since u∗j uj = 1 and (x∗vj)(v∗j x) = |v∗j x|2. If z is complex, then z∗z = zz = |z|2.


Since σj > 0, the right-hand side of (8.11) is the sum of nonnega-tive numbers. To have ‖Ax‖ = 0, all the coefficients in this sum mustbe zero. The only way for that to happen is for

v∗j x = 0, j = 1, . . . , r,

i.e., Ax = 0 if and only if x is orthogonal to v1, . . . , vr. We alreadyhave a characterization of such vectors from the singular value de-composition:

x ∈ span{vr+1, . . . , vn}.

Thus we conclude

N(A) = span{vr+1, . . . , vn}. If r = n, then this span is vacuous, andwe just have N(A) = 0.

To compute N(A∗), we can repeat the same argument based on‖A∗y‖2 to obtain

N(A∗) = span{ur+1, . . . , um}.

Putting these results together, we arrive at a beautiful elaborationof the Fundamental Theorem of Linear Algebra1. 1 Gilbert Strang. The fundamental

theorem of linear algebra. Amer. Math.Monthly, 100:848–855, 1993Theorem 8.6 (Fundamental Theorem of Linear Algebra, SVD Version)

Suppose A ∈ Cm×n has rank(A) = r, with left singular vectors {u1, . . . , um}and right singular vectors {v1, . . . , vn}. Then

R(A) = span{u1, . . . , ur}

N(A∗) = span{ur+1, . . . , um}

R(A∗) = span{v1, . . . , vr}

N(A) = span{vr+1, . . . , vn},

which implies

R(A)⊕N(A∗) = span{u1, . . . , um} = Cm

R(A∗)⊕N(A) = span{v1, . . . , vn} = Cn,

andR(A) ⊥ N(A∗), R(A∗) ⊥ N(A).

8.10 Matrix norms

How ‘large’ is a matrix? We do not mean dimension – but how large,in aggregate, are its entries? One can imagine a multitude of ways tomeasure the entries; perhaps most natural is to sum the squares of


the entries, then take the square root. This idea is useful, but we pre-fer a more subtle alternative that is of more universal utility through-out mathematics: we shall gauge the size A ∈ Cm×n by the maximumamount it can stretch a vector, x ∈ Cn. That is, we will measure ‖A‖by the largest that ‖Ax‖ can be. Of course, we can inflate ‖Ax‖ asmuch as we like simply by making ‖x‖ larger, which we avoid byimposing a normalization: ‖x‖ = 1. We arrive at the definition

‖A‖ = max‖x‖=1

‖Ax‖.

To study ‖Ax‖, we could appeal to the formula (8.11); however, wewill take a slightly different approach. First, suppose that Q is somematrix with orthonormal columns, so that Q∗Q = I. Then

‖Qx‖2 = (Qx)∗(Qx) = x∗Q∗Qx = x∗x = ‖x‖2,

so premultiplying by Q does not alter the norm of x. Now substitutethe full SVD A = UΣV∗ for A:

‖Ax‖ = ‖UΣV∗x‖ = ‖ΣV∗x‖,

where we have used the orthonormality of the columns of U. Nowdefine a new variable y = V∗x (which means Vy = x), and notice that‖x‖ = ‖Vx‖ = ‖y‖, since the columns of V are orthonormal. Now wecan compute the matrix norm:

‖A‖ = max‖x‖=1

‖Ax‖ = max‖x‖=1

‖ΣV∗x‖ = max‖Vy‖=1

‖Σy‖ = max‖y‖=1

‖Σy‖

So the norm of A is the same as the norm of Σ. We now must figureout how to pick the unit vector y to maximize ‖Σy‖. This is easy: wewant to optimize

‖Σy‖2 = σ21 |y1|2 + · · ·+ σ2

r |yr|2

subject to 1 = ‖y‖2 ≥ |y1|2 + · · ·+ |yr|2. Since σ1 ≥ · · · ≥ σr, Alternatively, you could compute ‖Σ‖by maximizing f (y) = ‖Σy‖ subject to‖y‖ = 1 using the Lagrange multipliertechnique from vector calculus.

‖Σy‖2 = σ21 |y1|2 + · · ·+ σ2

r |yr|2

≤ σ21

(|y1|2 + · · ·+ |yr|2) ≤ σ2

1 ‖y‖2 = σ21 ,

resulting in the upper bound

‖Σ‖ = max‖y‖=1

‖Σy‖ ≤ σ1. (8.12)

Will any unit vector y attain this upper bound? That is, can we findsuch a vector so that ‖Σy‖ = σ1? Sure: just take y = [1, 0, · · · , 0]∗ tobe the first column of the identity matrix. For this special vector,

‖Σy‖2 = σ21 |y1|2 + · · ·+ σ2

r |yr|2 = σ21 .


Since |Σy‖ can be no larger than σ1 for any y, and since ‖Σy‖ = σ1

for at least one choice of y, we conclude

‖Σ‖ = max‖y‖=1

‖Σy‖ = σ1,

and hence the norm of a matrix is its largest singular value:

‖A‖ = σ1.

Consider the matrix

A =

[1/2 1−1/2 1

]=

(√2

2

[1 11 −1

]) [√2 0

0√

2/2

] [0 11 0

]∗.

We see from this SVD that ‖A‖ = σ1 =√

2. For this example thevector Ax has the form

Ax = σ1(v∗1x)u1 + σ2(v∗2x)u2

=√

2 x2 u1 +

√2

2x1 u2,

so Ax is a blend of some expansion in the u1 direction and some con-traction in the u2 direction. We maximize the size of Ax by pickingan x for which Ax is maximally rich in u1, i.e., x = v1.

−1 1

−1

1

x1

x2

Every unit vector x in C2 is a pointwhere ‖x‖2 = x2

1 + x22 = 1, so the set

of all such vectors traces out the unitcircle shown in black in the plot above.We highlight two distinguished vectors:x = v1 (blue) and x = v2 (red).

−1 1

−1

1

(Ax)1

(Ax)2

The plot above shows Ax for all unitvectors x, which traces out an ellipsein C2. The vector x = v1 is mapped toAx = σ1u1 (blue), and this is the mostA stretches any unit vector; x = v2 ismapped to Ax = σ2u2 (red), whichgives the smallest value of ‖Ax‖.(Plots like this can be traced out withMATLAB’s eigshow command.)

8.11 Low-rank approximation

Perhaps the most important property of the singular value decompo-sition is its ability to immediately deliver optimal low-rank approxi-mations to a matrix. The dyadic form

A =r

∑j=1

σjujv∗j

writes the rank-r matrix A as the sum of the r rank-1 matrices

σjujv∗j .

Since σ1 ≥ σ2 ≥ · · · ≥ σr > 0, we might hope that the partial sum

k

∑j=1

σjujv∗j

will give a good approximation to A for some value of k that is muchsmaller than r (mathematicians write k � r for emphasis). This isespecially true in situations where A models some low-rank phe-nomenon, but some noise (such as random sampling errors, whenthe entries of A are measured from some physical process) causes A


to have much larger rank. If the noise is small relative to the “true”data in A, we expect A to have a number of very small singular val-ues that we might wish to neglect as we work with A. We will seeexamples of this kind of behavior in the next chapter.

For square diagonalizable matrices, the eigenvalue decompositionswe wrote down in Chapter 6 also express A as the sum of rank-1matrices,

A =n

∑j=1

λjwjw∗j ,

but there are three key distinctions that make the singular value

Here we write wj and wj for the rightand left eigenvectors, to distinguishthem from the singular vectors.

decomposition a better tool for developing low-rank approximationsto A.

1. The SVD holds for all matrices, while the eigenvalue decomposi-tion only holds for square matrices.

2. The singular values are nonnegative real numbers whose ordering

σ1 ≥ σ2 ≥ · · · ≥ σr > 0

gives a natural way to understand how much the rank-1 matricesσjujv

∗j contribute to A. In contrast, the eigenvalues will generally

be complex numbers, and thus do not have the same natural order.

3. The eigenvectors are not generally orthogonal, and this can skewthe rank-1 matrices λjwjw

∗j away from giving good approxima-

tions to A. In particular, we can find that ‖wjw∗j ‖ � 1, whereas

the matrices ujv∗j from the SVD always satisfy ‖ujv

∗j ‖ = 1.

This last point is subtle, so let us investigate it with an example.Consider

A =

[2 1000 1

]with eigenvalues λ1 = 2 and λ2 = 1 and eigenvalue decomposition

A = WΛW−1 =

[1 10 −1/100

] [2 00 1

] [1 1000 −100

]= λ1w1w∗1 + λ2w2w∗2

= 2[

10

][ 1 100 ]

+ 1[

1−1/100

][ 0 −100 ]

= 2[

1 1000 0

]+ 1

[0 −1000 1

].

Let us inspect individually the two rank-1 matrices that appear in theeigendecomposition:

λ1w1w∗1 =

[2 2000 0

], λ2w2w∗2 =

[0 −1000 1

].


Neither matrix individually gives a good approximation to A:

A− λ1w1w∗1 =

[0 −1000 1

], A− λ2w2w∗2 =

[2 2000 0

].

Both rank-1 “approximations” to A leave large errors!

Contrast this situation with the rank-1 approximation σ1u1v∗1 givenby the SVD for this A. To five decimal digits, we have

A = UΣV∗ =[

0.99995 −0.010000.01000 0.99995

] [100.025 0

0 0.020

] [0.01999 0.99980−0.99980 0.01999

]= σ1u1v∗1 + +σ2u2v∗2

= 100.025[

0.999950.01000

][ 0.01999 0.99980 ]

+ 0.020[−0.010000.99995

][−0.99980 0.01999 ]

= 100.025[

0.01999 0.999750.00020 00.00999

]+ 0.020

[0.00999 −0.00020−.99975 0.01999

].

Like the eigendecomposition, the SVD breaks A into two rank-1pieces:

σ1u1v∗1 =

[1.99980 100.000000.01999 0.99960

], σ2u2v∗2 =

[0.00020 0.00000−0.01999 0.00040

].

The first of these, the dominant term in the SVD, gives an excellentapproximation to A:

A− σ1u1v∗1 =

[0.00020 0.00000−0.01999 0.00040

].

The key factor making this approximation so good is that σ1 � σ2.What is more remarkable is that the dominant part of the singularvalue decomposition is actually the best low-rank approximation forall matrices.

Definition 8.3 Let A = ∑rj=1 σrujv

∗j be a rank-r matrix, written in terms

of its singular value decomposition. Then for any k ≤ r, the truncatedsingular value of rank-k is the partial sum

Ak =k

∑j=1

σjujv∗j .

Theorem 8.7 (Schmidt–Mirsky–Eckart–Young) Let A ∈ Cm×n. Thenfor all k ≤ rank(A), the truncated singular value decomposition

Ak =k

∑j=1

σjujv∗j

is a best rank-k approximation to A, in the sense that

‖A−Ak‖ = minrank(X)≤k

‖A− X‖ = σk+1.


It is easy to see that this Ak gives the approximation error σk+1, since

A−Ak =r

∑j=1

σjujv∗j −

k

∑j=1

σjujv∗j =

r

∑j=k+1

σjujv∗j ,

and this last expression is an SVD for the error in the approximationA−Ak. As described in Section 8.10, the norm of a matrix equals itslargest singular value, so

‖A−Ak‖ =∥∥∥∥ r

∑j=k+1

σjujv∗k

∥∥∥∥ = σk+1.

To complete the proof, one needs to show that no other rank-k matrixcan come closer to A than Ak. This pretty argument is a bit too intri-cate for this course, but we include it in the margin for those that areinterested.

Let X ∈ Cm×n be any rank-k matrix.The Fundamental Theorem of LinearAlgebra gives Cn = R(X∗) ⊕ N(X).Since rank(X∗) = rank(X) = k, noticethat dim(N(X)) = n − k. From thesingular value decomposition of Aextract v1, . . . , vk+1, a basis for somek + 1 dimensional subspace of Cn. SinceN(X) ⊆ Cn has dimension n− k, it mustbe that the intersection

N(X) ∩ span{v1, . . . , vk+1}

has dimension at least one. (Otherwise,N(X) ⊕ span{v1, . . . , vk+1} would bean n + 1 dimensional subspace of Cn:impossible!) Let z be some unit vectorin that intersection: ‖z‖ = 1 and

z ∈ N(X) ∩ span{v1, . . . , vk+1}.

Expand z = γ1v1 + · · ·+ γk+1vk+1, sothat ‖z‖ = 1 implies

1 = z∗z =

( k+1

∑j=1

γjvj

)∗( k+1

∑j=1

γjvj

)=

k+1

∑j=1|γj|2.

Since z ∈ N(X), we have

‖A− X‖ ≥ ‖(A− X)z‖ = ‖Az‖,

and then

‖Az‖ =

∥∥∥∥ k+1

∑j=1

σjujv∗j z∥∥∥∥ =

∥∥∥∥ k+1

∑j=1

σjγjuj

∥∥∥∥.

Since σk+1 ≤ σk ≤ · · · ≤ σ1 and the ujvectors are orthogonal,∥∥∥∥ k+1

∑j=1

σjγjuj

∥∥∥∥2≥ σk+1

∥∥∥∥ k+1

∑j=1

γjuj

∥∥∥∥2.

But notice that∥∥∥∥ k+1

∑j=1

γjuj

∥∥∥∥2

2=

k+1

∑j=1|γj|2 = 1,

where the last equality was derivedabove from the fact that ‖z‖2 = 1. Inconclusion, for any rank-k matrix X,

‖A− X‖2 ≥ σk+1

∥∥∥∥ k+1

∑j=1

γjuj

∥∥∥∥2= σk+1.

(This proof is adapted from §3.2.3 ofDemmel’s text.)

James W. Demmel. Applied NumericalLinear Algebra. SIAM, Philadelphia, 1997

8.11.1 Compressing images with low rank approximatoins

Image compression provides the most visually appealing applicationof the low-rank matrix factorization ideas we have just described. Animage can be represented as a matrix. For example, typical grayscaleimages consist of a rectangular array of pixels, m in the vertical direc-tion, n in the horizontal direction. The color of each of those pixelsis denoted by a single number, an integer between 0 (black) and 255(white). (This gives 28 = 256 different shades of gray for each pixel.Color images are represented by three such matrices: one for red, onefor green, and one for blue. Thus each pixel in a typical color imagetakes (28)3 = 224 = 16, 777, 216 shades.)

matlab has many built-in routines for processing images. Theimread command reads in image files. For example, if you want toload the file snapshot.jpg into matlab, you would use the com-mand:

A = double(imread(’snapshot.jpg’));

If your file contains a grayscale image, A will now contain the m× nmatrix containing the gray colors of your image. If you have a colorimage, then A will be an m × n × 3 matrix, and you will need toextract the color levels in an extra step.

Ared = A(:,:,1); Agreen = A(:,:,2); Ablue = A(:,:,3);

The double command converts the entries of the image into floatingpoint numbers. (To conserve memory, MATLAB’s default is to savethe entries of an image as integers, but MATLAB’s linear algebraroutines like svd will only work with floating point matrices.) Finally,to visualize an image in MATLAB, use

imagesc(A)


and, if the image is grayscale, follow this withcolormap(gray)

The imagesc command is a useful tool for visualizing any matrix ofdata; it does not require that the entries in A be integers. (However,for color images stored in m × n × 3 floating point matrices, youneed to use imagesc(uint8(A)) to convert A back to positive integervalues.)

Images are ripe for data compression: Often they contain broadregions of similar colors, and in many areas of the image adjacentrows (or columns) will look quite similar. If the image stored in Acan be represented well by a rank-k matrix, then one can approximateA by storing only the leading k singular values and vectors. To buildthis approximation

Ak =k

∑j=1

σjujv∗j ,

one need only store k(1 + m + n) values. When k(1 + m + n) � mn,there will be a significant savings in storage, thus giving an effectivecompression of A.

Let us look at an example to see how effective this image com-pression can be. For convenience we shall use an image built intomatlab,

load gatlin, A = X;

imagesc(A), colormap(gray)

which shows some of the key developers of the numerical linear alge-

original uncompressed image, rank = 480 Figure 8.1: A sample image: thefounders of numerical linear algebraat an early Gatlinburg Symposium.From left to right: Jim Wilkinson,Wallace Givens, George Forsythe,Alston Householder, Peter Henrici, andFriedrich Bauer.


bra algorithms we have studied this semester, gathered in Gatlinburg,Tennessee, for an important early conference in the field. The imageis of size 480× 640, so rank(A) ≤ 480. We shall compress this imagewith truncated singular value decompositions. Figures 8.2 and 8.3show compressions of A for dimensions ranging from k = 200 downto k = 1. For k = 200 and 100, the compression Ak provides an ex-cellent proxy for the full image A. For k = 50, 25 and 10, the qualitydegrades a bit, but even for k = 10 you can still tell that the imageshows six men in suits standing on a patterned floor. For k ≤ 5 welose much of the quality, but isn’t it remarkable how much structureis still apparent even when k = 5? The last image is interesting as

truncated SVD, rank k = 200 truncated SVD, rank k = 100


Figure 8.2: Compressions of the Gatlin-burg image in Figure 8.1 using trun-cated SVDs Ak = ∑k

j=1 σjujv∗j . Each

of these images can be stored with lessmemory than the original full image.




Figure 8.3: Continuation of Figure 8.3,showing compressions of the Gatlin-burg image via truncated SVDs of rank10, 5, 2, and 1. The rank-10 image mightbe useful as a “thumbnail” sketch ofthe image (e.g., an icon on a computerdesktop), but the other images arecompressed beyond the point of beinguseful.

a visualization of a rank-1 matrix: each row is a multiple of all theother rows, and each column is a multiple of all the other columns.

We gain an understanding of the quality of this compression bylooking at the singular values of A, shown in Figure 8.4. The first sin-gular value σ1 is a about an order of magnitude larger than the rest,and the singular values decay quite rapidly. (Notice the logarithmicvertical axis.) We have σ1 ≈ 15, 462, while σ50 ≈ 204.48. When wetruncate the singular value decomposition at k = 50, the neglectedterms in the singular value decomposition do not make a major con-tribution to the image.


Figure 8.4: Singular values of the 640×480 Gatlinburg image matrix. The firstfew singular values are much largerthan the rest, suggesting the potentialfor accurate low-rank approximation(compression).

student experiments

8.1. The two images shown in Figure 8.5 show characters gen-erated in the MinionPro italic font, defined in the image filesminionamp.jpg and minion11.jpg, each of which leads to a 200×200 matrix. Which image do you think will better lend itself tolow-rank approximation? Compute the singular values and trun-cated SVD approximations for a variety of ranks k. Do your resultsagree with your intuition?

Figure 8.5: Two images showingcharacters in the italic MinionPro font.(The first is an ampersand, which onecan clearly see here derives from theLatin word et, meaning “and.”)

8.12 Special Topic: Reducing dimensions with POD

8.13 Afterword

The singular value decomposition was developed in its initial formby Eugenio Beltrami (1873) and, independently, by Camille Jor-dan (1874).2 2 G. W. Stewart. On the early history

of the singular value decomposition.SIAM Review, 35:551–566, 1993

eign cha

Documents

Transcript of eign cha