Lecture 3 Eigenvalues

November 22, 2010

EIGENVALUES AND EIGENVECTORS

RODICA D. COSTIN

Contents

1. An example: linear differential equations 22. Eigenvalues and eigenvectors: definition and calculation 32.1. Definitions 32.2. The characteristic equation 42.3. Geometric interpretation of eigenvalues and eigenvectors 42.4. Digression: the fundamental theorem of algebra 52.5. Solutions of the characteristic equation 62.6. Diagonal matrices 72.7. Similar matrices have the same eigenvalues 72.8. Projections 82.9. Trace, determinant and eigenvalues 82.10. Eigenvectors corresponding to different eigenvalues are

independent 92.11. Diagonalization of matrices with linearly independent

eigenvectors 102.12. Real matrices with complex eigenvalues: decomplexification 112.13. Eigenspaces 122.14. Multiple eigenvalues, Jordan normal form 123. Solutions of linear differential equations with constant coefficients 153.1. Diagonalizable matrix 153.2. Repeated eigenvalues 183.3. Higher order linear differential equations; companion matrix 203.4. Stability in differential equations 224. Difference equations (Discrete dynamical systems) 274.1. Linear difference equations with constant coefficients 274.2. Solutions of linear difference equations 284.3. Stability 294.4. Why does the theory of discrete equations parallel that of

differential equations? 294.5. Example: Fibonacci numbers 314.6. Positive matrices 324.7. Markov chains 325. More functional calculus 345.1. Functional calculus for digonalizable matrices 34

1

2 RODICA D. COSTIN

5.2. Commuting matrices 37

1. An example: linear differential equations

One important motivation for the study of eigenvalues is solving lineardifferential equations.

Consider the simple equation

du

dt= λu

which is linear, homogeneous, with constant coefficients, and unknown u ofdimension one. Its general solution is, as it is well known, u(t) = Ceλt.

Consider next an equation which is also linear, homogeneous, with con-stant coefficients, but the unknown u has dimension two:

(1)

du1dt = (4a1 − 3a2)u1 + (−2a1 + 2a2)u2

du2dt = (6a1 − 6a2)u1 + (−3a1 + 4a2)u2

where a1, a2 are constants, or, in matrix form,

(2)du

dt= Mu

where

(3) M =

[4a1 − 3a2 −2a1 + 2a2

6a1 − 6a2 −3a1 + 4a2

], u =

[u1

u2

]For the system of equations (1), the solution is not immediately clear.

Inspired by the one dimensional case we can look for exponential solutions.Substituting u(t) = eλtx (where x is a constant vector) in (2) we obtainthat the scalar λ and the vector x must satisfy

(4) λx = Mx

or

(5) (M − λI)x = 0

If the null space of the matrix M − λI is zero, then the only solutionof (5) is x = 0 which gives the (trivial!) solution identically zero for thedifferential system (2).

If however, we can find special values of λ for which N (M − λI) is notnull, then we found a nontrivial solution of (2). Such values of λ are calledeigenvalues of the matrix M , and vectors x ∈ N (M − λI) are calledeigenvectors corresponding to the eigenvalue λ.

EIGENVALUES AND EIGENVECTORS 3

Of course, the necessary and sufficient condition for N (M − λI) 6= {0} isthat

(6) det(M − λI) = 0

Solving for (3) gives λ1 = a1 and λ2 = a2. For each eigenvalue we needthe corresponding eigenvectors.

Substituting λ = a1 (then a2) in (4) and solving for x we obtain that theeigenvector x is any scalar multiple of x1 (respectively x2) where

(7) x1 =

[23

], x2 =

[12

]We obtained the solutions u1(t) = ea1tx1, and u2(t) = ea2tx2 of (1).

Since the system is linear, then any linear combination of solutions is againa solution, and we found the general solution of (1) as

(8) u(t) = C1u1(t) + C2u2(t) = C1ea1tx1 + C2e

a2tx2

To complete the problem, we should also check that for any initial condi-tions we can find constants C1, C2 for which the solution (8) satisfies theseinitial data. However, we leave this discussion for a course in differentialequations. Furthermore, we will see in what follows that by carefully study-ing the algebraic properties of eigenvalues and eigenvectors this informationcomes for free!

2. Eigenvalues and eigenvectors: definition and calculation

2.1. Definitions. Denote the set of n×n (square) matrices with entires inF

Mn(F ) = {M |M = [Mij ]i,j=1,...n, Mij ∈ F}

A matrix M ∈Mn(F ) defines a linear transformation on the vector spaceFn(over the scalars F ) by usual multiplication x 7→Mx.

Note that a matrix with real entries can also act on Cn, since for anyx ∈ Cn we can define Mx ∈ Cn. But a matrix with complex non real entriescannot act on Rn, since for x ∈ Rn the image Mx may not belong to Rn(while certainly Mx ∈ Cn).

Definition 1. Let M be an n×n matrix acting on the vector space V = Fn.A scalar λ ∈ F is an eigenvalue of M if for some nonzero vector x ∈ V ,

x 6= 0 we have

(9) Mx = λx

The vector x is called eigenvector corresponding to the eigenavlue λ.

4 RODICA D. COSTIN

2.2. The characteristic equation. Equation (9) can be rewritten asMx−λx = 0, or (M − λI)x = 0, which means that the nonzero vector x belongsto the null space of the matrix M − λI, and in particular this matrix is notinvertible. Using the theory of matrices, we know that this is equivalent tothe facts that its determinant is zero:

det(M − λI) = 0

The determinant has the form

det(M − λI) =

∣∣∣∣∣∣∣∣∣M11 − λ M12 . . . M1n

M21 M22 − λ . . . M2n...

......

Mn1 Mn2 . . . Mnn − λ

∣∣∣∣∣∣∣∣∣where the polynomial form is easy to see by expansion of the determinant(and rigorously proved by induction on n).

Examples.For n = 2 the characteristic polynomial is∣∣∣∣ M11 − λ M12

M21 M22 − λ

∣∣∣∣ = (M11−λ) (M22−λ)−M12M21 = polynomial in λ of degree 2

For n = 3 the characteristic polynomial is∣∣∣∣∣∣M11 − λ M12 M13

M21 M22 − λ M23

M13 M23 M33 − λ

∣∣∣∣∣∣and expanding along say, row 1,

= (−1)1+1 (M11−λ)

∣∣∣∣ M22 − λ M23

M23 M33 − λ

∣∣∣∣+(−1)1+2M12

∣∣∣∣ M21 M23

M13 M33 − λ

∣∣∣∣+(−1)1+3M13

∣∣∣∣ M21 M22 − λM13 M23

∣∣∣∣ = polynomial in λ of degree 3

It is easy to show by induction that

det(M − λI) = polynomial in λ of degree n

Definition 2. The polynomial det(M − λI) is called the characteristicpolynomial of the matrix M , and the equation det(M − λI) = 0 is calledthe characteristic equation of M .

2.3. Geometric interpretation of eigenvalues and eigenvectors. LetM be an n × n matrix, and T : Rn → Rn defined by T (x) = Mx be thecorresponding linear transformation.

If v is an eigenvector corresponding to an eigenvalue λ of M : Mv = λv,then T expands or contracts v (and any vector in its direaction) λ times(and it does not change its direction!).

If the eigenvalue/vector are not real, a similar fact is true, only thatmultiplication by a complex (not real) scalar cannot be easily called an


expansion or a contraction (there is no ordering in complex numbers), seethe example of rotations, §2.5.1.

The special directions of the eigenvectors are called principal axes of thelinear transformation (or of the matrix).

2.4. Digression: the fundamental theorem of algebra.

2.4.1. Polynomials of degree two: roots and factorization. Consider polyno-mials of degree two, with real coefficients: p(x) = ax2 + bx + c. It is well

known that p(x) has real solutions x1,2 =−b±

√b2 − 4ac

2aif b2 − 4ac ≥ 0

(where x1 = x2 when b2 − 4ac = 0), or p(x) has no real solutions ifb2 − 4ac ≥ 0.

When the solutions are real, then the polynomial factors as ax2 +bx+c =a(x− x1)(x− x2).

In particular, if x1 = x2 then p(x) = a(x− x1)2 and x1 is called a doubleroot. It is convenient to say that also in this case p(x) has two roots.

Note: if we know the two zeroes of the polynomial, then we know eachfactor of degree one: x− x1; the number a in front of (x− x1)(x− x2) takescare that the coefficient of x2 equals a.

On the other hand, if p(x) has no real roots, then it cannot be factoredwithin real numbers, it is called irreducible (over the real numbers).

2.4.2. Complex numbers and factorization of polynomials of degree two. Ifp(x) = ax2 +bx+c is irreducible this means that b2−4ac < 0 and we cannottake the square root of this quantity in order to calculate the two roots ofp(x). However, writing b2 − 4ac = (−1) (−b2 + 4ac) and introducing thesymbol i for

√−1 we can write the zeroes of p(x) as

x1,2 =−b± i

√−b2 + 4ac

2a=−b2a± i√−b2 + 4ac

2a∈ R + iR = C

Considering the two roots x1, x2 complex, we can now factor ax2+bx+c =a(x − x1)(x − x2), only now the factors have complex coefficients. Withincomplex numbers every polynomial of degree two is reducible!

Note that the two roots of a quadratic polynomial with real coefficientsare complex conjugate: if a, b, c ∈ R and x1,2 6∈ R then x2 = x1.

2.4.3. The fundamental theorem of algebra. It is absolutely remarkable thatany polynomial can be completely factored using complex numbers:

Theorem 3. The fundamental theorem of algebraAny polynomial p(x) = anx

n+an−1xn−1 + . . .+a0 with coefficients aj ∈ C

can be factored

(10) anxn + an−1x

n−1 + . . .+ a0 = an(x− x1)(x− x2) . . . (x− xn)

for a unique set of complex numbers x1, x2, . . . , xn (the roots of the polyno-mial p(x)).

6 RODICA D. COSTIN

Remark. With probability one, the zeroes x1, . . . , xn of polynomials p(x)are distinct. Indeed, if x1 is a double root (or has higher multiplicity) thenboth relations p(x1) = 0 and p′(x1) = 0 must hold. This means that there isa relation between the coefficients a0, . . . an of p(x) (the multiplet (a0, . . . an)belongs to an n dimensional surface in Cn+1.)

2.4.4. Factorization within real numbers. If we want to restrict ourselvesonly within real numbers then we can factor any polynomial into factors ofdegree one or two:

Theorem 4. Factorization within real numbersAny polynomial of degree n with real coefficients can be factored into fac-

tors of degree one or two with real coefficients.

Theorem 4 is an easy consequence of the (deep) Theorem 3. Indeed, first,factor the polynomial in complex numbers (10). Then note that the zeroesx1, x2, . . . , xn come in pairs of complex conjugate numbers, since if z satisfiesp(z) = 0, then also its complex conjugate z satisfies p(z) = 0. Then eachpair of factors (x−z)(x−z) must be replaced in (10) by its expanded value:

(x− z)(x− z) = x2 − (z + z)x+ |z|2

which is an irreducible polynomial of degree 2, with real coefficients. 2

2.5. Solutions of the characteristic equation. Let M be an n× n ma-trix. Its characteristic equation, det(M − λI) = 0 is a polynomial of degreen, therefore it has n solutions in C (some of them may be equal).

If M is considered acting on vectors in Cn then this is just fine, but if Mis a matrix with real entries acting on Rn, then eigenvectors correspondingto complex nonreal eigenvalues are also nonreal.

For an n × n matrix with real entries, if we want to have n guaranteedeigenvalues, then we have to accept working in Cn. Otherwise, if we want torestrict ourselves to working only with real vectors, then we have to acceptthat we may have fewer (real) eigenvalues, or perhaps none, as in the caseof rotations in R2, shown below.

2.5.1. Rotation in the plane. Consider a rotation matrix

Rθ =

[cos θ − sin θsin θ cos θ

]To find its eigenvalues calculate

det(Rθ−λI) =

∣∣∣∣ cos θ − λ − sin θsin θ cos θ − λ

∣∣∣∣ = (cos θ−λ)2+sin2 θ = λ2−2 cos θ+1

hence the solutions of the characteristic equations det(Rθ − λI) = 0 areλ1,2 = cos θ ± i sin θ = e±iθ. It is easy to see that v1 = (i, 1)T is the

eigenvector corresponding to λ1 = eiθ and v2 = (−i, 1)T is the eigenvectorcorresponding to λ2 = e−iθ.


Note that both eigenvalues are not real (in general) and, since the entriesof Rθ are real, then λ2 = λ1: the two eigenvalues are complex conjugate.Also, so are the two eigenvectors: v2 = v1.

2.6. Diagonal matrices. Let D be a diagonal matrix:

(11) D =

d1 0 . . . 00 d2 . . . 0...

......

0 0 . . . dn

Diagonal matrices are associated to dilations in Rn (or Cn): each axis xj

is stretched (or compressed) dj times.To find its eigenvalues, calculate

det(D−λI) =

∣∣∣∣∣∣∣∣∣d1 − λ 0 . . . 0

0 d2 − λ . . . 0...

......

0 0 . . . dn − λ

∣∣∣∣∣∣∣∣∣ = (d1−λ1)(d2−λ2) . . . (dn−λn)

The eigenvalues are precisely the diagonal elements, and the eigenvectorcorresponding to dj is ej (as it is easy to check). The principal axes ofdiagonal matrices the coordonate axes x1, . . . ,xn.

Diagonal matrices are easy to work with: you can easily check that anypower Dk of D is the diagonal matrix having dkj on the diagonal.

Furthermore, let p(x) be a polynomial

p(t) = aktk + ak−1t

k−1 + . . .+ a1t+ a0

then for any matrix n× n M it makes sense to define p(M) as

p(M) = akMk + ak−1M

k−1 + . . .+ a1M + a0I

If D is a diagonal matrix (11) then p(D) is the diagonal matrix havingp(dj) on the diagonal. (Check!)

Diagonal matrices can be viewed as the collection of their eigenvalues!

2.7. Similar matrices have the same eigenvalues. It is very easy towork with diagonal matrices and a natural question arises: which lineartransformations have a diagonal matrix in a well chosen basis? This is themain topic we will be exploring for many sections to come.

Recall that if the matrix A represents the linear transformation T : V →V in some basis BV of V , and the matrix A′ represents the same lineartransformation T , only in a different basis B′V , then the two matrices aresimilar: A′ = SAS−1 (where S the the matrix of change of basis).

Eigenvalues are associated to the linear transformation (rather than itsmatrix representation):

8 RODICA D. COSTIN

Proposition 5. Two similar matrices have the same eigenvalues: if A,A′, Sare n × n matrices, and A′ = SAS−1 then the eigenvalues of A′ and of Aare the same.

This is very easy to check, since

det(A′ − λI) = det(SAS−1 − λI) = det[S(A− λI)S−1

]= detS det(A− λI) detS−1 = det(A− λI)

so A′ and A have the same characteristic equation. 2

2.8. Projections. Recall that projections do satisfy P 2 = P (we saw thisfor projections in dimension two, and we will prove it in general).

Proposition 6. Let P be a square matrix satisfying P 2 = P . Then theeigenvalues of P can only be are zero or one.

Proof. Let λ be an eigenvalue; this means that there is a nonzero vectorv so that Pv = λv. Applying P to both sides of the equality we obtainP 2v = P (λv) = λPv = λ2v. Using the fact that P 2v = Pv = λv it followsthat λv = λ2v so (λ − λ2)v = 0 and since v 6= 0 then λ − λ2 = 0 soλ ∈ {0, 1}. 2

Example. Consider the projection of R3 onto the x1x2 plane. Its matrix

P =

1 0 00 1 00 0 0

is diagonal, with eigenvalues 1, 1, 0.

2.9. Trace, determinant and eigenvalues.

Definition 7. Let M be an n × n matrix, M = [Mij ]i,j=1,...,n. The traceof M is the sum of its elements on the principal diagonal:

TrM =n∑j=1

Mjj

The following theorem shows that the trace and the determinant of anymatrix can be found as if the matrix would be diagonal (its entries being,of course, its eigenvalues):

Theorem 8. Let M be an n×n matrix, an let λ1, . . . , λn be its n eigenvalues(complex, with multiplicities counted). Then

(12) detM = λ1λ2 . . . λn

and

(13) TrM = λ1 + λ2 + . . .+ λn

In particular, the traces of similar matrices are equal, and so are theirdeterminants.


Corollary 9. A matrix M is invertible if and only if all its eigenvalues arenot zero.

Sketch of the proof.On one hand it is not hard to see that for any factored polynomial

(14) p(x) ≡ (x− λ1)(x− λ2) . . . (x− λn)

= xn − (λ1 + λ2 + . . .+ λn)xn−1 + . . .+ (−1)n(λ1λ2 . . . λn)

In particular, p(0) = (−1)nλ1λ2 . . . λn.On the other hand, let p(x) = (−1)n det(M − λI) (recall that the coeffi-

cient of λn in the characteristic equation is (−1)n) then p(0) = (−1)n detMwhich proves (12).

To show (13) we expand the determinant det(M − λI) using minors andcofactors keeping track of the coefficient of λn−1. As seen on the examplesin §2.2, only the first term in the expansion contains the power λn−1, andcontinuing to expand to lower and lower dimensional determinants, we seethat the only term containing λn−1 is

(M11 − λ)(M22 − λ) . . . (Mnn − λ)

= (−1)nλn − (−1)n(M11 +M22 + . . .+Mnn)λn−1 + lower powers of λ

which compared to (14) gives (13) 2

2.10. Eigenvectors corresponding to different eigenvalues are inde-pendent.

Theorem 10. M be an n × n matrix Let M be an n × n matrix, andλ1, . . . λk a set of distinct eigenvalues (λj 6= λi for all i, j = 1, . . . k). Let vjbe eigenvectors corresponding to the eigenvalues λj for j = 1, . . . , k.

Then the set v1, . . . ,vj is linearly independent.In particular, is all the eigenvalues of M are in F and are distinct, then

the set of corresponding eigenvectors form a basis for Fn.

Proof.Assume, to obtain a contradiction, that the eigenvectors are linearly de-

pendent: there are c1, . . . , ck ∈ F not all zero such that

(15) c1v1 + . . .+ ckvk = 0

Step I. We can assume that all cj are not zero, otherwise we just removethose vj from (15) and we have a similar equation with a smaller k. Thenwe can solve (15) for vk:

(16) vk = c′1v1 + . . .+ c′k−1vk−1

where c′j = −cj/ck.Applying M to both sides of (16) we obtain

(17) λkvk = c′1λ1v1 + . . .+ c′k−1λk−1vk−1

10 RODICA D. COSTIN

Multiplying (16) by λk and subtracting from (17) we obtain

(18) 0 = c′1(λ1 − λk)v1 + . . .+ c′k−1(λk−1 − λk)vk−1

Step II. There are two possibilities.1) Either all c′k−1(λk−1 − λk) are zero, which implies that all c′1, . . . , c

′k−1

are zero, therefore all c1, . . . , ck−1 are zero, which from (15) implies thatckvk = 0, and since vk 6= 0 then also ck = 0 which is a contradiction.

Or 2) not all c′j are zero, in which case we go to Step I. with a lower k.The procedure decreases k, therefore it must end, and we have a contra-

diction. 2

2.11. Diagonalization of matrices with linearly independent eigen-vectors. Suppose that the M be an n×n matrix has n independent eigen-vectors v1, . . . ,vn.

Note that, by Theorem 10, this is the case if we work in F = C and allthe eigenvalues are distinct (recall that this happens with probability one).Also this is the case if we work in F = R and all the eigenvalues are realand distinct.

Let S be the matrix with columns v1, . . . ,vn:

S = [v1, . . . ,vn]

which is invertible, since v1, . . . ,vn are linearly independent.Since Mvj = λjvj then

(19) M [v1, . . . ,vn] = [λ1v1, . . . , λnvn]

The right side of (19) equals MS. To identify the matrix on the left sideof (19) note that since Sej = vj then S(λjej) = λjvj so

[λ1v1, . . . , λnvn] = S[λ1e1, . . . , λnen] = SΛ, where Λ =

λ1 0 . . . 00 λ2 . . . 0...

......

0 0 . . . λm

Relation (19) is therefore

MS = SΛ, or S−1MS = Λ = diagonal

Note that the matrix S which diagonalizes a matrix is not unique. Forexample, we can replace any eigenvector by a scalar multiple of it. Also, wecan use different orders for the eigenvectors (this will result on a diagonalmatrix with the same values on the diagonal, but in different positions.

Example 1. Consider the matrix (3), for which we found the eigenvaluesλ1,2 = a1,2 and the corresponding eigenvectors (7). Taking

S =

[2 13 2

]


we have

S−1MS =

[a1 00 a2

]Not all matrices are diagonalizable, only those with distinct eigenvalues,

and some matrices with multiple eigenvalues.Example 2. The matrix

(20) N =

[0 10 0

]has eigenvalues λ1 = λ2 = 0 (easy to calculate) but only one (up to a scalarmultiple) eigenvector v1 = e1.

Multiple eigenvalues are not guaranteed to have an equal number of in-dependent eigenvectors!

To show that N is not diagonalizable, assume the contrary, to arriveto a contradiction. Suppose there exists an invertible matrix S so thatS−1NS = Λ where Λ is diagonal, hence it has the eigenvalues of N on itsdiagonal, and therefore it is the zero matrix: S−1NS = 0, which multipliedby S to the left and S−1 to the right gives N = 0, which is a contradiction.

Remark about the zero eigenvalue. A matrix M has an eigenvalueequal to zero if and only if its null space N (M) is nontrivial. Moreover, thematrix M has dimN (M) eigenvectors linearly independent which correspondto the eigenvalue zero.

Example 3. Examples of matrices that can be diagonalized even if theyhave multiple eigenvalues can be found as M = SΛS−1 where Λ is diagonal,with some entries equal, and S is any invertible matrix.

Remark. If a matrix has real entries, and distinct eigenvalues some ofwhich are nor real, we cannot diagonalize within real numbers, but we cando it using complex numbers and complex vectors, see for example rotations,§2.5.1.

2.12. Real matrices with complex eigenvalues: decomplexification.Suppose the n × n matrix M has real elements, eigenvalues λ1, . . . , λn andn independent eigenvectors v1, . . . ,vn. Then M is diagonalizable: if S =[v1, . . . ,vn] then S−1MS = Λ where Λ is a diagonal matrix with λ1, . . . , λnon its diagonal.

Suppose that some eigenvalues are not real. Then the matrices S and Λare not real either, and the diagonalization of M must be done in Cn. Recallthat the nonreal eigenvalues and eigenvectors of real matrices come in pairsof complex-conjugate ones.

Suppose that we want to work in Rn only. This can be achieved byreplacing diagonal 2× 2 blocks in Λ[

λj 0

0 λj

]by a 2× 2 matrix which is not diagonal, but has real entries.

12 RODICA D. COSTIN

To see how this is done, suppose λ1 ∈ C \ R and λ2 = λ1, v2 = v1.Splitting into real and imaginary parts, write λ1 = α1+iβ1 and v1 = x1+iy1.Then from M(x1 + iy1) = (α1 + iβ1)(x1 + iy1) identifying the real andimaginary parts, we obtain

Mx1 + iMy1 = (α1x− β1y) + i(β1x + α1y)

Then in the matrix S for diagonalization, replace the first two columnsv1,v2 = v1 by x1,y1 (which are vectors in Rn): using the matrix S =

[x1,y1,v3, . . . ,vn] instead of S we have MS = SΛ where

Λ =

α1 −β1 0 . . . 0β1 α1 0 . . . 00 0 λ3 . . . 0...

......

...0 0 0 . . . λm

We can similarly replace any pair of complex conjugate eigenvalues with

2× 2 real blocks.

2.13. Eigenspaces. Consider an n × n matrix M with entries in F , witheigenvalues λ1, . . . , λn in F .

Definition 11. The set

Vλj = {x ∈ Fn |Mx = λjx}

is called the eigenspace of M associated to the eigenvalue λj.

Exercise. Show that Vλj is the null space of the transformation M − λIand that Vλj is a subspace of Fn.

Note that all the nonzero vectors in Vλj are eigenvectors of M correspond-ing to the eigenvalues λj .

Definition 12. A subspace V is called an an invariant subspace for Mif M(V ) ⊂ V (which means that if x ∈ V then Mx ∈ V ).

Exercise. Show that Vλj is an invariant subspace for M . Show thatVλj ∩ Vλk = {0} if λj 6= λk.

Denote by λ1, . . . , λk the distinct eigenvalues ofM (in other words, {λ1, . . . , λk} ={λ1, . . . , λn}).

Show that M is diagonalizable in Fn if and only if Vλ1 ⊕ . . .⊕ Vλk = Fn

and explain why the sum before is a direct sum.

2.14. Multiple eigenvalues, Jordan normal form. We noted in §2.13that a matrix is similar to a diagonal matrix if and only if the dimensionof each eigenspace Vλj equals the order of multiplicity of the eigenvalue λj .Otherwise, there are lees than n independent eigenvectors; such a matrix iscalled defective.


2.14.1. Jordan normal form. Defective matrices can not be diagonalized,but we will see that they are similar to almost diagonal matrices, called Jor-dan normal form, which have the eigenvalues on the diagonal, 1 in selectedplaced above the diagonal, and zero in the rest. In §2.14.2 we will see howto construct the matrix S which conjugates a defective matrix to its Jordannormal form.

Example 0. The matrix (20) is defective, and it is in Jordan normalform.

Example 1. The matrix [λ+ 1

313

13 λ+ 1

3

]

is defective: it has eigenvalues λ, λ and only one independent eigenvector,(1, 1)T . The matrix is not similar to a diagonal matrix (is not diagonaliz-able), but we will see that it is similar to a 2×2 matrix with the eigenvaluesλ, λ on the diagonal, 1 above the diagonal, and zeroes in rest, the Jordanblock:

(21) J2(λ) =

[λ 10 λ

]

Example 2. Suppose M is a 3×3 matrix with characteristic polynomialdet(M −ΛI) = −(λ− 2)3. Therefore M has the eigenvalues 2, 2, 2. Supposethat M has only one independent vector v (the dimension of the eigenspaceV2 is 1. This matrix is defective, it cannot be diagonalized, but it is similarto the 3× 3 Jordan block

(22) J3(2) =

2 1 0

0 2 1

0 0 2

In general, if an eigenvalue λ is repeated p times, but there are only k < p

independent eigenvectors corresponding to the eigenvalue λ, then the Jordannormal form will have a k − 1 dimensional diagonal block having λ on thediagonal and Jordan block Jp−k+1(λ) (p − k + 1 dimensional, having λ onthe diagonal, 1 above the diagonal, and zeroes in rest).

Example 3. Suppose M is a 5×5 matrix with characteristic polynomialdet(M−ΛI) = −(λ−2)4(λ−3). Therefore M has the eigenvalues 3, 2, 2, 2, 2.Denote by v1 the eigenvector corresponding to the eigenvalue 3. Supposethat M has only two linearly independent vectors v2,v3 associated to theeigenvalue 2 (the dimension of the eigenspace V2 is 2, while the multiplicity

14 RODICA D. COSTIN

of the eigenvalue 2 is four). This matrix is similar to its Jordan normal form

S−1MS =

3 0 0 0 0

0 2 0 0 0

0 0 2 1 0

0 0 0 2 1

0 0 0 0 2

Note that the diagonal contains all the eigenvalues. Note the blocks thatform this matrix:

3 | 0 0 0 0− −0 | 2 | 0 0 0

− − −0 0 | 2 1 0

0 0 | 0 2 1

0 0 | 0 0 2

All the blocks are along the diagonal. There are two 1 × 1 blocks, cor-

responding the the eigenvectors v1 and v2, and one 3 × 3 block for theeigenvalue 2, which is defective by two eigenvectors.

In general defective matrices are similar to a matrix which is block-diagonal, having Jordan blocks on its diagonal.

2.14.2. Generalized eigenvectors. If M is a defective matrix, the matrix Swhich conjugated M to a Jordan normal form is constructed as follows. Firstchoose a basis for each eigenspace Vλj ; these vectors will form a linearlyindependent set. For each eigenvalue for which dimVλj is smaller than theorder of multiplicity of λj we add a number of independent vectors, calledgeneralized eigenvectors constructed as follows.

Let λ be a defective eigenvalue. Let v be an eigenvector corresponding toλ. Let

(23) x1 = v

and solving iteratively

(24) (M − λI)xk = xk−1, k = 2, 3, . . .

Note that

0 = (M − λI)v = (M − λI)x1 = (M − λI)2x2 = (M − λI)3x3 = . . .

This produces enough linearly independent vectors:


Lemma 13. Assume that the characteristic polynomial of the matrix M hasthe root λ with multiplicity p, but the dimension of the eigenspace Vλ equalsk with k < p (this means that M has only k < p linearly independent vectorsassociated to the eigenvalue λ).

Then x1, . . . ,xp−k+1 given by (23), (24) are linearly independent.

We will not include here the proof of the Lemma.Example.The matrix

M =

1 −2 3

1 2 −1

0 −1 3

has eigenvalues 2, 2, 2 and only one independent eigenvector v = (1, 1, 1)T .

Let x1 = (1, 1, 1)T . Then solving (M − 2I)x2 = x1 we obtain x2 =(1,−1, 0)T and solving (M − 2I)x3 = x2 we obtain x3 = (0, 1, 1)T . ForS = [x1,x2,x3] we have

(25) S−1MS =

2 1 0

0 2 1

0 0 2

3. Solutions of linear differential equations with constant

coefficients

In §1 we saw an example which motivated the theory of eigenvalues andeigenvectors. We can now solve in a similar way general linear first ordersystems of differential equations with constant coefficients:

(26)du

dt= Mu

where M is an m×m constant matrix and u in an m-dimensional vector.

3.1. Diagonalizable matrix. Assume that M is diagonalizable: there arem independent eigenvectors v1, . . . ,vm. As in §1 we find m purely expo-nential solutions uj(t) = eλjtvj for j = 1, . . . ,m where vj is the eigenvectorassociated to the eigenvalue λj of M .

3.1.1. Fundamental matrix solution. Since the differential system is linear,then any linear combination of solutions is again a solution:

(27) u(t) = a1u1(t) + . . .+ amum(t)

= a1eλ1tv1 + . . .+ ame

λmtvm, aj arbitrary constants

The matrix U(t) = [u1(t), . . . ,um(t)] is called a fundamental matrix so-lution. The formula (27) is

(28) u(t) = U(t)a, where a = (a1, . . . , am)T

16 RODICA D. COSTIN

The constants a are found so that the initial condition is satisfied: u(0) = u0

implies U(0)a = u0 therefore a = U(0)−1u0 (the matrix U(0) = [v1, . . . ,vm]is invertible since v1, . . . ,vm are linearly independent).

In conclusion, any initial problem has a unique solution and (28) is thegeneral solutions of the differential system.

3.1.2. The matrix eMt. In fact it is preferable to work with a matrix ofindependent solutions rather than with a set of independent solutions. Notethat the m×m matrix U(t) satisfies

(29)d

dtU(t) = M U(t)

If the differential equation (29) is a scalar one (were U and M werescalars), then U(t) = eMt is a solution and U(t) = eMta is the generalsolution. Can we define the exponential of a matrix?

Recall that the exponential function is the sum of its power series, whichconverges for all x ∈ C:

ex = 1 +1

1!x+

1

2!x2 + . . .+

1

n!xn + . . . =

∞∑n=0

1

n!xn

We can then define for any matrix M the exponential etM using the series

(30) etM = 1 +1

1!tM +

1

2!t2M2 + . . .+

1

n!tnMn + . . . =

∞∑n=0

1

n!tnMn

provided we show that the series converges. If, furthermore, we can differ-entiate term by term, then this matrix is a solution of (29) since(31)

d

dtetM =

d

dt

∞∑n=0

1

n!tnMn =

∞∑n=0

1

n!

d

dttnMn =

∞∑n=1

n

n!tn−1Mn = MetM

To establish uniform convergence of (30) the easiest way is to diagonalizeM . If v1, . . . ,vm are independent eigenvectors corresponding to the eigen-values λ1, . . . , λm of M , let S = [v1, . . . ,vm]. Then M = SΛS−1 with Λ thediagonal matrix with entries λ1, . . . , λm.

Note that

M2 =(SΛS−1

)2= SΛS−1SΛS−1 = SΛ2S−1

then

M3 = M2M =(SΛ2S−1

) (SΛS−1

)= SΛ3S−1

and so on; for any power

Mn = SΛnS−1

Then the series (30) is

(32) etM =

∞∑n=0

1

n!tnMn =

∞∑n=0

1

n!tnSΛnS−1 = S

( ∞∑n=0

1

n!tnΛn

)S−1


For

(33) Λ =

λ1 0 . . . 00 λ2 . . . 0...

......

0 0 . . . λm

it is easy to see that

(34) Λn =

λn1 0 . . . 00 λn2 . . . 0...

......

0 0 . . . λnm

for n = 1, 2, 3 . . .

therefore

(35)

∞∑n=1

1

n!tnΛn =

∑∞

n=11n! t

nλn1 0 . . . 00

∑∞n=1

1n! t

nλn2 . . . 0...

......

0 0 . . .∑∞

n=11n! t

nλnm

(36) =

etλ1 0 . . . 0

0 etλ2 . . . 0...

......

0 0 . . . etλm

= etΛ

and (32) becomes

(37) etM = SetΛS−1

which shows that the series defining the matrix etM converges uniformly forall t, and (37), (36) give a formula for its value. Furthermore, (31) showsthat U(t) = etM is a solution of the differential equation (29). Obviously,U(t) = etΛC is also a solution for any constant matrix C.

Note that the general vector solution of (26) is

(38) u(t) = etMb, with b an arbitrary constant vector

So see how the two forms of the general solution (38) and (28) are con-nected, write M = SΛS−1 in (26), and multiply by S−1:

S−1du

dt= ΛS−1u

Denoting y = S−1u the equation becomes y′ = Λy, with general solutiony(t) = etΛa. Therefore

u(t) = Sy(t) = SetΛa

which is exactly (27).On the other hand, (38) is u(t) = SetΛS−1b so (38) and (28) are the same

for a = S−1b.

18 RODICA D. COSTIN

Theorem 14. Fundamental facts on linear differential systemsLet M be an n× n matrix (diagonalizable or not).(i) The matrix differential problem

(39)d

dtU(t) = M U(t), U(0) = U0

has a unique solution, namely U(t) = eMtU0.(ii) Let W (t) = detU(t). Then

(40) W ′(t) = TrM W (t)

therefore

(41) W (t) = W (0) etTrM

(iii) If U0 is an invertible matrix, then the matrix U(t) is invertible forall t, and the columns of U(t) form an independent set of solutions of thesystem

(42)du

dt= Mu

(iv) Let u1(t), . . . ,un(t) be solutions of the system (42). If the vectorsu1(t), . . . ,un(t) are linearly independent at some t then they are linearlyindependent at any t.

Proof.We give here the proof in the case when M is diagonalizable. The general

case is quite similar, see §3.2.(i) Clearly U(t) = eMtU0 is a solution, and it is unique by the general

theory of differential equations: (39) is a linear system of n2 differentialequation in n2 unknowns.

(ii) Using (37) it follows that

W (t) = detU(t) = det(SetΛS−1U0) = det etΛ detU0 = et∑nj=1 λj detU0

= etTrM detU0 = etTrMW (0)

which is (41), implying (40).(iii), (iv) are immediate consequences of (41). 2

3.2. Repeated eigenvalues. All we did for diagonalizable matrices M canbe worked out for matrices which are not diagonalizable, by replacing thediagonal matrix Λ by a Jordan normal form J , see §2.14.

In formula (32) we need to understand what is the exponential of a Jordannormal form. Since such a matrix is block diagonal, then its exponential willbe block diagonal as well, with exponentials of each Jordan block.


3.2.1. Example: 2× 2 blocks. For

(43) J =

[λ 1

0 λ

]direct calculations give

(44) J2 =

[λ2 2λ

0 λ2

], J3 =

[λ3 3λ2

0 λ3

], . . . , Jk =

[λk k λk−1

0 λk

], . . .

and then

(45) etJ =∞∑k=0

1

k!tkJk =

[etλ tetλ

0 etλ

]

3.2.2. Example: 3× 3 blocks: for

(46) J =

λ 1 0

0 λ 1

0 0 λ

direct calculations give

J2 =

λ2 2λ 1

0 λ2 2λ

0 0 λ2

, J3 =

λ3 3λ2 3λ

0 λ3 3λ2

0 0 λ3

, J4 =

λ4 4λ3 6λ2

0 λ4 4λ3

0 0 λ4

Higher powers can be calculated by induction; it is clear that

(47) Jk =

λk kλk−1 k(k−1)2 λk−2

0 λk kλk−1

0 0 λk

Then

(48) etJ =∞∑k=0

1

k!tkJk =

etλ tetλ 12 t

2etλ

0 etλ tetλ

0 0 etλ

In general, if S is the matrix whose columns are an independent set of

eigenvectors and generalized eigenvectors of M such that M = SJS−1 whereJ is a Jordan normal form, then

(49) etM = SetJS−1

is a fundamental matrix solution of (26), and the general solution of (26) is

u(t) = etMb

20 RODICA D. COSTIN

3.2.3. General solution for 2×2 Jordan blocks. Let M be a 2×2 matrix, witheigenvalues λ, λ and only one independent eigenvector v. Find a generalizedeigenvector; let x1 = v and find x2 solution of (M − λI)x2 = x1 (note thatthis means that Mx2 = x1+λx2). If S = [x1,x2] then S−1MS is the Jordanblock (43) (check!).

Then, as in the diagonalizable case, the general vector solution is

u(t) = etMb = SetJS−1b = SetJa = a1eλtx1 + a2

(teλtx1 + eλtx2

)3.2.4. General solution for 3×3 Jordan blocks. Let M be a 3×3 matrix, witheigenvalues λ, λ, λ and only one independent eigenvector v. We need twogeneralized eigenvectors: let x1 = v, find x2 solution of (M − λI)x2 = x1,then x3 solution of (M − λI)x3 = x2 (note that this means that Mx2 =x1 + λx2, and Mx3 = x2 + λx3). If S = [x1,x2,x3] then S−1MS is theJordan block (46) (check!).

Then the general vector solution is

u(t) = etMb = SetJS−1b = SetJa

= a1eλtx1 + a2

(teλtx1 + eλtx2

)+ a3

(1

2t2eλtx1 + teλtx2 + eλtx3

)Note the practical prescription: if λ is a repeated eigenvalue of multiplicity

p then we need to look for solutions of the type u(t) = eλtq(t) where q(t)are polynomials in t of degree at most p.

3.3. Higher order linear differential equations; companion matrix.Consider scalar linear differential equations, with constant coefficients, ofdegree n:

(50) y(n) + an−1y(n−1) + . . .+ a1y

′ + a0y = 0

where y(t) is a scalar function and a1, . . . , an1 are constants.Such equations can be transformed into linear systems of first order equa-

tions. The substitution:

(51) u1 = y, u2 = y′, . . . , un = y(n−1)

transforms (50) into the system

(52) u′ = Mu, where M =

0 1 0 . . . 00 0 1 . . . 0...

...0 0 0 . . . 1−a0 −a1 −a2 . . . −an−1

The matrix M is called the companion matrix of the differential equation(50).


To find its eigenvalues an easy method is to search for λ so that the linearsystem Mx = λx has a solution x 6= 0:

x2 = λx1, x3 = λx2, . . . , xn−1 = λxn−2, −a0x1−a1x2− . . .−an−1xn = λxn

which implies that

(53) λn + an−1λn−1 + . . .+ a1λ+ a0 = 0

which is the characteristic equation of M . Note that we can obtain thisequation by searching for solutions of (50) which are purely exponential:y(t) = eλt.

Using the results obtained for first order linear systems, and looking justat the first component u1(t) (which is y(t)) of the vector u(t) we find thefollowing:

(i) if the characteristic equation (53) has n distinct solutions λ1, . . . , λnthen the general solution is a linear combination of purely exponential solu-tions

y(t) = a1eλ1t + . . .+ ane

λnt

(ii) if λj is a repeated eigenvalue of multiplicity pj then there are pjindependent solutions of the type eλjtq(t) where q(t) are polynomials in t ofdegree at most pj , therefore they can be taken to be eλjt, teλjt, . . . , tpj−1eλjt.

3.3.1. The Wronskian. Let y1(t), . . . , yn(t) be n solutions of the equation(50). The substitution (51) produces n solutions u1, . . .un of the system(52). Let U(t) = [u1(t), . . .un(t)] be the matrix solution of (52). Theorem 14applied to the companion system yields:

Theorem 15. Let y1(t), . . . , yn(t) be solutions of equation (50).(i) Their Wronskian

W (t) =

∣∣∣∣∣∣∣∣∣y1(t) . . . yn(t)y′1(t) . . . y′n(t)

......

y(n−1)1 (t) . . . y

(n−1)n (t)

∣∣∣∣∣∣∣∣∣satisfy

W (t) = e−tan−1W (0)

(ii) y1(t), . . . , yn(t) are linearly independent if and only if their Wronskianis not zero.

3.3.2. Decomplexification. Suppose aj ∈ R and there are nonreal eigen-values, say λ1,2 = α1 ± iβ1 (recall that they come in pairs of conjugate

ones). Then there are two independent solutions y1,2(t) = et(α1±iβ1) =etα1(cos(tβ1) ± i sin(tβ1)). If real valued solutions are needed (or desired),note that Sp(y1, y2) = Sp(yc, ys) where

yc(t) = etα1 cos(tβ1), ys(t) = etα1 sin(tβ1)

and ys, yc are two independent solutions (real valued).

22 RODICA D. COSTIN

Furthermore, any solution in Sp(yc, ys) can be written as

(54) C1etα1 cos(tβ1) + C2e

tα1 sin(tβ1) = Aetα1 sin(tβ1 +B)

where

A =√C2

1 + C22

and B is the unique angle in [0, 2π) so that

cosB =C2√

C21 + C2

2

, sinB =C1√

C21 + C2

2

3.4. Stability in differential equations. A linear, first order system ofdifferential equation

(55)du

dt= Mu

always has, a particular solution, the zero solution: u(t) = 0 for all t. Thepoint 0 is called an equilibrium point of the system (55). More generally,

Definition 16. An equilibrium point of a differential equation u′ = f(u)is a point u0 for which the constant function u(t) = u0 is a solution, there-fore f(u0) = 0.

It is important in applications to know how solutions behave near anequilibrium point.

An equilibrium point u0 is called stable if any solutions which start closeenough to u0 remain close to u0 for all t > 0. (This definition can be mademore mathematically precise, but it will not be needed here, and it is besidesthe scope of these lectures.)

Definition 17. An equilibrium point u0 is called asymptotically stableif

limt→∞

u(t) = u0 for any solution u(t)

It is clear that an asymptotically stable point is stable, but the converseis not necessarily true.

An equilibrium point which is not stable is called unstable.We can easily characterize the nature of the equilibrium point u0 = 0 of

linear differential equations in terms of the eigenvalues of the matrix M .We saw that solutions of a linear system (55) are linear combinations of

exponentials etλj where λj are the eigenvalues of the matrix M , and if M is

not diagonalizable, also of tketλj for 0 < k ≤(multiplicity of λj)− 1.Recall that

limt→∞

tketλj = 0 if and only if <λj < 0

Therefore:(i) if all λj have negative real parts, then any solution u(t) of (55) converge

to zero: limt→∞ u(t) = 0, and 0 is asymptotically stable.


(ii) If all <λj ≤ 0, and some real parts are zero, and eigenvalues with zeroreal part have the dimension of the eigenspace equal to the multiplicity ofthe eigenvalue1 then 0 is stable.

(iii) If any eigenvalue has a positive real part, then 0 is unstable.As examples, let us consider 2 by 2 systems with real coefficients.Example 1: an asymptotically stable case, with all eigenvalues real.For

M =

[−5 −2

−1 −4

], M = SΛS−1, with Λ =

[−3 0

0 −6

], S =

[1 2

−1 1

]The figure shows the field plot (a representation of the linear transfor-

mation x → Mx of R2). The trajectories are tangent to the line field, andthey are clearly going towards the origin. Along the directions of the twoeigenvectors of M , Mx is parallel to x (seen here from the fact that thearrow points towards O). The point 0 is a hyperbolic equilibrium point.

1This means that if <λj = 0 then there are no solutions q(t)etλj with nonconstantq(t).

24 RODICA D. COSTIN

Example 2: an asymptotically stable case, with nonreal eigenvalues.For

M =

[−1 −2

1 −3

]with Λ =

[−2 + i 0

0 −2− i

], S =

[1 + i 1− i

1 1

]The figure shows the field plot. The trajectories are tangent to the line

field, and they are clearly going towards the origin, though rotating aroundit. The equilibrium point 0 is hyperbolic.


Example 3: an unstable case, with one negative eigenvalue, and a positiveone.

For

M =

[3 −6

3 0

]with Λ =

[−3 0

0 6

], S =

[1 2

−1 1

]The figure shows the field plot. Note that there is a stable direction (in

the direction of the eigenvector corresponding to the negative eigenvalue),and an unstable one (the direction of the second eigenvector, correspondingto the positive eigenvalue). The equilibrium point 0 is a saddle point.

26 RODICA D. COSTIN

Example 4: an unstable point, with both positive eigenvalues.For

M =

[5 2

1 4

]with Λ =

[3 0

0 6

], S =

[1 2

−1 1

]The figure shows the field plot. Note that the arrows have opposite di-

rections than in Example 1. The trajectories go away from the origin.


Example 5: the equilibrium point 0 is stable, not asymptotically stable.For

M =

[1 −2

1 −1

]with Λ =

[i 0

0 −i

], S =

[1 + i 1− i

1 1

]The trajectories rotate around the origin on ellipses, with axes determined

by the real part and the imaginary part of the eigenvectors.

4. Difference equations (Discrete dynamical systems)

4.1. Linear difference equations with constant coefficients. A firstorder difference equation, linear, homogeneous, with constant coefficients,has the form

(56) xk+1 = Mxk

where M is an n × n matrix, and xk are n-dimensional vectors. Given aninitial condition x0 the solution of (56) is uniquely determined: x1 = Mx0,then we can determine x2 = Mx1, then x3 = Mx2, etc. Clearly the solutionof (56) with the initial condition x0 is

(57) xk = Mkx0

28 RODICA D. COSTIN

A second order difference equation, linear, homogeneous, with constantcoefficients, has the form

(58) xk+2 = M1xk+1 +M0xk

A solution of (58) is uniquely determined if we give two initial conditions,x0 and x1. Then we can find x2 = M1x1 +M0x0, then x3 = M1x2 +M0x1

etc.Second order difference equations can be reduced to first order ones: let

yk be the 2n dimensional vector

yk =

[xk

xk+1

]Then yk satisfies the recurrence

yk+1 = Myk where M =

[0 IM0 M1

]which is of the type (56), and has a unique solution if y0 is given.

More generally, a difference equation of order p which is linear, homoge-neous, with constant coefficients, has the form

(59) xk+p = Mp−1xk+p−1 + . . .+M1xk+1 +M0xk

which has a unique solution if the initial p values are specified x0,x1 . . . ,xp−1.The recurrence (59) can be reduced to a first order one for a vector of di-mension np.

To understand the solutions of the linear difference equations it thensuffices to study the first order ones, (56).

4.2. Solutions of linear difference equations. Consider the equation(56). If M has n independent eigenvectors v1, . . . ,vn (i.e. M is diagonaliz-able) let S = [v1, . . . ,vm] and then M = SΛS−1 with Λ the diagonal matrixwith entries λ1, . . . , λn. The solution (57) can be written as

xk = Mkx0 = SΛkS−1x0

and denoting S−1x0 = b,

xk = SΛkb = b1λk1v1 + . . .+ bnλ

knvn

hence solutions xk are linear combinations of λkj multiples of the eigenvectorsvj .

If M is not diagonalizable, just as in the case of differential equations,then consider a matrix S so that S−1MS is in Jordan normal form.

Consider the example of a 2 × 2 Jordan block: M = SJS−1 with Jgiven by (46). As in Let S = [y1,y2] where y1 is the eigenvector of Mcorresponding to the eigenvalue λ and y2 is a generalized eigenvector (as


in §3.2.3, only with a different notation). Using (44) we obtain the generalsolution

xk = [y1,y2]

[λk k λk−1

0 λk

] [b1b2

]= b1λ

ky1 + b2

(kλky1 + λky2

)and the recurrence has two linearly independent solutions of the form λky1

and q(k)λk where q(k) is a polynomial in k of degree one.In a similar way, for p× p Jordan blocks there are p linearly independent

solutions of the form q(k)λk where q(k) are polynomials in k of degree atmost p− 1, one of then being constant, equal to the eigenvector.

4.3. Stability. Clearly the constant zero sequance xk = 0 is a solution ofany linear homogeneous discrete equation (56): 0 is an equilibrium point(or a steady state).

As in the case of differential equations, an equilibrium point of a differenceequation is called asymptotically stable, or an attractor, if solutions startingclose enough to the equilibrium point converge towards it.

For linear difference equations this means that limk→∞ xk = 0 for allsolutions xk. This clearly happens if and only if all the eigenvalues λj of Msatisfy |λj | < 1.

If all eigenvalues have either |λj | < 1 or |λj | = 1 and for eigenvalues ofmodulus 1, the dimension of the eigenspace equals the multiplicity of theeigenvalue, 0 is a stable point (or neutral).

In all other cases the equilibrium point is called unstable.

4.4. Why does the theory of discrete equations parallel that ofdifferential equations?

An example: compounding interestIf a principal P0 is deposited in a bank, earning an interest of r% com-

pounded anually, after the first year, the amount in the account is P1 =(1 + r/100)P0, after the second year is P2 = (1 + r/100)P1, and so on: afterthe nth year the amount is

(60) Pn = (1 + r/100)Pn−1, therefore Pn = (1 + r/100)nP0

Now assume the same principal P0 is deposited in a bank earning thesame interest of r% per year, compounded bianually. This means that everyhalf year an r/2% interest is added. After the first year, the amount in

the account is P[2]1 = (1 + r/200)2P0, after the second year it is P

[2]2 =

(1 + r/200)2P[2]1 , and so on: after the nth year the amount is

(61) P [2]n = (1 + r/200)2P

[2]n−1, therefore P [2]

n = (1 + r/200)2nP0

Suppose now that the principal P0 is deposited in a bank earning r%interest per year, but compounded more often, k times a year. After the

30 RODICA D. COSTIN

first year, the amount in the account is P[k]1 = (1 + r/(100 k))kP0, and so

on: after the nth year the amount is

(62) P [k]n = (1 + r/(100 k))k P

[k]n−1

therefore

(63) P [k]n = (1 + r/(100 k))nkP0

Now assume the interest is compounded more and more often, continoulsy.This means k →∞, and taking this limit in (63) and using the well knownfact that

limx→0

(1 + x)1/x = e therefore limx→0

(1 + cx)1/x = ec

we obtain

(64) P [∞]n = enr/100P0

Relations (60), (61), (62) are difference equations, while (64) satisfies adifferential equation:

d

dnP [∞]n = r/100P [∞]

n

More generally:Suppose a function y(t) satisfies a differential equation, take for simplicity

y(t) scalar, and a linear, first order equation

(65)dy

dt= ay(t), y(0) = y0 with solution y(t) = eaty0

Let us consider a discretization of t: let h be small, and consider only thevalues for t: tn = nh. Using the approximation

(66) y(tn+1) = y(tn) + y′(tn)h+O(h2)

then

hy′(tn) ≈ y(tn+1)− y(tn)

(read: the shift by h equals, in a fisrt approximation, h ddh) which used in

(67) gives the difference equation(67)y(tn+1)− y(tn) = ahy(tn), y(0) = y0 with solution y(tn) = (1 + ah)ny0

The values y(tn) are approximations of y(t), and in the limit h→ 0 theyequal y(t):

y(tn) = (1 + ah)tn/h y0 → eatny0 as h→ 0

More abstractly:It is appears that the argument above relies on a (limit) connection be-

tween the shift operator (which takes a function y(t) to its shift y(t+h)) andthe differentiation operator (taking a function y(t) to its derivative d

dty(t)).Let us explore this connection.


Writing the whole Taylor series expansion in (66):

y(t+ h) = y(t) + hdy

dt+

1

2!h2d

2y

dt2+ . . .+

1

n!hndny

dtn+ . . .

=

(1 + h

d

dt+

1

2!h2 d

2

dt2+ . . .+

1

n!hn

dn

dtn+ . . .

)y(t) = eh

ddt y(t)

therefore

y(t+ h) = ehddt y(t)

The difference operator ∆h defined as (∆hy)(t) = y(t+ h)− y(t) satisfies

∆h = ehddt − 1

(of course, justification is needed, and it is done using operator theory).

4.5. Example: Fibonacci numbers.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .

this is one of the most famous sequence of numbers, studied for more that 2milennia (it fist appeared in ancient indian mathematics), which describescountless phenomena in nature, in art and in sciences.

The Fibonacci numbers are defined by the recurrence relation

(68) Fk+2 = Fk+1 + Fk

with the initial condition F0 = 0, F1 = 1.Denoting

xk =

[FkFk+1

]then xk satisfies the recurrence xk+1 = Mxk where M is the companionmatrix

M =

[0 11 1

]As in the case of differential equations, the characteristic equation can be

found either using the matrix M , or by directly substituting Fk = λk intothe recurrence (68), yielding λ2 = λ+ 1.

The eigenvalues are

λ1 =1 +√

5

2= φ = the golden ratio, λ2 =

1−√

5

2

Fk is a linear combination of λk1 and λk2: Fk = c1λk1 + c2λ

k2. The values of

c1, c2 can be found from the initial conditions:Note that the ratio of two consecutive Fibonacci numbers converges to

the golden ratio:

limk→∞

Fk+1

Fk= φ

32 RODICA D. COSTIN

4.6. Positive matrices.

Definition 18. A positive matrix is a square matrix whose entries areall positive numbers.

Caution: this is not to be confused with positive definite self adjointmatrices, which will be studied later.

Positive matrices have countless applications and very special properties.Notations.

x ≥ 0 denotes a vector with all components xj ≥ 0

x > 0 denotes a vector with all components xj > 0

Theorem 19. Perron-Frobenius TheoremLet P be a positive matrix: P = [Pij ]i,j=1,...,n, Pij > 0.P has a dominant eigenvalue (or, Perron root, or Perron-Frobenius eigen-

value) r(P ) = λ1 with the following properties:(i) λ1 > 0 and the associated eigenvector v1 is positive: v1 > 0.(ii) λ1 is a simple eigenvalue.(iii) All other eigenvalues have smaller modulus: if |λj | < λ1 for all

eigenvalues λj of P , j > 1.(iv) All other eigenvectors of P are not nonnegative, vj 6≥ 0 (they have

have at least one negative or nonreal entry).(v) λ1 satisfies the following maximin property: λ1 = maxT where

T = {t ≥ 0 |Px ≥ tx, for some x ≥ 0, x 6= 0}

(v’) λ1 satisfies the following minimax property: λ1 = minS where

S = {t ≥ 0 |Px ≤ tx, for all x ≥ 0}

(vi) Also

mini

∑j

Pij ≤ λ1 ≤ maxi

∑j

Pij

The proof of the Perron theorem will not be given here.

4.7. Markov chains. Markov processes model random chains of eventsevents whose likelihood depends on, and only on, what happened last.

4.7.1. Example. Suppose that it was found that every year 1% of the USpopulation living in costal areas moves inland, and 2% of the US populationliving inland moves to costal areas2. Denote by xk and yk the number ofpeople living in costal areas, respectively inland, at year k. We are interestedto understand how the population distribution among these areas evolves inthe future.

2These are not real figures. Unfortunately, I could not find real data on this topic.


Assuming the US population remains the same, in the year k+ 1 we findthat xk+1 = .99xk + .02yk and yk+1 = .01xk − .98yk or

(69) xk+1 = Mxk

where

xk =

[xkyk

], M =

[.99 .02.01 .98

]Relation (69) modeling our process is a first order difference equation.

Note that the entries of the matrix M are nonnegative (they represent apercentage, or a probability), and that its columns add up to 1, since thewhole population is subject to the process: any person of the US populationis in one of the two regions.

Question: what happens in the long run, as k → ∞? Would the wholepopulation eventually move to costal areas?

To find the solution xk of (69) we need the eigenvalues and eigenvectors ofM : it is easily calculated that there is one eigenvalue equal to 1 correspond-ing to v1 = (2, 1)T , and an eigenvalue .97, corresponding to v2 = (−1, 1)T .(Note that M is a positive matrix, and the Perron-Frobenius Theorem ap-plies: the dominant eigenvalue is 1, and its eigenvector has positive compo-nents, while the other eigenvector has both positive and nonpositive com-ponents.)

Then

xk = c1v1 + c2 .97kv2

and

x∞ = limk→∞

xk = c1v1

The limit is an eigenvector corresponding to the eigenvalue 1!In fact this is not a big surprise if we reason as follows : assuming that xk

converges (which is not guaranteed without information on the eigenvaluesof M) then taking the limit k → ∞ in the recurrence relation (69) we findthat x∞ = Mx∞ hence the limit x∞ is an eigenvector of M correspondingto the eigenvalue 1, or the limit is 0 - which is excluded by the interpretationthat xk + yk = const=the total population.

Note: all the eigenvectors corresponding to the eigenvalue 1 are steady-states: if the initial population distribution was x0 = av1 then the popula-tion distribution remains the same: xk = x0 for all k (since Mv1 = v1).

Exercise. What is the type of the equilibrium points c1v1 (asymptoti-cally stable, stable, unstable)?

In conclusion, in the long run the population becomes distributed withtwice as many people living in costal area than inland.

4.7.2. Markov matrices. More generally, a Markov process is governed byan equation (69) where the matrix M has two properties summarized asfollows.

34 RODICA D. COSTIN

Definition 20. An n × n matrix M = [Mij ] is called a Markov matrix(or a Stochastic matrix) if:(i) all Mij ≥ 0, and(ii) each column adds up to 1:

∑iMij = 1.

Theorem 21. Properties of Markov matricesIf M is a Markov matrix then:

(i) λ = 1 is an eigenvalue.(ii) All the eigenvalues satisfy |λj | ≤ 1. If all the entries of M are positive,then |λj | < 1 for j > 1.

(iii) If for some k all the entries of Mk are positive, then λ1 = 1 hasmultiplicity 1 and all the other eigenvalues satisfy |λj | < 1 for j = 2, . . . , n.

Proof.(i) The matrix M − I is not invertible, since all the columns of M add

up to 1, and therefore the columns of M − I add up to zero. Thereforedet(M − I) = 0 and 1 is an eigenvalue.

(ii) and (iii) follow from Perron-Frobenius Theorem 19. 2

Note that for general Markov matrices all eigenvectors corresponding tothe eigenvalue 1 are steady states.

5. More functional calculus

5.1. Functional calculus for digonalizable matrices. LetM be a squaren × n matrix, assumed diagonalizable: it has n independent eigenvectorsv1, . . . ,vn corresponding to the eigenvalues λ1, . . . , λn and if S = [v1, . . . ,vn]then S−1MS = Λ a diagonal matrix with λ1, . . . , λn on its diagonal.

5.1.1. Polynomials of M . We looked at positive integer powers of M , andwe saw that Mk = SΛkS−1, where the power k is applied to each diagonalentry of Λ. To be consistent we clearly need to define M0 = I.

Recall that M is invertible if and only if all its eigenvalues are not zero.Assume this is the case. Then we can easily check that M−1 = SΛ−1S−1

where the power k is applied to each diagonal entry of Λ. We can then defineany negative integer power of M .

If p(t) = antn + . . .+ a1t+ a0 is a polynomial in t, we can easily define

p(M) = anMn + . . .+ a1M + a0I = S p(Λ)S−1

where p(Λ) is the diagonal matrix with p(λj) on the diagonal.

5.1.2. The exponential eM . We defined the exponential eM using its Taylorseries

eM =∞∑k=0

1

k!Mk

and eM = SeΛS−1 where eΛ is the diagonal matrix with eλj on the diagonal.


5.1.3. The resolvent. For which numbers z ∈ C the matrix zI −M has aninverse, and what are its eigenvalues? Clearly the matrix zI−M is invertiblefor all z which differ from the eigenvalues of M (in the infinite dimensionalcase things are not quite so).

The matrix valued function R(z) = (zI − M)−1, defined for all z 6=λ1, . . . λn is called the resolvent of M . The resolvent has many uses, and isparticularly useful in infinite dimensions.

Let z = 1. If M is diagonalizable then (I −M)−1 = S (zI − Λ)−1 S−1

where (zI − Λ)−1 is the diagonal matrix with (z − λj)−1 on the diagonal.Here is another formula, very useful for the infinite dimensional case: if

M is diagonalizable, with all the eigenvalues satisfying |λj | < 1

then

(70) (I −M)−1 = I +M +M2 +M3 + . . .

which follows from the fact that

1

1− λj= 1 + λj + λ2

j + λ3j + . . . if |λj | < 1

The resolvent is extremely useful for nondiagonalizable cases as well. Ininfinite dimensions the numbers z for which the resolvent does not exist, thespectrum of the linear transformation, (they may or may nor be eigenval-ues) play the role of the eigenvalues in finite dimensions.

Returning to finite dimensions, if M is not diagonalizable, formula (70)may not hold (see for example a 2 dimensional Jordan block with zero eigen-value). We will see that (70) is true for matrices of norm less than 1 - this isa good motivation for introducing the notion of norm of a matrix later on.

5.1.4. The square root of M . Given the diagonalizable matrix M can wefind matrices R so that R2 = M , and what are they?

Using the diagonal form of M we have: R2 = SΛS−1 which is equivalentto S−1R2S = Λ and therefore (S−1RS)2 = Λ.

Assuming that S−1RS is diagonal, then S−1RS =diag(±λ1/21 , . . . ,±λ1/2

n )and therefore

(71) R = S

σ1 . . . 0...

...0 . . . σn

S−1, σj ∈ {1,−1}

There are 2n such matrices!But there are also matrices with S−1RS not diagonal. Take for example

M = I, and find all the matrices R with R2 = I. Then besides the fourdiagonal solutions

R =

[σ1 00 σ2

], σ1,2 ∈ {1,−1}

36 RODICA D. COSTIN

there is the two parameter family of the solutions

R =

[±√

1− ab ab ∓

√1− ab

]Some of these matrices have nonreal entries!

5.1.5. Functional calculus for diagonalizable matrices. What other functionsof M can we define? If M is diagonalizable it seems that given a functionf(t) we can define f(M) provided that all f(λj) are defined (a careful con-struction is needed).

Diagonalizable matrices are thus very ”user friendly”. Later on we willsee that there is a quick test to see which matrices are diagonalizable, andwhich are not. It will be proved that M is diagonalizable if and only ifit commutes with its adjoint: MM∗ = M∗M . Such matrices are callednormal, and this is the gateway to generalizing functional calculus to linearoperators in infinite dimensions.

5.1.6. Working with Jordan blocks. The calculations done for 2 and 3 di-mensional Jodan blocks in §3.2 can be done in a tidy way for the generaln× n blocks using functional calculus.

First note that any n× n Jordan block, with eigenvalue λ can be writtenas

J = λI +N

where N is a matrix whose only nonzero entries are 1 above the diagonal.A short calculation shows that N2 only nonzero entries are a sequence of1’s at a distance two above diagonal, and so on: each additional power of Npushes the slanted line of 1 moves toward the upper right corner. EventuallyNn = 0. For example in dimension four:

N =

0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

N2 =

0 0 1 0

0 0 0 1

0 0 0 0

0 0 0 0

N3 =

0 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0

N4 = 0

Since I and N commute we can use the binomial formula which gives

Jk = (λI +N)k =

k∑j=0

(kj

)λk−jN j

which for k > n − 2 equals∑n−1

j=0

(kj

)λk−jN j . See (44), (47) for n = 2

and n = 3.Also because I and N commute

eJ = eλI+N = eλIeN = eλn−1∑k=0

1

k!Nk

See (45), (48) for n = 2 and n = 3.


Exercise. What is (I−J)−1 for J an n×n Jordan block with eigenvalueλ?

5.1.7. The Cayley-Hamilton Theorem. Here is a beautiful fact:

Theorem 22. The Cayley-Hamilton Theorem. Let M be a squarematrix, and p(λ) = det(M − λI) be its characteristic polynomial.

Then p(M) = 0.

Note that if M is n × n then it follows in particular that Mn is a linearcombinations of earlier powers I,M,M2, . . . ,Mn−1.

Proof of the Cayley-Hamilton Theorem.Assume first that M is diagonalizable: M = SΛS−1. Then p(M) =

p(SΛS−1) = S p(Λ)S−1 where p(Λ) is the diagonal matrix having p(λj) onthe diagonal. Since p(λj) = 0 for all j then p(Λ) = 0 and the theorem isproved.

In the general case M = SJS−1 where J is a Jordan normal form. Thenp(J) is a block diagonal matrix, the blocks being p applied to standardJordan blocks. Let J1 be any one of these blocks, with eigenvalue λ1 anddimension p1. Then the characteristic polynomial of M contains the factor(λ1 − λ)p1 . Since (λ1 − J1)p1 = (−N1)p1 = 0 then p(J1) = 0. As this is truefor each Jordan block composing J , the theorem follows. 2

5.2. Commuting matrices. The beautiful world of functional calculuswith matrices is marred by noncommuatitivity. For example eAeB equalseA+B only if A and B commute, and the square (A + B)2 = A2 + AB +BA+B2 cannot be simplified to A2 + 2AB+B2 unless A and B commute.

When do two matrices commute?

Theorem 23. Let A and B be two diagonalizable matrices.Then AB = BA if and only if they have the same matrix matrix of eigen-

vectors S (they are simultaneously diagonalizable).

Proof.Assume that A = SΛS−1 and B = SΛ′S−1 where Λ,Λ′ diagonal. Then,

since diagonal matrices commute,

AB = SΛS−1SΛ′S−1 = SΛΛ′S−1 = SΛ′ΛS−1 = SΛ′S−1SΛS−1 = BA

Conversely, assume AB = BA and let S = [v1, . . . ,vn] be the matrixdiagonalizing A, with Avj = αjvj . Then BAvj = αjBvj so ABvj = αjBvjwhich means that both vj and Bvj are eigenvectors of A corresponding tothe same eigenvalue αj .

If all the eigenvalues of A are simple, then this means that Bvj is a scalarmultiple of vj so S diagonalizes B as well.

If A has multiple eigenvalues then we may need to change S a little bit(each set of eigenvectors of A corresponding to the same eigenvalue), toaccommodate B.

38 RODICA D. COSTIN

First replaceA by a diagonal matrix: AB = BA is equivalent to SΛS−1B =BSΛS−1 therefore ΛS−1BS = S−1BSΛ. Let C = S−1BS, satisfyingΛC = CΛ.

We can assume that the multiple eigenvalues of Λ are grouped together,so that Λ is built of diagonal blocks of the type αjI of dimensions dj , withdistinct αj .

A direct calculation shows that ΛC = CΛ is equivalent to the fact thatC is block diagonal with blocks of dimensions dj . Since C is diagonalizable,each block can be diagonalized: C = TΛ′T−1, and this conjugation leavesΛ invariant: TΛT−1 = Λ.

Then the matrix ST diagonalizes both A and B. 2

Examples. Any two functions of a matrix M , f(M) and g(M), commute.

Lecture 3 Eigenvalues

Documents

Transcript of Lecture 3 Eigenvalues