CS 450 { Numerical Analysisheath.cs.illinois.edu/scicomp/notes/cs450_chapt04.pdfCS 450 { Numerical...

Post on 05-Jul-2020

3 views 0 download

Transcript of CS 450 { Numerical Analysisheath.cs.illinois.edu/scicomp/notes/cs450_chapt04.pdfCS 450 { Numerical...

CS 450 – Numerical Analysis

Chapter 4: Eigenvalue Problems †

Prof. Michael T. Heath

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

heath@illinois.edu

January 28, 2019

†Lecture slides based on the textbook Scientific Computing: An IntroductorySurvey by Michael T. Heath, copyright c© 2018 by the Society for Industrial andApplied Mathematics. http://www.siam.org/books/cl80

2

Eigenvalue Problems

3

Eigenvalues and Eigenvectors

I Standard eigenvalue problem : Given n × n matrix A, find scalar λand nonzero vector x such that

Ax = λ x

I λ is eigenvalue, and x is corresponding eigenvector

I λ may be complex even if A is real

I Spectrum = λ(A) = set of all eigenvalues of A

I Spectral radius = ρ(A) = max{|λ| : λ ∈ λ(A)}

4

Geometric Interpretation

I Matrix expands or shrinks any vector lying in direction of eigenvectorby scalar factor

I Scalar expansion or contraction factor is given by correspondingeigenvalue λ

I Eigenvalues and eigenvectors decompose complicated behavior ofgeneral linear transformation into simpler actions

5

Eigenvalue Problems

I Eigenvalue problems occur in many areas of science and engineering,such as structural analysis

I Eigenvalues are also important in analyzing numerical methods

I Theory and algorithms apply to complex matrices as well as realmatrices

I With complex matrices, we use conjugate transpose, AH , instead ofusual transpose, AT

6

Examples: Eigenvalues and Eigenvectors

I A =

[1 00 2

]: λ1 = 1, x1 =

[10

], λ2 = 2, x2 =

[01

]

I A =

[1 10 2

]: λ1 = 1, x1 =

[10

], λ2 = 2, x2 =

[11

]

I A =

[3 −1−1 3

]: λ1 = 2, x1 =

[11

], λ2 = 4, x2 =

[1−1

]

I A =

[1.5 0.50.5 1.5

]: λ1 = 2, x1 =

[11

], λ2 = 1, x2 =

[−1

1

]

I A =

[0 1−1 0

]: λ1 = i , x1 =

[1i

], λ2 = −i , x2 =

[i1

]where i =

√−1

7

Characteristic Polynomial and Multiplicity

8

Characteristic Polynomial

I Equation Ax = λx is equivalent to

(A− λI )x = 0

which has nonzero solution x if, and only if, its matrix is singular

I Eigenvalues of A are roots λi of characteristic polynomial

det(A− λI ) = 0

in λ of degree n

I Fundamental Theorem of Algebra implies that n × n matrix Aalways has n eigenvalues, but they may not be real nor distinct

I Complex eigenvalues of real matrix occur in complex conjugatepairs: if α+ iβ is eigenvalue of real matrix, then so is α− iβ, wherei =√−1

9

Example: Characteristic Polynomial

I Characteristic polynomial of previous example matrix is

det

([3 −1−1 3

]− λ

[1 00 1

])=

det

([3− λ −1−1 3− λ

])=

(3− λ)(3− λ)− (−1)(−1) = λ2 − 6λ+ 8 = 0

so eigenvalues are given by

λ =6±√

36− 32

2, or λ1 = 2, λ2 = 4

10

Companion Matrix

I Monic polynomial

p(λ) = c0 + c1λ+ · · ·+ cn−1λn−1 + λn

is characteristic polynomial of companion matrix

Cn =

0 0 · · · 0 −c01 0 · · · 0 −c10 1 · · · 0 −c2...

.... . .

......

0 0 · · · 1 −cn−1

I Roots of polynomial of degree > 4 cannot always computed in finitenumber of steps

I So in general, computation of eigenvalues of matrices of order > 4requires (theoretically infinite) iterative process

11

Characteristic Polynomial, continued

I Computing eigenvalues using characteristic polynomial is notrecommended because of

I work in computing coefficients of characteristic polynomial

I sensitivity of coefficients of characteristic polynomial

I work in solving for roots of characteristic polynomial

I Characteristic polynomial is powerful theoretical tool but usually notuseful computationally

12

Example: Characteristic Polynomial

I Consider

A =

[1 εε 1

]where ε is positive number slightly smaller than

√εmach

I Exact eigenvalues of A are 1 + ε and 1− ε

I Computing characteristic polynomial in floating-point arithmetic, weobtain

det(A− λI ) = λ2 − 2λ+ (1− ε2) = λ2 − 2λ+ 1

which has 1 as double root

I Thus, eigenvalues cannot be resolved by this method even thoughthey are distinct in working precision

13

Multiplicity and Diagonalizability

I Multiplicity is number of times root appears when polynomial iswritten as product of linear factors

I Simple eigenvalue has multiplicity 1

I Defective matrix has eigenvalue of multiplicity k > 1 with fewerthan k linearly independent corresponding eigenvectors

I Nondefective matrix A has n linearly independent eigenvectors, so itis diagonalizable

X−1AX = D

where X is nonsingular matrix of eigenvectors

14

Eigenspaces and Invariant Subspaces

I Eigenvectors can be scaled arbitrarily: if Ax = λx , thenA(γx) = λ(γx) for any scalar γ, so γx is also eigenvectorcorresponding to λ

I Eigenvectors are usually normalized by requiring some norm ofeigenvector to be 1

I Eigenspace = Sλ = {x : Ax = λx}

I Subspace S of Rn (or Cn) is invariant if AS ⊆ S

I For eigenvectors x1 · · · xp, span([x1 · · · xp]) is invariant subspace

15

Relevant Properties of Matrices

16

Relevant Properties of Matrices

I Properties of matrix A relevant to eigenvalue problems

Property Definitiondiagonal aij = 0 for i 6= jtridiagonal aij = 0 for |i − j | > 1triangular aij = 0 for i > j (upper)

aij = 0 for i < j (lower)Hessenberg aij = 0 for i > j + 1 (upper)

aij = 0 for i < j − 1 (lower)

orthogonal ATA = AAT = Iunitary AHA = AAH = Isymmetric A = AT

Hermitian A = AH

normal AHA = AAH

17

Examples: Matrix Properties

I Transpose:

[1 23 4

]T=

[1 32 4

]

I Conjugate transpose:

[1 + i 1 + 2i2− i 2− 2i

]H=

[1− i 2 + i

1− 2i 2 + 2i

]

I Symmetric:

[1 22 3

]

I Nonsymmetric:

[1 32 4

]

I Hermitian:

[1 1 + i

1− i 2

]

I NonHermitian:

[1 1 + i

1 + i 2

]

18

Examples, continued

I Orthogonal:

[0 11 0

],

[−1 00 −1

],

[ √2/2

√2/2

−√

2/2√

2/2

]

I Unitary:

[i√

2/2√

2/2

−√

2/2 −i√

2/2

]

I Nonorthogonal:

[1 11 2

]

I Normal:

1 2 00 1 22 0 1

I Nonnormal:

[1 10 1

]

19

Properties of Eigenvalue Problems

Properties of eigenvalue problem affecting choice of algorithm andsoftware

I Are all eigenvalues needed, or only a few?

I Are only eigenvalues needed, or are corresponding eigenvectors alsoneeded?

I Is matrix real or complex?

I Is matrix relatively small and dense, or large and sparse?

I Does matrix have any special properties, such as symmetry, or is itgeneral matrix?

20

Conditioning of Eigenvalue Problems

21

Conditioning of Eigenvalue Problems

I Condition of eigenvalue problem is sensitivity of eigenvalues andeigenvectors to changes in matrix

I Conditioning of eigenvalue problem is not same as conditioning ofsolution to linear system for same matrix

I Different eigenvalues and eigenvectors are not necessarily equallysensitive to perturbations in matrix

22

Conditioning of Eigenvalues

I If µ is eigenvalue of perturbation A + E of nondefective matrix A,then

|µ− λk | ≤ cond2(X ) ‖E‖2where λk is closest eigenvalue of A to µ and X is nonsingular matrixof eigenvectors of A

I Absolute condition number of eigenvalues is condition number ofmatrix of eigenvectors with respect to solving linear equations

I Eigenvalues may be sensitive if eigenvectors are nearly linearlydependent (i.e., matrix is nearly defective)

I For normal matrix (AHA = AAH), eigenvectors are orthogonal, soeigenvalues are well-conditioned

23

Conditioning of Eigenvalues

I If (A + E )(x + ∆x) = (λ+ ∆λ)(x + ∆x), where λ is simpleeigenvalue of A, then

|∆λ| / ‖y‖2 · ‖x‖2|yHx |

‖E‖2 =1

cos(θ)‖E‖2

where x and y are corresponding right and left eigenvectors and θ isangle between them

I For symmetric or Hermitian matrix, right and left eigenvectors aresame, so cos(θ) = 1 and eigenvalues are inherently well-conditioned

I Eigenvalues of nonnormal matrices may be sensitive

I For multiple or closely clustered eigenvalues, correspondingeigenvectors may be sensitive

24

Computing Eigenvalues and Eigenvectors

25

Problem Transformations

I Shift : If Ax = λx and σ is any scalar, then (A− σI )x = (λ− σ)x ,so eigenvalues of shifted matrix are shifted eigenvalues of originalmatrix

I Inversion : If A is nonsingular and Ax = λx with x 6= 0, then λ 6= 0and A−1x = (1/λ)x , so eigenvalues of inverse are reciprocals ofeigenvalues of original matrix

I Powers : If Ax = λx , then Akx = λkx , so eigenvalues of power ofmatrix are same power of eigenvalues of original matrix

I Polynomial : If Ax = λx and p(t) is polynomial, thenp(A)x = p(λ)x , so eigenvalues of polynomial in matrix are values ofpolynomial evaluated at eigenvalues of original matrix

26

Similarity Transformation

I B is similar to A if there is nonsingular matrix T such that

B = T−1A T

I Then

By = λy ⇒ T−1ATy = λy ⇒ A(Ty) = λ(Ty)

so A and B have same eigenvalues, and if y is eigenvector of B,then x = Ty is eigenvector of A

I Similarity transformations preserve eigenvalues, and eigenvectors areeasily recovered

27

Example: Similarity Transformation

I From eigenvalues and eigenvectors for previous example,[3 −1−1 3

] [1 11 −1

]=

[1 11 −1

] [2 00 4

]and hence [

0.5 0.50.5 −0.5

] [3 −1−1 3

] [1 11 −1

]=

[2 00 4

]

I So original matrix is similar to diagonal matrix, and eigenvectorsform columns of similarity transformation matrix

28

Diagonal Form

I Eigenvalues of diagonal matrix are diagonal entries, and eigenvectorsare columns of identity matrix

I Diagonal form is desirable in simplifying eigenvalue problems forgeneral matrices by similarity transformations

I But not all matrices are diagonalizable by similarity transformation

I Closest one can get, in general, is Jordan form, which is nearlydiagonal but may have some nonzero entries on first superdiagonal,corresponding to one or more multiple eigenvalues

29

Triangular Form

I Any matrix can be transformed into triangular (Schur ) form bysimilarity, and eigenvalues of triangular matrix are diagonal entries

I Eigenvectors of triangular matrix less obvious, but stillstraightforward to compute

I If

A− λI =

U11 u U13

0 0 vT

O 0 U33

is triangular, then U11y = u can be solved for y , so that

x =

y−10

is corresponding eigenvector

30

Block Triangular Form

I If

A =

A11 A12 · · · A1p

A22 · · · A2p

. . ....

App

with square diagonal blocks, then

λ(A) =

p⋃j=1

λ(Ajj)

so eigenvalue problem breaks into p smaller eigenvalue problems

I Real Schur form has 1× 1 diagonal blocks corresponding to realeigenvalues and 2× 2 diagonal blocks corresponding to pairs ofcomplex conjugate eigenvalues

31

Forms Attainable by Similarity

A T Bdistinct eigenvalues nonsingular diagonalreal symmetric orthogonal real diagonalcomplex Hermitian unitary real diagonalnormal unitary diagonalarbitrary real orthogonal real block triangular

(real Schur)arbitrary unitary upper triangular

(Schur)arbitrary nonsingular almost diagonal

(Jordan)

I Given matrix A with indicated property, matrices B and T existwith indicated properties such that B = T−1AT

I If B is diagonal or triangular, eigenvalues are its diagonal entries

I If B is diagonal, eigenvectors are columns of T

32

Power Iteration

33

Power Iteration

I Simplest method for computing one eigenvalue-eigenvector pair ispower iteration, which repeatedly multiplies matrix times initialstarting vector

I Assume A has unique eigenvalue of maximum modulus, say λ1, withcorresponding eigenvector v1

I Then, starting from nonzero vector x0, iteration scheme

xk = Axk−1

converges to multiple of eigenvector v1 corresponding to dominanteigenvalue λ1

34

Convergence of Power Iteration

I To see why power iteration converges to dominant eigenvector,express starting vector x0 as linear combination

x0 =n∑

i=1

αivi

where vi are eigenvectors of A

I Thenxk = Axk−1 = A2xk−2 = · · · = Akx0 =

n∑i=1

λki αivi = λk1

(α1v1 +

n∑i=2

(λi/λ1)kαivi

)

I Since |λi/λ1| < 1 for i > 1, successively higher powers go to zero,leaving only component corresponding to v1

35

Example: Power Iteration

I Ratio of values of given component of xk from one iteration to nextconverges to dominant eigenvalue λ1

I For example, if A =

[1.5 0.50.5 1.5

]and x0 =

[01

], we obtain

k xTk ratio

0 0.0 1.01 0.5 1.5 1.5002 1.5 2.5 1.6673 3.5 4.5 1.8004 7.5 8.5 1.8895 15.5 16.5 1.9416 31.5 32.5 1.9707 63.5 64.5 1.9858 127.5 128.5 1.992

I Ratio is converging to dominant eigenvalue, which is 2

36

Limitations of Power Iteration

Power iteration can fail for various reasons

I Starting vector may have no component in dominant eigenvector v1(i.e., α1 = 0) — not problem in practice because rounding errorusually introduces such component in any case

I There may be more than one eigenvalue having same (maximum)modulus, in which case iteration may converge to linear combinationof corresponding eigenvectors

I For real matrix and starting vector, iteration can never converge tocomplex vector

37

Normalized Power Iteration

I Geometric growth of components at each iteration risks eventualoverflow (or underflow if λ1 < 1)

I Approximate eigenvector should be normalized at each iteration, say,by requiring its largest component to be 1 in modulus, givingiteration scheme

yk = Axk−1xk = yk/‖yk‖∞

I With normalization, ‖yk‖∞ → |λ1|, and xk → v1/‖v1‖∞

38

Example: Normalized Power Iteration

I Repeating previous example with normalized scheme,

k xTk ‖yk‖∞

0 0.000 1.01 0.333 1.0 1.5002 0.600 1.0 1.6673 0.778 1.0 1.8004 0.882 1.0 1.8895 0.939 1.0 1.9416 0.969 1.0 1.9707 0.984 1.0 1.9858 0.992 1.0 1.992

〈 interactive example 〉

39

Geometric Interpretation

I Behavior of power iteration depicted geometrically

I Initial vector x0 = v1 + v2 contains equal components ineigenvectors v1 and v2 (dashed arrows)

I Repeated multiplication by A causes component in v1(corresponding to larger eigenvalue, 2) to dominate, so sequence ofvectors xk converges to v1

40

Power Iteration with ShiftI Convergence rate of power iteration depends on ratio |λ2/λ1|, whereλ2 is eigenvalue having second largest modulus

I May be possible to choose shift, A− σI , such that∣∣∣∣λ2 − σλ1 − σ

∣∣∣∣ < ∣∣∣∣λ2λ1∣∣∣∣

so convergence is accelerated

I Shift must then be added to result to obtain eigenvalue of originalmatrix

I In earlier example, for instance, if we pick shift of σ = 1, (which isequal to other eigenvalue) then ratio becomes zero and methodconverges in one iteration

I In general, we would not be able to make such fortuitous choice, butshifts can still be extremely useful in some contexts, as we will seelater

41

Inverse and Rayleigh Quotient Iterations

42

Inverse IterationI To compute smallest eigenvalue of matrix rather than largest, can

make use of fact that eigenvalues of A−1 are reciprocals of those ofA, so smallest eigenvalue of A is reciprocal of largest eigenvalue ofA−1

I This leads to inverse iteration scheme

Ayk = xk−1xk = yk/‖yk‖∞

which is equivalent to power iteration applied to A−1

I Inverse of A not computed explicitly, but factorization of A used tosolve system of linear equations at each iteration

I Inverse iteration converges to eigenvector corresponding to smallesteigenvalue of A

I Eigenvalue obtained is dominant eigenvalue of A−1, and hence itsreciprocal is smallest eigenvalue of A in modulus

43

Example: Inverse Iteration

I Applying inverse iteration to previous example to compute smallesteigenvalue yields sequence

k xTk ‖yk‖∞

0 0.000 1.01 −0.333 1.0 0.7502 −0.600 1.0 0.8333 −0.778 1.0 0.9004 −0.882 1.0 0.9445 −0.939 1.0 0.9716 −0.969 1.0 0.985

which is indeed converging to 1 (which is its own reciprocal in thiscase)

〈 interactive example 〉

44

Inverse Iteration with Shift

I As before, shifting strategy, working with A− σI for some scalar σ,can greatly improve convergence

I Inverse iteration is particularly useful for computing eigenvectorcorresponding to approximate eigenvalue, since it converges rapidlywhen applied to shifted matrix A− λI , where λ is approximateeigenvalue

I Inverse iteration is also useful for computing eigenvalue closest togiven value β, since if β is used as shift, then desired eigenvaluecorresponds to smallest eigenvalue of shifted matrix

45

Rayleigh Quotient

I Given approximate eigenvector x for real matrix A, determining bestestimate for corresponding eigenvalue λ can be considered as n × 1linear least squares approximation problem

xλ ∼= Ax

I From normal equation xTxλ = xTAx , least squares solution is givenby

λ =xTAxxTx

I This quantity, known as Rayleigh quotient, has many usefulproperties

46

Example: Rayleigh Quotient

I Rayleigh quotient can accelerate convergence of iterative methodssuch as power iteration, since Rayleigh quotient xT

k Axk/xTk xk gives

better approximation to eigenvalue at iteration k than does basicmethod alone

I For previous example using power iteration, value of Rayleighquotient at each iteration is shown below

k xTk ‖yk‖∞ xT

k Axk/xTk xk

0 0.000 1.01 0.333 1.0 1.500 1.5002 0.600 1.0 1.667 1.8003 0.778 1.0 1.800 1.9414 0.882 1.0 1.889 1.9855 0.939 1.0 1.941 1.9966 0.969 1.0 1.970 1.999

47

Rayleigh Quotient Iteration

I Given approximate eigenvector, Rayleigh quotient yields goodestimate for corresponding eigenvalue

I Conversely, inverse iteration converges rapidly to eigenvector ifapproximate eigenvalue is used as shift, with one iteration oftensufficing

I These two ideas combined in Rayleigh quotient iteration

σk = xTk Axk/xT

k xk

(A− σk I )yk+1 = xk

xk+1 = yk+1/‖yk+1‖∞starting from given nonzero vector x0

48

Rayleigh Quotient Iteration, continued

I Rayleigh quotient iteration is especially effective for symmetricmatrices and usually converges very rapidly

I Using different shift at each iteration means matrix must berefactored each time to solve linear system, so cost per iteration ishigh unless matrix has special form that makes factorization easy

I Same idea also works for complex matrices, for which transpose isreplaced by conjugate transpose, so Rayleigh quotient becomesxHAx/xHx

49

Example: Rayleigh Quotient Iteration

I Using same matrix as previous examples and randomly chosenstarting vector x0, Rayleigh quotient iteration converges in twoiterations

k xTk σk

0 0.807 0.397 1.8961 0.924 1.000 1.9982 1.000 1.000 2.000

50

Deflation

51

Deflation

I After eigenvalue λ1 and corresponding eigenvector x1 have beencomputed, then additional eigenvalues λ2, . . . , λn of A can becomputed by deflation, which effectively removes known eigenvalue

I Let H be any nonsingular matrix such that Hx1 = αe1, scalarmultiple of first column of identity matrix (Householdertransformation is good choice for H)

I Then similarity transformation determined by H transforms A intoform

HAH−1 =

[λ1 bT

0 B

]where B is matrix of order n − 1 having eigenvalues λ2, . . . , λn

52

Deflation, continued

I Thus, we can work with B to compute next eigenvalue λ2

I Moreover, if y2 is eigenvector of B corresponding to λ2, then

x2 = H−1[αy2

], where α =

bTy2λ2 − λ1

is eigenvector corresponding to λ2 for original matrix A, providedλ1 6= λ2

I Process can be repeated to find additional eigenvalues andeigenvectors

53

Deflation, continued

I Alternative approach lets u1 be any vector such that uT1 x1 = λ1

I Then A− x1uT1 has eigenvalues 0, λ2, . . . , λn

I Possible choices for u1 include

I u1 = λ1x1, if A is symmetric and x1 is normalized so that ‖x1‖2 = 1

I u1 = λ1y1, where y1 is corresponding left eigenvector (i.e.,AT y1 = λ1y1) normalized so that yT

1 x1 = 1

I u1 = ATek , if x1 is normalized so that ‖x1‖∞ = 1 and kthcomponent of x1 is 1

54

QR Iteration

55

Simultaneous Iteration

I Simplest method for computing many eigenvalue-eigenvector pairs issimultaneous iteration, which repeatedly multiplies matrix timesmatrix of initial starting vectors

I Starting from n × p matrix X0 of rank p, iteration scheme is

Xk = AXk−1

I span(Xk) converges to invariant subspace determined by p largesteigenvalues of A, provided |λp| > |λp+1|

I Also called subspace iteration

56

Orthogonal Iteration

I As with power iteration, normalization is needed with simultaneousiteration

I Each column of Xk converges to dominant eigenvector, so columnsof Xk become increasingly ill-conditioned basis for span(Xk)

I Both issues can be addressed by computing QR factorization at eachiteration

Q̂kRk = Xk−1

Xk = AQ̂k

where Q̂kRk is reduced QR factorization of Xk−1

I This orthogonal iteration converges to block triangular form, andleading block is triangular if moduli of consecutive eigenvalues aredistinct

57

QR Iteration

I For p = n and X0 = I , matrices

Ak = Q̂Hk AQ̂k

generated by orthogonal iteration converge to triangular or blocktriangular form, yielding all eigenvalues of A

I QR iteration computes successive matrices Ak without formingabove product explicitly

I Starting with A0 = A, at iteration k compute QR factorization

QkRk = Ak−1

and form reverse product

Ak = RkQk

58

QR Iteration, continued

I Successive matrices Ak are unitarily similar to each other

Ak = RkQk = QHk Ak−1Qk

I Diagonal entries (or eigenvalues of diagonal blocks) of Ak convergeto eigenvalues of A

I Product of orthogonal matrices Qk converges to matrix ofcorresponding eigenvectors

I If A is symmetric, then symmetry is preserved by QR iteration, soAk converge to matrix that is both triangular and symmetric, hencediagonal

59

Example: QR Iteration

I Let A0 =

[7 22 4

]I Compute QR factorization

A0 = Q1R1 =

[.962 −.275.275 .962

] [7.28 3.02

0 3.30

]and form reverse product

A1 = R1Q1 =

[7.83 .906.906 3.17

]I Off-diagonal entries are now smaller, and diagonal entries closer to

eigenvalues, 8 and 3

I Process continues until matrix is within tolerance of being diagonal,and diagonal entries then closely approximate eigenvalues

60

QR Iteration with Shifts

I Convergence rate of QR iteration can be accelerated byincorporating shifts

QkRk = Ak−1 − σk IAk = RkQk + σk I

where σk is rough approximation to eigenvalue

I Good shift can be determined by computing eigenvalues of 2× 2submatrix in lower right corner of matrix

61

Example: QR Iteration with Shifts

I Repeat previous example, but with shift of σ1 = 4, which is lowerright corner entry of matrix

I We compute QR factorization

A0 − σ1I = Q1R1 =

[.832 .555.555 −.832

] [3.61 1.66

0 1.11

]and form reverse product, adding back shift to obtain

A1 = R1Q1 + σ1I =

[7.92 .615.615 3.08

]I After one iteration, off-diagonal entries smaller compared with

unshifted algorithm, and diagonal entries closer approximations toeigenvalues

〈 interactive example 〉

62

Preliminary Reduction

I Efficiency of QR iteration can be enhanced by first transformingmatrix as close to triangular form as possible before beginningiterations

I Hessenberg matrix is triangular except for one additional nonzerodiagonal immediately adjacent to main diagonal

I Any matrix can be reduced to Hessenberg form in finite number ofsteps by orthogonal similarity transformation, for example usingHouseholder transformations

I Symmetric Hessenberg matrix is tridiagonal

I Hessenberg or tridiagonal form is preserved during successive QRiterations

63

Preliminary Reduction, continued

Advantages of initial reduction to upper Hessenberg or tridiagonal form

I Work per QR iteration is reduced from O(n3) to O(n2) for generalmatrix or O(n) for symmetric matrix

I Fewer QR iterations are required because matrix nearly triangular (ordiagonal) already

I If any zero entries on first subdiagonal, then matrix is blocktriangular and problem can be broken into two or more smallersubproblems

〈 interactive example 〉

64

Preliminary Reduction, continued

I QR iteration is implemented in two-stages

symmetric −→ tridiagonal −→ diagonal

or

general −→ Hessenberg −→ triangular

I Preliminary reduction requires definite number of steps, whereassubsequent iterative stage continues until convergence

I In practice only modest number of iterations usually required, somuch of work is in preliminary reduction

I Cost of accumulating eigenvectors, if needed, dominates total cost

65

Cost of QR Iteration

Approximate overall cost of preliminary reduction and QR iteration,counting both additions and multiplications

I Symmetric matrices

I 43n3 for eigenvalues only

I 9n3 for eigenvalues and eigenvectors

I General matrices

I 10n3 for eigenvalues only

I 25n3 for eigenvalues and eigenvectors

66

Krylov Subspace Methods

67

Krylov Subspace Methods

I Krylov subspace methods reduce matrix to Hessenberg (ortridiagonal) form using only matrix-vector multiplication

I For arbitrary starting vector x0, if

Kk =[x0 Ax0 · · · Ak−1x0

]then

K−1n AKn = Cn

where Cn is upper Hessenberg (in fact, companion matrix)

I To obtain better conditioned basis for span(Kn), compute QRfactorization

QnRn = Kn

so thatQH

n AQn = RnCnR−1n ≡ H

with H upper Hessenberg

68

Krylov Subspace Methods

I Equating kth columns on each side of equation AQn = QnH yieldsrecurrence

Aqk = h1kq1 + · · ·+ hkkqk + hk+1,kqk+1

relating qk+1 to preceding vectors q1, . . . ,qk

I Premultiplying by qHj and using orthonormality,

hjk = qHj Aqk , j = 1, . . . , k

I These relationships yield Arnoldi iteration, which produces unitarymatrix Qn and upper Hessenberg matrix Hn column by column usingonly matrix-vector multiplication by A and inner products of vectors

69

Arnoldi Iteration

x0 = arbitrary nonzero starting vectorq1 = x0/‖x0‖2for k = 1, 2, . . .

uk = Aqk

for j = 1 to khjk = qH

j uk

uk = uk − hjkqj

endhk+1,k = ‖uk‖2if hk+1,k = 0 then stopqk+1 = uk/hk+1,k

end

70

Arnoldi Iteration, continued

I IfQk =

[q1 · · · qk

]then

Hk = QHk AQk

is upper Hessenberg matrix

I Eigenvalues of Hk , called Ritz values, are approximate eigenvalues ofA, and Ritz vectors given by Qky , where y is eigenvector of Hk , arecorresponding approximate eigenvectors of A

I Eigenvalues of Hk must be computed by another method, such asQR iteration, but this is easier problem if k � n

71

Arnoldi Iteration, continued

I Arnoldi iteration fairly expensive in work and storage because eachnew vector qk must be orthogonalized against all previous columnsof Qk , and all must be stored for that purpose.

I So Arnoldi process usually restarted periodically with carefullychosen starting vector

I Ritz values and vectors produced are often good approximations toeigenvalues and eigenvectors of A after relatively few iterations

72

Lanczos Iteration

I Work and storage costs drop dramatically if matrix is symmetric orHermitian, since recurrence then has only three terms and Hk istridiagonal (so usually denoted Tk)

q0 = 0β0 = 0x0 = arbitrary nonzero starting vectorq1 = x0/‖x0‖2for k = 1, 2, . . .

uk = Aqk

αk = qHk uk

uk = uk − βk−1qk−1 − αkqk

βk = ‖uk‖2if βk = 0 then stopqk+1 = uk/βk

end

73

Lanczos Iteration, continued

I αk and βk are diagonal and subdiagonal entries of symmetrictridiagonal matrix Tk

I As with Arnoldi, Lanczos iteration does not produce eigenvalues andeigenvectors directly, but only tridiagonal matrix Tk , whoseeigenvalues and eigenvectors must be computed by another methodto obtain Ritz values and vectors

I If βk = 0, then algorithm appears to break down, but in that caseinvariant subspace has already been identified (i.e., Ritz values andvectors are already exact at that point)

74

Lanczos Iteration, continued

I In principle, if Lanczos algorithm were run until k = n, resultingtridiagonal matrix would be orthogonally similar to A

I In practice, rounding error causes loss of orthogonality, invalidatingthis expectation

I Problem can be overcome by reorthogonalizing vectors as needed,but expense can be substantial

I Alternatively, can ignore problem, in which case algorithm stillproduces good eigenvalue approximations, but multiple copies ofsome eigenvalues may be generated

75

Krylov Subspace Methods, continued

I Great virtue of Arnoldi and Lanczos iterations is their ability toproduce good approximations to extreme eigenvalues for k � n

I Moreover, they require only one matrix-vector multiplication by Aper step and little auxiliary storage, so are ideally suited to largesparse matrices

I If eigenvalues are needed in middle of spectrum, say near σ, thenalgorithm can be applied to matrix (A− σI )−1, assuming it ispractical to solve systems of form (A− σI )x = y

76

Example: Lanczos Iteration

I For 29× 29 symmetric matrix with eigenvalues 1, . . . , 29, behaviorof Lanczos iteration is shown below

77

Jacobi Method

78

Jacobi Method

I One of oldest methods for computing eigenvalues is Jacobi method,which uses similarity transformation based on plane rotations

I Sequence of plane rotations chosen to annihilate symmetric pairs ofmatrix entries, eventually converging to diagonal form

I Choice of plane rotation slightly more complicated than in Givensmethod for QR factorization

I To annihilate given off-diagonal pair, choose c and s so that

JTAJ =

[c −ss c

] [a bb d

] [c s−s c

]=

[c2a− 2csb + s2d c2b + cs(a− d)− s2b

c2b + cs(a− d)− s2b c2d + 2csb + s2a

]is diagonal

79

Jacobi Method, continued

I Transformed matrix diagonal if

c2b + cs(a− d)− s2b = 0

I Dividing both sides by c2b, we obtain

1 +s

c

(a− d)

b− s2

c2= 0

I Making substitution t = s/c , we get quadratic equation

1 + t(a− d)

b− t2 = 0

for tangent t of angle of rotation, from which we can recoverc = 1/(

√1 + t2) and s = c · t

I Advantageous numerically to use root of smaller magnitude

80

Example: Plane Rotation

I Consider 2× 2 matrix

A =

[1 22 1

]I Quadratic equation for tangent reduces to t2 = 1, so t = ±1

I Two roots of same magnitude, so we arbitrarily choose t = −1,which yields c = 1/

√2 and s = −1/

√2

I Resulting plane rotation J gives

JTAJ =

[1/√

2 1/√

2

−1/√

2 1/√

2

] [1 22 1

] [1/√

2 −1/√

2

1/√

2 1/√

2

]=

[3 00 −1

]

81

Jacobi Method, continued

I Starting with symmetric matrix A0 = A, each iteration has form

Ak+1 = JTk AkJk

where Jk is plane rotation chosen to annihilate a symmetric pair ofentries in Ak

I Plane rotations repeatedly applied from both sides in systematicsweeps through matrix until off-diagonal mass of matrix is reducedto within some tolerance of zero

I Resulting diagonal matrix orthogonally similar to original matrix, sodiagonal entries are eigenvalues, and eigenvectors are given byproduct of plane rotations

82

Jacobi Method, continued

I Jacobi method is reliable, simple to program, and capable of highaccuracy, but converges rather slowly and difficult to generalizebeyond symmetric matrices

I Except for small problems, more modern methods usually require 5to 10 times less work than Jacobi

I One source of inefficiency is that previously annihilated entries cansubsequently become nonzero again, thereby requiring repeatedannihilation

I Newer methods such as QR iteration preserve zero entriesintroduced into matrix

83

Example: Jacobi Method

I Let A0 =

1 0 20 2 12 1 1

I First annihilate (1,3) and (3,1) entries using rotation

J0 =

0.707 0 −0.7070 1 0

0.707 0 0.707

to obtain

A1 = JT0 A0J0 =

3 0.707 00.707 2 0.707

0 0.707 −1

84

Example, continuedI Next annihilate (1,2) and (2,1) entries using rotation

J1 =

0.888 −0.460 00.460 0.888 0

0 0 1

to obtain

A2 = JT1 A1J1 =

3.366 0 0.3250 1.634 0.628

0.325 0.628 −1

I Next annihilate (2,3) and (3,2) entries using rotation

J2 =

1 0 00 0.975 −0.2210 0.221 0.975

to obtain

A3 = JT2 A2J2 =

3.366 0.072 0.3170.072 1.776 00.317 0 −1.142

85

Example, continuedI Beginning new sweep, again annihilate (1,3) and (3,1) entries using

rotation

J3 =

0.998 0 −0.0700 1 0

0.070 0 0.998

to obtain

A4 = JT3 A3J3 =

3.388 0.072 00.072 1.776 −0.005

0 −0.005 −1.164

I Process continues until off-diagonal entries reduced to as small as

desired

I Result is diagonal matrix orthogonally similar to original matrix, withthe orthogonal similarity transformation given by product of planerotations

〈 interactive example 〉

86

Other Methods for Symmetric Matrices

87

Bisection or Spectrum-Slicing

I For real symmetric matrix, we can determine how many eigenvaluesare less than given real number σ

I By systematically choosing various values for σ (slicing spectrum atσ) and monitoring resulting count, any eigenvalue can be isolated asaccurately as desired

I For example, symmetric indefinite factorization A = LDLT makesinertia (numbers of positive, negative, and zero eigenvalues) ofsymmetric matrix A easy to determine

I By applying factorization to matrix A− σI for various values of σ,individual eigenvalues can be isolated as accurately as desired usinginterval bisection technique

88

Sturm Sequence

I Another spectrum-slicing method for computing individualeigenvalues is based on Sturm sequence property of symmetricmatrices

I Let A be symmetric matrix and let pr (σ) denote determinant ofleading principal minor of order r of A− σI

I Then zeros of pr (σ) strictly separate those of pr−1(σ), and numberof agreements in sign of successive members of sequence pr (σ), forr = 1, . . . , n, equals number of eigenvalues of A strictly greater thanσ

I This property allows computation of number of eigenvalues lying inany given interval, so individual eigenvalues can be isolated asaccurately as desired

I Determinants pr (σ) are easy to compute if A is transformed totridiagonal form before applying Sturm sequence technique

89

Divide-and-Conquer Method

I Express symmetric tridiagonal matrix T as

T =

[T1 OO T2

]+ β uuT

I Can now compute eigenvalues and eigenvectors of smaller matricesT1 and T2

I To relate these back to eigenvalues and eigenvectors of originalmatrix requires solution of secular equation, which can be donereliably and efficiently

I Applying this approach recursively yields divide-and-conqueralgorithm for symmetric tridiagonal eigenproblems

90

Relatively Robust Representation

I With conventional methods, cost of computing eigenvalues ofsymmetric tridiagonal matrix is O(n2), but if orthogonaleigenvectors are also computed, then cost rises to O(n3)

I Another possibility is to compute eigenvalues first at O(n2) cost,and then compute corresponding eigenvectors separately usinginverse iteration with computed eigenvalues as shifts

I Key to making this idea work is computing eigenvalues andcorresponding eigenvectors to very high relative accuracy so thatexpensive explicit orthogonalization of eigenvectors is not needed

I RRR algorithm exploits this approach to produce eigenvalues andorthogonal eigenvectors at O(n2) cost

91

Generalized Eigenvalue Problems and SVD

92

Generalized Eigenvalue Problems

I Generalized eigenvalue problem has form

Ax = λBx

where A and B are given n × n matrices

I If either A or B is nonsingular, then generalized eigenvalue problemcan be converted to standard eigenvalue problem, either

(B−1A)x = λx or (A−1B)x = (1/λ)x

I This is not recommended, since it may cause

I loss of accuracy due to rounding error

I loss of symmetry if A and B are symmetric

I Better alternative for generalized eigenvalue problems is QZalgorithm

93

QZ AlgorithmI If A and B are triangular, then eigenvalues are given by λi = aii/bii ,

for bii 6= 0

I QZ algorithm reduces A and B simultaneously to upper triangularform by orthogonal transformations

I First, B is reduced to upper triangular form by orthogonaltransformation from left, which is also applied to A

I Next, transformed A is reduced to upper Hessenberg form byorthogonal transformation from left, while maintaining triangularform of B, which requires additional transformations from right

I Finally, analogous to QR iteration, A is reduced to triangular formwhile still maintaining triangular form of B, which again requirestransformations from both sides

I Eigenvalues can now be determined from mutually triangular form,and eigenvectors can be recovered from products of left and righttransformations, denoted by Q and Z

94

Computing SVD

I Singular values of A are nonnegative square roots of eigenvalues ofATA, and columns of U and V are orthonormal eigenvectors ofAAT and ATA, respectively

I Algorithms for computing SVD work directly with A, withoutforming AAT or ATA, to avoid loss of information associated withforming these matrix products explicitly

I SVD is usually computed by variant of QR iteration, with A firstreduced to bidiagonal form by orthogonal transformations, thenremaining off-diagonal entries are annihilated iteratively

I SVD can also be computed by variant of Jacobi method, which canbe useful on parallel computers or if matrix has special structure

95

Summary – Computing Eigenvalues and Eigenvectors

I Algorithms for computing eigenvalues and eigenvectors arenecessarily iterative

I Power iteration and its variants approximate oneeigenvalue-eigenvector pair

I QR iteration transforms matrix to triangular form by orthogonalsimilarity, producing all eigenvalues and eigenvectors simultaneously

I Preliminary orthogonal similarity transformation to Hessenberg ortridiagonal form greatly enhances efficiency of QR iteration

I Krylov subspace methods (Lanczos, Arnoldi) are especially useful forcomputing eigenvalues of large sparse matrices

I Many specialized algorithms are available for computing eigenvaluesof real symmetric or complex Hermitian matrices

I SVD can be computed by variant of QR iteration