Research Matters Nick Higham February 25, 2009 School of ...higham/talks/ecm12_log.pdf · Research...

Research Matters

February 25, 2009

Nick HighamDirector of Research

School of Mathematics

The Matrix Logarithm:from Theory to Computation

Nick HighamSchool of Mathematics

The University of Manchester

higham@ma.man.ac.ukhttp://www.ma.man.ac.uk/~higham/

6th European Congress of Mathematics, July 2012

Definition Applications Theory Methods

Matrix Logarithm

DefinitionA logarithm of A ∈ Cn×n is any matrix X such that eX = A.

Implicit definition.Properties, classification?

University of Manchester Nick Higham Matrix Logarithm 2 / 40

Outline

1 Definition and Properties

2 Applications

3 Theory

4 Computing the Matrix Logarithm andits Fréchet derivative

Cayley and Sylvester

Matrix algebra developed by ArthurCayley, FRS (1821–1895) in Memoir onthe Theory of Matrices (1858).

Cayley considered matrix squareroots.

Term “matrix” coined in 1850 by JamesJoseph Sylvester, FRS (1814–1897).

Gave (1883) first definition of f (A)for general f .

Multiplicity of Definitions

There have been proposed in the literature since 1880eight distinct definitions of a matric function,

by Weyr, Sylvester and Buchheim,Giorgi, Cartan, Fantappiè, Cipolla,

Schwerdtfeger and Richter.

— R. F. Rinehart,The Equivalence of Definitions

of a Matric Function (1955)

Multiplicity of Definitions

There have been proposed in the literature since 1880eight distinct definitions of a matric function,

by Weyr, Sylvester and Buchheim,Giorgi, Cartan, Fantappiè, Cipolla,

Schwerdtfeger and Richter.

— R. F. Rinehart,The Equivalence of Definitions

of a Matric Function (1955)

Jordan Canonical Form

Z−1AZ = J = diag(J1, . . . , Jp), Jk︸︷︷︸mk×mk

λk. . .. . . 1

Definition

f (A) = Zf (J)Z−1 = Zdiag(f (Jk))Z−1,

f (Jk) =

f (λk) f ′(λk) . . .

f (mk−1))(λk)

(mk − 1)!

f (λk). . . .... . . f ′(λk)

f (λk)

Primary and Nonprimary Logarithms

A = diag(1,1,e,e).

Primary: log(A) = diag(0,0,1,1).

Nonprimary: log(A) = diag(0,2πi ,1,1).

Cauchy Integral Theorem

Definition

f (A) =1

f (z)(zI − A)−1 dz,

where f is analytic on and inside a closed contour Γ thatencloses λ(A).

Mercator’s Series

By integrating (1 + t)−1 = 1− t + t2 − t3 + · · · between 0and x we obtain Mercator’s series (1668),

log(1 + x) = x − x2

3− x4

4+ · · · , |x | < 1.

For A ∈ Cn×n,

log(I + A) = A− A2

3− A4

4+ · · · , ρ(A) < 1.

Composite Functions

Theoremf (t) = g(h(t)) ⇒ f (A) = g(h(A)).

Corollaryexp(log(A)) = A when log(A) is defined.

Composite Functions

Theoremf (t) = g(h(t)) ⇒ f (A) = g(h(A)).

Corollaryexp(log(A)) = A when log(A) is defined.

What about log(exp(A)) = A?

Matrix unwinding number

U(A) = A− log(exp(A))2πi

Outline

2 Applications

3 Theory

Toolbox of Matrix Functions

d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

]= exp

([0 −tA

t In 0

])[y ′0y0

In software want to be able evaluate interesting f atmatrix args as well as scalar args.MATLAB has expm, logm, sqrtm, funm.

d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

]= exp

([0 −tA

t In 0

])[y ′0y0

d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

]= exp

([0 −tA

t In 0

])[y ′0y0

Application: Control Theory

Convert continuous-time system

= Fx(t) + Gu(t),

y = Hx(t) + Ju(t),

to discrete-time state-space system

xk+1 = Axk + Buk ,

yk = Hxk + Juk .

HaveA = eFτ , B =

(∫ τ

0eFtdt

where τ is the sampling period.MATLAB Control System Toolbox: c2d and d2c.

The Average Eye

First order character of optical system characterized bytransference matrix

[S δ0 1

]∈ R5×5,

where S ∈ R4×4 is symplectic:

ST JS = J =

[0 I2−I2 0

Average m−1∑mi=1 Ti is not a transference matrix.

Harris (2005) proposes the average exp(m−1∑mi=1 log(Ti)).

Markov Models

Time-homogeneous continuous-time Markov processwith transition probability matrix P(t) ∈ Rn×n.Transition intensity matrix Q: qij ≥ 0 (i 6= j),∑n

j=1 qij = 0, P(t) = eQt .

For discrete-time Markov processes:

Embeddability problemWhen does a given stochastic P have a real logarithm Qthat is an intensity matrix?

Markov Models (1)—Example

With x = −e−2√

3π ≈ −1.9× 10−5,

1 + 2x 1− x 1− x1− x 1 + 2x 1− x1− x 1− x 1 + 2x

.P diagonalizable, Λ(P) = {1, x , x}.Every primary log complex (can’t have complexconjugate ei’vals).Yet a generator is the non-primary log

Q = 2√

−2/3 1/2 1/61/6 −2/3 1/21/2 1/6 −2/3

.University of Manchester Nick Higham Matrix Logarithm 17 / 40

Markov Models (2)

Suppose P ≡ P(1) has a generator Q = log P.Then P(t) at other times is P(t) = exp(Qt).E.g., if P transition matrix for 1 year,

P(1/12) = e1

12 log P ≡ P1/12 is matrix for 1 month.

Problem: log P, P1/k may have wrong sign patterns⇒“regularize”.

HIV to Aids Transition

Estimated 6-month transition matrix.Four AIDS-free states and 1 AIDS state.2077 observations (Charitos et al., 2008).

0.8149 0.0738 0.0586 0.0407 0.01200.5622 0.1752 0.1314 0.1169 0.01430.3606 0.1860 0.1521 0.2198 0.08150.1676 0.0636 0.1444 0.4652 0.1592

0 0 0 0 1

.Want to estimate the 1-month transition matrix.

Λ(P) = {1,0.9644,0.4980,0.1493,−0.0043}.N. J. Higham and L. Lin.

On pth roots of stochastic matrices, LAA, 2011.

Outline

2 Applications

3 Theory

Logs of A = I3

0 0 00 0 00 0 0

0 2π − 1 1−2π 0 0−2π 0 0

0 2π 1−2π 0 0

,eB = eC = eD = I3.

Λ(C) = Λ(D) = {0,2πi ,−2πi}.

All Solutions of eX = A

Theorem (Gantmacher, 1959)

A ∈ Cn×n nonsing with Jordan canonical formZ−1AZ = J = diag(J1, J2, . . . , Jp). All solutions to eX = Aare given by

X = Z U diag(L(j1)1 ,L(j2)

2 , . . . ,L(jp)p ) U

−1Z−1,

whereL(jk )

k = log(Jk(λk)) + 2 jk π i Imk ,

jk ∈ Z arbitrary, and U an arbitrary nonsing matrix thatcommutes with J.

All Solutions of eX = A: Classified

TheoremA ∈ Cn×n nonsing: p Jordan blocks, s distinct ei’vals.eX = A has a countable infinity of solutions that are primaryfunctions of A:

Xj = Zdiag(L(j1)1 ,L(j2)

2 , . . . ,L(jp)p )Z−1,

where λi = λk implies ji = jk . If s < p then eX = A hasnon-primary solutions

Xj(U) = Z U diag(L(j1)1 ,L(j2)

2 , . . . ,L(jp)p ) U

−1Z−1,

where jk ∈ Z arbitrary, U arbitrary nonsing with UJ = JU,and for each j ∃ i and k s.t. λi = λk while ji 6= jk .

Logs of A = I3 (again)

0 2π − 1 1−2π 0 0−2π 0 0

0 2π 1−2π 0 0

,e0 = eC = eD = I3. Λ(C) = Λ(D) = {0,2πi ,−2πi}.

1 α 00 1 α0 0 1

, α ∈ C,

X = U diag(2πi ,−2πi ,0)U−1 = 2π i

1 −2α 2α2

0 1 −α0 0 1

.University of Manchester Nick Higham Matrix Logarithm 24 / 40

Square Roots of Rotations

G(θ) =

[cos θ sin θ− sin θ cos θ

G(θ/2) is the natural square root of G(θ).

For θ = π,

G(π) =

[−1 00 −1

], G(π/2) =

[0 1−1 0

G(π/2) is a nonprimary square root.

Principal Logarithm and pth Root

Let A ∈ Cn×n have no eigenvalues on R− .

Principal logX = log(A) denotes unique X such that

eX = A.−π < Im

(λ(X )

)< π.

Principal pth root

For integer p > 0, X = A1/p is unique X such thatX p = A.−π/p < arg(λ(X )) < π/p.

Principal Logarithm and pth Root

Let A ∈ Cn×n have no eigenvalues on R− .

Principal logX = log(A) denotes unique X such that

eX = A.−π < Im

(λ(X )

)< π.

Principal pth root

For integer p > 0, X = A1/p is unique X such thatX p = A.−π/p < arg(λ(X )) < π/p.

Outline

2 Applications

3 Theory

Henry Briggs (1561–1630)

Arithmetica Logarithmica (1624)Logarithms to base 10 of 1–20,000 and90,000–100,000 to 14 decimal places.

Briggs must be viewed as one of thegreat figures in numerical analysis.

—Herman H. Goldstine,A History of Numerical Analysis (1977)

Henry Briggs (1561–1630)

Arithmetica Logarithmica (1624)Logarithms to base 10 of 1–20,000 and90,000–100,000 to 14 decimal places.

Briggs must be viewed as one of thegreat figures in numerical analysis.

—Herman H. Goldstine,A History of Numerical Analysis (1977)

Briggs’ Log Method (1617)

log(ab) = log a + log b ⇒ log a = 2 log a1/2.

Use repeatedly:log a = 2k log a1/2k

Write a1/2k= 1 + x and note log(1 + x) ≈ x . Briggs worked

to base 10 and used

log10 a ≈ 2k · log10 e · (a1/2k − 1).

When Does log(BC) = log(B) + log(C)?

TheoremLet B,C ∈ Cn×n commute and have no ei’vals on R−. If forevery ei’val λj of B and the corr. ei’val µj of C,|argλj + argµj | < π, then log(BC) = log(B) + log(C).

Proof. log(B) and log(C) commute, since B and C do.Therefore

elog(B)+log(C) = elog(B)elog(C) = BC.

Thus log(B) + log(C) is some logarithm of BC. Then

Im(logλj + logµj) = argλj + argµj ∈ (−π, π),

so log(B) + log(C) is the principal logarithm of BC.

When Does log(BC) = log(B) + log(C)?

TheoremLet B,C ∈ Cn×n commute and have no ei’vals on R−. If forevery ei’val λj of B and the corr. ei’val µj of C,|argλj + argµj | < π, then log(BC) = log(B) + log(C).

Proof. log(B) and log(C) commute, since B and C do.Therefore

elog(B)+log(C) = elog(B)elog(C) = BC.

Thus log(B) + log(C) is some logarithm of BC. Then

Im(logλj + logµj) = argλj + argµj ∈ (−π, π),

so log(B) + log(C) is the principal logarithm of BC.

Inverse Scaling and Squaring Method

Take B = C in previous theorem:

log A = log(A1/2 · A1/2) = 2 log(A1/2),

since argλ(A1/2) ∈ (−π/2, π/2).

Use Briggs’ idea: log A = 2k log(A1/2k)

Kenney & Laub’s (1989) inverse scaling and squaringmethod:

Bring A close to I by repeated square roots.Approximate log(A1/2s

) using an [m/m] Padéapproximant rm(x) ≈ log(1 + x).Rescale to find log(A).

log A = log(A1/2 · A1/2) = 2 log(A1/2),

since argλ(A1/2) ∈ (−π/2, π/2).

log A = log(A1/2 · A1/2) = 2 log(A1/2),

since argλ(A1/2) ∈ (−π/2, π/2).

Choice of Parameters s, m

Must have ‖I − A1/2s‖ < 1.

Larger Padé degree m means smaller s.

Let h2m+1(X ) = erm(X) − X − I.Assume ρ(rm(X )) < π, so log(erm(X)) = rm(X ). Then

rm(X ) = log(I + X + h2m+1(X )) =: log(I + X +∆X ),

h2m+1(X ) =∞∑

k=2m+1

ckX k .

Bounding the Backward Error

Want to bound norm of h2m+1(X ) =∑∞

k=2m+1 ckX k .

Non-normality implies ρ(A)� ‖A‖.

Note that

ρ(A) ≤ ‖Ak‖1/k ≤ ‖A‖, k = 1 : ∞.

and limk→∞ ‖Ak‖1/k = ρ(A).

Use ‖Ak‖1/k instead of ‖A‖ in the truncation bounds.

[0.9 5000 −0.5

0 5 10 15 2010

‖A‖k2‖Ak‖2(‖A5‖1/52 )k

(‖A10‖1/102 )k

‖Ak‖1/k2

Algorithm of Al-Mohy & H (2012)

Truncation bounds use ‖Ak‖1/k rather than ‖A‖, leadingto major benefits in speed and accuracy.Matrix norms not such a blunt tool!Use estimates of ‖Ak‖ (alg of H & Tisseur (2000)).Choose s and m to achieve double precision backwarderror at minimal cost.Initial Schur decomposition: A = QTQ∗.Directly and accurately compute certain elements ofT 1/2s − I and log(T ). Use

a1/2s − 1 =a− 1∏s

i=1(1 + a1/2i ).

Frechét Derivative of Logarithm

f (A + E)− f (A)− L(A,E) = o(‖E‖).Integral formula

L(A,E) =

(t(A− I) + I

)−1E(t(A− I) + I

)−1 dt .

Method based on

X E0 X

[f (X ) L(X ,E)

0 f (X )

Kenney & Laub (1998): Kronecker–Sylvester alg,Padé of tanh(x)/x . Requires complex arithmetic.

Algorithm of Al-Mohy, H & Relton (2012)

Fréchet differentiate the ISS algorithm!

1 E0 = E2 for i = 1: s3 Compute A1/2i .4 Solve the Sylvester eqn A1/2i Ei + EiA1/2i

= Ei−1.5 end6 log(A) ≈ 2srm(A1/2s − I)7 Llog(A,E) ≈ 2sLrm(A1/2s − I,Es)

Backward Error Result

rm(X ) = log(I + X +∆X ),

Lrm(X ,E) = Llog(I + X +∆X ,E +∆E).

Conclusions & Future Directions

Log appears in a growing number of applications.Have good algorithms for log(A), Llog(A) and estimatingthe condition number.If A is real can work entirely in real arithmetic.

Conditioning of f (A).Non-primary functions.Functions ofstructured matrices.

References I

A. H. Al-Mohy and N. J. Higham.A new scaling and squaring algorithm for the matrixexponential.SIAM J. Matrix Anal. Appl., 31(3):970–989, 2009.

A. H. Al-Mohy and N. J. Higham.Improved inverse scaling and squaring algorithms forthe matrix logarithm.SIAM J. Sci. Comput., 34(4):C152–C169, 2012.

References II

A. H. Al-Mohy, N. J. Higham, and S. D. Relton.Computing the fréchet derivative of the matrix logarithmand estimating the condition number.MIMS EPrint 2012.72, Manchester Institute forMathematical Sciences, The University of Manchester,UK, July 2012.17 pp.

T. Charitos, P. R. de Waal, and L. C. van der Gaag.Computing short-interval transition matrices of adiscrete-time Markov chain from partially observeddata.Statistics in Medicine, 27:905–921, 2008.

References III

W. F. Harris.The average eye.Opthal. Physiol. Opt., 24:580–585, 2005.

N. J. Higham.The Matrix Function Toolbox.http://www.ma.man.ac.uk/~higham/mftoolbox.

N. J. Higham.Evaluating Padé approximants of the matrix logarithm.SIAM J. Matrix Anal. Appl., 22(4):1126–1135, 2001.

References IV

N. J. Higham.Functions of Matrices: Theory and Computation.Society for Industrial and Applied Mathematics,Philadelphia, PA, USA, 2008.ISBN 978-0-898716-46-7.xx+425 pp.

N. J. Higham and A. H. Al-Mohy.Computing matrix functions.Acta Numerica, 19:159–208, 2010.

N. J. Higham and L. Lin.On pth roots of stochastic matrices.Linear Algebra Appl., 435(3):448–463, 2011.

References V

C. S. Kenney and A. J. Laub.Condition estimates for matrix functions.SIAM J. Matrix Anal. Appl., 10(2):191–209, 1989.

C. S. Kenney and A. J. Laub.A Schur–Fréchet algorithm for computing the logarithmand exponential of a matrix.SIAM J. Matrix Anal. Appl., 19(3):640–663, 1998.

Research Matters Nick Higham February 25, 2009 School of ...higham/talks/ecm12_log.pdf · Research...

Documents

Transcript of Research Matters Nick Higham February 25, 2009 School of ...higham/talks/ecm12_log.pdf · Research...

David higham twitter presentation birmingham 131008

Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Early Numerical Linear Algebra in the UKhigham/talks/talk05_early-hist.pdf · 2005. 7. 18. · Early Numerical Linear Algebra in the UK Nick Higham School of Mathematics The University

2009 08 SITA - Stuart Hayward-Higham

School of Mathematics | The University of …higham/talks/fun11.pdfResearch Matters February 25, 2009 Nick Higham Director of Research School of Mathematics 1 / 6 Functions of a Matrix:

C.F.W. Higham , T.F.G. Higham and K. Douka

February 25, 2009 Nick Higham School of ... - WordPress.com · Wordpress. I tend to reach via Twitter: post announcements or recommendations from other readers. Hootsuite Social Media

The University of Manchester - Research Matters Nick ...higham/talks/fun17.pdfNC State, Tuesday April 25, 2017 Multivalued troubleLogarithmInverse Trig/HypAlgorithms Outline 1 The

Blocked Algorithms for the Matrix Square Root - University of …siam/files/snscc12/... · 2013. 1. 7. · Time Series Analysis Operations Research Edvin Deadman , Nick Higham , Rui

Research Matters Nick Higham School of Mathematics The …higham/talks/writing... · 2019. 4. 25. · Better: as shown by Vaughan and Williams [2]. Nick Higham How to Write Scientiﬁc

Research Matters Nick Higham School of Mathematics The …higham/talks/asna13... · 2013. 1. 24. · Justin Gatlin Record, 2006 “The American was given a time of 9.76sec at the

BREED CHAMPIONSHIP SHOW - Higham Press

FERRY ROUTES & TRAVEL ADVICE ISLE OF WIGHT CANINE ASS OCIATION - Higham Press Ltd · 2020-02-27 · Higham Press Ltd., New Street, Higham, Alfreton, Derbys DE55 6BP. Tel: 01773 832390

Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester

HIGHAM & RUSHDEN NEWSLETTER

How and Why to Estimate Condition Numbers for Matrix Functionshigham/talks/claode13.pdf · Research Matters February 25, 2009 Nick Higham Director of Research School of Mathematics

Cayley, Sylvester, and Early Matrix Theory - Society for... · Cayley, Sylvester, and Early Matrix Theory Nick Higham ... NPL: aircraft ﬂutter, matrix structural analysis. Elementary

leeds - Higham Press Ltd · Higham Press Ltd., New Street, Higham, Alfreton, Derbyshire DE55 6BP. Tel. 01773 832390 Fax 01773 520794 E-mail: mail@highampress.co.uk Championship Show

The Rise of Multiprecision Computationshigham/talks/arith17.pdf · Research Matters February 25, 2009 Nick Higham Director of Research School of Mathematics 1 / 6 The Rise of Multiprecision

Research Matters Nick Higham February 25, 2009 School of … · 2019. 6. 4. · the square root of the constant in error bound” rule of thumb—and our results hold for any n!,