Global body-wave tomography - Sorbonne...

12
Global body-wave tomography Lapo Boschi ([email protected]) September 23, 2009 Travel-time anomalies in the infinite-frequency approximation One example of relationship between ground displacement as observed on a seismogram and the properties of the Earth (i.e. equation (5) of the previous lecture) is the equation associating the travel time T of a seismic phase to the velocity v(r) of the same phase within the Earth, T = ray path 1 v(r(s)) ds, (1) where s denotes the incremental length along the ray path, and r = r(s) is the ray-path equation. Equation (1) is based on the assumption that the propagation of seismic waves in the Earth follows the laws of linear optics (i.e. Fermat’s principle, Snell’s law): the ray-path equation is derived on the basis of those laws. This approach is referred to as “ray theory”. In the global seismology course (spring semester) we will determine the limits of validity of this assumption: most importantly, the assumption is valid so long as the seismic wavelength is much shorter than the spatial extent of seismically heterogeneous (high- or low-velocity) regions through which the wave travels. Put differently, ray-theory is an infinite-frequency approximation. After a first-order Taylor expansion around a reference model v 0 (r), δT = - ray path δv(r) v 2 0 (r(s)) ds, (2) δT being the travel-time difference between the perturbed and reference models. Parameterization If one measures the travel time of a body wave in the true Earth, and derives the corresponding ray path via Fermat’s principle in a reference Earth model (say, PREM), equation (2) can be employed to improve the model, using the observation δT to determine a correction δv(r) 1 . We must first get rid of the integral in (2), which is done by writing the unknown function δv(r) as a linear combination of N known “basis functions” f i (r)(i =1, ..., N ). δv(r)= N i=1 c i f i (r), (3) where the N constant coefficients c i are unknown. 1 Comparing (2) with equation (5) of lecture 1, you’ll see that the sensitivity kernel here equals -1/v 2 0 (r) along the JWKB ray path, 0 elsewhere. 1

Transcript of Global body-wave tomography - Sorbonne...

Page 1: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Global body-wave tomography

Lapo Boschi ([email protected])

September 23, 2009

Travel-time anomalies in the infinite-frequency approximation

One example of relationship between ground displacement as observed on a seismogramand the properties of the Earth (i.e. equation (5) of the previous lecture) is the equationassociating the travel time T of a seismic phase to the velocity v(r) of the same phase withinthe Earth,

T =∫ray path

1v(r(s))

ds, (1)

where s denotes the incremental length along the ray path, and r = r(s) is the ray-pathequation. Equation (1) is based on the assumption that the propagation of seismic waves inthe Earth follows the laws of linear optics (i.e. Fermat’s principle, Snell’s law): the ray-pathequation is derived on the basis of those laws. This approach is referred to as “ray theory”.In the global seismology course (spring semester) we will determine the limits of validity ofthis assumption: most importantly, the assumption is valid so long as the seismic wavelengthis much shorter than the spatial extent of seismically heterogeneous (high- or low-velocity)regions through which the wave travels. Put differently, ray-theory is an infinite-frequencyapproximation.

After a first-order Taylor expansion around a reference model v0(r),

δT = −∫ray path

δv(r)v20(r(s))

ds, (2)

δT being the travel-time difference between the perturbed and reference models.

Parameterization

If one measures the travel time of a body wave in the true Earth, and derives the correspondingray path via Fermat’s principle in a reference Earth model (say, PREM), equation (2) can beemployed to improve the model, using the observation δT to determine a correction δv(r)1.We must first get rid of the integral in (2), which is done by writing the unknown functionδv(r) as a linear combination of N known “basis functions” fi(r) (i = 1, ..., N).

δv(r) =N∑

i=1

cifi(r), (3)

where the N constant coefficients ci are unknown.1Comparing (2) with equation (5) of lecture 1, you’ll see that the sensitivity kernel here equals −1/v2

0(r)

along the JWKB ray path, 0 elsewhere.

1

Page 2: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Substituting (3) into (2),

δT = −N∑

i=1

ci

∫ray path

fi(r)v20(r(s))

ds. (4)

fi(r) and v0(r) are known, thus the integral in (4) can be calculated.Naturally δv(r) can only be constrained properly when a large number of observations

δT are available. Let δTj denote the j-th “travel time anomaly” measurement in a set of M ,all corresponding in general to different seismic sources and stations. We can then write M

equations

δTj = −N∑

i=1

ci

∫ray pathj

fi(r)v20(r(s))

ds (j=1,...,M). (5)

The known integrals in (5) depend on the two indexes i and j. Let us denote them −Aji;introducing the M ×N matrix A, (5) can be rewritten

δTj =N∑

i=1

ciAji, (6)

or, in tensor notation,δT = A · c, (7)

a linear system of M equations.Parameterization is the choice of the set of basis functions fi. As N , in practical applica-

tions, is necessarily finite, and has to be as small as possible for the tomographic algorithm tobe fast, no parameterization is entirely adequate to describe the seismic velocity distributionsin the Earth.

In practice, not one, but two sets of functions (a set of functions of r, and a set of functionsof θ and φ) need to be chosen; each fi is then defined as the product of a “radial” functiontimes an “horizontal” one. Examples of radial functions are Heaviside functions i.e. “layers”,polynomials of various kinds, splines...; examples of horizontal functions are the characteristicfunctions of pixels, spherical harmonics, splines....

Figure 1: Examples of spherical harmonic functions. Left to right: l = 2,m = 0; l = 4,m = 2;l = 8,m = 4; l = 16,m = 16. Different colour corresponds to different sign, different colourintensity to different amplitude.

I call “voxel” the product of a pixel function times a layer function. Voxels, and theproduct of spherical harmonics times some polynomials in r, are the most commonly em-ployed parameterizations in the literature. Spherical harmonics have useful mathematical

Page 3: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

properties2 arising in spherically symmetric (or approximately spherically symmetric) phys-ical problems–like ours. Voxels are attractive because of their simplicity. Both voxels andspherical harmonics form orthogonal bases, which turns out to be very practical; splines don’tenjoy this property.

Figure 2: Approximately equal-area (5◦ × 5◦ at the equator) pixel subdivision of the globe.

Spherical harmonics are nonzero over the entire surface of the Earth. Then, if sphericalharmonics are chosen as basis functions, the integral in eq. (5) will be nonzero for all valuesof i. In the case of a voxel (spline, wavelet or other ”local” functions) parameterization, thesame integral will be 0, except for values of i whose corresponding voxel is crossed by a raypath. The matrix A will therefore be dense if the model is parameterized in terms of sphericalharmonics or other ”global” functions; sparse if “local” functions like voxels are used.

“Damped” least-squares solution

Since travel time observations are naturally noisy, it is best to overdetermine3 the problem,making sure that the number N of solution coefficients ci be much smaller than the numberM of observations δTj . Equation (7) is then solved in the least squares4 sense, and

cLS = (AT ·A)−1 ·AT · δT, (8)

where the subscript LS reminds us that this is the least-squares solution, not the only possiblesolution of (7). By definition, the least-squares solution minimizes the data variance, that is

variance reduction = 1.−∑M

i=1 [(A · cLS)i − δTi]2∑M

i=1 δT 2i

= maximum. (9)

2e.g., Dahlen, F. A., and J. Tromp, Theoretical Global Seismology, Princeton Univ. Press 1998, Appendix

B.3Menke, W., Geophysical Data Analysis: Discrete Inverse Theory, Academic Press 1989.4Trefethen, L. N., and D. Bau, Numerical Linear Algebra, SIAM 1997, theorem 11.1.

Page 4: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Equation (8) is apparently simple, but its numerical implementation carries a number ofproblems.

Regularization, or damping

The matrix AT · A often turns out to be singular or very close to singular (numericallysingular); the inverse problem must then be regularized by forcing cLS to satisfy certainrequirements, reflecting our a-priori knowledge of the velocity distribution. For example, weknow that a reference model like PREM already provides a relatively good fit of the data, soit is legitimate to expect δv(r) to be small. The “size” of a model δv(r) can be quantified as∫

V|δv(r)|2dV, (10)

V being the entire volume of the Earth. If we substitute (3) into (10), and assume thefunctions fi to be orthogonal to each other (note that spherical harmonics and voxels sharethis property),

∫V

(N∑

i=1

cifi(r)

) N∑j=1

cjfj(r)

dV =N∑

i=1

c2i

∫V

f2i (r)dV (11)

The values of c for which (10) is minimum are then given by the system of N equations(k = 1, ..., N)

∂ck

(N∑

i=1

c2i

∫V

f2i (r)dV

)= 0, (12)

or, after differentiation,

ck

∫V

f2k (r)dV = 0 (k = 1, ..., N), (13)

which in tensor notation becomesB · c = 0, (14)

where B is the N ×N diagonal matrix whose k-th diagonal entry equals the integral in (13).Regularization then consists of replacing the inverse problem (7) with(

AλB

)· c =

(δT0

), (15)

where the value of λ can be adjusted to regularize more or less strongly the inversion. Theformula for the least-squares solution of (14) gives

cLS = (AT ·A + λ2BT ·B)−1 ·AT · δT. (16)

Regularization is often referred to as “damping”; λ is then called “damping parameter”and (16) “damped” least-squares solution. Another common regularization constraint is therequirement that the integral of the squared gradient of δv(r) be minimum.

The selection of damping scheme and parameters is to a large extent arbitrary. It requiresthat a number of preliminary inversions be performed with different damping. We call “trade-off curve” a plot of the misfit (1−variance reduction) achieved by each solution model as afunction of its complexity (for example, the integral over the Earth’s volume of the squaredvalue of the model, or of its squared gradient). The trade-off curve has a large derivative where

Page 5: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

a small increase in model complexity can quickly decrease the misfit (improve the fit). Abovea certain complexity, the decrease in misfit becomes smaller and eventually negligible. Thisresults in the characteristic L-shape: the trade-off curve is also called “L-curve”. We wantto keep increasing model complexity only so long as this results in a significant improvementof the fit; we don’t want to increase avoid increasing the complexity of the model, if thatdoes not bring a significant improvement in the fit. Then, preferable solutions are foundaway from both extremes, i.e., near the vertex of the “L”. Even after this analysis, however,tomographers are typically left with a broad spectrum of acceptable solutions.

Figure 3: How the L-curve criterion works. To the left: tomographic map of surface-wavephase velocity (more on this in the next lecture) perturbation to PREM, corresponding toshear velocity at ∼ 100km, with isolines showing temperature at the same depth estimatedfrom heat flow observations. The tomographic map corresponds to the point of maximumcurvature of the L-curve (plot at the bottom right). Notice that maximum L-curve curvaturecorresponds to highest correlation between tomography and independent geothermal data(top right). (Courtesy of Julia Schafer, ETH Zurich.)

Your homework assignment will allow you to experiment with both size (or “norm”) androughness damping. You should try and find a trade-off curve like those in figure 3.

Least-squares algorithms

Least-squares solvers can be subdivided into two main classes: direct and iterative. Directalgorithms consist of efficient approaches to the implementation of eq. (16), taking advantageof any peculiar properties of the matrix to be inverted. In our case, (AT ·A+λ2BT ·B) is byconstruction symmetric and positive-definite, so the most efficient direct inversion algorithm,Cholesky factorization, can be applied. Cholesky factorization and other direct algorithms(the most popular being perhaps the singular value decomposition) are discussed in detail,

Page 6: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Figure 4: One of the earliest global tomographic models based on travel-time observations:the P-velocity model published by Dziewonski in 1984 (on JGR). The basis functions areproducts of spherical harmonics up to degree 8, and, radially, Legendre polynomials up todegree 4, resulting in 245 model coefficients total.

Page 7: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Figure 5: High resolution models (∼ 105 voxels) of the mantle. Left is a P-velocity mappublished in 1997 by van der Hilst and co-authors. Right is an S-velocity map derivedindependently and published in 1994 by Grand.

Page 8: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Figure 6: The models of figure 4, plotted in “cross section” (i.e. on a vertical plane per-pendicular to the Earth’s surface, going down to the core-mantle boundary). The authorsassociated the fast anomaly to subducted material.

Page 9: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

and with great clarity, in the textbook of Trefethen and Bau, cited above.

Only if A is a sparse matrix, iterative algorithms, like “conjugate gradients” or LSQR, arefaster then direct ones. Iterative algorithms do not involve the implementation of eq. (16);the matrix AT ·A is not needed, as iterative algorithms operate directly on A. An iterativeformula is applied to calculate cLS in a number of successive steps; the iterative solution isguaranteed to converge to cLS after N iterations are performed, but iterative least-squaressolvers include a criterion to determine, at each iteration, if convergence has been achievedto a sufficiently good approximation; in global tomography, a number of iterations � N isusually enough5.

Because spherical harmonic parameterizations require A to be dense, iterative inversionalgorithms are a prerogative of inverse problems formulated in terms of voxels or other localfunctions. Early global tomographers preferred to make use of spherical harmonics (see themid-80s papers of Adam Dziewonski and John Woodhouse, where the Earth’s mantle wasdescribed by ∼ 102 harmonic coefficients). In the 1990s, combining iterative algorithms andvoxel parameterizations, Rob van der Hilst, Steve Grand and others6 were able to derivetomographic models of the Earth’s mantle described by ∼ 105 parameters.

For a discussion of the implications of different parameterizations, and associated regular-ization schemes and inversion algorithms, see, e.g., Boschi, L., Applications of Linear InverseTheory in Modern Global Seismology7, Harvard University 2001.

Resolution

The resolution of a tomographic image is the highest spatial frequency at which the imageis expected to be meaningful. In other words, it is the smallest spatial extent of a velocityanomaly, that the inverted seismic database can properly map.

Resolution is limited by two factors: the quality of the data and the uniformity of theirspatial distribution (the “data coverage”), and the approximations involved in the formulationof the inverse problem.

We have seen that our formulation rests on the JWKB solution of the Earth’s equationof motion: we then expect the latter issue to be relevant when measurements of relativelylow frequency waves are inverted. We will be concerned with this problem after a few morelectures, but, for the time being, let us assume to be working at high enough frequencies.

If the effect of flaws in the theory is negligible, we can use the approximated theory tobuild a “theoretical” data vector δTsyn, based on a known “input” model of the Earth, andthen invert it following the method described above. If resolution is good, the solution modelthat we find should be very close to the input one.

If I call cin the coefficients of the input model in the chosen parameterization, then fromequation (7)

δTsyn = A · cin. (17)

5Your homework assignment gives you a chance to verify this statement.6Grand, S., 1994. Mantle shear structure between the Americas and surrounding oceans, JGR vol.99 page

11,591. Van der Hilst, R.D., Widyantoro, S., and Engdahl, E. R., 1997. Evidence for deep mantle circulation

from global tomography, Nature vol.386 page 578.7http://www.seg2.ethz.ch/boschil/thesis.pdf

Page 10: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

δTsyn can then be substituted to δT in (22) and (16) and the “output” model cLS

derived by direct or iterative implementation of the latter equation. cLS is the compared tocin. Note that, for this resolution test to have any sense, the same regularization schemeshould be employed as when inverting real data.

Figure 7: The simplest way to evaluate the resolution of a database, is to subdivide intovoxels the volume to be mapped, and count the number of ray paths crossing each voxel.(Boschi and Dziewonski, 1999.)

This type of tests are referred to as “synthetic” tests (hence the subscript syn to denotetheoretical, “synthetic” data), or “checkerboard” tests, if velocity anomalies in the inputmodel are distributed like a checkerboard, with constant values of alternating sign.

The measure of resolution provided by synthetic tests can change depending on the formof the input model, and has therefore only a relative value. Resolution can be quantifiedmore rigorously if the resolution matrix R is calculated. R can be introduced substitutingthe expression (24) for δTsyn to δT in equation (16). Then,

cLS = (AT ·A + λ2BT ·B)−1 ·AT ·A · cin. (18)

Instead of deriving cLS from (24) and then comparing it to cin, one can look at thematrix that relates those two vectors, and see how close it is to the identity matrix. We call itresolution matrix, and denote it R. Note that calculating R requires that AT ·A+λ2BT ·B beinverted; we have seen that this computation is very expensive; high-resolution tomographicalgorithms, that take advantage of the sparsity of A and of iterative least-squares solvers,cannot provide R. This issue is discussed in some recent articles by Boschi (GRL 2003) and

Page 11: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Soldati & Boschi (GJI 2005)8.

Figure 8: Example of a “checkerboard” test, from Soldati and Boschi, GJI 2005.

8http://www.seg2.ethz.ch/boschil/publications.html

Page 12: Global body-wave tomography - Sorbonne Universitéhestia.lgs.jussieu.fr/~boschil/tomography/bodywaves.pdf · off curve” a plot of the misfit (1−variance reduction) achieved

Figure 9: Example of R. Left: the full matrix, smoothed to be plotted on a limited space.Right: detail of R, associated with the eleventh layer (strictly speaking, the eleventh radialspline–see Boschi, GRL 2003, “Measures of resolution in global body-wave tomography”).The basis function index grows first West to East, then North to South, then from the topto the bottom of the mantle. The total number of basis functions here equals 7240; R is a7240× 7240 matrix.