Complete Fall09

download Complete Fall09

of 271

Transcript of Complete Fall09

  • 7/30/2019 Complete Fall09

    1/271

    INVERSE PROBLEMS IN GEOPHYSICS

    GEOS 567

    A Set of Lecture Notes

    by

    Professors Randall M. Richardson and George Zandt

    Department of Geosciences

    University of Arizona

    Tucson, Arizona 85721

    Revised and Updated Fall 2009

  • 7/30/2019 Complete Fall09

    2/271

    Geosciences 567: PREFACE (RMR/GZ)

    i

    TABLE OF CONTENTS

    PREFACE .......................................................................................................................................v

    CHAPTER 1: INTRODUCTION ..................................................................................................1

    1.1 Inverse Theory: What It Is and What It Does ........................................................11.2 Useful Definitions ...................................................................................................21.3 Possible Goals of an Inverse Analysis ....................................................................31.4 Nomenclature ..........................................................................................................41.5 Examples of Forward Problems ..............................................................................7

    1.5.1 Example 1: Fitting a Straight Line ..........................................................71.5.2 Example 2: Fitting a Parabola .................................................................81.5.3 Example 3: Acoustic Tomography .........................................................91.5.4 Example 4: Seismic Tomography .........................................................10

    1.5.5 Example 5: Convolution ...................................................................... 101.6 Final Comments ....................................................................................................11

    CHAPTER 2: REVIEW OF LINEAR ALGEBRA AND STATISTICS .....................................12

    2.1 Introduction ...........................................................................................................122.2 Matrices and Linear Transformations....................................................................12

    2.2.1 Review of Matrix Manipulations ...........................................................122.2.2 Matrix Transformations .........................................................................152.2.3 Matrices and Vector Spaces ...................................................................19

    2.2 Probability and Statistics .......................................................................................212.3.1 Introduction ............................................................................................212.3.2 Definitions, Part 1 ...................................................................................21

    2.3.3 Some Comments on Applications to Inverse Theory ............................242.3.4 Definitions, Part 2 ..................................................................................25

    CHAPTER 3: INVERSE METHODS BASED ON LENGTH ...................................................31

    3.1 Introduction ...........................................................................................................313.2 Data Error and Model Parameter Vectors .............................................................313.3 Measures of Length................................................................................................313.4 Minimizing the Misfit: Least Squares...................................................................33

    3.4.1 Least Squares Problem for a Straight Line ............................................333.4.2 Derivation of the General Least Squares Solution .................................363.4.3 Two Examples of Least Squares Problems ............................................383.4.4 Four-Parameter Tomography Problem ..................................................40

    3.5 Determinancy of Least Squares Problems .............................................................423.5.1 Introduction ............................................................................................423.5.2 Even-Determined Problems: M=N......................................................433.5.3 Overdetermined Problems: Typically,N>M.......................................433.5.4 Underdetermined Problems: TypicallyM>N......................................43

    3.6 Minimum Length Solution.....................................................................................443.6.1 Background Information ........................................................................443.6.2 Lagrange Multipliers ..............................................................................453.6.3 Application to the Purely Underdetermined Problem ............................48

  • 7/30/2019 Complete Fall09

    3/271

    Geosciences 567: PREFACE (RMR/GZ)

    ii

    3.6.4 Comparison of Least Squares and Minimum Length Solutions .............503.6.5 Example of Minimum Length Problem .................................................50

    3.7 Weighted Measures of Length...............................................................................513.7.1 Introduction ............................................................................................513.7.2 Weighted Least Squares..........................................................................523.7.3 Weighted Minimum Length ...................................................................55

    3.7.4 Weighted Damped Least Squares ...........................................................573.8 A Priori Information and Constraints ....................................................................583.8.1 Introduction ............................................................................................583.8.2 A First Approach to Including Constraints ............................................593.8.3 A Second Approach to Including Constraints .......................................613.8.4 Example From Seismic Receiver Functions ..........................................64

    3.9 Variance of the Model Parameters.........................................................................653.9.1 Introduction ............................................................................................653.9.2 Application to Least Squares .................................................................653.9.3 Application to the Minimum Length Problem .......................................663.9.4 Geometrical Interpretation of Variance .................................................66

    CHAPTER 4: LINEARIZATION OF NONLINEAR PROBLEMS............................................70

    4.1 Introduction ...........................................................................................................704.2 Linearization of Nonlinear Problems ....................................................................704.3 General Procedure for Nonlinear Problems ..........................................................734.4 Three Examples ....................................................................................................73

    4.4.1 A Linear Example ..................................................................................734.4.2 A Nonlinear Example ............................................................................754.4.3 Nonlinear Straight-Line Example ..........................................................81

    4.5 Creeping vs Jumping (Shaw and Orcutt, 1985) ....................................................86

    CHAPTER 5: THE EIGENVALUE PROBLEM ........................................................................89

    5.1 Introduction ...........................................................................................................895.2 The Eigenvalue Problem for Square (MM) Matrix A .......................................89

    5.2.1 Background ............................................................................................895.2.2 How Many Eigenvalues, Eigenvectors? ................................................905.2.3 The Eigenvalue Problem in Matrix Notation .........................................925.2.4 Summarizing the Eigenvalue Problem for A .........................................94

    5.3 Geometrical Interpretation of the Eigenvalue Problem for Symmetric A ............955.3.1 Introduction ............................................................................................955.3.2 Geometrical Interpretation .....................................................................965.3.3 Coordinate System Rotation ................................................................1005.3.4 Summarizing Points .............................................................................101

    5.4 Decomposition Theorem for Square A ................................................................1025.4.1 The Eigenvalue Problem for AT ..........................................................1025.4.2 Eigenvectors for AT .............................................................................103

    5.4.3 Decomposition Theorem for Square Matrices .....................................1035.4.4 Finding the Inverse A1 for theMMMatrix A ................................1105.4.5 What Happens When There Are Zero Eigenvalues? ...........................1115.4.6 Some Notes on the Properties ofSP and RP ........................................114

    5.5 Eigenvector Structure ofmLS .............................................................................1155.5.1 Square Symmetric A Matrix With Nonzero Eigenvalues ....................1155.5.2 The Case of Zero Eigenvalues ..............................................................1175.5.3 Simple Tomography Problem Revisited ..............................................118

  • 7/30/2019 Complete Fall09

    4/271

    Geosciences 567: PREFACE (RMR/GZ)

    iii

    CHAPTER 6: SINGULAR-VALUE DECOMPOSITION (SVD) ............................................123

    6.1 Introduction .........................................................................................................1236.2 Formation of a New Matrix B .............................................................................123

    6.2.1 Formulating the Eigenvalue Problem With G .....................................1236.2.2 The Role ofGT as an Operator ............................................................124

    6.3 The Eigenvalue Problem for B ...........................................................................1256.3.1 Properties ofB .....................................................................................1256.3.2 Partitioning W ......................................................................................126

    6.4 Solving the Shifted Eigenvalue Problem ............................................................1276.4.1 The Eigenvalue Problem for GTG .......................................................1276.4.2. The Eigenvalue Problem for GGT .......................................................128

    6.5 How Many i Are There, Anyway?? ..................................................................1296.5.1 Introducing P, the Number of Nonzero Pairs (+i, i) ......................1306.5.2 Finding the Eigenvector Associated with i ......................................1316.5.3 No New Information From the i System .........................................1316.5.4 What About the Zero Eigenvalues is, i = 2(P + 1), . . . ,N+M? .....1326.5.5 How Big is P? ......................................................................................133

    6.6 Introducing Singular Values ...............................................................................134

    6.6.1 Introduction ..........................................................................................1346.6.2 Definition of the Singular Value ..........................................................1356.6.3 Definition of, the Singular-Value Matrix .........................................135

    6.7 Derivation of the Fundamental Decomposition Theorem for General G(NM,NM) ...................................................................................137

    6.8 Singular-Value Decomposition (SVD) ...............................................................1386.8.1 Derivation of Singular-Value Decomposition .....................................1386.8.2 Rewriting the Shifted Eigenvalue Problem ..........................................1406.8.3 Summarizing SVD ...............................................................................141

    6.9 Mechanics of Singular-Value Decomposition ....................................................1426.10 Implications of Singular-Value Decomposition .................................................143

    6.10.1 Relationships Between U, UP, and U0 .................................................1436.10.2 Relationships Between V, VP, and V0 .................................................1446.10.3 Graphic Representation ofU, UP, U0, V, VP, and V0 Spaces ..............145

    6.11 Classification ofd = Gm Based on P,M, andN ................................................1466.11.1 Introduction ..........................................................................................1466.11.2 Class I: P =M=N ..............................................................................1476.11.3 Class II: P =M

  • 7/30/2019 Complete Fall09

    5/271

    Geosciences 567: PREFACE (RMR/GZ)

    iv

    7.3.4 The Unit (Model) Covariance Matrix [covum] ...................................1767.3.5 A Closer Look at Stability ....................................................................1767.3.6 Combining R, N, [covum] ..................................................................1807.3.7 An Illustrative Example .......................................................................181

    7.4 Quantifying the Quality ofR, N, and [covum] ..................................................1847.4.1 Introduction ..........................................................................................184

    7.4.2 Classes of Problems .............................................................................1847.4.3 Effect of the Generalized Inverse Operator G g

    1 .................................1857.5 Resolution Versus Stability ................................................................................187

    7.5.1 Introduction ..........................................................................................1877.5.2 R, N, and [covum] for Nonlinear Problems ........................................189

    CHAPTER 8: VARIATIONS OF THE GENERALIZED INVERSE ......................................195

    8.1 Linear Transformations .......................................................................................1958.1.1 Analysis of the Generalized Inverse Operator G g

    1 ............................1958.1.2 G g

    1 Operating on a Data Vector d ......................................................197

    8.1.3 Mapping Between Model and Data Space: An Example ....................198

    8.2 Including Prior Information, or the Weighted Generalized Inverse ...................2008.2.1 Mathematical Background ...................................................................2008.2.2 Coordinate System Transformation of Data and Model Parameter

    Vectors ...........................................................................................2038.2.3 The Maximum Likelihood Inverse Operator, Resolution, and

    Model Covariance .........................................................................2048.2.4 Effect on Model- and Data-Space Eigenvectors ..................................2088.2.5 An Example .........................................................................................210

    8.3 Damped Least Squares and the Stochastic Inverse .............................................2168.3.1 Introduction ..........................................................................................2168.3.2 The Stochastic Inverse .........................................................................2168.3.3 Damped Least Squares .........................................................................220

    8.4 Ridge Regression ................................................................................................2258.4.1 Mathematical Background ...................................................................2258.4.2 The Ridge Regression Operator ...........................................................2268.4.3 An Example of Ridge Regression Analysis .........................................228

    8.5 Maximum Likelihood .........................................................................................2328.5.1 Background ..........................................................................................2328.5.2 The General Case..................................................................................235

    CHAPTER 9: CONTINUOUS INVERSE THEORY AND OTHER APPROACHES .............239

    9.1 Introduction .........................................................................................................2399.2 The BackusGilbert Approach ............................................................................2409.3 Neural Networks .................................................................................................248

    9.4 The Radon Transform and Tomography (Approach 1) .......................................2519.4.1 Introduction .............................................................................................2519.4.2 Interpretation of Tomography Using the Radon Transform ...................2549.4.3 Slant-Stacking as a Radon Transform (following Claerbout, 1985) .......255

    9.5 A Review of the Radon Transform (Approach 2) ...............................................2599.6 Alternative Approach to Tomography ................................................................262

  • 7/30/2019 Complete Fall09

    6/271

    Geosciences 567: PREFACE (RMR/GZ)

    v

    PREFACE

    This set of lecture notes has its origin in a nearly incomprehensible course in inversetheory that I took as a first-semester graduate student at MIT. My goal, as a teacher and in thesenotes, is to present inverse theory in such a way that it is not only comprehensible but useful.

    Inverse theory, loosely defined, is the fine art of inferring as much as possible about aproblem from all available information. Information takes both the traditional form of data, aswell as the relationship between actual and predicted data. In a nuts-and-bolt definition, it is one(some would argue the best!) way to find and assess the quality of a solution to some(mathematical) problem of interest.

    Inverse theory has two main branches dealing with discrete and continuous problems,respectively. This text concentrates on the discrete case, covering enough material for a single-semester course. A background in linear algebra, probability and statistics, and computerprogramming will make the material much more accessible. Review material is provided on thefirst two topics in Chapter 2.

    This text could stand alone. However, it was written to complement and extend thematerial covered in the supplemental text for the course, which deals more completely with someareas. Furthermore, these notes make numerous references to sections in the supplemental text.Besides, the supplemental text is, by far, the best textbook on the subject and should be a part ofthe library of anyone interested in inverse theory. The supplemental text is:

    Geophysical Data Analysis: Discrete Inverse Theory (Revised Edition)by William Menke, Academic Press, 1989.

    The course format is largely lecture. We may, from time to time, read articles from theliterature and work in a seminar format. I will try to schedule a couple of guest lectures inapplications. Be forewarned. There is a lot of homework for this course. They are occasionallyvery time consuming. I make every effort to avoid pure algebraic nightmares, but my general

    philosophy is summarized below:

    I hear, and I forget.I see, and I remember.I do, and I understand.

    Chinese Proverb

    I try to have you do a simple problem by hand before turning you loose on the computer,where all realistic problems must be solved. You will also have access to existing code and acomputer account on a SPARC workstation. You may use and modify the code for some of thehomework and for the term project. The term project is an essential part of the learning processand, I hope, will help you tie the course work together. Grading for this course will be asfollows:

    60% Homework30% Term Project10% Class Participation

    Good luck, and may you find the trade-off between stability and resolution less traumaticthan most, on average.

    Randy RichardsonAugust 2009

  • 7/30/2019 Complete Fall09

    7/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    1

    CHAPTER 1: INTRODUCTION

    1.1 Inverse Theory: What It Is and What It Does

    Inverse theory, at least as I choose to define it, is the fine art of estimating modelparameters from data. It requires a knowledge of the forward model capable of predicting data ifthe model parameters were, in fact, already known. Anyone who attempts to solve a problem inthe sciences is probably using inverse theory, whether or not he or she is aware of it. Inversetheory, however, is capable (at least when properly applied) of doing much more than just

    estimating model parameters. It can be used to estimate the quality of the predicted modelparameters. It can be used to determine which model parameters, or which combinations ofmodel parameters, are best determined. It can be used to determine which data are mostimportant in constraining the estimated model parameters. It can determine the effects of noisydata on the stability of the solution. Furthermore, it can help in experimental design bydetermining where, what kind, and how precise data must be to determine model parameters.

    Inverse theory is, however, inherently mathematical and as such does have its limitations.It is best suited to estimating the numerical values of, and perhaps some statistics about, modelparameters for some known or assumedmathematical model. It is less well suited to provide thefundamental mathematics or physics of the model itself. I like the example Albert Tarantolagives in the introduction of his classic book1 on inverse theory. He says, . . . you can alwaysmeasure the captains age (for instance by picking his passport), but there are few chances forthis measurement to carry much information on the number of masts of the boat. You musthave a good idea of the applicable forward model in order to take advantage of inverse theory.Sooner or later, however, most practitioners become rather fanatical about the benefits of aparticular approach to inverse theory. Consider the following as an example of how, or how not,to apply inverse theory. The existence or nonexistence of a God is an interesting question.Inverse theory, however, is poorly suited to address this question. However, if one assumes thatthere is a God and that She makes angels of a certain size, then inverse theory might well beappropriate to determine the number of angels that could fit on the head of a pin. Now, who saidpractitioners of inverse theory tend toward the fanatical?

    In the rest of this chapter, I will give some useful definitions of terms that will come uptime and again in inverse theory, and give some examples, mostly from Menkes book, of how toset up forward problems in an attempt to clearly identify model parameters from data.

    1Inverse Problem Theory, by Albert Tarantola, Elsevier Scientific Publishing Company, 1987.

  • 7/30/2019 Complete Fall09

    8/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    2

    1.2 Useful Definitions

    Let us begin with some definitions of things likeforwardand inverse theory, models andmodelparameters, data, etc.

    Forward Theory: The (mathematical) process of predicting data based on some physical ormathematical model with a given set of model parameters (and perhaps some other appropriateinformation, such as geometry, etc.).

    Schematically, one might represent this as follows:

    predicted datamodel parameters model

    As an example, consider the two-way vertical travel time tof a seismic wave throughMlayers ofthickness hi and velocity vi. Then tis given by

    =

    =

    M

    i i

    i

    v

    ht

    1

    2 (1.1)

    The forward problem consists of predicting data (travel time) based on a (mathematical) modelof how seismic waves travel. Suppose that for some reason thickness was known for each layer(perhaps from drilling). Then only theMvelocities would be considered model parameters. Onewould obtain a particular travel time tfor each set of model parameters one chooses.

    Inverse Theory: The (mathematical) process of predicting (or estimating) the numerical values(and associated statistics) of a set of model parameters of an assumed model based on a set ofdata or observations.

    Schematically, one might represent this as follows:

    modeldata predicted (or estimated)model parameters

    As an example, one might invert the travel time tabove to determine the layer velocities. Note

    that one needs to know the (mathematical) model relating travel time to layer thickness andvelocity information. Inverse theory should not be expected to provide the model itself.

    Model: The model is the (mathematical) relationship between model parameters (and otherauxiliary information, such as the layer thickness information in the previous example) and thedata. It may be linear or nonlinear, etc.

  • 7/30/2019 Complete Fall09

    9/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    3

    Model Parameters: The model parameters are the numerical quantities, or unknowns, that oneis attempting to estimate. The choice of model parameters is usually problem dependent, and

    quite often arbitrary. For example, in the case of travel times cited earlier, layer thickness is notconsidered a model parameter, while layer velocity is. There is nothing sacred about thesechoices. As a further example, one might choose to cast the previous example in terms of

    slowness si, where: si = 1 /vi (1.2)

    Travel time t is a nonlinear function of layer velocities but a linear function of layer slowness.As you might expect, it is much easier to solve linear than nonlinear inverse problems. A moreserious problem, however, is that linear and nonlinear formulations may result in differentestimates of velocity if the data contain any noise. The point I am trying to impress on you nowis that there is quite a bit of freedom in the way model parameters are chosen, and it can affectthe answers you get!

    Data: Data are simply the observations or measurements one makes in an attempt to constrainthe solution of some problem of interest. Travel time in the example above is an example ofdata. There are, of course, many other examples of data.

    Some examples of inverse problems (mostly from Menke) follow:

    Medical tomographyEarthquake locationEarthquake moment tensor inversionEarth structure from surface or body wave inversionPlate velocities (kinematics)

    Image enhancementCurve fittingSatellite navigationFactor analysis

    1.3 Possible Goals of an Inverse Analysis

    Now let us turn our attention to some of the possible goals of an inverse analysis. Thesemight include:

    1. Estimates of a set of model parameters (obvious).2. Bounds on the range of acceptable model parameters.3. Estimates of the formal uncertainties in the model parameters.4. How sensitive is the solution to noise (or small changes) in the data?5. Where, and what kind, of data are best suited to determine a set of model parameters?6. Is the fit between predicted and observed data adequate?

  • 7/30/2019 Complete Fall09

    10/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    4

    7. Is a more complicated (i.e., more model parameters) model significantly better than amore simple model?

    Not all of these are completely independent goals. It is important to realize, as early as possible,that there is much more to inverse theory than simply a set of estimated model parameters. Also,it is important to realize that there is very often not a single correct answer. Unlike amathematical inverse, which either exists or does not exist, there are many possible approximateinverses. These may give different answers. Part of the goal of an inverse analysis is todetermine if the answer you have obtained is reasonable, valid, acceptable, etc. This takesexperience, of course, but you have begun the process.

    Before going on with how to formulate the mathematical methods of inverse theory, Ishould mention that there are two basic branches of inverse theory. In the first, the modelparameters and data are discrete quantities. In the second, they are continuous functions. An

    example of the first might occur with the model parameters we seek being given by the momentsof inertia of the planets:

    model parameters = I1,I2,I3, . . . ,I10 (1.3)

    and the data being given by the perturbations in the orbital periods of satellites:

    data = T1, T2, T3, . . . , TN (1.4)

    An example of a continuous function type of problem might be given by velocity as afunction of depth:

    model parameters = v(z) (1.5)

    and the data given by a seismogram of ground motion

    data = d(t) (1.6)

    Separate strategies have been developed for discrete and continuous inverse theory.There is, of course, a fair bit of overlap between the two. In addition, it is often possible toapproximate continuous functions with a discrete set of values. There are potential problems(aliasing, for example) with this approach, but it often makes otherwise intractable problemstractable. Menkes book deals exclusively with the discrete case. This course will certainlyemphasize discrete inverse theory, but I will also give you a little of the continuous inversetheory at the end of the semester.

    1.4 Nomenclature

    Now let us introduce some nomenclature. In these notes, vectors will be denoted byboldface lowercase letters, and matrices will be denoted by boldface uppercase letters.

  • 7/30/2019 Complete Fall09

    11/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    5

    Suppose one makes N measurements in a particular experiment. We are trying todetermine the values ofMmodel parameters. Our nomenclature for data and model parameterswill be

    data: d = [d1, d2, d3, . . . , dN]T (1.7)

    model parameters: m = [m1, m2, m3, . . . , mM]T (1.8)

    where d and m areNandMdimensional column vectors, respectively, and T denotes transpose.

    The model, or relationship between d and m, can have many forms. These can generallybe classified as either explicit or implicit, and either linearor nonlinear.

    Explicitmeans that the data and model parameters can be separated onto different sidesof the equal sign. For example,

    d1

    = 2m1

    + 4m2

    (1.9)

    and

    d1 = 2m1 + 4m12 m2 (1.10)

    are two explicit equations.

    Implicitmeans that the data cannotbe separated on one side of an equal sign with modelparameters on the other side. For example,

    d1(m1 + m2) = 0 (1.11)

    and

    d1(m1 + m12 m2) = 0 (1. 12)

    are two implicit equations. In each example above, the first represents a linear relationshipbetween the data and model parameters, and the second represents a nonlinearrelationship.

    In this course we will deal exclusively with explicit type equations, and predominantlywith linearrelationships. Then, the explicit linearcase takes the form

    d = Gm (1.13)

    where d is anN-dimensional data vector, m is anM-dimensional model parameter vector, and Gis anNMmatrix containing only constant coefficients.

    The matrix G is sometimes called the kernel or data kernel or even the Greens functionbecause of the analogy with the continuous function case:

  • 7/30/2019 Complete Fall09

    12/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    6

    d(x) = G(x, t)m(t) dt (1. 14)

    Consider the following discrete case example with two observations (N = 2) and threemodel parameters (M= 3):

    d1 = 2m1 + 0m2 4m3

    d2 = m1 + 2m2 + 3m3(1.15)

    which may be written as

    d1

    d2

    =

    2 0 4

    1 2 3

    m1

    m2

    m3

    (1.16)

    or simply

    d = Gm (1.13)

    where

    d = [d1, d2]T

    m = [m1, m2, m3]T

    and

    G =2 0 4

    1 2 3

    (1.17)

    Then d and m are 2 1 and 3 1 column vectors, respectively, and G is a 2 3 matrix withconstant coefficients.

    On the following pages I will give some examples of how forward problems are set upusing matrix notation. See pages 1016 of Menke for these and other examples.

  • 7/30/2019 Complete Fall09

    13/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    7

    1.5 Examples of Forward Problems

    1.5.1 Example 1: Fitting a Straight Line (See Page 10 of Menke)

    z (depth)

    T

    (temperature)

    slope = b

    a

    .

    ..

    ..

    ..

    Suppose thatNtemperature measurements Ti are made at depthszi in the earth. The data

    are then a vector d ofNmeasurements of temperature, where d = [T1, T2, T3, . . . , TN]T. The

    depths zi are not data. Instead, they provide some auxiliary information that describes the

    geometry of the experiment. This distinction will be further clarified below.Suppose that we assume a model in which temperature is a linear function of depth: T=

    a + bz. The intercept a and slope b then form the two model parameters of the problem, m =[a, b]T. According to the model, each temperature observation must satisfy T= a +zb:

    T1 = a + bz1

    T2 = a + bz2

    M TN= a + bzN

    These equations can be arranged as the matrix equation Gm = d:

    =

    b

    a

    z

    z

    z

    T

    T

    T

    NN 1

    1

    1

    2

    1

    2

    1

    MMM

  • 7/30/2019 Complete Fall09

    14/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    8

    1.5.2 Example 2: Fitting a Parabola (See Page 11 of Menke)

    z (depth)

    T

    (temperature)

    ..

    ..

    ..

    If the model in example 1 is changed to assume a quadratic variation of temperature withdepth of the form T= a + bz + cz2, then a new model parameter is added to the problem, m = [a,b, c]T. The number of model parameters is nowM= 3. The data are supposed to satisfy

    T1 = a + bz1 + cz12

    T2 = a + bz2 + cz22

    M

    TN= a + bzN+ czN2

    These equations can be arranged into the matrix equation

    =

    c

    b

    a

    zz

    zz

    zz

    T

    T

    T

    NNN2

    2

    22

    2

    11

    2

    1

    1

    1

    1

    MMMM

    This matrix equation has the explicit linear form Gm = d. Note that, although theequation is linear in the data and model parameters, it is not linear in the auxiliary variable z.

    The equation has a very similar form to the equation of the previous example, whichbrings out one of the underlying reasons for employing matrix notation: it can often emphasizesimilarities between superficially different problems.

  • 7/30/2019 Complete Fall09

    15/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    9

    1.5.3 Example 3: Acoustic Tomography (See Pages 1213 of Menke)

    Suppose that a wall is assembled from a rectangular array of bricks (Figure 1.1 from

    Menke, below) and that each brick is composed of a different type of clay. If the acousticvelocities of the different clays differ, one might attempt to distinguish the different kinds ofbricks by measuring the travel time of sound across the various rows and columns of bricks, in

    the wall. The data in this problem areN= 8 measurements of travel times, d = [T1, T2, T3, . . . ,

    T8]T. The model assumes that each brick is composed of a uniform material and that the travel

    time of sound across each brick is proportional to the width and height of the brick. The

    proportionality factor is the bricks slownesssi, thus givingM= 16 model parameters, m = [s1,

    s2, s3, . . . , s16]T, where the ordering is according to the numbering scheme of the figure as

    The travel time of acoustic waves (dashed lines) through the rows and columns of a square arrayof bricks is measured with the acoustic source S and receiver R placed on the edges of the square.The inverse problem is to infer the acoustic properties of the bricks (which are assumed to behomogeneous).

    row 1: T1 = hs1 + hs2 + hs3 + hs4row 2: T2 = hs5 + hs6 + hs7 + hs8

    M M

    column 4: T8 = hs4 + hs8 + hs12 + hs16

    and the matrix equation is

    =

    16

    2

    1

    8

    2

    1

    1000100010001000

    0000000011110000

    0000000000001111

    s

    ss

    h

    T

    TT

    MMMMMMMMMMMMMMMMMM

    Here the bricks are assumed to be of width andheight h.

  • 7/30/2019 Complete Fall09

    16/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    10

    1.5.4 Example 4: Seismic Tomography

    An example of the impact of inverse methods in the geosciences: Northern California

    A large amount of data is available, much of it redundant. Patterns in the data can be interpreted qualitatively. Inversion results quantify the patterns. Perhaps, more importantly, inverse methods provide quantitative information on the

    resolution, standard error, and "goodness of fit." We cannot overemphasize the "impact" of colorful graphics, for both good and bad. Inverse theory is not a magic bullet. Bad data will still give bad results, and, interpretation of

    even good results requires breadth of understanding in the field. Inverse theory does provide quantitative information on how well the model is "determined,"

    importance of data, and model errors. Another example: improvements in "imaging" subduction zones.

    1.5.5 Example 5: Convolution

    Convolution is widely significant as a physical concept and offers an advantageousstarting point for many theoretical developments. One way to think about convolution is that itdescribes the action of an observing instrument when it takes a weighted mean of some physicalquantity over a narrow range of some variable. All physical observations are limited in this way,and for this reason alone convolution is ubiquitous (paraphrased from Bracewell, The Fourier

    Transform and Its Applications, 1964). It is widely used in time series analysis as well torepresent physical processes.

    The convolution of two functions f(x) and g(x) represented asf(x)*g(x) is

    f(u) g(x u) du

    (1.18)For discrete finite functions with common sampling intervals, the convolution is

    hk= fi gk ii=0

    m

    0 < k< m + n (1. 19)

    A FORTRAN computer program for convolution would look something like:

    L=M+N1

    DO 10 I=1,L

    10 H(I)=0

    DO 20 I=1,M

    DO 20 J=1,N

    20 H(I+J1)=H(I+J1)+G(I)*F(J)

  • 7/30/2019 Complete Fall09

    17/271

    Geosciences 567: CHAPTER 1 (RMR/GZ)

    11

    Convolution may also be written using matrix notation as

    =

    + 1

    2

    1

    2

    1

    1

    2

    12

    1

    00

    0

    0

    00

    mnm

    n

    n

    n

    h

    h

    h

    g

    g

    g

    f

    f

    f

    f

    f

    ff

    f

    (1. 20)

    In the matrix form, we recognize our familiar equation Gm = d (ignoring the confusingnotation differences between fields, when, for example, g1 above would be m1), and we candefine deconvolution as the inverse problem of finding m = G1d. Alternatively, we can alsoreformulate the problem as GTGm = GTd and find the solution as m = [GTG]1 [GTd].

    1.6 Final Comments

    The purpose of the previous examples has been to help you formulate forward problemsin matrix notation. It helps you to clearly differentiate model parameters from other informationneeded to calculate predicted data. It also helps you separate data from everything else.Getting the forward problem set up in matrix notation is essential before you can invert the

    system.

    The logical next step is to take the forward problem given by

    d = Gm (1.13)

    and invert it for an estimate of the model parameters mest as

    mest = Ginversed (1.21)

    We will spend a lot of effort determining just what Ginverse means when the inversedoes not exist in the mathematical sense of

    GGinverse = GinverseG = I (1.22)

    where I is the identity matrix.

    The next order of business, however, is to shift our attention to a review of the basics ofmatrices and linear algebra as well asprobability and statistics in order to take full advantage ofthe power of inverse theory.

  • 7/30/2019 Complete Fall09

    18/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    12

    CHAPTER 2: REVIEW OF LINEAR ALGEBRA AND STATISTICS

    2.1 Introduction

    In discrete inverse methods, matrices and linear transformations play fundamental roles.So do probability and statistics. This review chapter, then, is divided into two parts. In the first,we will begin by reviewing the basics of matrix manipulations. Then we will introduce somespecial types of matrices (Hermitian, orthogonal and semiorthogonal). Finally, we will look atmatrices as linear transformations that can operate on vectors of one dimension and return avector of another dimension. In the second section, we will review some elementary probability

    and statistics, with emphasis on Gaussian statistics. The material in the first section will beparticularly useful in later chapters when we cover eigenvalue problems, and methods based onthe length of vectors. The material in the second section will be very useful when we considerthe nature of noise in the data and when we consider the maximum likelihood inverse.

    2.2 Matrices and Linear Transformations

    Recall from the first chapter that, by convention, vectors will be denoted by lower caseletters in boldface (i.e., the data vector d), while matrices will be denoted by upper case letters in

    boldface (i.e., the matrix G) in these notes.

    2.2.1 Review of Matrix Manipulations

    Matrix Multiplication

    IfA is anNMmatrix (as inNrows byMcolumns), and B is anML matrix, we writetheNL product C ofA and B, as

    C = AB (2.1)

    We note that matrix multiplication is associative, that is

    (AB)C = A(BC) (2.2)

    but in general is not commutative. That is, in general

    ABBA (2.3)

  • 7/30/2019 Complete Fall09

    19/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    13

    In fact, ifAB exists, then the product BA only exists ifA and B are square.

    In Equation (2.1) above, the ijth entry in C is the product of the ith row ofA and thejthcolumn of B. Computationally, it is given by

    ==

    M

    k

    kjikij bac1

    (2.4)

    One way to form C using standard FORTRAN code would be

    DO 300 I = 1, NDO 300 J = 1, LC(I,J) = 0.0DO 300 K = 1, M

    300 C(I,J) = C(I,J) + A(I,K)*B(K,J) (2.5)

    A special case of the general rule above is the multiplication of a matrix G (NM) and avector m (M 1):

    d = G m (1.13)(N 1) (NM) (M 1)

    In terms of computation, the vector d is given by

    di = Gijmj

    j=1

    M

    (2.6)

    The Inverse of a Matrix

    The mathematical inverse of the MM matrix A, denoted A1, is defined such that:

    AA1 = A1A = IM (2.7)

    where IM is the MM identity matrix given by:

    100

    0

    10

    001

    L

    OM

    M

    L

    (2.8)

    (MM)

  • 7/30/2019 Complete Fall09

    20/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    14

    A1 is the matrix, which when either pre- or postmultiplied by A, returns the identity matrix.Clearly, since only square matrices can both pre- and postmultiply each other, the mathematicalinverse of a matrix only exists for square matrices.

    A useful theorem follows concerning the inverse of a product of matrices:

    Theorem: If A = B C D (2.9)NN NN NN NN

    Then A1, if it exists, is given by

    A1 = D1C1B1 (2.10)

    Proof: A(A1) = BCD(D1C1B1)

    = BC (DD1) C1B1

    = BCIC1B1

    = B (CC1) B1

    = BB1

    = I (2.11)

    Similarly, (A1)A = D1C1B1BCD = = I (Q.E.D.)

    The Transpose and Trace of a Matrix

    The transpose of a matrix A is written as AT and is given by

    (AT)ij = Aji (2.12)

    That is, you interchange rows and columns.

    The transpose of a product of matrices is the product of the transposes, in reverse order.That is,

    (AB)T = BTAT (2.13)

  • 7/30/2019 Complete Fall09

    21/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    15

    Just about everything we do with real matrices A has an analog for complex matrices. Inthe complex case, wherever the transpose of a matrix occurs, it is replaced by the complexconjugate transpose of the matrix, denoted A . That is,

    if Aij = aij + biji (2.14)

    then A ij = cij + diji (2.15)

    where cij = aji (2.16)

    and dij = bji (2.17)

    that is, A ij = aji bjii (2.18)

    Finally, the trace ofA is given by

    trace (A) = aiii=1

    M

    (2.19)

    Hermitian Matrices

    A matrix A is said to be Hermitian if it is equal to its complex conjugate transpose. Thatis, if

    A = A (2.20)

    IfA is a real matrix, this is equivalent to

    A = AT (2.21)

    This implies that A must be square. The reason that Hermitian matrices will be important is thatthey have only real eigenvalues. We will take advantage of this many times when we considereigenvalue and shifted eigenvalue problems later.

    2.2.2 Matrix Transformations

    Linear Transformations

    A matrix equation can be thought of as a linear transformation. Consider, for example,the original matrix equation:

    d = Gm (1.13)

  • 7/30/2019 Complete Fall09

    22/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    16

    where d is anN 1 vector, m is anM 1 vector, and G is anNMmatrix. The matrix G canbe thought of as an operator that operates on an M-dimensional vector m and returns anN-dimensional vector d.

    Equation (1.13) represents an explicit, linear relationship between the data and model

    parameters. The operator G, in this case, is said to be linear because ifm is doubled, for example,so is d. Mathematically, one says that G is a linear operator if the following is true:If d = Gm

    and f= Gr

    then [d + f] = G[m + r] (2.22)

    In another way to look at matrix multiplications, in the by-now-familiar Equation (1.13),

    d = Gm (1.13)

    the column vector d can be thought of as a weighted sum of the columns of G, with theweighting factors being the elements in m. That is,

    d = m1g1 + m2g2 + + mMgM (2.23)

    where

    m = [m1, m2, . . . , mM]T (2.24)

    and

    gi = [g1i, g2i, . . . , gNi]T (2.25)

    is the ith column ofG. Also, ifGA = B, then the above can be used to infer that the first columnof B is a weighted sum of the columns of G with the elements of the first column of A asweighting factors, etc. for the other columns ofB. Each column ofB is a weighted sum of thecolumns ofG.

    Next, consider

    dT = [Gm]T (2.26)

    or

    dT = mT GT (2.27)1 N 1 M MN

    The row vector dT is the weighted sum of the rows ofGT, with the weighting factors again beingthe elements in m. That is,

  • 7/30/2019 Complete Fall09

    23/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    17

    dT = m1g1T + m2g2

    T + + mMgMT (2.28)

    Extending this to

    ATGT = BT (2.29)

    we have that each row of BT is a weighted sum of the rows ofGT, with the weighting factorsbeing the elements of the appropriate row ofAT.

    In a long string of matrix multiplications such as

    ABC = D (2.30)

    each column ofD is a weighted sum of the columns ofA, and each row ofD is a weighted sumof the rows ofC.

    Orthogonal Transformations

    An orthogonal transformation is one that leaves the length of a vector unchanged. Wecan only talk about the length of a vector being unchanged if the dimension of the vector isunchanged. Thus, only square matrices may represent an orthogonal transformation.

    Suppose L is an orthogonal transformation. Then, if

    Lx = y (2.31)

    where L isNN, and x, y are bothN-dimensional vectors. Then

    xTx = yTy (2.32)

    where Equation (2.32) represents the dot product of the vectors with themselves, which is equalto the length squared of the vector. If you have ever done coordinate transformations in the past,you have dealt with an orthogonal transformation. Orthogonal transformations rotate vectors butdo not change their lengths.

    Properties of orthogonal transformations. There are several properties of orthogonal

    transformations that we will wish to use.

    First, ifL is anNNorthogonal transformation, then

    LTL = IN (2.33)

    This follows from

    yTy = [Lx]T[Lx]

  • 7/30/2019 Complete Fall09

    24/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    18

    = xTLTLx (2.34)

    but yTy = xTx by Equation (2.32). Thus,

    LTL = IN (Q.E.D.) (2.35)

    Second, the relationship between L and its inverse is given byL1 = LT (2.36)

    and

    L = [LT]1 (2.37)

    These two follow directly from Equation (2.35) above.

    Third, the determinant of a matrix is unchanged if it is operated upon by orthogonaltransformations. Recall that the determinant of a 3 3 matrix A, for example, where A is givenby

    =

    333231

    232221

    131211

    aaa

    aaa

    aaa

    A (2.38)

    is given by

    det (A) = a11(a22a33 a23a32)

    a12(a21a33 a23a31)

    +a13(a21a32 a22a31) (2.39)

    Thus, ifA is anMMmatrix, and Lis an orthogonal transformations, and if

    A = (L)A(L)T (2.40)

    it follows that

    det (A) = det (A) (2.41)

    Fourth, the trace of a matrix is unchanged if it is operated upon by an orthogonaltransformation, where trace (A) is defined as

    =

    =M

    i

    iia1

    )(trace A (2.42)

  • 7/30/2019 Complete Fall09

    25/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    19

    That is, the sum of the diagonal elements of a matrix is unchanged by an orthogonaltransformation. Thus,

    trace (A) = trace (A) (2.43)

    Semiorthogonal Transformations

    Suppose that the linear operator L is not square, butNM(NM). Then L is said tobe semiorthogonal if and only if

    LTL = IM, but LLT IN,N>M (2.44)

    or

    LLT = IN, but LTLIM,M>N (2.45)

    where IN and IMare theNNand MM identity matrices, respectively.

    A matrix cannot be both orthogonal and semiorthogonal. Orthogonal matrices must besquare, and semiorthogonal matrices cannot be square. Furthermore, if L is a square N Nmatrix, and

    LTL = IN (2.35)

    then it is not possible to have

    LLTIN (2.46)

    2.2.3 Matrices and Vector Spaces

    The columns or rows of a matrix can be thought of as vectors. For example, ifA is anNM matrix, each column can be thought of as a vector in N-space because it has N entries.Conversely, each row of A can be thought of as being a vector in M-space because it has Mentries.

    We note that for the linear system of equations given by

    Gm = d (1.13)

    where G isNM, m isM 1, and d isN 1, that the model parameter vector m lies inM-space(along with all the rows of G), while the data vector lies inN-space (along with all the columnsofG). In general, we will think of the M 1 vectors as lying in model space, while theN 1vectors lie in data space.Spanning a Space

  • 7/30/2019 Complete Fall09

    26/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    20

    The notion of spanning a space is important for any discussion of the uniqueness ofsolutions or of the ability to fit the data. We first need to introduce definitions of linearindependence and vector orthogonality.

    A set onMvectors vi, i = 1, . . . , M, inM-space (the set of allM-dimensional vectors),

    is said to be linearly independent if and only if

    a1v1 + a2v2 + + aMvM= 0 (2.47)

    where ai are constants, has only the trivial solution ai = 0, i = 1, . . . ,M.

    This is equivalent to saying that an arbitrary vector s inMspace can be written as a linearcombination of the vi, i = 1, . . . ,M. That is, one can find ai such that for an arbitrary vector s

    s = a1v1 + a2v2 + + aMvM (2.48)

    Two vectors r and s in M-space are said to be orthogonal to each other if their dot, or inner,product with each other is zero. That is, if

    0cos == srsr (2.49)

    where is the angle between the vectors, and r , s are the lengths ofr and s, respectively.

    The dot product of two vectors is also given by

    rTs = sTr = rii=1

    M

    si (2.50)

    Mspace is spanned by any set ofMlinearly independentM-dimensional vectors.

    Rank of a Matrix

    The number of linearly independent rows in a matrix, which is also equal to the number oflinearly independent columns, is called the rank of the matrix. The rank of matrices is defined forboth square and nonsquare matrices. The rank of a matrix cannot exceed the minimum of the num-ber of rows or columns in the matrix (i.e., the rank is less than or equal to the minimum ofN,M).

    If anMMmatrix is an orthogonal matrix, then it has rankM. TheMrows are all linearlyindependent, as are the Mcolumns. In fact, not only are the rows independent for an orthogonalmatrix, they are orthogonal to each other. The same is true for the columns. If a matrix issemiorthogonal, then theMcolumns (orNrows, ifN

  • 7/30/2019 Complete Fall09

    27/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    21

    2.3 Probability and Statistics

    2.3.1 Introduction

    We need some background in probability and statistics before proceeding very far. Inthis review section, I will cover the material from Menke's book, using some material from othermath texts to help clarify things.

    Basically, what we need is a way of describing the noise in data and estimated modelparameters. We will need the following terms: random variable,probability distribution, meanor expected value, maximum likelihood, variance, standard deviation, standardized normalvariables, covariance, correlationcoefficients, Gaussian distributions, and confidence intervals.

    2.3.2 Definitions, Part 1

    Random Variable: A function that assigns a value to the outcome of an experiment. A randomvariable has well-defined properties based on some distribution. It is called random because youcannot know beforehand the exact value for the outcome of the experiment. One cannot measuredirectly the true properties of a random variable. One can only make measurements, also calledrealizations, of a random variable, and estimate its properties. The birth weight of baby goslingsis a random variable, for example.

    Probability Density Function: The true properties of a random variable b are specified by theprobability density function P(b). The probability that a particular realization of b will fallbetween b and b + db is given by P(b)db. (Note that Menke uses dwhere I use b. His notation is

    bad when one needs to use integrals.) P(b) satisfies

    1 = P(b)

    +

    db (2.51)

    which says that the probability ofb taking on some value is 1. P(b) completely describes therandom variable b. It is often useful to try and find a way to summarize the properties ofP(b)with a few numbers, however.

    Mean or Expected Value: The mean value E(b) (also denoted ) is much like the mean of a

    set of numbers; that is, it is the balancing point of the distribution P(b) and is given by

    E(b) = bP(b)

    +

    db (2.52)

    Maximum Likelihood: This is the point in the probability distribution P(b) that has the highestlikelihood or probability. It may or may not be close to the mean E(b) = . An importantpoint is that for Gaussian distributions, the maximum likelihood point and the mean E(b) =

  • 7/30/2019 Complete Fall09

    28/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    22

    are the same! The graph below (after Figure 2.3, p. 23, Menke) illustrates a case where the twoare different.

    P(b)

    bML

    b

    The maximum likelihood point bML of the probability distribution P(b) for data b gives the mostprobable value of the data. In general, this value can be different from the mean datum ,which is at the balancing point of the distribution.

    Variance: Variance is one measure of the spread, or width, ofP(b) about the mean E(b). It isgiven by

    2 = (b < b >)2 P(b)

    +

    db (2.53)

    Computationally, for L experiments in which the kth experiment gives bk, the variance is given

    by

    2 =1

    L 1(bk < b >)

    2

    k=1

    L

    (2.54)

    Standard Deviation: Standard deviation is the positive square root of the variance, given by

    = + 2 (2.55)

    Covariance: Covariance is a measure of the correlation between errors. If the errors in twoobservations are uncorrelated, then the covariance is zero. We need another definition beforeproceeding.

  • 7/30/2019 Complete Fall09

    29/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    23

    Joint Density Function P(b): The probability that b1 is between b1 and b1 + db1, that b2 is

    between b2 and b2 + db2, etc. If the data are independent, then

    P(b) = P(b1) P(b2) . . .P(bn) (2.56)

    If the data are correlated, then P(b) will have some more complicated form. Then, thecovariance between b1 and b2 is defined as

    ndbdbdbPbbbbbb )())((),cov( 21221121 LL b+

    +

    >

  • 7/30/2019 Complete Fall09

    30/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    24

    The figure below (after Figure 2.8, page 26, Menke) shows three different cases of degreeof correlation for two observations b1 and b2.

    +

    +

    +

    +

    +

    +

    b

    b2

    1b

    b2

    1b

    b2

    1

    (a) (b) (c)

    Contour plots of P(b1, b2) when the data are (a) uncorrelated, (b) positively correlated, (c)negatively correlated. The dashed lines indicate the four quadrants of alternating sign used todetermine correlation.

    2.3.3 Some Comments on Applications to Inverse Theory

    Some comments are now in order about the nature of the estimated model parameters.We will always assume that the noise in the observations can be described as random variables.Whatever inverse we create will map errors in the data into errors in the estimated modelparameters. Thus, the estimated model parameters are themselves random variables. This is true

    even though the true model parameters may not be random variables. If the distribution of noisefor the data is known, then in principle the distribution for the estimated model parameters can befound by mapping through the inverse operator.

    This is often very difficult, but one particular case turns out to have a rather simple form.We will see where this form comes from when we get to the subject of generalized inverses. Fornow, consider the following as magic.

    If the transformation between data b and model parameters m is of the formm = Mb + v (2.61)

    where M is any arbitrary matrix and v is any arbitrary vector, then

    = M + v (2.62)

    and

    [cov m] = M [cov b] MT (2.63)

  • 7/30/2019 Complete Fall09

    31/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    25

    2.3.4 Definitions, Part 2

    Gaussian Distribution: This is a particular probability distribution given by

    >

  • 7/30/2019 Complete Fall09

    32/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    26

    which reduces to the previous case in Equation (2.64) for N= 1 and var (b1) = 2. In statistics

    books, Equation (2.65) is often given as

    P(b) = (2)N/2 |b|1/2 exp{2[b b]T1[b b]}

    With this background, it makes sense (statistically, at least) to replace the originalrelationship:

    b = Gm (1.13)

    with

    = Gm (2.66)

    The reason is that one cannot expect that there is an m that should exactly predict any particularrealization ofb when b is in fact a random variable.

    Then the joint probability is given by

    { }][][cov][exp)2(

    ])(det[cov)(

    1T

    2

    1

    2/

    2/1

    GmbbGmbb

    b =

    NP

    (2.67)

    What one then does is seek an m that maximizes the probability that the predicted dataare in fact close to the observed data. This is the basis of the maximum likelihoodor probabilisticapproach to inverse theory.

    Standardized Normal Variables: It is possible to standardize random variables by subtractingtheir mean and dividing by the standard deviation.

    If the random variable had a Gaussian (i.e., normal) distribution, then so does thestandardized random variable. Now, however, the standardized normal variables have zero meanand standard deviation equal to one. Random variables can be standardized by the followingtransformation:

    mm =s (2.68)

    where you will often see z replacing s in statistics books.

    We will see, when all is said and done, that most inverses represent a transformation tostandardized variables, followed by a simple inverse analysis, and then a transformation backfor the final solution.

    Chi-Squared (Goodness of Fit) Test: A statistical test to see whether a particular observeddistribution is likely to have been drawn from a population having some known form.

  • 7/30/2019 Complete Fall09

    33/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    27

    The application we will make of the chi-squared test is to test whether the noise in aparticular problem is likely to have a Gaussian distribution. This is not the kind of question onecan answer with certainty, so one must talk in terms of probability or likelihood. For example, inthe chi-squared test, one typically says things like there is only a 5% chance that this sampledistribution does not follow a Gaussian distribution.

    As applied to testing whether a given distribution is likely to have come from a Gaussianpopulation, the procedure is as follows: One sets up an arbitrary number of bins and comparesthe number of observations that fall into each bin with the number expected from a Gaussiandistribution having the same mean and variance as the observed data. One quantifies thedeparture between the two distributions, called the chi-squared value and denoted2, as

    ( ) ( )[ ][ ]

    2

    1

    2

    bininexpected#

    bininexpected#bininobs#

    =

    =

    k

    i i

    ii (2.69)

    where the sum is over the number of bins, k. Next, the number of degrees of freedom for the

    problem must be considered. For this problem, the number of degrees is equal to the number ofbins minus three. The reason you subtract three is as follows: You subtract 1 because if anobservation does not fall into any subset ofk 1 bins, you know it falls in the one bin left over.You are not free to put it anywhere else. The other two come from the fact that you haveassumed that the mean and standard deviation of the observed data set are the mean and standarddeviations for the theoretical Gaussian distribution.

    With this information in hand, one uses standard chi-squared test tables from statisticsbooks and determines whether such a departure would occur randomly more often than, say, 5%of the time. Officially, the null hypothesis is that the sample was drawn from a Gaussiandistribution. If the observed value for2 is greater than

    2 , called the critical2 value for thesignificance level, then the null hypothesis is rejected at the significance level. Commonly,= 0.05 is used for this test, although = 0.01 is also used. The significance level isequivalent to the 100*(1 )% confidence level (i.e., = 0.05 corresponds to the 95%confidence level).

    Consider the following example, where the underlying Gaussian distribution from whichall data samples d are drawn has a mean of 7 and a variance of 10. Seven bins are set up withedges at 4, 2, 4, 6, 8, 10, 12, and18, respectively. Bin widths are not prescribed for the chi-squared test, but ideally are chosen so there are about an equal number of occurrences expectedin each bin. Also, one rule of thumb is to only include bins having at least five expectedoccurrences. I have not followed the about equal number expected in each bin suggestionbecause I want to be able to visually compare a histogram with an underlying Gaussian shape.

    However, I have chosen wider bins at the edges in these test cases to capture more occurrences atthe edges of the distribution.

    Suppose our experiment with 100 observations yields a sample mean of 6.76 and asample variance of 8.27, and 3, 13, 26, 25, 16, 14, and 3 observations, respectively, in the binsfrom left to right. Using standard formulas for a Gaussian distribution with a mean of 6.76 and avariance of 8.27, the number expected in each bin is 4.90, 11.98, 22.73, 27.10, 20.31, 9.56, and3.41, respectively. The calculated2, using Equation (2.69), is 4.48. For seven bins, the DOFs

  • 7/30/2019 Complete Fall09

    34/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    28

    for the test is 4, and2 = 9.49 for = 0.05. Thus, in this case, the null hypothesis would beaccepted. That is, we would accept that this sample was drawn from a Gaussian distribution witha mean of 6.76 and a variance of 8.27 at the = 0.05 significance level (95% confidence level).The distribution is shown below, with a filled circle in each histogram at the number expected inthat bin.

    It is important to note that this distribution does not look exactly like a Gaussiandistribution, but still passes the2 test. A simple, non-chi-square analogy may help better

    understand the reasoning behind the chi-square test. Consider tossing a true coin 10 times. Themost likely outcome is 5 heads and 5 tails. Would you reject a null hypothesis that the coin is atrue coin if you got 6 heads and 4 tails in your one experiment of tossing the coin ten times?Intuitively, you probably would not reject the null hypothesis in this case, because 6 heads and 4tails is not that unlikely for a true coin.

    In order to make an informed decision, as we try to do with the chi-square test, you wouldneed to quantify how likely, or unlikely, a particular outcome is before accepting or rejecting thenull hypothesis that it is a true coin. For a true coin, 5 heads and 5 tails has a probability of 0.246(that is, on average, it happens 24.6% of the time), while the probability of 6 heads and 4 tails is0.205, 7 heads and 3 tails is 0.117, and 8 heads and 2 tails is 0.044, respectively. A distributionof 7 heads and 3 tails does not look like 5 heads and 5 tails, but occurs more than 10% of the

    time with a true coin.

    Hence, by analogy, it is not too unlikely and you would probably not reject the nullhypothesis that the coin is a true coin just because you tossed 7 heads and 3 tails in oneexperiment. Ten heads and no tails only occurs, on average, one time in 1024 experiments (orabout 0.098% of the time). If you got 10 heads and 0 tails, youd probably reject the nullhypothesis that you are tossing a true coin because the outcome is very unlikely. Eight heads andtwo tails occurs 4.4% of the time, on average. You might also reject the null hypothesis in this

  • 7/30/2019 Complete Fall09

    35/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    29

    case, but you would do so with less confidence, or at a lower significance level. In both cases,however, your conclusion will be wrong occasionally just due to random variations. You acceptthe possibility that you will be wrong rejecting the null hypothesis 4.4% of the time in this case,even if the coin is true.

    The same is true with the chi-square test. That is, at the = 0.05 significance level (95%confidence level), with2 greater than2 , you reject the null hypothesis, even though yourecognize that you will reject the null hypothesis incorrectly about 5% of the time in the presenceof random variations. Note that this analogy is a simple one in the sense that it is entirelypossible to actually do a chi-square test on this coin toss example. Each time you toss the cointen times you get one outcome: xheads and (10 x) tails. This falls into the xheads and (10 x) tails bin. If you repeat this many times you get a distribution across all bins from 0 headsand 10 tails to 10 heads and 0 tails. Then you would calculate the number expected in eachbin and use Equation (2.69) to calculate a chi-square value to compare with the critical value atthe significance level.

    Now let us return to another example of the chi-square test where we reject the null

    hypothesis. Consider a case where the observed number in each of the seven bins defined aboveis now 2, 17, 13, 24, 26, 9, and 9, respectively, and the observed distribution has a mean of 7.28and variance of 10.28. The expected number in each bin, for the observed mean and variance, is4.95, 10.32, 19.16, 24.40, 21.32, 12.78, and 7.02, respectively. The calculated 2 is now 10.77,and the null hypothesis would be rejected at the = 0.05 significance level (95% confidencelevel). That is, we would reject that this sample was drawn from a Gaussian distribution with amean of 7.28 and variance of 10.28 at this significance level. The distribution is shown on thenext page, again with a filled circle in each histogram at the number expected in that bin.

    Confidence Intervals: One says, for example, with 98% confidence that the true mean of arandom variable lies between two values. This is based on knowing the probability distribution

  • 7/30/2019 Complete Fall09

    36/271

    Geosciences 567: CHAPTER 2 (RMR/GZ)

    30

    for the random variable, of course, and can be very difficult, especially for complicateddistributions that include nonzero correlation coefficients. However, for Gaussian distributions,these are well known and can be found in any standard statistics book. For example, Gaussiandistributions have 68% and 95% confidence intervals of approximately 1 and 2,respectively.

    T and F Tests: These two statistical tests are commonly used to determine whether theproperties of two samples are consistent with the samples coming from the same population.

    The Ftest in particular can be used to test the improvement in the fit between predictedand observed data when one adds a degree of freedom in the inversion. One expects to fit thedata better by adding more model parameters, so the relevant question is whether theimprovement is significant.

    As applied to the test of improvement in fit between case 1 and case 2, where case 2 usesmore model parameters to describe the same data set, the Fratio is given by

    )/(

    )/()(

    22

    2121

    DOFE

    DOFDOFEEF

    = (2.70)

    where E is the residual sum of squares andDOFis the number of degrees of freedom for eachcase.

    IfF is large, one accepts that the second case with more model parameters provides asignificantly better fit to the data. The calculated Fis compared to published tables withDOF1

    DOF2 andDOF2 degrees of freedom at a specified confidence level. (Reference: T. M. Hearns,

    Pn

    travel times in Southern California,J. Geophys. Res., 89, 18431855, 1984.)

    The next section will deal with solving inverse problems based on length measures. Thiswill include the classic least squares approach.

  • 7/30/2019 Complete Fall09

    37/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    31

    CHAPTER 3: INVERSE METHODS BASED ON LENGTH

    3.1 Introduction

    This chapter is concerned with inverse methods based on the length of various vectorsthat arise in a typical problem. The two most common vectors concerned are the data-error ormisfit vector and the model parameter vector. Methods based on the first vector give rise toclassic least squares solutions. Methods based on the second vector give rise to what are knownas minimum length solutions. Improvements over simple least squares and minimum lengthsolutions include the use of information about noise in the data and a priori information aboutthe model parameters, and are known as weighted least squares or weighted minimum length

    solutions, respectively. This chapter will end with material on how to handle constraints and onvariances of the estimated model parameters.

    3.2 Data Error and Model Parameter Vectors

    The data error and model parameter vectors will play an essential role in the developmentof inverse methods. They are given by

    data error vector = e = dobs dpre (3.1)

    and

    model parameter vector = m (3.2)

    The dimension of the error vector e isN 1, while the dimension of the model parameter vectorisM 1, respectively. In order to utilize these vectors, we next consider the notion of the size, orlength, of vectors.

    3.3 Measures of Length

    The norm of a vector is a measure of its size, or length. There are many possibledefinitions for norms. We are most familiar with the Cartesian (L2) norm. Some examples of

    norms follow:

    =

    =N

    i

    ieL1

    1(3.3)

  • 7/30/2019 Complete Fall09

    38/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    32

    2/1

    1

    2

    2

    =

    =

    N

    i

    ieL(3.4)

    M

    MN

    i

    M

    iM eL

    /1

    1

    =

    =

    (3.5)

    and finally,

    L = maxi

    ei (3.6)

    Important Notice! Inverse methods based on different norms can, and often do, give differentanswers!

    The reason is that different norms give different weight to outliers. For example, theL norm gives all the weight to the largest misfit. Low-order norms give more equal weight to

    errors of different sizes.

    The L2 norm gives the familiar Cartesian length of a vector. Consider the total misfit E

    between observed and predicted data. It has units of length squared and can be found either asthe square of the L2 norm of e, the error vector (Equation 3.1), or by noting that it is also

    equivalent to the dot (or inner) product ofe with itself, given by

    =

    =

    ==N

    i

    i

    N

    N e

    e

    e

    e

    eeeE1

    22

    1

    21

    T ][M

    Lee (3.7)

    Inverse methods based on theL2 norm are also closely tied to the notion that errors in the

    data have Gaussian statistics. They give considerable weight to large errors, which would beconsidered unlikely if, in fact, the errors were distributed in a Gaussian fashion.

    Now that we have a way to quantify the misfit between predicted and observed data, weare ready to define a procedure for estimating the value of the elements in m. The procedure isto take the partial derivative of E with respect to each element in m and set the resultingequations to zero. This will produce a system ofMequations that can be manipulated in such away that, in general, leads to a solution for theMelements ofm.

    The next section will show how this is done for the least squares problem of finding abest fit straight line to a set of data points.

  • 7/30/2019 Complete Fall09

    39/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    33

    3.4 Minimizing the MisfitLeast Squares

    3.4.1 Least Squares Problem for a Straight Line

    Consider the figure below (after Figure 3.1 from Menke, page 36):

    d

    z zi

    (a) (b)

    dobs

    i

    d pre

    i

    {ei

    .............. ..

    .

    . .

    .

    (a) Least squares fitting of a straight line to (z, d) pairs. (b) The error ei for eachobservation is the difference between the observed and predicted datum: ei = di

    obs di

    pre.

    The ith predicted datum dipre for the straight line problem is given by

    dipre = m1 + m2zi (3.8)

    where the two unknowns, m1 and m2, are the intercept and slope of the line, respectively, andzi is

    the value along thez axis where the ith observation is made.

    ForNpoints we have a system ofNsuch equations that can be written in matrix form as:

    =

    2

    1

    11

    1

    1

    1

    m

    m

    z

    z

    z

    d

    d

    d

    N

    i

    N

    i

    MM

    MM

    M

    M

    (3.9)

    Or, in the by now familiar matrix notation, as

  • 7/30/2019 Complete Fall09

    40/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    34

    d = G m (1.13)(N 1) (N 2) (2 1)

    The total misfit Eis given by

    [ ]2preobsT

    ==N

    i

    ii ddE ee (3.10)

    ( )[ ]221obs +=

    N

    i

    ii zmmd(3.11)

    Dropping the obs in the notation for the observed data, we have

    [ ] +++=N

    i

    iiiiii zmmzmmzmdmddE22

    2

    2

    12121

    2222 (3.12)

    Then, taking the partials ofEwith respect to m1 and m2, respectively, and setting them to zero

    yields the following equations:

    E

    m1= 2Nm1 2 di

    i=1

    N

    + 2m2 zii=1

    N

    = 0 (3.13)

    and

    0222

    1

    2

    2

    1

    1

    12

    =++= ===

    N

    i

    i

    N

    i

    i

    N

    i

    ii zmzmzd

    m

    E

    (3.14)

    Rewriting Equations (3.13) and (3.14) above yields

    =+i

    i

    i

    i dzmNm 21(3.15)

    and

    =+i

    ii

    i

    i

    i

    i zdzmzm2

    21(3.16)

    Combining the two equations in matrix notation in the form Am = b yields

    =

    ii

    i

    ii

    i

    zd

    d

    m

    m

    zz

    zN

    2

    1

    2 (3.17)

    or simply

  • 7/30/2019 Complete Fall09

    41/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    35

    A m = b (3.18)(2 2) (2 1) (2 1)

    Note that by the above procedure we have reduced the problem from one withNequations in twounknowns (m1 and m2) in Gm = d to one with two equations in the same two unknowns in Am =

    b.

    The matrix equation Am = b can also be rewritten in terms of the original G and d whenone notices that the matrix A can be factored as

    GGT2

    1

    21

    2

    1

    1

    1

    111=

    =

    N

    Nii

    i

    z

    z

    z

    zzzzz

    zN

    MML

    L(3.19)

    (2 2) (2 N) (N 2) (2 2)

    Also, b above can be rewritten similarly as

    dGT2

    1

    21

    111=

    =

    N

    Nii

    i

    d

    d

    d

    zzzzd

    d

    ML

    L(3.20)

    Thus, substituting Equations (3.19) and (3.20) into Equation (3.17), one arrives at the so-callednormal equations for the least squares problem:

    GTGm = GTd (3.21)

    The least squares solution mLS is then found as

    mLS = [GTG]1GTd (3.22)

    assuming that [GTG]1 exists.

    In summary, we used the forward problem (Equation 3.9) to give us an explicitrelationship between the model parameters (m

    1and m

    2) and a measure of the misfit to the

    observed data, E. Then, we minimized Eby taking the partial derivatives of the misfit functionwith respect to the unknown model parameters, setting the partials to zero, and solving for themodel parameters.

  • 7/30/2019 Complete Fall09

    42/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    36

    3.4.2 Derivation of the General Least Squares Solution

    We start with any system of linear equations which can be expressed in the form

    d = G m (1.13)

    (N 1) (NM) (M 1)

    Again, let E= eTe = [d dpre]T[d dpre]

    E= [d Gm]T[d Gm] (3.23)

    =

    ===

    M

    k

    kiki

    M

    j

    jiji

    N

    i

    mGdmGdE111

    (3.24)

    As before, the procedure is to write out the above equation with all its cross terms, take partials

    ofEwith respect to each of the elements in m, and set the corresponding equations to zero. Forexample, following Menke, page 40, Equations (3.6)(3.9), we obtain an expression for thepartial ofEwith respect to mq:

    022111

    == ===

    i

    N

    i

    iqik

    N

    i

    iq

    M

    k

    k

    q

    dGGGmm

    E

    (3.25)

    We can simplify this expression by recalling Equation (2.4) from the introductory remarks onmatrix manipulations in Chapter 2:

    =

    =M

    k

    kjikij baC

    1

    (2.4)

    Note that the first summation on i in Equation (3.25) looks similar in form to Equation (2.4), butthe subscripts on the first G term are backwards. If we further note that interchanging thesubscripts is equivalent to taking the transpose ofG, we see that the summation on i gives theqkth entry in GTG:

    qk

    N

    i

    ikqi

    N

    i

    ikiq GGGG ][][T

    1

    T

    1

    GG== ==

    (3.26)

    Thus, Equation (3.25) reduces to

    02][21

    T

    1

    == ==

    i

    N

    i

    iqqk

    M

    k

    k

    q

    dGmm

    EGG

    (3.27)

    Now, we can further simplify the first summation by recalling Equation (2.6) from the samesection

    =

    =M

    j

    jiji mGd1

    (2.6)

  • 7/30/2019 Complete Fall09

    43/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    37

    To see this clearly, we rearrange the order of terms in the first sum as follows:

    qk

    M

    k

    qk

    M

    k

    qkk mm ][][][T

    1=

    T

    1

    T GmGGGGG == =

    (3.28)

    which is the qth entry in GTGm. Note that GTGm has dimension (MN)(NM)(M 1) =(M 1). That is, it is anM-dimensional vector.

    In a similar fashion, the second summation on i can be reduced to a term in [GTd]q, theqth entry in an (MN)(N 1) = (M 1) dimensional vector. Thus, for the qth equation, wehave

    qq

    qm

    E][2][20 TT dGGmG ==

    (3.29)

    Dropping the common factor of 2 and combining the q equations into matrix notation, we arriveat

    GTGm = GTd (3.30)

    The least squares solution for m is thus given by

    mLS = [GTG]1GTd (3.31)

    The least squares operator, GLS1, is thus given by

    GLS1 = [GTG]1GT (3.32)

    Recalling basic calculus, we note that mLS above is the solution that minimizes E, the totalmisfit. Summarizing, setting the q partial derivatives ofEwith respect to the elements in m tozero leads to the least squares solution.

    We have just derived the least squares solution by taking the partial derivatives ofEwithrespect to mq and then combining the terms for q = 1, 2, . . .,M. An alternative, but equivalent,

    formulation begins with Equation (3.2) but is written out as

    E= [d Gm]T[d Gm] (3.23)

    = [dT mTGT][d Gm]

    = dTd dTGm mTGTd + mTGTGm (3.33)

    Then, taking the partial derivative ofEwith respect to mT turns out to be equivalent to what wasdone in Equations (3.25)(3.30) for mq, namely

  • 7/30/2019 Complete Fall09

    44/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    38

    E/mT = GTd + GTGm = 0 (3.34)

    which leads to

    GTGm = GTd (3.30)

    andmLS = [GTG]1GTd (3.31)

    It is also perhaps interesting to note that we could have obtained the same solutionwithout taking partials. To see this, consider the following four steps.

    Step 1. We begin with

    Gm = d (1.13)

    Step 2. We then premultiply both sides by GT

    GTGm = GTd (3.30)

    Step 3. Premultiply both sides by [GTG]1

    [GTG]1GTGm = [GTG]1GTd (3.35)

    Step 4. This reduces to

    mLS = [GTG]1GTd (3.31)

    as before. The point is, however, that this way does not show why mLS is the solution

    which minimizes E, the misfit between the observed and predicted data.

    All of this assumes that [GTG]1 exists, of course. We will return to the existence andproperties of [GTG]1 later. Next, we will look at two examples of least squares problems toshow a striking similarity that is not obvious at first glance.

    3.4.3 Two Examples of Least Squares Problems

    Example 1. Best-Fit Straight-Line Problem

    We have, of course, already derived the solution for this problem in the last section.Briefly, then, for the system of equations

  • 7/30/2019 Complete Fall09

    45/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    39

    d = Gm (1.13)

    given by

    =

    2

    12

    1

    2

    1

    1

    1

    1

    m

    m

    z

    z

    z

    d

    d

    d

    NN

    MMM(3.9)

    we have

    =

    =

    2

    2

    1

    21

    T

    1

    1

    1

    111

    ii

    i

    N

    N zz

    zN

    z

    z

    z

    zzz MML

    LGG (3.36)

    and

    =

    =

    ii

    i

    N

    N zd

    d

    d

    d

    d

    zzz ML

    L 2

    1

    21

    T111

    dG (3.37)

    Thus, the least squares solution is given by

    =

    ii

    i

    ii

    i

    zd

    d

    zz

    zN

    m

    m1

    2

    LS2

    1(3.38)

    Example 2. Best-Fit Parabola Problem

    The ith predicted datum for a parabola is given by

    di = m1 + m2zi + m3zi2 (3.39)

    where m1 and m2 have the same meanings as in the straight line problem, and m3 is the

    coefficient of the quadratic term. Again, the problem can be written in the form:

    d = Gm (1.13)

    where now we have

  • 7/30/2019 Complete Fall09

    46/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    40

    =

    3

    2

    1

    2

    2

    2

    111

    1

    1

    1

    m

    m

    m

    zz

    zz

    zz

    d

    d

    d

    NN

    ii

    N

    i

    MMM

    MMM

    M

    M

    (3.40)

    and

    =

    =2

    T

    432

    32

    2

    T ,

    ii

    ii

    i

    iii

    iii

    ii

    zd

    zd

    d

    zzz

    zzz

    zzN

    dGGG (3.41)

    As before, we form the least squares solution as

    mLS

    = [GTG]1GTd (3.31)

    Although the forward problems of predicting data for the straight line and parabolic caseslook very different, the least squares solution is formed in a way that emphasizes the fundamentalsimilarity between the two problems. For example, notice how the straight-line problem isburied within the parabola problem. The upper left hand 2 2 part ofGTG in Equation (3.41) isthe same as Equation (3.36). Also, the first two entries in GTd in Equation (3.41) are the same asEquation (3.37).

    Next we consider a four-parameter example.

    3.4.4 Four-Parameter Tomography Problem

    Finally, let's consider a four-parameter problem, but this one based on the concept oftomography.

    S R

    1 2

    3 4

    t t

    t

    t

    1

    2

    3 4

    )(11

    21

    21

    1 sshv

    hv

    ht +=

    +

    =

    )(11

    43

    43

    2 sshv

    hv

    ht +=

    +

    =

    )(11 3131

    3 sshv

    hv

    ht +=

    +

    =

    )(11

    42

    42

    4 sshv

    hv

    ht +=

    +

    =

    (3.42)

  • 7/30/2019 Complete Fall09

    47/271

    Geosciences 567: CHAPTER 3 (RMR/GZ)

    41

    =

    4

    3

    2

    1

    4

    3

    2

    1

    1010

    0101

    1100

    0011

    s

    s

    s

    s

    h

    t

    t

    t

    t

    (3.43)

    ord = Gm (1.13)

    =

    =

    2110

    1201

    1021

    0112

    1010

    0101

    1100

    0011

    1010

    0110

    1001

    0101

    22T hhGG (3.44)

    +

    +

    +

    +

    =

    42

    32

    41

    31

    T

    tt

    tt

    tt

    tt

    hdG (3.45)

    So, the normal equations are

    GTGm = GTd (3.21)

    +

    ++

    +

    =

    42

    32

    41

    31

    4

    3

    2

    1

    2110

    12011021

    0112

    tt

    tttt

    tt

    s

    ss

    s

    h (3.46)

    or

    +

    +

    +

    +

    =

    +++

    42

    32

    41

    31

    4321

    2

    1

    1

    0

    1

    2

    0

    1

    1

    0

    2

    1

    0

    1

    1

    2

    tt

    tt

    tt

    tt

    ssssh (3.47)

    Example: s1 =