Clive D Rodgers - European Space...

Post on 27-Apr-2020

7 views 0 download

Transcript of Clive D Rodgers - European Space...

1

PRINCIPLES OF RETRIEVAL THEORY

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

2

ATMOSPHERIC REMOTE SENSING: THE INVERSE PROBLEM

Topics

1. Introduction

2. Bayesian approach

3. Information content

4. Error analysis and characterisation

3

ADVERTISEMENT

C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory and Practice,

World Scientific Publishing Co., 2000.

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

• Finding a solution, inverting the forward model

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

• Finding a solution, inverting the forward model

◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

• Finding a solution, inverting the forward model

◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

• Finding the ‘best’ solution

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

• Finding a solution, inverting the forward model

◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

• Finding the ‘best’ solution

◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

4

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

• Various aspects:

• Formulate the problem properly:

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

• Finding a solution, inverting the forward model

◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

• Finding the ‘best’ solution

◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

• Understanding the answer

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

• the measurement improves that knowledge

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

• the measurement improves that knowledge

• the measurement may not be enough by itself to completely determine the unknown

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

• the measurement improves that knowledge

• the measurement may not be enough by itself to completely determine the unknown

• Ill-posed problems

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

• the measurement improves that knowledge

• the measurement may not be enough by itself to completely determine the unknown

• Ill-posed problems

• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

5

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

• Does it actually contain the information you want?

• Updating existing knowledge

• You always have some prior knowledge of the ‘unknown’

• the measurement improves that knowledge

• the measurement may not be enough by itself to completely determine the unknown

• Ill-posed problems

• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

• Which of an infinite manifold of solutions do you want?

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

• Arrange them as a vector for computational purposes

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

• Arrange them as a vector for computational purposes

• Examples:

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

• Arrange them as a vector for computational purposes

• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

6

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

• Arrange them as a vector for computational purposes

• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

◦ Fourier coefficients for a set of waves

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

State Space

– State space is the space of state vectors, dimension n.

7

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

State Space

– State space is the space of state vectors, dimension n.

Generally the two vector spaces will have different dimensions.

8

MATHEMATICAL CONCEPTS III

8

MATHEMATICAL CONCEPTS III

Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on the

physics of the measurement.

8

MATHEMATICAL CONCEPTS III

Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on the

physics of the measurement.

– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

8

MATHEMATICAL CONCEPTS III

Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on the

physics of the measurement.

– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

Inverse or Retrieval Method

– The inverse problem is one of finding an inverse mapping R(y):

Given a point in measurement space, which point or set of points in state space could havemapped into it?

9

NOTATION

I have tried to make it mnemonic as far as possible:

Matrices Bold upper case

Vectors Bold lower case

State vectors xMeasurement vectors yCovariance matrices S (based on σ, and not wanting to use Σ)

Measurement error ε

Forward model F(x) (Really ought to be f(x) it’s a vector)

Jacobian K (originally a Kernel of an integral transform)

Gain matrix G (I’ve used up K, which might stand for Kalman gain)

Averaging Kernel A

Different vectors, matrices of the same type are distinguished by superscripts, subscripts, etc:

a priori xa (background)

estimate xfirst guess x0

‘true’ value x (no subscript)

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

• Eight channels (elements of y).

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

• Eight channels (elements of y).

• State vector is notionally temperature at 100 levels.

10

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

• Eight channels (elements of y).

• State vector is notionally temperature at 100 levels.

• Measurement error (when considered) is 0.5 K.

11

STANDARD WEIGHTING FUNCTIONS

12

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

them.

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

them.

They span a subspace called the row space, of dimension equal to the rank of K,

p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

them.

They span a subspace called the row space, of dimension equal to the rank of K,

p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.

Only those components of x in the row space can be measured.

13

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

them.

They span a subspace called the row space, of dimension equal to the rank of K,

p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.

Only those components of x in the row space can be measured.

The null space is the part of state space which is not in the row space.

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

14

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

3. p = n < m. Overconstrained, ill-posed

More measurements than unknowns, so they could be inconsistent, but the unknowns are all in

the row space, so there is information about all of them.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns. Simple example:

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns. Simple example:

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

15

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns. Simple example:

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

3. p < n < m.

More measurements than the rank, so they could be inconsistent, more unknowns than the rank,

so not all are defined by the measurement.

16

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

16

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

How do we identify the row and null spaces?

One straightforward way is Gram-Schmidt orthogonalisation, but . . .

17

SINGULAR VECTOR DECOMPOSITION [>>]

Is the neatest way of doing the job.

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a

solution if |A− λI| = 0,

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a

solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

18

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a

solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

An eigenvector can be scaled by an arbitrary factor. It is conventional to normalise them so that

lT l = 1 or l†l = 1 (Hermitian adjoint)

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

Multiply by R = (LT )−1 AT = RΛLT

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

Multiply by R = (LT )−1 AT = RΛLT

Postmultiply by R ATR = RΛ

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

Multiply by R = (LT )−1 AT = RΛLT

Postmultiply by R ATR = RΛ

Thus:

R is the matrix of eigenvectors of AT .

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

Multiply by R = (LT )−1 AT = RΛLT

Postmultiply by R ATR = RΛ

Thus:

R is the matrix of eigenvectors of AT .

AT has the same eigenvalues as A.

19

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

Transpose LTAT = ΛLT

Multiply by R = (LT )−1 AT = RΛLT

Postmultiply by R ATR = RΛ

Thus:

R is the matrix of eigenvectors of AT .

AT has the same eigenvalues as A.

In the case of a Symmetric matrix, S = ST we must have L = R, so that LTL = LLT = I or

LT = L−1, and the eigenvectors are orthogonal.

In this case the eigenvalues are real.

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector,

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

20

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

At these points, xTSx = 1, so xTλx = 1 or:

λ =1

xTx

So the eigenvalues are the reciprocals of the squares of the lengths of the principal axes.

21

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

21

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

Consider using the eigenvectors of S to transform the equation for the quadratic surface:

xTLΛLTx = 1 or yTΛy = 1 orX

λiy2i = 1

where y = LTx or x = Ly. This transforms the surface into its principal axis representation.

22

EIGENVECTORS - USEFUL RELATIONSHIPS

Asymmetric Matrices Symmetric Matrices

AR = RΛ SL = LΛ

LTA = ΛLLT = R−1, RT = L−1 LT = L−1

LRT = LTR = I LLT = LTL = IA = RΛLT =

PλirilTi S = LΛLT =

PλililTi

AT = LΛRT =PλilirTi

A−1 = RΛ−1LT S−1 = LΛ−1LT

An = RΛnLT Sn = LΛnLT

LTAR = Λ LTSL = Λ

LTAnR = Λn LTSnL = Λn

LTA−1R = Λ−1 LTS−1L = Λ−1

|A| =Q

i λi |A| =Q

i λi

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n

columns can be constructed:

Kv = λuKTu = λv (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n

columns can be constructed:

Kv = λuKTu = λv (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

This is equivalent to the symmetric problem:„O K

KT O

«„uv

«= λ

„uv

«

23

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n

columns can be constructed:

Kv = λuKTu = λv (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

This is equivalent to the symmetric problem:„O K

KT O

«„uv

«= λ

„uv

«From (3) we can get

KTKv = λKTu = λ2vKKTu = λKv = λ2u (4)

so u and v are the eigenvectors of KKT (m×m) and KTK (n× n) respectively.

24

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectors

correspond to each other, yet there are potentially different numbers of v and u vectors.

24

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectors

correspond to each other, yet there are potentially different numbers of v and u vectors.

If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

24

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectors

correspond to each other, yet there are potentially different numbers of v and u vectors.

If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

The surplus eigenvectors will have zero eigenvalues, and can be discarded and we can write:„O K

KT O

«„UV

«=

„UV

«Λ

where Λ is p× p, U is m× p, and V is n× p.

There will be n+m− p more eigenvectors of the composite matrix, all with zero eigenvalue.

25

SINGULAR VECTORS - USEFUL RELATIONSHIPS

KV = UΛKTU = VΛ

UTKV = VTKTU = λ

K = UΛVT

KT= VΛUT

VTV = UTU = IpKKTU = UΛ2

KTKV = VΛ2(5)

26

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job.

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

26

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job.

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

27

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

27

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

so that

pUTmy1 = pΛpV

Tnx1

27

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

so that

pUTmy1 = pΛpV

Tnx1

or

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

27

SINGULAR VECTOR DECOMPOSITION [<<]

Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

so that

pUTmy1 = pΛpV

Tnx1

or

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

The rows of VT , or the columns of V (in state space) are a basis for the row space of K. Similarly

the columns of U (in measurement space) are a basis of its column space.

28

SINGULAR VECTOR DECOMPOSITION II

We can also see that an exact solution is

x′ = Λ−1y′

or

x = VΛ−1UTy

This is only a unique solution if p = n. If p < n any multiples of vectors with zero singular values

can be added, and still satisfy the equations.

29

SVD OF THE STANDARD WEIGHTING FUNCTIONS

30

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

31

APPROACHES TO INVERSE PROBLEMS

31

APPROACHES TO INVERSE PROBLEMS

• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

31

APPROACHES TO INVERSE PROBLEMS

• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

31

APPROACHES TO INVERSE PROBLEMS

• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

• Ad hoc Approaches

• Relaxation

• Exact algebraic solutions

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

• P (y) is the a priori p.d.f. of the measurement.

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

• P (y) is the a priori p.d.f. of the measurement.

• P (x, y) is the joint a priori p.d.f. of x and y.

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

• P (y) is the a priori p.d.f. of the measurement.

• P (x, y) is the joint a priori p.d.f. of x and y.

• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

32

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

• P (y) is the a priori p.d.f. of the measurement.

• P (x, y) is the joint a priori p.d.f. of x and y.

• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

• P (x|y) is the p.d.f. of the state given the measurement - this is what we want to find.

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

P (y)

33

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

P (y)

If we have a prior p.d.f. for x, P(x), and we know statistically how y is related to x via P (y|x),

then we can find an un-normalised version of P (x|y), namely P (y|x)P (x), which can be

normalised if required.

34

BAYES THEOREM GEOMETRICALLY

35

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

35

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

35

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

• Assume that the a priori p.d.f. is Gaussian (less justifiable):

− lnP (x) =1

2(x− xa)

TS−1a (x− xa) + const

i.e. x is distributed normally with mean xa and covariance Sa.

36

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

36

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

• If we want a state estimate x rather than a p.d.f., then we must calculate some function of

P (x|y), such as its mean or its maximum

x =

ZP (x|y)x dx or

dP (x|y)

dx= 0

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

This is quadratic in x, so has to be of the form:

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

This is quadratic in x, so has to be of the form:

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

37

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

This is quadratic in x, so has to be of the form:

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

giving

S−1= KTS−1

ε K + S−1a

• ‘The Fisher Information Matrix’.

38

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

38

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

a )x

38

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

a )x

giving:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

a xa)

The mean x and the covariance S define the full posterior pdf.

39

A GEOMETRIC INTERPRETATION OF THE SOLUTION

40

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

a xa) (1)

40

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

40

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

Overconstrained case

The least squares solution xl satisfies KTS−1ε Kxl = KTS−1

ε y.

Inserting this in (1) gives:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

a xa)

41

AN INTERPRETATION OF THE SOLUTION II

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

or

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

a xa)

41

AN INTERPRETATION OF THE SOLUTION II

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

or

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

a xa)

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

41

AN INTERPRETATION OF THE SOLUTION II

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

a xa)

or

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

a xa)

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

• This is exactly like the familiar combination of scalar measurements x1 and x2 of an unknown

x, with variances σ21 and σ2

2 respectively:

x = (1/σ21 + 1/σ

22)−1

(x1/σ21 + x2/σ

22)

42

End of Section

43

INFORMATION CONTENT AND ERROR ANALYSIS

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

44

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

44

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

We need to answer this

• to determine if a conceptual instrument design actually works

• to optimise designs

Use the linear problem for simplicity to illustrate the ideas.

y = Kx + ε

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

• Compare this with the statistical mechanics definition of entropy:

S = −kXi

pi ln pi

P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

• Compare this with the statistical mechanics definition of entropy:

S = −kXi

pi ln pi

P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.

• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

45

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

• Compare this with the statistical mechanics definition of entropy:

S = −kXi

pi ln pi

P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.

• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

What does this all mean?

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

0

1

aln

„1

a

«dx

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

0

1

aln

„1

a

«dx = ln a

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

0

1

aln

„1

a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

1

Vln

„1

V

«dv = lnV

46

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

0

1

aln

„1

a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

1

Vln

„1

V

«dv = lnV

i.e the entropy is the log of the volume of state space occupied by the p.d.f.

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional

to the square roots of the corresponding eigenvalues.

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional

to the square roots of the corresponding eigenvalues.

• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

47

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

12

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional

to the square roots of the corresponding eigenvalues.

• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

• Therefore entropy is the log of the volume enclosed by some particular contour of P (x). A

‘volume of uncertainty’.

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

• A generalisation of ‘signal to noise’.

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

• A generalisation of ‘signal to noise’.

• In the Boxcar case, the log of the ratio of the volumes before and after

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

• A generalisation of ‘signal to noise’.

• In the Boxcar case, the log of the ratio of the volumes before and after

• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

a |

48

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

• A generalisation of ‘signal to noise’.

• In the Boxcar case, the log of the ratio of the volumes before and after

• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

a |

• minus the log of the determinant of the weight of xa in the Bayesian expectation.

49

The log of the ratio of the volumes of the ellipsoids

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

dn = E{[y −Kx]TS−1

ε [y −Kx]}

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

dn = E{[y −Kx]TS−1

ε [y −Kx]}

and

ds = E{[x− xa]TS−1

a [x− xa]}

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

dn = E{[y −Kx]TS−1

ε [y −Kx]}

and

ds = E{[x− xa]TS−1

a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

a )}

50

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

χ2

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

dn = E{[y −Kx]TS−1

ε [y −Kx]}

and

ds = E{[x− xa]TS−1

a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

a )}= tr(E{[x− xa][x− xa]

T}S−1a ) (6)

51

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

ε K)

= tr(KSaKT(KSaK

T+ Sε)

−1)

51

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

ε K)

= tr(KSaKT(KSaK

T+ Sε)

−1) (7)

and

dn = tr((KTS−1ε K + S−1

a )−1S−1

a ) +m− n= tr(Sε(KSaK

T+ Sε)

−1) (8)

52

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

52

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

52

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

52

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices. [>>]

53

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

A12A

12 = A

53

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

A12A

12 = A

Using An = RΛnLT for n = 1/2:

A12 = RΛ

12LT

53

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

A12A

12 = A

Using An = RΛnLT for n = 1/2:

A12 = RΛ

12LT

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

53

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

A12A

12 = A

Using An = RΛnLT for n = 1/2:

A12 = RΛ

12LT

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

We only use square roots of symmetric covariance matrices. In this case S12 = LΛ

12LT is

symmetric.

54

SQUARE ROOTS OF SYMMETRIC MATRICES

Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS

12 , of which the

Cholesky decomposition:

S = TTT

where T is upper triangular, is the most useful.

54

SQUARE ROOTS OF SYMMETRIC MATRICES

Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS

12 , of which the

Cholesky decomposition:

S = TTT

where T is upper triangular, is the most useful.

There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

54

SQUARE ROOTS OF SYMMETRIC MATRICES

Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS

12 , of which the

Cholesky decomposition:

S = TTT

where T is upper triangular, is the most useful.

There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

The inverse symmetric square root is S−12 = LΛ−

12LT .

The inverse Cholesky decomposition is S−1 = T−1T−T .

The inverse square root T−1 is triangular, and its numerical effect is implemented efficiently by

back substitution.

55

INDEPENDENT MEASUREMENTS[<<]

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices.

55

INDEPENDENT MEASUREMENTS[<<]

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices.

The forward model becomes:

y = Kx + ε

where K = S−1

2ε KS

12a .

56

INDEPENDENT MEASUREMENTS[<<]

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

−12

a x

The transformed covariances Sa and Sε both become unit matrices.

The forward model becomes:

y = Kx + ε

where K = S−1

2ε KS

12a .

The solution covariance becomes:ˆS = (In + KT K)

−1

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal,

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

not x′ = Λ−1y′ as you might expect from (1).

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

not x′ = Λ−1y′ as you might expect from (1).

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

57

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

of K:

y = Kx + ε → y = UΛVT x + ε

Define:

x′ = VT x y′ = UT y ε′= UT

ε

The forward model becomes:

y′ = Λx′ + ε′

(1)

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

)−1

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

not x′ = Λ−1y′ as you might expect from (1).

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are poorly measured.

58

INFORMATION

Shannon Information in the Transformed Basis

Because it is a ratio of volumes, the linear transformation does not change the information

content. So consider information in the x′, y′ system:

H = S{S′a} − S{S′}

= −1

2log(|In|) +

1

2log(|(Λ2

+ I)−1|)

=Xi

1

2log(1 + λ

2i ) (9)

59

DEGREES OF FREEDOM

Degres of Freedom in the Transformed Basis

The number of independent quantities measured can be thought of as the number of singular

values for which λi � 1

The degrees of freedom for signal is

ds =Xi

λ2i (1 + λ

2i )−1

It is also the sum of the eigenvalues of In − ˆS.

Summary: for each independent component x′i

• The information content is 12 log(1 + λ2

i )

• The degrees of freedom for signal is λ2i (1 + λ2

i )−1

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

• The vectors which have zero singular values form a basis for the null space.

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

• The vectors which have zero singular values form a basis for the null space.

• Any component of the state in the null space maps onto the origin in measurement space.

60

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

• The vectors which have zero singular values form a basis for the null space.

• Any component of the state in the null space maps onto the origin in measurement space.

• This implies that there are distinct states, in fact whole subspaces, which map onto the same

point in measurement space, and cannot be distinguished by the measurement.

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

In the x′, y′ system:

• vectors with λ = 0 are in the null space

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

In the x′, y′ system:

• vectors with λ = 0 are in the null space

• vectors with λ� 1 are in the near null space

61

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

In the x′, y′ system:

• vectors with λ = 0 are in the null space

• vectors with λ� 1 are in the near null space

• vectors with λ >∼ 1 are in the non-null space, and are observable.

62

Diagnostics for the Standard Case

Table 1: Singular values of K, together with contributions of each vector to the degrees of freedom

ds and information content H for both covariance matrices. Measurement noise variance is 0.25 K2.

Diagonal covariance Full covariance

i λi dsi Hi (bits) λi dsi Hi (bits)

1 6.51929 0.97701 2.72149 27.81364 0.99871 4.79865

2 4.79231 0.95827 2.29147 18.07567 0.99695 4.17818

3 3.09445 0.90544 1.70134 9.94379 0.98999 3.32105

4 1.84370 0.77269 1.06862 5.00738 0.96165 2.35227

5 1.03787 0.51858 0.52731 2.39204 0.85123 1.37443

6 0.55497 0.23547 0.19368 1.09086 0.54337 0.56546

7 0.27941 0.07242 0.05423 0.46770 0.17948 0.14270

8 0.13011 0.01665 0.01211 0.17989 0.03135 0.02297

totals 4.45653 8.57024 5.55272 16.75571

Sa = 100I K2

Saij = 100[1− exp(−

|zi − zj|h

)]

where h = 1 in ln(p) coordinates.

63

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

63

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

Table 2: Singular Values of K

5.345 3.498 0.033 0.0046 7.15e-4 2.56e-4

7.76e-5 2.13e-3 1.71e-5 1.38e-5 9.01e-6 6.73e-6

5.82e-6 4.79e-6 2,87e-6 3.52e-6 3.75e-6 1.91e-6

9.83e-7 2.37e-7 7.71e-7 1.18e-7 1.48e-6 1.95e-7

1.37e-7 6.67e-8 3.50e-8 3.37e-8 5.83e-9 6.29e-9

Noise is 0.03 in these units;

Prior variance is about 1.

There are 2.5 degrees of freedom.

Lots of near null space.

64

A SIMULATED RETRIEVAL

——— ’true state’

· · · · a priori

— — noise free simulation

— · — with simulated noise

How do we understand the nature of a

retrieval?

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

• y is the measurement vector, length m

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

• y is the measurement vector, length m

• x is the state vector, length n

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

• y is the measurement vector, length m

• x is the state vector, length n

• f is a function describing the physics of the measurement,

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

• y is the measurement vector, length m

• x is the state vector, length n

• f is a function describing the physics of the measurement,

• b is a set of ‘known’ parameters of this function,

65

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

• y is the measurement vector, length m

• x is the state vector, length n

• f is a function describing the physics of the measurement,

• b is a set of ‘known’ parameters of this function,

• ε is measurement error, with covariance Sε.

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

• R represents the retrieval method

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

• R represents the retrieval method

• b is our estimate of the forward function parameters b

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

• R represents the retrieval method

• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.

a priori.

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

• R represents the retrieval method

• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.

a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

66

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

• R represents the retrieval method

• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.

a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

which may be regarded as the transfer function of the measurement and retrieval system as a

whole:

x = T(x, b, ε, b, c)

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

• ∂x/∂b = Gb, sensitivity to non-retrieved parameters

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

• ∂x/∂b = Gb, sensitivity to non-retrieved parameters

• ∂x/∂c = Gc, sensitivity to retrieval method parameters

67

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

• ∂x/∂b = Gb, sensitivity to non-retrieved parameters

• ∂x/∂c = Gc, sensitivity to retrieval method parameters

and understanding the effect of replacing f by a numerical Forward Model F.

68

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

68

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

• It usually has parameters b which have experimental error

68

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

• It usually has parameters b which have experimental error

• The vector b is not a target for retrieval

68

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

• It usually has parameters b which have experimental error

• The vector b is not a target for retrieval

• There may be parameters b′ of the forward function that are not included in the forward model:

F(x, b) ' f(x, b, b′)

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

where ∆f is the error in the forward model relative to the real physics:

∆f = f(x, b, b′)− F(x, b)

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

where ∆f is the error in the forward model relative to the real physics:

∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

69

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

where ∆f is the error in the forward model relative to the real physics:

∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)

where Kx = ∂F/∂x (the weighting function) and Kb = ∂F/∂b.

70

LINEARISE THE TRANSFER FUNCTION II

x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)

Linearise R with respect to its first argument, y:

70

LINEARISE THE TRANSFER FUNCTION II

x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)

Linearise R with respect to its first argument, y:

x = R[F(xa, b), b, xa, c] + Gy[Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε]

where Gy = ∂R/∂y (the contribution function)

71

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

where

71

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

where

A = GyKx = ∂x/∂x

the averaging kernel,

71

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

where

A = GyKx = ∂x/∂x

the averaging kernel, and

εy = Kb(b− b) + ∆f(x, b, b′) + ε

is the total error in the measurement relative to the forward model.

71

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

where

A = GyKx = ∂x/∂x

the averaging kernel, and

εy = Kb(b− b) + ∆f(x, b, b′) + ε

is the total error in the measurement relative to the forward model.

72

BIAS

This is the error you would get on doing a simulated error-free retrieval of the a priori.

72

BIAS

This is the error you would get on doing a simulated error-free retrieval of the a priori.

A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

72

BIAS

This is the error you would get on doing a simulated error-free retrieval of the a priori.

A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

But check your system to be sure it is. . .

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

The error analysis is different in the two cases, because in the second case there is an extra term.

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

The error analysis is different in the two cases, because in the second case there is an extra term.

• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

73

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

The error analysis is different in the two cases, because in the second case there is an extra term.

• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

• The width is a measure of the resolution of the observing system

• The area (generally between zero and unity) is a simple measure of the amount of real

information that appears in the retrieval.

74

STANDARD EXAMPLE: AVERAGING KERNEL

75

STANDARD EXAMPLE: RETRIEVAL GAIN

76

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

76

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

The linear expansion in x, b and ε gave:

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

76

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

The linear expansion in x, b and ε gave:

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

76

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

The linear expansion in x, b and ε gave:

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

This has the consequence that, at least to first order, ∂R/∂c = 0 for any reasonable retrieval.

77

THE OTHER TWO PARAMETERS II

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

77

THE OTHER TWO PARAMETERS II

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

77

THE OTHER TWO PARAMETERS II

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

Thus the reasonableness criterion has the consequence that there can be no inverse model

parameters c that matter, other than xa.

77

THE OTHER TWO PARAMETERS II

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

Thus the reasonableness criterion has the consequence that there can be no inverse model

parameters c that matter, other than xa.

Incidentally:

The term in square brackets should not depend on b either.

77

THE OTHER TWO PARAMETERS II

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

Thus the reasonableness criterion has the consequence that there can be no inverse model

parameters c that matter, other than xa.

Incidentally:

The term in square brackets should not depend on b either.

This implies that Gb + GyKb = 0, which is no more than:

∂x∂b

+∂x∂y∂y∂b

= 0

78

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

78

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

Thus the error sources can be split up as:

x− x = (A− I)(x− xa) smoothing+GyKb(b− b) model parameters+Gy∆f(x, b, b′) modelling error

+Gyε measurement noise

Some of these are easy to estimate, some are not.

79

MEASUREMENT NOISE

measurement noise = Gyε

This is the easiest component to evaluate.

ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a

known covariance matrix.

79

MEASUREMENT NOISE

measurement noise = Gyε

This is the easiest component to evaluate.

ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a

known covariance matrix.

The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

79

MEASUREMENT NOISE

measurement noise = Gyε

This is the easiest component to evaluate.

ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a

known covariance matrix.

The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

This is true of all of the error sources.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded

as the covariance of a climatology.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded

as the covariance of a climatology.

To estimate the smoothing error, you need to know the climatological covariance matrix.

80

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

The mean should be zero.

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded

as the covariance of a climatology.

To estimate the smoothing error, you need to know the climatological covariance matrix.

To do the job properly, you need the real climatology, not just some ad hoc matrix that has been

used as a constraint in the retrieval. The real climatology is often not available. Much of the

smoothing error can be in fine spatial scales that may never have been measured.

81

STANDARD EXAMPLE: ERROR COMPONENTS

Standard Deviations:

——— A Priori

· · · · Smoothing error

— — Retrieval noise

— · — Total retrieval error

82

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

82

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

If you have estimated the forward model parameters properly, their individual errors will be

unbiassed, so the mean error will be zero.

82

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

If you have estimated the forward model parameters properly, their individual errors will be

unbiassed, so the mean error will be zero.

The covariance is:

Sf = GyKbSbKTb GT

y

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

82

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

If you have estimated the forward model parameters properly, their individual errors will be

unbiassed, so the mean error will be zero.

The covariance is:

Sf = GyKbSbKTb GT

y

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

However remember that this is most probably a systematic, not a random error.

83

FORWARD MODEL ERRORS II

Modelling error

modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))

83

FORWARD MODEL ERRORS II

Modelling error

modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))

Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

83

FORWARD MODEL ERRORS II

Modelling error

modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))

Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

This can be hard to evaluate, because it requires a model f which includes the correct physics. If Fis simply a numerical approximation for efficiency’s sake, it may not be too difficult, but if f is not

known in detail, or so horrendously complex that no proper model is feasible, then modelling error

can be tricky to estimate.

This is also usually a systematic error.

84

ERROR COVARIANCE MATRIX

An Error Covariance Matrix S is defined as

Sij = E{εiεj}

Diagonal elements are the familiar error variances.

84

ERROR COVARIANCE MATRIX

An Error Covariance Matrix S is defined as

Sij = E{εiεj}

Diagonal elements are the familiar error variances.

The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

1

2(y − y)

TS−1(y − y)

«

84

ERROR COVARIANCE MATRIX

An Error Covariance Matrix S is defined as

Sij = E{εiεj}

Diagonal elements are the familiar error variances.

The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

1

2(y − y)

TS−1(y − y)

«Contours of the PDF are

(y − y)TS−1

(y − y) = const

i.e. ellipsoids, with arbitrary axes.

85

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

85

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

We would really like a PDF to be of the form

P (z) ∝Yi

exp(−z2i /2σ

2i )

i.e. each zi to have independent errors.

85

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

We would really like a PDF to be of the form

P (z) ∝Yi

exp(−z2i /2σ

2i )

i.e. each zi to have independent errors.

This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

1

2(x− x)

TLΛ−1LT (x− x)

«

85

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

We would really like a PDF to be of the form

P (z) ∝Yi

exp(−z2i /2σ

2i )

i.e. each zi to have independent errors.

This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

1

2(x− x)

TLΛ−1LT (x− x)

«

So if we put z = LT (x− x) then the zi are independent with variance λi.

86

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

86

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

• We have, in effect, transformed to the principal axes of the ellipsoid

86

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

• We have, in effect, transformed to the principal axes of the ellipsoid

• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in

terms of Error Patterns ei = λ12i li such that the total error is of the form

x− x =Xi

βiei

where the error patterns are orthogonal, and the coefficients β are independent random variables

with unit variance.

86

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

• We have, in effect, transformed to the principal axes of the ellipsoid

• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in

terms of Error Patterns ei = λ12i li such that the total error is of the form

x− x =Xi

βiei

where the error patterns are orthogonal, and the coefficients β are independent random variables

with unit variance.

87

STANDARD EXAMPLE: NOISE ERROR PATTERNS

There are only eight, same as the rank of the problem.

88

STANDARD EXAMPLE: SMOOTHING ERROR PATTERNS

The largest ten out of one hundred, the dimension of the state vector.

89

Unsolved Problem

89

Unsolved Problem

How do you describe an error covariance to the naive data user?

90

End of Section