Clive D Rodgers - European Space...

PRINCIPLES OF RETRIEVAL THEORY

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

ATMOSPHERIC REMOTE SENSING: THE INVERSE PROBLEM

Topics

1. Introduction

2. Bayesian approach

3. Information content

4. Error analysis and characterisation

C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory and Practice,

World Scientific Publishing Co., 2000.

WHAT IS AN INVERSE OR RETRIEVAL PROBLEM?

• Almost any measurement you make...

• When you measure some function of the quantity you really want, you have a retrieval

problem.

• Sometimes it’s trivial, sometimes it isn’t.

problem.

• Various aspects:

problem.

• Formulate the problem properly:

problem.

◦ Describe the measurement in terms of some Forward Model

◦ Don’t forget experimental error!

problem.

• Finding a solution, inverting the forward model

problem.

◦ Algebraic

◦ Numerical

◦ No unique solution

◦ No solution at all

problem.

◦ Algebraic

◦ Numerical

• Finding the ‘best’ solution

problem.

◦ Algebraic

◦ Numerical

◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

problem.

◦ Algebraic

◦ Numerical

◦ Uniqueness - a unique solution may not be the best...

◦ Accuracy

◦ Efficiency

• Understanding the answer

THINGS TO THINK ABOUT

• Why isn’t the problem trivial?

• Forward models which are not explicitly invertible

• Ill-conditioned or ill-posed problems

• Errors in the measurement (and in the forward model) can map into errors in the solution in

a non-trivial way.

• What to measure?

a non-trivial way.

• Does it actually contain the information you want?

a non-trivial way.

• Updating existing knowledge

a non-trivial way.

• You always have some prior knowledge of the ‘unknown’

a non-trivial way.

• the measurement improves that knowledge

a non-trivial way.

• the measurement may not be enough by itself to completely determine the unknown

a non-trivial way.

• Ill-posed problems

a non-trivial way.

• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

a non-trivial way.

• You cannot solve an ill-posed problem. You have to convert it into a well-posed problem.

• Which of an infinite manifold of solutions do you want?

MATHEMATICAL CONCEPTS I

• Measurement Vector: y = (y1, y2, ...ym)

• Any measurement is of a finite number of quantities.

• Arrange them as a vector for computational purposes

• State Vector: x = (x1, x2, ...xn)

• The desired quantity is often continuous - e.g. a temperature profile

• We can only make a finite number of measurements and calculations

• Express the unknown in terms of a finite number of parameters

• They do not all have to be of the same type

• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

• Examples:

◦ Temperature on a set of pressure levels, with a specified interpolation rule.

◦ Fourier coefficients for a set of waves

MATHEMATICAL CONCEPTS II

Using vectors, it is convenient to think in terms of linear algebra and vector spaces - even if the

forward model is not linear

Measurement Space

– Measurement space is the space of measurement vectors, dimension m.

Measurement Space

State Space

– State space is the space of state vectors, dimension n.

Measurement Space

State Space

– State space is the space of state vectors, dimension n.

Generally the two vector spaces will have different dimensions.

MATHEMATICAL CONCEPTS III

Forward Function and Model

– The Forward Function f(x) maps from state space onto measurement space, depending on the

physics of the measurement.

– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

– The Forward Model F(x)is the best we can do in the circumstances to model the forward

function

Inverse or Retrieval Method

– The inverse problem is one of finding an inverse mapping R(y):

Given a point in measurement space, which point or set of points in state space could havemapped into it?

NOTATION

I have tried to make it mnemonic as far as possible:

Matrices Bold upper case

Vectors Bold lower case

State vectors xMeasurement vectors yCovariance matrices S (based on σ, and not wanting to use Σ)

Measurement error ε

Forward model F(x) (Really ought to be f(x) it’s a vector)

Jacobian K (originally a Kernel of an integral transform)

Gain matrix G (I’ve used up K, which might stand for Kalman gain)

Averaging Kernel A

Different vectors, matrices of the same type are distinguished by superscripts, subscripts, etc:

a priori xa (background)

estimate xfirst guess x0

‘true’ value x (no subscript)

STANDARD ILLUSTRATION

Idealised thermal-emission nadir sounder represented as a linear forward model:

y = Kx + ε

K is the ‘weighting function’ matrix, ε is measurement error or noise.

y = Kx + ε

• Vertical coordinate is notionally ln(p), discretised at 100 levels from 0 in steps of 0.1 to 9.9 –

around 0 to 70 km.

y = Kx + ε

around 0 to 70 km.

• Eight channels (elements of y).

y = Kx + ε

around 0 to 70 km.

• State vector is notionally temperature at 100 levels.

y = Kx + ε

around 0 to 70 km.

• State vector is notionally temperature at 100 levels.

• Measurement error (when considered) is 0.5 K.

STANDARD WEIGHTING FUNCTIONS

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

NOISE FREE MEASUREMENTS

Row Space and Null Space

Consider an error-free linear measurement, equivalent to solving linear equations:

y = Kx

The rows of K are the weighting functions ki:

yi = kTi x

y = Kx

yi = kTi x

The ki are a set of vectors in state space; the measurements are projections of the state x onto

y = Kx

yi = kTi x

They span a subspace called the row space, of dimension equal to the rank of K,

p ≤ min(n,m). If p < m then the weighting functions are not linearly independent.

y = Kx

yi = kTi x

Only those components of x in the row space can be measured.

y = Kx

yi = kTi x

Only those components of x in the row space can be measured.

The null space is the part of state space which is not in the row space.

ILL-POSED AND WELL-POSED PROBLEMS

Ill or well posed? · · · Under- or over-determined? · · · Under- or over-constrained?

Which is which?

1. p = m = n. Well posed.

The number of unknowns is equal to the number of measurements, and they are all independent.

Which is which?

2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

Which is which?

2. p = m < n. Underconstrained, ill-posed

More unknowns than measurements, but the measurements are all independent.

3. p = n < m. Overconstrained, ill-posed

More measurements than unknowns, so they could be inconsistent, but the unknowns are all in

the row space, so there is information about all of them.

ILL-POSED AND WELL-POSED PROBLEMS II

Another category: Mixed-determined, the problem is both underconstrained and overconstrained.

1. p < m = n.

The number of unknowns is equal to the number of measurements, but the measurements

are not independent, so they could be inconsistent, and the number of independent pieces of

information is less than the number of unknowns.

1. p < m = n.

information is less than the number of unknowns. Simple example:

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

1. p < m = n.

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

1. p < m = n.

y1 = x1 + x2 + ε1 (1)

y2 = x1 + x2 + ε2 (2)

2. p < m < n.

More unknowns than measurements, but the measurements are not independent, so they could

be inconsistent.

3. p < n < m.

More measurements than the rank, so they could be inconsistent, more unknowns than the rank,

so not all are defined by the measurement.

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

ILL-POSED AND WELL-POSED PROBLEMS III

Summary

If p < n then the system is underconstrained; there is a null space.

If p < m then the system is overconstrained in some part of the row space.

How do we identify the row and null spaces?

One straightforward way is Gram-Schmidt orthogonalisation, but . . .

SINGULAR VECTOR DECOMPOSITION [>>]

Is the neatest way of doing the job.

MATRIX ALGEBRA – EIGENVECTORS

The eigenvalue problem associated with an Arbitrary square matrix A, of order n, is to find

eigenvectors l and scalar eigenvalues λ which satisfy

Al = λl

If A is a coordinate transformation, then l has the same representation in the untransformed and

transformed coordinates, apart from a factor λ.

Al = λl

This is the same as (A− λI)l = 0, a homogeneous set of equations, which can only have a

solution if |A− λI| = 0,

Al = λl

solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

Al = λl

solution if |A− λI| = 0,giving a polynomial equation of degree n, with n solutions for λ. They

will be complex in general.

An eigenvector can be scaled by an arbitrary factor. It is conventional to normalise them so that

lT l = 1 or l†l = 1 (Hermitian adjoint)

EIGENVECTORS II

We can assemble the eigenvectors as columns in a matrix L:

AL = LΛ

where Λ is a diagonal matrix, with the eigenvalues on the diagonal.

EIGENVECTORS II

AL = LΛ

Transpose LTAT = ΛLT

EIGENVECTORS II

AL = LΛ

Multiply by R = (LT )−1 AT = RΛLT

EIGENVECTORS II

AL = LΛ

Postmultiply by R ATR = RΛ

EIGENVECTORS II

AL = LΛ

R is the matrix of eigenvectors of AT .

EIGENVECTORS II

AL = LΛ

AT has the same eigenvalues as A.

EIGENVECTORS II

AL = LΛ

AT has the same eigenvalues as A.

In the case of a Symmetric matrix, S = ST we must have L = R, so that LTL = LLT = I or

LT = L−1, and the eigenvectors are orthogonal.

In this case the eigenvalues are real.

EIGENVECTORS – GEOMETRIC INTERPRETATION

Consider the scalar equation:

xTSx = 1

where S is symmetric. This is the equation of a quadratic surface centered on the origin, in

n-space.

xTSx = 1

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector,

xTSx = 1

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

xTSx = 1

n-space.

The normal to the surface is the vector ∂(xTSx)/∂x, i.e. Sx, and x is the radius vector, so

Sx = λx

is the problem of finding points where the normal and the radius vector are parallel. These are

where the principal axes intersect the surface.

At these points, xTSx = 1, so xTλx = 1 or:

So the eigenvalues are the reciprocals of the squares of the lengths of the principal axes.

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

GEOMETRIC INTERPRETATION II

The lengths are independent of the coordinate system, so will also be invariant under an arbitrary

orthogonal transformation, i.e. one in which (distance)2 = xTx is unchanged.

Consider using the eigenvectors of S to transform the equation for the quadratic surface:

xTLΛLTx = 1 or yTΛy = 1 orX

λiy2i = 1

where y = LTx or x = Ly. This transforms the surface into its principal axis representation.

EIGENVECTORS - USEFUL RELATIONSHIPS

Asymmetric Matrices Symmetric Matrices

AR = RΛ SL = LΛ

LTA = ΛLLT = R−1, RT = L−1 LT = L−1

LRT = LTR = I LLT = LTL = IA = RΛLT =

PλirilTi S = LΛLT =

PλililTi

AT = LΛRT =PλilirTi

A−1 = RΛ−1LT S−1 = LΛ−1LT

An = RΛnLT Sn = LΛnLT

LTAR = Λ LTSL = Λ

LTAnR = Λn LTSnL = Λn

LTA−1R = Λ−1 LTS−1L = Λ−1

|A| =Q

i λi |A| =Q

SINGULAR VECTOR DECOMPOSITION

The standard eigenvalue problem is meaningless for non-square matrices.

A ‘shifted’ eigenvalue problem associated with an arbitrary non-square matrix K, m rows and n

columns can be constructed:

Kv = λuKTu = λv (3)

where v, of length n, and u, of length m, are called the singular vectors of K.

This is equivalent to the symmetric problem:„O K

«„uv

«= λ

This is equivalent to the symmetric problem:„O K

«„uv

«= λ

«From (3) we can get

KTKv = λKTu = λ2vKKTu = λKv = λ2u (4)

so u and v are the eigenvectors of KKT (m×m) and KTK (n× n) respectively.

SINGULAR VECTOR DECOMPOSITION II

Care is needed in constructing a matrix of singular vectors, because individual u and v vectors

correspond to each other, yet there are potentially different numbers of v and u vectors.

If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

If the rank of K is p, then there will be p non-zero singular values, and both KKT and KTK will

have p non-zero eigenvalues.

The surplus eigenvectors will have zero eigenvalues, and can be discarded and we can write:„O K

«„UV

where Λ is p× p, U is m× p, and V is n× p.

There will be n+m− p more eigenvectors of the composite matrix, all with zero eigenvalue.

SINGULAR VECTORS - USEFUL RELATIONSHIPS

KV = UΛKTU = VΛ

UTKV = VTKTU = λ

K = UΛVT

KT= VΛUT

VTV = UTU = IpKKTU = UΛ2

KTKV = VΛ2(5)

SINGULAR VECTOR DECOMPOSITION [<<]

Express K as

mKn = mUpΛpVTn

where the subscripts indicate the sizes of the matrices.

Express K as

mKn = mUpΛpVTn

Is the neatest way of doing the job

Express K as

mKn = mUpΛpVTn

Then the forward model (no noise) becomes:

my1 = mKnx1 = mUpΛpVTnx1

Express K as

mKn = mUpΛpVTn

so that

pUTmy1 = pΛpV

Express K as

mKn = mUpΛpVTn

so that

pUTmy1 = pΛpV

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

Express K as

mKn = mUpΛpVTn

so that

pUTmy1 = pΛpV

y′ = Λx′

where y′ = UTy and x′ = VTx are both of order p.

The rows of VT , or the columns of V (in state space) are a basis for the row space of K. Similarly

the columns of U (in measurement space) are a basis of its column space.

We can also see that an exact solution is

x′ = Λ−1y′

x = VΛ−1UTy

This is only a unique solution if p = n. If p < n any multiples of vectors with zero singular values

can be added, and still satisfy the equations.

SVD OF THE STANDARD WEIGHTING FUNCTIONS

EXACT RETRIEVAL SIMULATION

The state vector is a set of

eight coefficients of a degree seven

polynomial.

(a) Original profile: US standard

atmosphere

(b) Exact retrieval with no

experimental error

(c) Exact retrieval with simulated

0.5 K error

APPROACHES TO INVERSE PROBLEMS

• Bayesian Approach

• What is the pdf of the state, given the measurement and the a priori ?

• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

• Optimisation Approaches:

• Maximum Likelihood

• Maximum A Posteriori

• Minimum Variance

• Backus-Gilbert - resolution/noise trade-off

• Ad hoc Approaches

• Relaxation

• Exact algebraic solutions

BAYESIAN APPROACH

This is the most general approach to the problem (that I know of).

Knowledge is represented in terms of probability density functions:

• P (x) is the a priori p.d.f. of the state, describing what we know about the state before we make

the measurement.

BAYESIAN APPROACH

the measurement.

• P (y) is the a priori p.d.f. of the measurement.

BAYESIAN APPROACH

the measurement.

• P (x, y) is the joint a priori p.d.f. of x and y.

BAYESIAN APPROACH

the measurement.

• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

BAYESIAN APPROACH

the measurement.

• P (y|x) is the p.d.f. of the measurement given the state - this depends on experimental error

and the forward function.

• P (x|y) is the p.d.f. of the state given the measurement - this is what we want to find.

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

BAYES THEOREM

The theorem states:

P (x, y) = P (x|y)P (y)

and of course

P (y, x) = P (y|x)P (x)

so that

P (x|y) =P (y|x)P (x)

If we have a prior p.d.f. for x, P(x), and we know statistically how y is related to x via P (y|x),

then we can find an un-normalised version of P (x|y), namely P (y|x)P (x), which can be

normalised if required.

BAYES THEOREM GEOMETRICALLY

APPLICATION OF THE BAYESIAN APPROACH

We need explicit forms for the p.d.f’s:

• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

• Assume that experimental error is Gaussian:

− lnP (y|x) =1

2(y − F(x))

TS−1ε (y − F(x)) + const

where F(x) is the Forward model:

y = F(x) + ε

and Sε is the covariance matrix of the experimental error, ε:

Sε = E{εεT} = E{(y − F(x))(y − F(x))T}

• Assume that the a priori p.d.f. is Gaussian (less justifiable):

− lnP (x) =1

2(x− xa)

TS−1a (x− xa) + const

i.e. x is distributed normally with mean xa and covariance Sa.

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

APPLICATION OF THE BAYESIAN APPROACH II

• Thus the pdf of the state when the measurements and the a priori are given is:

−2 lnP (x|y) = [y − F(x)]TS−1

ε [y − F(x)] + [x− xa]TS−1

a [x− xa] + const

• If we want a state estimate x rather than a p.d.f., then we must calculate some function of

P (x|y), such as its mean or its maximum

ZP (x|y)x dx or

dP (x|y)

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM

The linear problem has a forward model:

F(x) = Kx

so the p.d.f. P (x|y) becomes:

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

F(x) = Kx

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

This is quadratic in x, so has to be of the form:

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

F(x) = Kx

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

F(x) = Kx

−2 lnP (x|y) = [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa] + c1

−2 lnP (x|y) = [x− x]T S−1

[x− x] + c2

Equate the terms that are quadratic in x:

xTKTS−1ε Kx + xTS−1

a x = xT S−1x

giving

S−1= KTS−1

ε K + S−1a

• ‘The Fisher Information Matrix’.

BAYESIAN SOLUTION FOR THE LINEAR PROBLEM II

Equating the terms linear in xT gives:

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

(−Kx)TS−1

ε (y) + (x)TS−1

a (−xa) = xT S−1(−x)

This must be valid for any x. Cancel the xT ’s, and substitute for S−1:

KTS−1ε y + S−1

a xa = (KTS−1ε K + S−1

giving:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε y + S−1

The mean x and the covariance S define the full posterior pdf.

A GEOMETRIC INTERPRETATION OF THE SOLUTION

AN ALGEBRAIC INTERPRETATION OF THE SOLUTION

The expected value is:

x = (KTS−1ε K + S−1

a )−1

a xa) (1)

x = (KTS−1ε K + S−1

a )−1

a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxe + S−1

x = (KTS−1ε K + S−1

a )−1

a xa) (1)

Underconstrained case

There must exist at least one ‘exact’ solution xe = Gy in the sense that Kxe = y, i.e. KG = I.

For example G = KT (KKT )−1.

Replace y by Kxe in (1):

x = (KTS−1ε K + S−1

a )−1

Overconstrained case

The least squares solution xl satisfies KTS−1ε Kxl = KTS−1

Inserting this in (1) gives:

x = (KTS−1ε K + S−1

a )−1

(KTS−1ε Kxl + S−1

AN INTERPRETATION OF THE SOLUTION II

x = (KTS−1ε K + S−1

a )−1

x = (KTS−1ε K + S−1

a )−1

x = (KTS−1ε K + S−1

a )−1

x = (KTS−1ε K + S−1

a )−1

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

x = (KTS−1ε K + S−1

a )−1

x = (KTS−1ε K + S−1

a )−1

• Both represent a weighted mean of a solution (exact, xe or least squares xl) with xa using

relative weights KTS−1ε K and S−1

a respectively – their Fisher information matrices.

• This is exactly like the familiar combination of scalar measurements x1 and x2 of an unknown

x, with variances σ21 and σ2

2 respectively:

x = (1/σ21 + 1/σ

22)−1

(x1/σ21 + x2/σ

End of Section

INFORMATION CONTENT AND ERROR ANALYSIS

Clive D Rodgers

Atmospheric, Oceanic and Planetary Physics

University of Oxford

ESA Advanced Atmospheric Training Course

September 15th – 20th, 2008

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

INFORMATION CONTENT OF A MEASUREMENT

Information in a general qualitative sense:

Conceptually, what does y tell you about x?

We need to answer this

• to determine if a conceptual instrument design actually works

• to optimise designs

Use the linear problem for simplicity to illustrate the ideas.

y = Kx + ε

SHANNON INFORMATION

• The Shannon information content of a measurement of x is the change in the entropy of the

probability density function describing our knowledge of x.

SHANNON INFORMATION

• Entropy is defined by:

S{P} = −ZP (x) log(P (x)/M(x))dx

M(x) is a measure function. We will take it to be constant.

SHANNON INFORMATION

• Compare this with the statistical mechanics definition of entropy:

S = −kXi

pi ln pi

P (x)dx corresponds to pi. 1/M(x) is a kind of scale for dx.

SHANNON INFORMATION

S = −kXi

pi ln pi

• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

SHANNON INFORMATION

S = −kXi

pi ln pi

• The Shannon information content of a measurement is the change in entropy between the p.d.f.

before, P (x), and the p.d.f. after, P (x|y), the measurement:

H = S{P (x)} − S{P (x|y)}

What does this all mean?

ENTROPY OF A BOXCAR PDF

Consider a uniform p.d.f in one dimension, constant in (0,a):

P (x) = 1/a 0 < x < a

and zero outside.

P (x) = 1/a 0 < x < a

and zero outside.The entropy is given by

S = −Z a

P (x) = 1/a 0 < x < a

S = −Z a

«dx = ln a

P (x) = 1/a 0 < x < a

S = −Z a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

«dv = lnV

P (x) = 1/a 0 < x < a

S = −Z a

«dx = ln a

Similarly, the entropy of any constant pdf in a finite volume V of arbitrary shape is:

S = −ZV

«dv = lnV

i.e the entropy is the log of the volume of state space occupied by the p.d.f.

ENTROPY OF A GAUSSIAN PDF

Consider the Gaussian distribution:

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

If you evaluate the entropy of a Gaussian distribution you will find it is proportional to log |S|12 .

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

• The contours of P (x) in n-space are ellipsoidal, described by

(x− x)TS−1

(x− x) = constant

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

(x− x)TS−1

(x− x) = constant

• The principal axes of the ellipsoid are the eigenvectors of S, and their lengths are proportional

to the square roots of the corresponding eigenvalues.

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

(x− x)TS−1

(x− x) = constant

• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

P (x) =1

(2π)n2 |S|

exp[−1

2(x− x)

TS−1(x− x)]

(x− x)TS−1

(x− x) = constant

• The volume of the ellipsoid is proportional to the root of the product of the eigenvalues, which

is proportional to |S|12 .

• Therefore entropy is the log of the volume enclosed by some particular contour of P (x). A

‘volume of uncertainty’.

ENTROPY AND INFORMATION

Information content is the change in entropy,

• i.e. the log of the ratio of the volumes of uncertainty before and after making a measurement.

• A generalisation of ‘signal to noise’.

• In the Boxcar case, the log of the ratio of the volumes before and after

• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

• In the Gaussian case:

H = log |Sa| − log |S| = − log |(KTS−1ε K + S−1

a )−1S−1

• minus the log of the determinant of the weight of xa in the Bayesian expectation.

The log of the ratio of the volumes of the ellipsoids

DEGREES OF FREEDOM FOR SIGNAL AND NOISE

The state estimate that maximises P (x|y) in the linear Gaussian case is the one which minimises

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

The r.h.s. has initially m+ n degrees of freedom, of which n are fixed by choosing x to be x, so

the expected value of χ2 is m.

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

These m degrees of freedom can be assigned to degrees of freedom for noise dn and degrees of

freedom for signal ds according to:

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

dn = E{[y −Kx]TS−1

ε [y −Kx]}

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

ε [y −Kx]}

ds = E{[x− xa]TS−1

a [x− xa]}

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

ε [y −Kx]}

a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

= [y −Kx]TS−1

ε [y −Kx] + [x− xa]TS−1

a [x− xa]

ε [y −Kx]}

a [x− xa]}

Using tr(CD)=tr(DC), we can see that

ds = E{tr([x− xa][x− xa]TS−1

a )}= tr(E{[x− xa][x− xa]

T}S−1a ) (6)

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

= tr(KSaKT(KSaK

T+ Sε)

DEGREES OF FREEDOM FOR SIGNAL AND NOISE II

With some manipulation we can find

ds = tr((KTS−1ε K + S−1

a )−1KTS−1

= tr(KSaKT(KSaK

T+ Sε)

−1) (7)

dn = tr((KTS−1ε K + S−1

a )−1S−1

a ) +m− n= tr(Sε(KSaK

T+ Sε)

−1) (8)

INDEPENDENT MEASUREMENTS

If the measurement error covariance is not diagonal, the elements of the y vector will not be

statistically independent. Likewise for the a priori.

The measurements will not be independent functions of the state if K is not diagonal.

Therefore it helps to understand where the information comes from if we transform to a different

basis.

First, statistical independence. Define:

y = S−1

2ε y x = S

basis.

y = S−1

2ε y x = S

The transformed covariances Sa and Sε both become unit matrices. [>>]

SQUARE ROOTS OF MATRICES

The square root of an arbitrary matrix is defined as A12 where

12 = A

Using An = RΛnLT for n = 1/2:

A12 = RΛ

12 = A

A12 = RΛ

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

12 = A

A12 = RΛ

This square root of a matrix is not unique, because the diagonal elements of Λ12 in RΛ

12LT can

have either sign, leading to 2n possibilities.

We only use square roots of symmetric covariance matrices. In this case S12 = LΛ

12LT is

symmetric.

SQUARE ROOTS OF SYMMETRIC MATRICES

Symmetric matrices can also have non-symmetric roots satisfying S = (S12)TS

12 , of which the

Cholesky decomposition:

S = TTT

where T is upper triangular, is the most useful.

12 , of which the

S = TTT

There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

12 , of which the

S = TTT

There are an infinite number of non-symmetric square roots: if S12 is a square root, then clearly so

is XS12 where X is any orthonormal matrix.

The inverse symmetric square root is S−12 = LΛ−

12LT .

The inverse Cholesky decomposition is S−1 = T−1T−T .

The inverse square root T−1 is triangular, and its numerical effect is implemented efficiently by

back substitution.

INDEPENDENT MEASUREMENTS[<<]

basis.

y = S−1

2ε y x = S

The transformed covariances Sa and Sε both become unit matrices.

basis.

y = S−1

2ε y x = S

The forward model becomes:

y = Kx + ε

where K = S−1

2ε KS

basis.

y = S−1

2ε y x = S

y = Kx + ε

where K = S−1

2ε KS

The solution covariance becomes:ˆS = (In + KT K)

TRANSFORM AGAIN

Now make K diagonal. Rotate both x and y to yet another basis, defined by the singular vectors

y = Kx + ε → y = UΛVT x + ε

TRANSFORM AGAIN

Define:

x′ = VT x y′ = UT y ε′= UT

y′ = Λx′ + ε′

TRANSFORM AGAIN

Define:

y′ = Λx′ + ε′

The Jacobian is now diagonal, Λ, and the a priori and noise covariances are still unit matrices,

hence the solution covariance becomes:

ˆS = (In + KT K)−1 → S′ = (In + Λ2

which is diagonal,

TRANSFORM AGAIN

Define:

y′ = Λx′ + ε′

ˆS = (In + KT K)−1 → S′ = (In + Λ2

which is diagonal, and the solution itself is

x′ = (In + Λ2)−1

(Λy′ + x′a)

TRANSFORM AGAIN

Define:

y′ = Λx′ + ε′

ˆS = (In + KT K)−1 → S′ = (In + Λ2

x′ = (In + Λ2)−1

(Λy′ + x′a)

not x′ = Λ−1y′ as you might expect from (1).

TRANSFORM AGAIN

Define:

y′ = Λx′ + ε′

ˆS = (In + KT K)−1 → S′ = (In + Λ2

x′ = (In + Λ2)−1

(Λy′ + x′a)

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

TRANSFORM AGAIN

Define:

y′ = Λx′ + ε′

ˆS = (In + KT K)−1 → S′ = (In + Λ2

x′ = (In + Λ2)−1

(Λy′ + x′a)

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are well measured

• Elements for which λi � 1 or (1 + λ2i )−1 � 1 are poorly measured.

INFORMATION

Shannon Information in the Transformed Basis

Because it is a ratio of volumes, the linear transformation does not change the information

content. So consider information in the x′, y′ system:

H = S{S′a} − S{S′}

= −1

2log(|In|) +

2log(|(Λ2

+ I)−1|)

2log(1 + λ

2i ) (9)

DEGREES OF FREEDOM

Degres of Freedom in the Transformed Basis

The number of independent quantities measured can be thought of as the number of singular

values for which λi � 1

The degrees of freedom for signal is

ds =Xi

λ2i (1 + λ

2i )−1

It is also the sum of the eigenvalues of In − ˆS.

Summary: for each independent component x′i

• The information content is 12 log(1 + λ2

• The degrees of freedom for signal is λ2i (1 + λ2

i )−1

THE OBSERVABLE AND NULL SPACES

• The part of measurement space that can be seen is that spanned by the weighting functions.

Anything outside that is in the null space of K.

• Any orthogonal linear combination of the weighting functions will form a basis (coordinate

system) for the observable space. An example is those singular vectors of K which have non-zero

singular values.

• The vectors which have zero singular values form a basis for the null space.

singular values.

• Any component of the state in the null space maps onto the origin in measurement space.

singular values.

• Any component of the state in the null space maps onto the origin in measurement space.

• This implies that there are distinct states, in fact whole subspaces, which map onto the same

point in measurement space, and cannot be distinguished by the measurement.

THE NEAR NULL SPACE

However

• the solution can have components in the null space - from the a priori.

THE NEAR NULL SPACE

However

• components observable in principle can have near zero contributions from the measurement, the

‘near null space’

THE NEAR NULL SPACE

However

In the x′, y′ system:

• vectors with λ = 0 are in the null space

THE NEAR NULL SPACE

However

• vectors with λ� 1 are in the near null space

THE NEAR NULL SPACE

However

• vectors with λ� 1 are in the near null space

• vectors with λ >∼ 1 are in the non-null space, and are observable.

Diagnostics for the Standard Case

Table 1: Singular values of K, together with contributions of each vector to the degrees of freedom

ds and information content H for both covariance matrices. Measurement noise variance is 0.25 K2.

Diagonal covariance Full covariance

i λi dsi Hi (bits) λi dsi Hi (bits)

1 6.51929 0.97701 2.72149 27.81364 0.99871 4.79865

2 4.79231 0.95827 2.29147 18.07567 0.99695 4.17818

3 3.09445 0.90544 1.70134 9.94379 0.98999 3.32105

4 1.84370 0.77269 1.06862 5.00738 0.96165 2.35227

5 1.03787 0.51858 0.52731 2.39204 0.85123 1.37443

6 0.55497 0.23547 0.19368 1.09086 0.54337 0.56546

7 0.27941 0.07242 0.05423 0.46770 0.17948 0.14270

8 0.13011 0.01665 0.01211 0.17989 0.03135 0.02297

totals 4.45653 8.57024 5.55272 16.75571

Sa = 100I K2

Saij = 100[1− exp(−

|zi − zj|h

where h = 1 in ln(p) coordinates.

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

FTIR Measurements of CO

30 levels; 894 measurements:

Apparently heavily overconstrained.

Table 2: Singular Values of K

5.345 3.498 0.033 0.0046 7.15e-4 2.56e-4

7.76e-5 2.13e-3 1.71e-5 1.38e-5 9.01e-6 6.73e-6

5.82e-6 4.79e-6 2,87e-6 3.52e-6 3.75e-6 1.91e-6

9.83e-7 2.37e-7 7.71e-7 1.18e-7 1.48e-6 1.95e-7

1.37e-7 6.67e-8 3.50e-8 3.37e-8 5.83e-9 6.29e-9

Noise is 0.03 in these units;

Prior variance is about 1.

There are 2.5 degrees of freedom.

Lots of near null space.

A SIMULATED RETRIEVAL

——— ’true state’

· · · · a priori

— — noise free simulation

— · — with simulated noise

How do we understand the nature of a

retrieval?

ERROR ANALYSIS AND CHARACTERISATION

Conceptually

• The measurement y is a function of some unknown state x:

y = f(x, b) + ε

where:

Conceptually

y = f(x, b) + ε

where:

• y is the measurement vector, length m

Conceptually

y = f(x, b) + ε

where:

• x is the state vector, length n

Conceptually

y = f(x, b) + ε

where:

• f is a function describing the physics of the measurement,

Conceptually

y = f(x, b) + ε

where:

• b is a set of ‘known’ parameters of this function,

Conceptually

y = f(x, b) + ε

where:

• b is a set of ‘known’ parameters of this function,

• ε is measurement error, with covariance Sε.

ERROR ANALYSIS & CHARACTERISATION II

• The retrieval x is a function of the form:

x = R(y, b, c)

where:

x = R(y, b, c)

where:

• R represents the retrieval method

x = R(y, b, c)

where:

• b is our estimate of the forward function parameters b

x = R(y, b, c)

where:

• b is our estimate of the forward function parameters b• c represents any parameters used in the inverse method that do not affect the measurement, e.g.

a priori.

x = R(y, b, c)

where:

a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

x = R(y, b, c)

where:

a priori.

The Transfer function

Thus the retrieval is related to the ‘truth’ x formally by:

x = R(f(x, b) + ε, b, c)

which may be regarded as the transfer function of the measurement and retrieval system as a

whole:

x = T(x, b, ε, b, c)

ERROR ANALYSIS & CHARACTERISATION III

Characterisation means evaluating:

• ∂x/∂x = A, sensitivity to actual state: Averaging Kernel Matrix

Error analysis involves evaluating:

• ∂x/∂ε = Gy, sensitivity to noise (or to measurement!)

• ∂x/∂b = Gb, sensitivity to non-retrieved parameters

• ∂x/∂c = Gc, sensitivity to retrieval method parameters

and understanding the effect of replacing f by a numerical Forward Model F.

THE FORWARD MODEL

We often need to approximate the forward function by a Forward Model:

F(x, b) ' f(x, b)

Where F models the physics of the measurement, including the instrument, as well as we can.

THE FORWARD MODEL

F(x, b) ' f(x, b)

• It usually has parameters b which have experimental error

THE FORWARD MODEL

F(x, b) ' f(x, b)

• The vector b is not a target for retrieval

THE FORWARD MODEL

F(x, b) ' f(x, b)

• The vector b is not a target for retrieval

• There may be parameters b′ of the forward function that are not included in the forward model:

F(x, b) ' f(x, b, b′)

LINEARISE THE TRANSFER FUNCTION

The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components of c.

x = R(f(x, b, b′) + ε, b, xa, c)

Replace f by F + ∆f :

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

x = R(f(x, b, b′) + ε, b, xa, c)

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

where ∆f is the error in the forward model relative to the real physics:

∆f = f(x, b, b′)− F(x, b)

x = R(f(x, b, b′) + ε, b, xa, c)

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

x = R(f(x, b, b′) + ε, b, xa, c)

x = R(F(x, b) + ∆f(x, b, b′) + ε, b, xa, c)

∆f = f(x, b, b′)− F(x, b)

Linearise F about x = xa, b = b:

x = R(F(xa, b) + Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε, b, xa, c)

where Kx = ∂F/∂x (the weighting function) and Kb = ∂F/∂b.

LINEARISE THE TRANSFER FUNCTION II

Linearise R with respect to its first argument, y:

LINEARISE THE TRANSFER FUNCTION II

Linearise R with respect to its first argument, y:

x = R[F(xa, b), b, xa, c] + Gy[Kx(x− xa) + Kb(b− b) + ∆f(x, b, b′) + ε]

where Gy = ∂R/∂y (the contribution function)

CHARACTERISATION

Some rearrangement gives:

x− xa = R(F(xa, b), b, xa, c)− xa bias+A(x− xa) smoothing

+Gyεy error (10)

CHARACTERISATION

+Gyεy error (10)

A = GyKx = ∂x/∂x

the averaging kernel,

CHARACTERISATION

+Gyεy error (10)

the averaging kernel, and

εy = Kb(b− b) + ∆f(x, b, b′) + ε

is the total error in the measurement relative to the forward model.

CHARACTERISATION

+Gyεy error (10)

the averaging kernel, and

εy = Kb(b− b) + ∆f(x, b, b′) + ε

is the total error in the measurement relative to the forward model.

This is the error you would get on doing a simulated error-free retrieval of the a priori.

A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

This is the error you would get on doing a simulated error-free retrieval of the a priori.

A priori is what you know about the state before you make the measurement. Any reasonable

retrieval method should return the a priori given a measurement vector that corresponds to it, so

the bias should be zero.

But check your system to be sure it is. . .

THE AVERAGING KERNEL

The retrieved state is a smoothed version of the true state with smoothing functions given by the

rows of A, plus error terms:

x = xa + A(x− xa) + Gyεy = (I− A)xa + Ax + Gyεy

You can either:

• accept that the retrieval is an estimate of a smoothed state, not the true state

You can either:

or• consider the retrieval as an estimate of the true state, with an error contribution due to

smoothing.

You can either:

smoothing.

The error analysis is different in the two cases, because in the second case there is an extra term.

You can either:

smoothing.

• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

You can either:

smoothing.

• If the state represents a profile, then the averaging kernel is a smoothing function with a widthand an area.

• The width is a measure of the resolution of the observing system

• The area (generally between zero and unity) is a simple measure of the amount of real

information that appears in the retrieval.

STANDARD EXAMPLE: AVERAGING KERNEL

STANDARD EXAMPLE: RETRIEVAL GAIN

THE OTHER TWO PARAMETERS

“The retrieved quantity is expressed as:

x = R(f(x, b, b′) + ε, b, xa, c)

where we have also separated xa, the a priori, from other components

of c.”

We should also look at the sensitivity of the retrieval to the inverse model parameters, xa and c.

x = R(f(x, b, b′) + ε, b, xa, c)

of c.”

The linear expansion in x, b and ε gave:

x = [R(F(xa, b), b, xa, c)] + A(x− xa) + Gyεy

x = R(f(x, b, b′) + ε, b, xa, c)

of c.”

We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

x = R(f(x, b, b′) + ε, b, xa, c)

of c.”

We argued that the term in square brackets should be equal to xa, for any reasonable retrieval.

This has the consequence that, at least to first order, ∂R/∂c = 0 for any reasonable retrieval.

THE OTHER TWO PARAMETERS II

It also follows that:∂R∂xa

=∂x∂xa

= In − A

For any reasonable retrieval.

=∂x∂xa

= In − A

Thus the reasonableness criterion has the consequence that there can be no inverse model

parameters c that matter, other than xa.

=∂x∂xa

= In − A

Incidentally:

The term in square brackets should not depend on b either.

=∂x∂xa

= In − A

Incidentally:

The term in square brackets should not depend on b either.

This implies that Gb + GyKb = 0, which is no more than:

∂x∂b

+∂x∂y∂y∂b

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

ERROR ANALYSIS

Some further rearrangement gives for the error in x:

x− x = (A− I)(x− xa) smoothing+Gyεy measurement error

where the bias term has been dropped, and:

εy = Kb(b− b) + ∆f(x, b, b′) + ε

Thus the error sources can be split up as:

x− x = (A− I)(x− xa) smoothing+GyKb(b− b) model parameters+Gy∆f(x, b, b′) modelling error

+Gyε measurement noise

Some of these are easy to estimate, some are not.

MEASUREMENT NOISE

measurement noise = Gyε

This is the easiest component to evaluate.

ε is usually random noise, and is often unbiassed and uncorrelated between channels, and has a

known covariance matrix.

MEASUREMENT NOISE

The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

MEASUREMENT NOISE

The covariance of the measurement noise is:

Sn = GySεGTy

Note that Sn is not necessarily diagonal, so there will be errors which are correlated between

different elements of the state vector.

This is true of all of the error sources.

SMOOTHING ERROR

To estimate the actual smoothing error, you need to know the true state:

smoothing error = (A− I)(x− xa)

To characterise the statistics of this error, you need its mean and covariance over some ensemble.

SMOOTHING ERROR

The mean should be zero.

SMOOTHING ERROR

The covariance is:

Ss = E{(A− I)(x− xa)(x− xa)T(A− I)T}

SMOOTHING ERROR

The covariance is:

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

SMOOTHING ERROR

The covariance is:

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

= (A− I)Sa(A− I)T

SMOOTHING ERROR

The covariance is:

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

where Sa is the covariance of an ensemble of states about the a priori state. This is best regarded

as the covariance of a climatology.

SMOOTHING ERROR

The covariance is:

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

To estimate the smoothing error, you need to know the climatological covariance matrix.

SMOOTHING ERROR

The covariance is:

= (A− I)E{(x− xa)(x− xa)T}(A− I)T

To estimate the smoothing error, you need to know the climatological covariance matrix.

To do the job properly, you need the real climatology, not just some ad hoc matrix that has been

used as a constraint in the retrieval. The real climatology is often not available. Much of the

smoothing error can be in fine spatial scales that may never have been measured.

STANDARD EXAMPLE: ERROR COMPONENTS

Standard Deviations:

——— A Priori

· · · · Smoothing error

— — Retrieval noise

— · — Total retrieval error

FORWARD MODEL ERRORS

Forward model parameters

forward model parameter error = GyKb(b− b)

This one is easy. (In principle)

If you have estimated the forward model parameters properly, their individual errors will be

unbiassed, so the mean error will be zero.

The covariance is:

Sf = GyKbSbKTb GT

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

The covariance is:

Sf = GyKbSbKTb GT

where Sb is the error covariance matrix of b, namely E{(b− b)(b− b)T}

However remember that this is most probably a systematic, not a random error.

FORWARD MODEL ERRORS II

Modelling error

modelling error = Gy∆f = Gy(f(x, b, b′)− F(x, b))

Modelling error

Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

Modelling error

Note that this is evaluated at the true state, and with the true value of b, but hopefully its

sensitivity to these quantities is not large.

This can be hard to evaluate, because it requires a model f which includes the correct physics. If Fis simply a numerical approximation for efficiency’s sake, it may not be too difficult, but if f is not

known in detail, or so horrendously complex that no proper model is feasible, then modelling error

can be tricky to estimate.

This is also usually a systematic error.

ERROR COVARIANCE MATRIX

An Error Covariance Matrix S is defined as

Sij = E{εiεj}

Diagonal elements are the familiar error variances.

Sij = E{εiεj}

The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

2(y − y)

TS−1(y − y)

Sij = E{εiεj}

The corresponding probability density function (PDF), if Gaussian, is:

P (y) ∝ exp

„−

2(y − y)

TS−1(y − y)

«Contours of the PDF are

(y − y)TS−1

(y − y) = const

i.e. ellipsoids, with arbitrary axes.

CORRELATED ERRORS

• How do we understand an error covariance matrix?

• What corresponds to error bars for a profile?

CORRELATED ERRORS

We would really like a PDF to be of the form

P (z) ∝Yi

exp(−z2i /2σ

i.e. each zi to have independent errors.

CORRELATED ERRORS

P (z) ∝Yi

exp(−z2i /2σ

This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

2(x− x)

TLΛ−1LT (x− x)

CORRELATED ERRORS

P (z) ∝Yi

exp(−z2i /2σ

This can be done by diagonalising S, and substituting S = LΛLT , where L is matrix of

eigenvectors li. Then

P (x) ∝ exp

„−

2(x− x)

TLΛ−1LT (x− x)

So if we put z = LT (x− x) then the zi are independent with variance λi.

ERROR PATTERNS

• Error covariance is described in state space by an ellipsoid

ERROR PATTERNS

• We have, in effect, transformed to the principal axes of the ellipsoid

ERROR PATTERNS

• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in

terms of Error Patterns ei = λ12i li such that the total error is of the form

x− x =Xi

where the error patterns are orthogonal, and the coefficients β are independent random variables

with unit variance.

ERROR PATTERNS

• We can express the error in e.g. a state vector estimate x, due to an error covariance S, in

terms of Error Patterns ei = λ12i li such that the total error is of the form

x− x =Xi

where the error patterns are orthogonal, and the coefficients β are independent random variables

with unit variance.

STANDARD EXAMPLE: NOISE ERROR PATTERNS

There are only eight, same as the rank of the problem.

STANDARD EXAMPLE: SMOOTHING ERROR PATTERNS

The largest ten out of one hundred, the dimension of the state vector.

Unsolved Problem

How do you describe an error covariance to the naive data user?

End of Section

Clive D Rodgers - European Space...

Documents

Transcript of Clive D Rodgers - European Space...

Earth’s Atmosphere Lecture by B.Kerridge, RALearth.esa.int/atmostraining2008/Earth_Atmosphere_Kerridge_150908_print.pdf · 15-20 th Sept’08, Oxford, UK Earth’s Atmosphere –

SPECIFICATIONS - Rodgers Instruments

TOPPEIt - Rodgers Instruments

RODGERS AND HAMMERSTEIN

Mike Rodgers 2015

PAUL RODGERS Interview

Richards Rodgers Parti

N. Rodgers

Hayward Rodgers

Ben rodgers portfolio

Joseph Rodgers

Rodgers Bulls 2012

Lecture 3 Tropospheric Chemistry Data Assimilation - ESAearth.esa.int/atmostraining2008/Fri_elbern_DAtrop_v1.pdf · Lecture 3 Tropospheric Chemistry Data Assimilation H. Elbern Rhenish

RODGERS CLASSIC

OBOUGD 110 - Rodgers

Rodgers 700B

Rodgers John - nasonline.org

Digital Room Correction in Rodgers Organs and Audi… · Rodgers Organs. plus... Rodgers Audio and Speakers. 2016 Rodgers Dealer Meeting. September 20, 2016. John Pospisil – Manager

Aaron rodgers

Carmel Rodgers