MSc Methods XX: YY

29
MSc Methods XX: YY Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: [email protected] www.geog.ucl.ac.uk/~mdisney

description

MSc Methods XX: YY. Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: [email protected] www.geog.ucl.ac.uk /~ mdisney. Lecture outline. Two parameter estimation Some stuff Uncertainty & linear approximations - PowerPoint PPT Presentation

Transcript of MSc Methods XX: YY

Page 1: MSc Methods XX: YY

MSc Methods XX: YY

Dr. Mathias (Mat) Disney

UCL Geography

Office: 113, Pearson Building

Tel: 7670 0592

Email: [email protected]

www.geog.ucl.ac.uk/~mdisney

Page 2: MSc Methods XX: YY

• Two parameter estimation– Some stuff

• Uncertainty & linear approximations – parameter estimation, uncertainty– Practical – basic Bayesian estimation

• Linear Models– parameter estimation, uncertainty– Practical – basic Bayesian estimation

Lecture outline

Page 3: MSc Methods XX: YY

• Example: signal in the presence of background noise• Very common problem: e.g. peak of lidar return from forest canopy?

Presence of a star against a background? Transitioning planet?

Parameter estimation continued

A

B

x0

See p 35-60 in Sivia & Skilling

Page 4: MSc Methods XX: YY

• Data are e.g. photon counts in a particular channel, so expect count in kth channel Nk to be where A, B are signal and background

• Assume peak is Gaussian (for now), width w, centered on xo so ideal datum Dk then given by

• Where n0 is constant (integration time). Unlike Nk, Dk not a whole no., so actual datum some integer close to Dk

• Poisson distribution is pdf which represents this property i.e.

Gaussian peak + background

Page 5: MSc Methods XX: YY

• Poisson: prob. of N events occurring over some fixed time interval if events occur at a known rate independently of time of previous event

• If expected number over a given interval is D, prob. of exactly N events

Aside: Poisson distribution

Used in discrete counting experiments, particularly cases where large number of outcomes, each of which is rare (law of rare events) e.g. • Nuclear decay• No. of calls arriving at a call centre per

minute – large number arriving BUT rare from POV of general population….

See practical page for poisson_plot.py

Page 6: MSc Methods XX: YY

• So likelihood for datum Nk is

• Where I includes reln. between expected counts Dk and A, B i.e. for our Gaussian model, xo, w, no are given (as is xk).

• IF data are independent, then likelihood over all M data is just product of probs of individual measurements i.e.

• As usual, we want posterior pdf of A, B given {Nk}, I

Gaussian peak + background

Page 7: MSc Methods XX: YY

• Prior? Neither A, nor B can be –ve so most naïve prior pdf is

• To calc constant we need Amax, Bmax but we may assume they are large enough not to cut off posterior pdf i.e. Is effectively 0 by then

• So, log of posterior• And, as before, we want A, B to maximise L• Reliability is width of posterior about that point

Gaussian peak + background

Page 8: MSc Methods XX: YY

• ‘Generate’ experimental data (see practical)• n0 chosen to give max expectation Dk = 100. Why do Nk > 100?

Gaussian peak + background

Page 9: MSc Methods XX: YY

Gaussian peak + background• Posterior PDF is now 2D• Max L A=1.11, B=1.89 (actual 1.09, 1.93)

Page 10: MSc Methods XX: YY

• Changing the experimental setup?– E.g. reducing counts per bin (SNR) e.g. because of

shorter integration time, lower signal threshold etc.

Gaussian peak + background

Same signal, but data look much noisier – broader PDFTruncated at 0 – prior important

Page 11: MSc Methods XX: YY

• Changing the experimental setup?– Increasing number of bins (same count rate, but spread

out over twice measurement range)

Gaussian peak + background

Much narrower posterior PDFBUT reduction mostly in B

Page 12: MSc Methods XX: YY

• More data, so why uncertainty in A, B not reduced equally?– Data far from origin only tell you about background– Conversely – restrict range of x over which data are collected

(fewer bins) it is hard to distinguish A from B (signal from noise)– Skewed & high correlation between A, B

Gaussian peak + background

Page 13: MSc Methods XX: YY

• If only interested in A then according to marginalisation rule integrate joint posterior PDF wrt B i.e.

• So • See previous experimental cases…..

Marginal distribution

15 bins, ~100 counts maximum 15 bins, ~10 counts maximum

1 2

Page 14: MSc Methods XX: YY

• Marginal conditional• Marginal pdf: takes into account prior ignorance of B• Conditional pdf: assumes we know B e.g. via calibration• Least difference when measurements made far from A (3)• Most when data close to A (4)

Marginal distribution

31 bins, ~100 counts maximum 7 bins, ~100 counts maximum

43

Page 15: MSc Methods XX: YY

• Posterior L shows reliability of parameters & we want optimal• For parameters {Xj}, with post.

• Optimal {Xoj} is set of simultaneous eqns

• For i = 1, 2, …. Nparams

• So for log of P i.e. and for 2 parameters we want

• where

Uncertainty

Max??

Sivia & Skilling (2006) Chapter 3, p 35-51

Page 16: MSc Methods XX: YY

• To estimate reliability of best estimate we want spread of P about (Xo, Yo)

• Do this using Taylor expansion i.e.

• Or

• So for the first three terms (to quadratic) we have

• Ignore (X-Xo) and (Y-Yo) terms as expansion is about maximum

Uncertainty

http://en.wikipedia.org/wiki/Taylor_seriesSivia & Skilling (2008) Chapter 3, p 35-51

Page 17: MSc Methods XX: YY

• So mainly concerned with quadratic terms. Rephrase via matrices• For quadratic term, Q

• Where

Uncertainty

e2

e1

X

Y

Q=k

Xo

Yo

• Contour of Q in X-Y plane i.e. line of constant L

• Orientation and eccentricity determined by A, B, C

• Directions e1 and e2 are the eigenvectors of 2nd derivative matrices A, B, C

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 18: MSc Methods XX: YY

• So (x,y) component of e1 and e2 given by solutions of

• Where eigenvalues λ1 and λ2 are 1/k2 (& k1,2 are widths of ellipse along principal directions)

• If (Xo, Yo) is maximum then λ1 and λ2 < 0

• So A < 0, B < 0 and AB > C2

• So if C ≠ 0 then ellipse not aligned to axes, and how do we estimate error bars on Xo, Yo?

• We can get rid of parameters we don’t want (Y for e.g.) by integrating i.e.

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 19: MSc Methods XX: YY

• And then use Taylor again &• So (see S&S p 46 & Appendix)

• And so marginal distn. for X is just Gaussian with best estimate (mean) Xo and uncertainty (SD)

• So all fine and we can calculate uncertainty……right?

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 20: MSc Methods XX: YY

• Note AB-C2 is determinant of and is λ1 x λ2

• So if λ1 or λ2 0 then AB-C2 0 and σX and σY ∞• Oh dear……• So consider variance of posterior

• Where μ is mean • For a 1D normal distribution this gives• For 2D case (X,Y) here

• Which we have from before. Same for Y so…..

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

e2

e1

Page 21: MSc Methods XX: YY

• Consider covariance σ2XY

• Which describes correlation between X and Y and if estimate of X has little/no effect on estimate of Y then

• And, using Taylor as before

• • So in matrix notation• Where we remember that

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 22: MSc Methods XX: YY

• Covariance (or variance-covariance) matrix describes covariance of error terms

• When C = 0, σ2XY = 0 and no correlation, and e1 and e2 aligned

with axes• If C increases (relative to A, B), posterior pdf becomes more

skewed and elliptical - rotated at angle ± tan-1(√A/B)

Uncertainty

After Sivia & Skilling (2008) fig 3.7 p. 48

C=0, X, Y uncorrelated Large, -ve correlation Large, +ve correlation

Page 23: MSc Methods XX: YY

• As correlation grows, if C =(AB)1/2 then contours infinitely wide in one direction (except for prior bounds)

• In this case σX and σY v. large (i.e. very unreliable parameter estimates)

• BUT large off-diagonals in covariance matrix mean we can estimate linear combinations of parameters

• For –ve covariance, posterior wide in direction Y=-mX, where m=(A/B)1/2 but narrow perpendicular to axis along Y+mX = c

• i.e. lot of information about Y+mX but little about Y – X/m• For +ve correlation most info. on Y-mX but not Y + X/m

Uncertainty

After Sivia & Skilling (2008) fig 3.7 p. 48

Page 24: MSc Methods XX: YY

• Seen the 2 param case, so what about generalisation of Taylor quadratic approximation to M params?

• Remember, we want {Xoj} to maximise L, (log) posterior pdf

• Rephrase in matrix form Xo i.e. for i = 1, 2, …. M we want

• Extension of Taylor expansion to M variables is

• So if X is an M x 1 column vector and ignoring higher terms, exponential of posterior pdf is

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 25: MSc Methods XX: YY

• Where is a symmetric M x M matrix of 2nd derivatives• And (X-Xo)T is the transpose of (X-Xo) (a row vector)

• So this is generalisation of Q from 2D case

• And contour map from before is just a 2D slice through our now M dimensional parameter space

• Constant of proportionality is

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 26: MSc Methods XX: YY

• So what are the implications of all of this??• Maximum of M parameter posterior PDF is Xo & we know• Compare to 2D case & see is analogous to -1/σ2

• Can show that generalised case for covariance matrix σ2 is

• Square root of diagonals (i=j) give marginal error bars and off-diagonals (i≠j) decribe correlations between parameters

• So covariance matrix contains most information describing model fit AND faith we have in parameter estimates

Uncertainty

Sivia & Skilling (2008) Chapter 3, p 35-51

Page 27: MSc Methods XX: YY

• Sivia & Skilling make the important point (p50) that inverse of diagonal elements of matrix ≠ diagonal of inverse of matrix

• i.e. do NOT try and estimate value / spread of one parameter in M dim case by holding all others fixed at optimal values

Uncertainty

After Sivia & Skilling (2008) p50.

Xj

Xi

Xoj

σii

Incorrect ‘best fit’ σii

• Need to include marginalisation to get correct magnitude for uncertainty

• Discussion of multimodal and asymmetric posterior PDF for which Gaussian is not good approx

• S&S p51….

Page 28: MSc Methods XX: YY

• We have seen that we can express condition for best estimate of set of M parameters {Xj} very compactly as

• Where jth element of is (log posterior pdf) evaluated at (X=Xo)

• So this is set of simultaneous equations, which, IF they are linear i.e.

• Then can use linear algebra methods to solve i.e.• This is the power (joy?) of linearity! Will see more on this later• Even if system not linear, we can often approximate as linear over

some limited domain to allow linear methods to be used• If not, then we have to use other (non-linear) methods…..

Summary

Page 29: MSc Methods XX: YY

• Two parameter eg: Gaussian peak + background• Solve via Bayes’ T using Taylor expansion (to quadratic)• Issues over experimental setup

– Integration time, number of bins, size etc.– Impact on posterior PDF

• Can use linear methods to derive uncertainty estimates and explore correlation between parameters

• Extend to multi-dimensional case using same method• Be careful when dealing with uncertainty • KEY: not always useful to look for summary statistics – if

in doubt look at the posterior PDF – this gives full description

Summary