Error analysis lecture 9

7/30/2019 Error analysis lecture 9

1/29

Physics 509 1

Physics 509: Least Squares

Parameter Estimation

Scott OserLecture #9

October 2, 2008


2/29

Physics 509 2

Outline

Least squared estimators

Errors on least squared estimators Fitting binned data sets Properties of linear least squared fitting Nonlinear least squared fitting

Goodness of fit estimation Dealing with error bars on x and y

Last time: we were introduced to frequentist parameterestimation, and learned the maximum likelihoodmethod---a very powerful parameter estimation

technique. Today:


3/29

Physics 509 3

Maximum Likelihood with Gaussian Errors

Suppose we want to fit a set of points (xi

,yi

) to some model

y=f(x|), in order to determine the parameter(s) . Often themeasurements will be scattered around the model with someGaussian error. Let's derive the ML estimator for.

The log likelihood is then

Maximizing this is equivalent to minimizing

L=i=1

N 1

i 2 exp [

1

2 yi fx i

i 2

]ln L=

1

2i=1

N

y i fx i

i 2

i=1

N

ln i 2

2=i=1

N

y i fx i i 2


4/29

Physics 509 4

The Least Squares Method

Taken outside the context of the ML method, the least squaresmethod is the most commonly known estimator.

2=i=1

N

y i fx i

i

2

Why?

1) Easily implemented.2) Graphically motivated (see title slide!)3) Mathematically straightforward---often analytic solution4) Extension of LS to correlated uncertainties straightforward:

2=i=1

N

j=1

N

y i fx iy i fx jV1ij


5/29

Physics 509 5

Least Squares Straight Line Fit

The most straightforward example is a linear fit: y=mx+b.

2= y imx ibi

2

Least squares estimators for m and b are found by differentiating 2

with respect to m & b.

d2

dm

=2

yimxib

i2

x i=0

d2

db=2 yimxib i2 =0

This is a linear system of simultaneous equations with twounknowns.


6/29

Physics 509 6

Solving for m and b

The most straightforward example is a linear fit: y=mx+b.

d2

dm=2 yimxib i2 x i=0

d2

db=2 yimxib i2 =0

m= yi

i2

xi

i2

1

i2

xi yi

i2

xi i2 2

x i2

i2 1 i2

m=y x xy

x 2x

2

yi

i2 =m

xi

i2 b

1

i2

xi yi

i2 =m

xi2

i2

b xi

i2

b= y i

i2

m

xi

i2

1 i2 b=y m x

(Special case of equal 's.)


7/29

Physics 509 7

Solution for least squares m and b

There's a nice analytic solution---rather than trying to numericallyminimize a 2, we can just plug in values into the formulas! Thisworked out nicely because of the very simple form of the likelihood,due to the linearity of the problem and the assumption of Gaussianerrors.

m= yi

i2

xi

i2

1 i

2 xi yi

i2

xi

i

2

2

x i2

i

2

1

i

2

m= y x xy x 2x2

b= yii2 m

x i

i2

1

i2

b=y mx

(Special case of equal errors)


8/29

Physics 509 8

Errors in the Least Squares Method

What about the errors and correlations between m and b?Simplest way to derive this is to look at the chi-squared, andremember that this is a special case of the ML method:

ln L=1

2

2=

1

2 y imx ib

i

2

In the ML method, we define the 1 error on a parameter by theminimum and maximum value of that parameter satisfying

ln L=.

In LS method, this corresponds to 2=+1 above the best-fit point.Two sigma error range corresponds to 2=+4, 3 is 2=+9, etc.

But notice one thing about the dependence of the 2---it is

quadratic in both m and b, and generally includes a cross-termproportional to mb. Conclusion: Gaussian uncertainties on m andb, with a covariance between them.


9/29

Physics 509 9

Errors in Least Squares: another approach

There is another way to calculate the errors on the least squares

estimator, as described in the text. Treat the estimator as afunction of the measured data:

and then use error propagation to turn the errors on the yiinto

errors on the estimator. (Note that thexiare considered fixed.)

This approach is used in Barlow. We will discuss errorpropagation and the meaning of error bars in more detail in thenext lecture.

m= my1,

y2,

... yN


10/29

Physics 509 10

Formulas for Errors in the Least SquaresMethod

We can also derive the errors by relating the 2 to the negative loglikelihood, and using the error formula:

m2=

1

1 / i2

1

x2x

2=

2

N

1

x2x

2

b2=

1

1 / i2

x2

x2x

2=

2

N

x2

x2x

2

cov m , b=1

1/i2

x2

x x 2=

2

N

x

x2x

2

(intuitive when =0)

cov1ai , a j=

2 ln L

ai a j

=

2 ln L

ai a ja=a=

1

2

2 2

ai a ja=a


11/29

Physics 509 11

Fitting Binned Data

A very popular use of least squares fitting is when you have

binned data (as in a histogram).

p x, = x 2

ex /

The number of events occurring inany bin is assumed to bedistributed with a Poisson with

mean fj:

fj= x low , j

xhi , j

dx p x,

2 , =j=1

N nj fj 2

f j


12/29

Physics 509 12

A lazy way to do things

It's bad enough that the least squares method assumes Gaussian

errors. When applied to binned data, it also throws awayinformation. Even worse is the common dodge of approximatingthe error as n

j---this is called modified least squares.

You can get away with this when all the nj

are large, but what happens when somebins are close to zero?

You can exclude zero bins, but thenyou're throwing away even moreinformation---the fact that a bin has zerocounts is telling you something about theunderlying distribution.

WARNING: In spite of these drawbacks,this kind of fit is what most standardplotting packages default to when youask them to fit something.

2

, j=1

N n j fj2

nj


13/29

Physics 509 13

Comparison of ML and MLS fit to exponential

Red = true distributionBlack = fit

The more bins you have

with small statistics, theworse the MLS methodbecomes.

If you MUST use MLSinstead of ML to fit

histograms, at leastrebin data to getreasonable statistics inalmost all bins (andseriously considerexcluding low statistics

bins from the fittedregion.) You'll throwaway information, butwill be less likely to biasyour result.


14/29

Physics 509 14

Evil tidings: Don't fit for normalization withleast squaresIf that didn't scare you off least squares fitting to histograms,consider the following morality tale ...

Suppose we have some normalized distribution we're fitting to:

When letting the normalization constant float as a free parameterin the fit:

the least squared fit will return a biased result for.

Least squares best-fit: = n + 2/2

Modified least squares best-fit: = n - 2

(Ask me about the parable of SNO's three independentanalyses ...)

fx, satisfying dx fx=1

nj= x low , j

x hi , j

dx fx


15/29

Physics 509 15

Fitting binned data with ML method

If you need to fit for a normalization, you're better off using the

extended maximum likelihood method:

L, =

n

n!e

i=

1

n

p x i=e

n

n !i=

1

n

p x i

ln L , =i=1

n

ln [ p x i]

This works even when is a function of the other parameters---feel freeto reparameterize however you find convenient. In fact the term in ln(L)will help to constrain the other parameters in this case.

It's easy to imagine cases where a model parameter affects not only theshape of a PDF, but also the rate of the process in question.

For binned data: ln L tot, =toti=1

N

ni ln [i tot, ]

Heretot

is predicted total number of events,is predicted in the ith bin, and n

iis the

number observed in the ith

bin.


16/29

Physics 509 16

Linear least squares and matrix algebra

Least squares fitting really shines in one area: linear

parameter dependence in your fit function:

y x=j=1

m

jfj x

In this special case, LS estimators for the are unbiased, havethe minimum possible variance of any linear estimators, and canbe solved analytically, even when N is small, and independent ofthe individual measurement PDFs.

Aij=[f1x1 f2 x1 ...

f1x2 f2x 2 ...

]y pred=A

2= ymeas y pred

TV

1 ymeas y pred

2= ymeasATV1 ymeasA

Some conditions apply---see Gauss-Markov theorem for exact statement.


17/29

Physics 509 17

Linear least squares: exact matrix solution

y x=j=1

m

jfj x

Aij=

[

f1x1 f2 x1 ...

f1x2 f2x2 ...

]

y pred=A

2= ymeasA

TV

1 ymeasA

=AT

V1

A1

AT

V1

y

Uij=cov i , j =AT

V1

A1

Best fit estimator:

Covariance matrix of estimators:

Nice in principle, but requires lots of matrix inversions---rathernasty numerically. Might be simpler to just minimize 2!


18/29

Physics 509 18

Nonlinear least squares

The derivation of the least squares method doesn't depend on the

assumption that your fitting function is linear in the parameters.Nonlinear fits, such as A + B sin(Ct + D), can be tackled with theleast squares technique as well. But things aren't nearly as nice:

No closed form solution---have to minimize the 2 numerically. Estimators are no longer guaranteed to have zero bias andminimum variance. Contours generated by 2=+1 no longer are ellipses, and thetangents to these contours no longer give the standard deviations.(However, we can still interpret them as giving 1 errors---

although since the distribution is non-Gaussian, this error rangeisn't the same thing as a standard deviation Be very careful with minimization routines---depending on howbadly non-linear your problem is, there may be multiple solutions,local minima, etc.


19/29

Physics 509 19

Goodness of fit for least squares

By now you're probably wondering why I haven't discussed the

use of2 as a goodness of fit parameter. Partly this is becauseparameter estimation and goodness of fit are logically separatethings---if you're CERTAIN that you've got the correct model anderror estimates, then a poor2 can only be bad luck, and tells younothing about how accurate your parameter estimates are.

Carefully distinguish between:

1) Value of2 at minimum: a measure of goodness of fit2) How quickly 2 changes as a function of the parameter: a

measure of the uncertainty on the parameter.Nonetheless, a major advantage of the 2 approach is that it doesautomatically generate a goodness of fit parameter as a byproductof the fit. As we'll see, the maximum likelihood method doesn't.

How does this work?


20/29

Physics 509 20

2 as a goodness of fit parameter

Remember that the sum of N Gaussian variables with zero mean

and unit RMS, when squared and added, follows a 2 distributionwith N degrees of freedom. Compare to the least squaresformula:

2=

i jy ifx iy j fx jV

1ij

If each yiis distributed around the function according to a

Gaussian, and f(x|) is a linear function of the m free parameters, and the error estimates don't depend on the free parameters,then the best-fit least squares quantity we call 2 actually follows a2 distribution with N-m degrees of freedom.

People usually ignore these various caveats and assume thisworks even when the parameter dependence is non-linear andthe errors aren't Gaussian. Be very careful with this, and checkwith simulation if you're not sure.


21/29

Physics 509 21

Goodness of fit: an example

Does the data sample,

known to haveGaussian errors, fitacceptably to aconstant (flat line)?

6 data points 1 freeparameter = 5 d.o.f.

2 = 8.85/5 d.o.f.

Chance of getting alarger2 is 12.5%---anacceptable fit byalmost anyone'sstandard.

Flat line is a good fit.

Distinction between goodness of fit and


22/29

Physics 509 22

Distinction between goodness of fit andparameter estimation

Now if we fit a sloped

line to the same data,is the slope consistentwith flat.

2 is obviously going to

be somewhat better.

But slope is 3.5different from zero!Chance probability of

this is 0.0002.

How can wesimultaneously saythat the same data set

is acceptably fit by aflat line and has aslope that issignificantly larger thanzero???

Distinction between goodness of fit and


23/29

Physics 509 23

Distinction between goodness of fit andparameter estimation

Goodness of fit and parameter estimation are answering twodifferent questions.

1) Goodness of fit: is the data consistent with having been drawnfrom a specified distribution?

2) Parameter estimation: which of the following limited set ofhypotheses is most consistent with the data?

One way to think of this is that a 2

goodness of fit compares thedata set to all the possible ways that random Gaussian data mightfluctuate. Parameter estimation chooses the best of a morelimited set of hypotheses.

Parameter estimation is generally more powerful, at the expenseof being more model-dependent.

Complaint of the statistically illiterate: Although you say your datastrongly favours solution A, doesn't solution B also have an

acceptable

2

/dof close to 1?


24/29

Physics 509 24

Goodness of fit: ML method

Sadly, the ML method does not yield a useful goodness of fit

parameter. This is perhaps surprising, and is not commonlyappreciated.

First of all, the quantity that plays the role of the 2 in theminimization, -ln(L), doesn't follow a standard distribution.

One sometimes recommended approach is to generate manysimulated data sets with the same number of data points as yourreal data, and to fit them all with ML. Then make a histogram ofthe resulting minimum values of -ln(L) from all of the fits. Interpret

this as a PDF for -ln(L) and see where the -ln(L) value for yourdata lies. If it lies in the meat of the distribution, it's a good fit. Ifit's way out on the tail, you can report that the fit is poor, and thatthe probability of getting a larger value of -ln(L) than that seenfrom your data is tiny.

This is a necessary condition to conclude that your model is agood fit to your data, but it is not sufficient ...

Goodness of fit in ML method: a


25/29

Physics 509 25

Goodness of fit in ML method: acounterexample

pxb=1bx2

lnL b=i=

1

N

ln [1b x i2]

Data set B is a terrible fitto the model for x


26/29

Physics 509 26

How to test goodness of fit when using MLmethod?

My personal recommendation: use ML to estimateparameters, but in the end bin your data and compare tothe model using a 2 or similar test.

Use simulation to determine the true distribution of the2 statistic whenever you can, without assuming that itnecessarily follows a true 2 distribution. This isespecially important when estimating parameters usingML.

We'll return to goodness of fit later when studying

hypothesis testing, and look at some alternatives to the

2

test.

When to use Least Squares vs Maximum


27/29

Physics 509 27

When to use Least Squares vs. MaximumLikelihood

My general advice: use maximum likelihood whenever youcan. To use it, you must know how to calculate the PDFs ofthe measurements. But always remember that the MLestimators are often biased (although bias is usuallynegligible if N is large).

Consider using least squares if:

your problem is linear in all parameters, or

the errors are known to be Gaussian, or else you don'tknow the form of the measurement PDFs but only know thecovariances, or for computational reasons, you need to use a simplifiedlikelihood that may have a closed form solution

In general, the ML method has more general applicability,and makes use of more of the available information.

And avoid fitting histograms with LS whenever possible.

E b th d


28/29

Physics 509 28

Errors on both x and y

Surely we've all encountered the case where we measure a set of

(x,y) points, and have errors on both x and y. We then want to fitthe data to some model. How do we do this?

Let's start with a Bayesian approach:

PD , IPD, IPILet's assume that x and y have Gaussian errors around their truevalues. We can think of the errors x and y as nuisanceparameters we'll integrate over:

PD, I exp [ y i 2

2y , i2 ] exp [

xi 2

2 x , i2 ]

PD, I dx 1 ...dxn exp [

y obs ,iy x obs ,ixi2

2 y , i2

]exp

[

x i 2

2x , i2

]If we integrate out the xi, then we get a likelihood function as a

function of a that doesn't depend on xi.



29/29

Physics 509 29


In principle you just do this integral. It may be the case that y(x) islinear over the range of ~2. In that case we can expand in a

Taylor series as:

PD, I dx 1 ... d xn exp [

yobs ,iy xobs ,ixi2

2 y , i2

]exp

[

x i 2

2x , i2

]

y xobsx y x true dydx xtrue x

In principle you just do this integral. It may be the case that y(x) islinear over the range of ~2. In that case we can expand in a

Taylor series as:

If I did the integral correctly (and you should definitely check for

yourself rather than trust me!), integration over dx just winds upaltering the denominator under the y term. We get

PD, I exp

[

yobs ,i y x true ,i 2

2

y , i

2

x2

dy

dx

2

]

Error analysis lecture 9

Documents

Transcript of Error analysis lecture 9