Copie de LeastSquares.xls

download Copie de LeastSquares.xls

of 25

Transcript of Copie de LeastSquares.xls

Weighted Least Squares Fits

IntroEach tab of this spreadsheet describes a functionLook along the bottom row. The tabs there -- LSFit, LSFit Y Uncertainty, etc. -- each introduce a usefulset of routines.Just click on a tab to go to that section, or use these links:How to View the Visual Basic for Applications Source Code for These FunctionsHow to Use an Array FunctionHow to Goal SeekHow to Use LSFit to find an Unweighted Least Squares FitHow to Use LSFit to find Weighted Least Squares FitHow to Use LSFitBoth to find an Weighted Least Squares Fit, with Errors in X and YAbout this programTo View the source code for the functions in this spreadsheet,Run Tools/Macro/Visual Basic EditorYou may not have installed the VB editor when you installed Excel.If not, you can just run Office setup to add it in.Make sure the Project Explorer is open with View/Project ExplorerThe function is in "VBAProject (LeastSquares.xls) / Modules / LeastSquares"Many of these functions are "Array Functions"Some of the functions are excel array functions, which you might not be familiar with.An array function takes one or several values and returns an array --several values in a particular 2-D order.An example of a built-in array function is LINEST, the excel function to find linear fits.Also see the Excel help topic, "About array formulas and how to enter them."To enter an array formula,Select a range of cells with the shape of the array the function returns.For example, the LSFit function returns a 2 column by 4 row array.You don't have to use the full size. For instance, if you only want the slope and uncertainty,just choose a 2 column by 1 row set of cells. If you choose too many cells theywill be filled with "#N/A!" but otherwise the function will still work correctly.Type your formula into the formula entry box at the top of the screen.For instance, if you wanted to run LSFit on data in cells E10:E15 and G10:G15,you would type in "=LSFit(E10:E15, G10:G15)" without the quotes.Then hit "Control-Shift-Enter" your range should be filled with least squares fit goodnessFor instance, if you run LSFit, your result will be in the formSlopeInterceptUncertainty in SlopeUncertainty in InterceptPearson's R2 ValueStandard Error in YRegression Sum of SquaresResiduals Sum of SquaresIf, on the other hand, you used LSFitBoth, your result would appear asSlopeInterceptUncertainty in SlopeUncertainty in InterceptMinimize ThisResiduals Sum of SquaresYou don't have to use the array functions, though.For LSFit, there are non-array functions which directly return each of the above values.Take note though: each requires a separate call to the main (LSFit or LSFitBoth, etc.) function.Therefore, to get all eight of the values listed above would be eight times as slow asusing the array formula would be. It will also take longer to type in.However, in practice, you will not notice the speed difference. Only if you are iterating ona large array should this be a factor. Feel free to use the non-array versions of these commands.For LSFitBoth, you must either use the array function or you may use the INDEX functionto extract the desired value. For instance, to find the Uncertainty in the Slope for LSBoth, use=INDEX(LSFitBoth(A10:A20, B10:B20, C10:C20, D10:D20, A5),2, 1)About this programThese routines were written by Philip Kromer for the UT-Austin Modern Physics Lab (Hook 'em Horns).http://www.ph.utexas.edu/~phy453/This work was partially supported by a grant from National Instruments.http://www.natinst.com/These routines distributed under the GPL. Read more athttp://www.gnu.org/licenses/gpl.htmlYou may use this code, without charge, for as long as you want.You may use this code for any application, commercial or not.However, you must ensure that however the code is distributed, thesource code to these modules remains visible and editable.You must also ensure that I (Philip Kromer) receive credit for this functionality, and that anyperson who views the source code will also see these restrictions.You may not charge for this code, although you may use it in a commercalproduct as long as there is significant additional functionality.If you make an interesting modification to this code, please let me know!Philip Kromer For a pretty spreadsheet which documents and provides examples, go seehttp://mrflip.com/resources/ExcelFunctions/Copyright Philip Kromer, 1999

How to View the Visual Basic for Applications Source Code for These FunctionsHow to Use an Array FunctionHow to Use LSFit to find an Unweighted Least Squares FitHow to Use LSFit to find Weighted Least Squares FitHow to Use LSFitBoth to find an Weighted Least Squares Fit, with Errors in X and YHow to Goal Seekhttp://mrflip.com/resources/ExcelFunctions/http://www.gnu.org/licenses/gpl.htmlhttp://www.natinst.com/http://www.ph.utexas.edu/~phy453/About this program

LSFitFunction LSFitDoes a simple least-squares fit on a set of data.Notice that the arguments are LSFit(X, Y)This is the opposite of the order of the Excel commands Slope, Intercept, and LinEst!Also, we do not compute the F value so the output does not match up exactly either.To Run the FunctionMake two arrays of numbers -- an X and a YRun "{=LSFit(B41:B50, C41:C50)}"An Example Array:Slope-5Intercept20Nonlinearity0%Noise10%XY113.61210.7135.1940.005-5.196-10.717-13.628-20.009-27.3110-27.89The Results:Using LSFit as an Array FunctionSlopeInterceptValue0.000.00Uncertainty0.000.00R / Sy0.000.00SSreg / SSres00.0Using LSFitSlope, LSFitIntercept, etc.SlopeInterceptValue0.000.00Uncertainty0.000.00R / Sy0.000.00SSreg / SSres00.0Using Built in Excel FunctionsSlopeInterceptValue-4.92119.545Uncertainty0.1470.910R / Sy0.9931.3F / dF1124.78.0SSreg / SSres1997.614.2

LSFit

YXY

LSFit; Y Uncertainty

YXY

LSFitBoth; Uncy in Both VarsFunction LSFit -- With Uncertainties (Weighted Fit)Does a simple least-squares fit on a set of data.The arguments are LSFit(X, Y, Yerr)Notice that this is the opposite of the order of the Excel commands Slope, Intercept, and LinEst!Also, we do not compute the F value so the output does not match up exactly either.To Run the FunctionMake two arrays of numbers -- an X and a YRun "{=LSFit(B41:B50, C41:C50)}"An Example Array:Slope4Intercept25Nonlinearity30%Noise10%Uncertainty40%XYY Uncertainty127.4111.44240.4714.95349.6321.79454.1819.56577.8633.66698.7337.647101.4840.818133.7755.569172.8364.0310171.9675.37The Results:Using LSFit as an Array FunctionSlopeInterceptValue0.000.00Uncertainty0.000.00R / Sy0.000.00SSreg / SSres00.0XWeighted Fit YExcel's Fit Y00-1Using LSFitSlope, LSFitIntercept, etc.1016SlopeIntercept2033Value0.000.003050Uncertainty0.000.004067R / Sy0.000.005084SSreg / SSres00.06010170118Using Built in Functions (which don't know about uncertainty!)80135SlopeIntercept90152Value17.04-0.86100169Uncertainty1.308.04R / Sy0.9611.78F / dF1738.0SSreg / SSres239431109Using Mathematica (does Uncertainty right)SlopeInterceptValue14.0210.72Uncertainty1.244.14R / Sy0.94SSreg / SSres161.0This Cell should say "Null":NullThis Cell should say 1:1

LSFitBoth; Uncy in Both Vars11.442381414311.442381414314.947493748614.947493748621.788888359721.788888359719.55611667319.55611667333.65703771733.65703771737.636937437937.636937437940.806811783940.806811783955.5566691555.5566691564.028730780764.028730780775.37196507575.371965075

YWeighted Fit YExcel's Fit YXY

LSFitBoth; Example2

YXY

M-Code For LSFit in MathematicaFunction LSFitBoth -- Goal Seeking Method For Weighted Fits With Uncertainties in Both CoordinatesLSFitBoth can do a weighted least-squares fit on a set of data with uncertainty in X and Y.The arguments are LSFitBoth(X, Y, Xerr, Yerr, Slope)The weight for each point is one over its uncertainty squared. This is usually the right thing.This function does not actually determine the slope. Instead, it returns a value -- Minimize this -- which the Tools / Goal Seek commandcan use to find the true slope. Just have Excel goal seek the Minimize this result to be zero, based on the slope seed which you provide.The resulting slope -- after goal seeking -- is either completely right or completely wrong. That is, it will converge to the best or to the worst fit.If it goes to the worst fit (obvious upon graphing the result), simply start goal seek with a different initial slope. Elementary geometry suggests thatthe opposite inverse -- (-1/m) -- is a good value in that case.This technique is based on the paper "Linear least-squares fits with errors in both coordinates," Am. J. Phys 57 #7 p. 642 (1989) By B.C. Reed.The algorithm ws first developed by York, Can. J. Phys. 44 (1966) p.1079 : "Least Squares Fitting with Errors in Both Coordinates."If you are really going to look these papers up, also see the erratum in Am. J. Phys 58 #2 p.189 (1990)To Run the FunctionMake four columns of numbers containing X and Y data, and their uncertainties.Make another cell and initialize it with a seed for the slope of the data. SLOPE or LSFit can suggest a value.Make another cell with the formula "=LSFitBothMinimizeThis($C$65:$C$74,$D$65:$D$74,$E$65:$E$74,$F$65:$F$74,$J$93)"Of course, you should leave off the quotes and use the ranges for the four columns followed by the slope seed.Run Tools / Goal Seek menu function; you would like to seek the MinimizeThis cell to be zero (0) by changing the slope seed cell.The resulating value of the slope seed is the slope of the fit.For the full information, select a range that is two columns wide by up to three rows long.Type " "=LSFitBoth($C$65:$C$74,$D$65:$D$74,$E$65:$E$74,$F$65:$F$74,$J$93)" into the top left cell of that range.Hit Control-Shift-Enter.The values returned by LSFitBoth areSlopeInterceptUncertainty in SlopeUncertainty in InterceptMinimize ThisResiduals Sum of SquaresHow to Goal SeekThere is no simple, general technique that gives a simple function for a fit to data when there are errors in both coordinates.A rough and ready form is to graph X vs. Y, then Y vs. X, and to use the average of their fit values and the pythagorean sum of their uncertainties.There are, however, pathological cases which will defeat this method.Here we describe a technique which uses a quite powerful tool built into Excel, known as "Goal Seek."Goal Seek lets you iterate a function until its output achieves a given value. It can find roots, maxima and minima, among other abilities.For example, we know that Sin(x) has a maximum at p/4. We will demonstrate Goal Seek by finding a maximum of Sin(x).Set up cells with the input to your function (in our case, x) and the function you want to calculate. (It may be arbitrarily complicated,and it may depend on many cells and functions; They must all, however, depend on only one input variable -- otherwise you need to use the Solver add-in)For instance, in the box for X below, type in "0.4;" the box for Sin(x) should read 0.389X0.400Sin (X)0.389Now, on the Excel Menu, choose Tools / Goal Seek. Tell it to set cell E28 (or whatever) to value 1 by changing cell E27.You should get the value for p/4:p/41.5711.0000.00000Instead, I think you will get the result 1.541.Below1.5411.0000.00044Try it again, but start with x=2.0:Above1.6030.9990.00052What happened? Well, we are looking for a maximum: by definition, the function is flat near such a point, so all output values look the samenear that value for x. This is an intrinsic problem for any such algortihm.Notice, too, that although our input values are different, the output values are off by less than a thousandth part.Fortunately, this is not usually a worry when we find roots (places where a function crosses zero) since, except for pathological cases,the function will do so at a reasonably sharp angle. (Those pathological cases are easy to see by graphing the function).Punch in an initial value for x of say 4.0, and goal seek sin(x) to be zero. You will get the familiar result 3.14159X4.00000Sin (X)-0.757As promised, Goal seek found the rood to a high degree of accuracy. Also note that if you start it near say 7 it will find 2p, etcGoal Seek will generally find the "nearest" solution to your starting value. If you give it a sane starting point it will settle down correctly.Some Example Data:Slope4Intercept25Nonlinearity18%Noise15%Uncertainty10%XiYiDXiDYi05.900.101.000.95.400.100.751.84.400.040.502.64.600.040.353.33.500.070.224.43.700.110.225.22.800.130.126.12.800.220.126.52.400.750.107.41.501.000.04From York, Can. J. Phys. 44 (1966) 1079 : Least Squares FittingThe Results:Using LSFitBothSlopeInterceptGoal Seek this cell0.00to be ZeroSlope / Intercept0.000.00Based on this cell-0.48096Uncertainty0.000.00A good initial value is0.00MinThis / SSRes0.000000.00alpha / beta0.000.00gamma / delta0.000.00The last two rows of values are diagnostics; they are not generally useful statistical valuesThere is no LSFitBothSlope, LSFitBothIntercept, etc.I figured it would clutter up the Function WizardXWeighted in X & YWeighted in YUnweightedWeighted in Y only-0.90.00.00.0SlopeIntercept00.00.00.0Slope / Intercept0.000.000.90.00.00.0Uncertainty0.000.001.80.00.00.0R / Sy0.000.002.60.00.00.0SSreg / SSres00.03.30.00.00.04.40.00.00.0Unweighted5.20.00.00.0SlopeIntercept6.10.00.00.0Slope / Intercept0.000.006.50.00.00.0Uncertainty0.000.007.40.00.00.0R / Sy0.000.008.30.00.00.0SSreg / SSres00.0

Note: the box on left was computed using Mathematica-Excel Link; it is for comparison only none of the other commands require this package.If you have the Mathematica/Excel link, evaluate the pink cell below and then the cells at left.Note that the unweighted fit pays too much attention to the (poorer quality) points on the left side of the graph. The weighted fit hews more closely to the points with small uncertainties, and is appropriately disdainful of the points with large error bars. The difference is especially obvious in a curve with high scatter and large error bars.RC:R[3]C[1]=Math( "LSFit", C22:$C$31, $D$22:$D$31, $E$22:$E$31 )*$I$65=MathCode( 'M-Code For LSFit in Mathematica'!$B$4:$D$42 )

M-Code For LSFit in Mathematica0.10.1110.10.10.74535599250.74535599250.04472135950.04472135950.50.50.03535533910.03535533910.35355339060.35355339060.07071067810.07071067810.22360679770.22360679770.11180339890.11180339890.22360679770.22360679770.12909944490.12909944490.11952286090.11952286090.22360679770.22360679770.11952286090.11952286090.74535599250.74535599250.10.1110.04472135950.0447213595

YiUnweightedWeighted in YWeighted in X & YXY

YiXY

Function LSFitBoth -- Another ExampleAnother Example Array:Slope4Intercept25Nonlinearity18%Noise15%Uncertainty10%XiYiDXiDYi0.890.670.010.0110.640.010.010.920.760.010.010.870.610.010.010.90.740.010.010.860.610.010.011.080.770.010.010.860.610.010.011.250.990.010.011.010.770.010.010.860.730.010.010.850.640.010.010.880.620.010.010.840.630.010.010.790.570.010.010.880.660.010.010.70.530.010.010.810.460.010.010.880.790.010.010.920.770.010.010.920.700.010.011.010.880.010.010.880.620.010.010.920.800.010.010.960.740.010.010.850.640.010.011.040.930.010.01From Reed, Am. J. Phys. 57 #7 (1989) 645 : Linear Least SquaresThe Results:Using LSFitBothSlopeInterceptGoal Seek this cell0.00to be ZeroSlope / Intercept0.00000.0000Based on this cell1.1668Uncertainty0.00000.0000A good initial value is0.00MinThis / SSRes0.000000.00alpha / beta0.000.00gamma / delta0.000.00XWeighted in X & YWeighted in YUnweightedThere is no LSFitBothSlope, LSFitBothIntercept, etc.0.780.00.00.0I figured it would clutter up the Function Wizard0.890.00.00.010.00.00.00.920.00.00.0Weighted in Y only0.870.00.00.0SlopeIntercept0.90.00.00.0Slope / Intercept0.000.000.860.00.00.0Uncertainty0.000.001.080.00.00.0R / Sy0.000.000.860.00.00.0SSreg / SSres00.01.250.00.00.01.010.00.00.0Unweighted0.860.00.00.0SlopeIntercept0.850.00.00.0Slope / Intercept0.000.000.880.00.00.0Uncertainty0.000.000.840.00.00.0R / Sy0.000.000.790.00.00.0SSreg / SSres00.00.880.00.00.00.70.00.00.00.810.00.00.00.880.00.00.00.920.00.00.00.920.00.00.01.010.00.00.00.880.00.00.00.920.00.00.00.960.00.00.00.850.00.00.01.040.00.00.0

Note that the unweighted fit pays too much attention to the (poorer quality) points on the left and right side of the graph.

The weighted fit hews more closely to the points with small uncertainties, and is appropriately disdainful of the points with large error bars, but is unfortunately misled by the points at the lower right -- which have large uncertainties in X and are not as trustowrthy as the naive algorithm thinks.

The X & Y weighted fit laughs at the impostors on the left and right of the graph both the same, and finds a visibly truer fit.

The technique for errors in both coordinates is necessary when there are points with large X but small Y uncertainty, and vice versa.

0.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.01

YiUnweightedWeighted in YWeighted in X & YXY

YiXY

(***********************************************************This is a Mathematica routine to do Weighted least squares fits.It is to be used with the Mathematica/Excel Link for windows.See http://www.mathematica.com/If you don't have all that stuff -- don't sweat it.You clearly have Excel and that's all you need for theWeighted least squares fits.************************************************************)(***********************************************************If this looks horribly ugly... that's because it is.All it does is take the brain damaged output format ofthe Regress comand (in Statistics`LinearRegression`) andextract the useful (for us) values.************************************************************) 1/Yerr^2];(* Get Fit and Uncertainty *)paramTable = ParameterTable /. RegressResult;{{fitb, fitm}, {fitdb, fitdm}, TStat, PValue} =((FullForm[paramTable])[[1,1]])//Transpose;(* Get R^2 value *)r2val = RSquared /. RegressResult;(* Get Model Sum of Squares and Residual Sum of Squares *)anovaTable = (FullForm[ANOVATable /. RegressResult]) [[1,1]];{SSModel, SSRes, SSTotal} ={anovaTable[[1,2]], anovaTable[[2,2]], anovaTable[[3,2]]};{{fitm, fitb},{fitdm, fitdb},{r2val, syNotYet},{SSModel, SSRes}})]

Another instance for which this technique is appropriate is when the data shows a large amount of scatter. In that case the wrong weighting scheme can throw things off. The iterative LSBoth function does better than the simple weighted fit in Y.