Function Minimization Report

download Function Minimization Report

of 13

Transcript of Function Minimization Report

  • 8/17/2019 Function Minimization Report

    1/13

     

    Function Minimization

    By Andrew McLennan

    Supervisor: Lorenzo Moneta

  • 8/17/2019 Function Minimization Report

    2/13

    10/3/2005 Andrew McLennan – University of Oxford Page 2 of 13

    Index

    Introduction to Minimization  3 Project Description 3Minimization for Histograms  4Examining Areas of Convergence  4Introduction of extra noise 5

    Pure Minimization  6Next Step 7Conclusion 7Appendix 1  8Appendix 2a  9Appendix 2b  9Appendix 3  10Appendix 4  11Appendix 5a  12Appendix 5b  12Appendix 6  13

    I would like to take this opportunity to thank everyone who has helped me to get this far at CERN. Special thanks go to

    my supervisor Lorenzo Moneta for his teaching and direction during this project. To Andras Zsenei for his consistent

    help in the office. To everyone in my department for their support during this summer. To my university tutors Geraint

    Jones and Andy Wathen for their teaching and references. To all my friends here at CERN who have made my time so

    here so enjoyable and taught me so many new things about their cultures. Especially Jan, Nino, Dana, David, Carlos,

    John, Stian, Diana, Florian, Laura, Lucia, Cristina, Aoife and Laura who I will hopefully go and visit soon. And finally, I

    would like to thank my friends and family back in England for always being there for me.

  • 8/17/2019 Function Minimization Report

    3/13

    10/3/2005 Andrew McLennan – University of Oxford Page 3 of 13

    Introduction to MinimizationA large class of problems in many different fields of research can be reduced to the problem of finding the smallest value

    taken on by a function of one or more variable parameters. For example, the minimum of ( )23)(   !=   x x f    is zero and isobtained at x = 3.

    The classic example of minimization which occurs so often in scientific research here at CERN, however, is the

    estimation of unknown parameters in a theory, by minimizing the different between theory and experimental data.

    Before we can tackle the minimization problem, we have to state what assumptions we are allowed to make. It is

    assumed that the function F(x) is not actually known analytically but is only defined by the value it takes at a given

     position. It is also assumed that we are allowed to specify a range which the parameters are allowed to take. Any

    additional information we wish to provide such as the numerical values of the derivatives dF/dx at any point should be

    given when available, but in general should not be assumed.

    The function should get repeatedly evaluated at different points until its minimum is found and the method which finds

    the minimum (within a given tolerance) after the fewest evaluations is generally considered to be the best.*

    There are several difficult situations which the minimization methods have to overcome. These include

    •  Finding the global minimum opposed to a local minimum

    •  Finding the minimum point located somewhere within a large plateau

    • 

    Finding the minimum point without having to check every point in the allowable range

    We will test our methods to check whether the above situations are handled correctly

    Project DescriptionThe goal of my project here at CERN is to examine the relationship between the amount of information we need to

    supply in order for the minimization methods to converge to the correct result as well as comparing the differences

     between the minimization methods Minuit, GMinuit and Fumili. Minuit comes from the class TMinuit and GMinuit is

    the name of the packer for the new C++ version of Minuit. Both Minuit and GMinuit are general minimization methods

     based on Fletcher’s unified approach to Variable Metric Methods (VMM) which combines rank-one and rank-two

    formulas to deal with different types of minimization problems.** Fumili comes from the class TFumili and is a

    specialized minimization technique based on Chi Squared minimization and is designed to work very quickly for certain

    functions provided we supply good starting information.

    My aim has been to produce software tools to systematically analyze the length of time each method takes to run as well

    as graphical displays of the regions in which each method converges. It is important that both Minuit and GMinuit work

     provided the same initial conditions apply for backwards compatibility, but ideally GMinuit should work faster in general

    and over a larger range of input parameters. It is also important to insure against islands being formed within the range of

    input parameters.***

    Minimization within ROOT**** can be used in several ways:

    •  In order to produce a “line of best fit” on data within a histogram

    •  With pure minimization of a function of our choosing.

    Given a histogram, the production of a line of best fit is very useful. Minimization methods are designed to take this data

    and use it to efficiently produce a function which best matches the data points. This is a difficult mathematical problem

    and hence why many different methods exist. Pure function minimization is what is behind the production of a line of

     best f it and hence should be tested also. I therefore restricted my investigation to looking at these two cases using non-

    trivial functions to get the most out of the methods.

    Initially I didn’t know what results I would produce and so a lot of experimenting was required. This allowed me to

     become very familiar with the methods I was examining and helped me to evolve and tailor my programs at each stage so

    that I could produce interesting and useful results to aid in the fixing of bugs within GMinuit.

    * Occasionally other considerations may be important, such as the amount of storage required by the method or the amount of computation

    required to implement the method, but normally the dominating factor will be the time spent in evaluating the function

    ** A general minimization method is one which can handle all functions. More details about these and many more minimization methods can

     be found on the Minuit website under Documentation. http://www.cern.ch/minuit *** An island is a region of input parameters of non-convergence for the method, located inside another region of convergence.

    **** ROOT was produced at CERN for use in analyzing the data from high energy physics. See http://root.cern.ch for more details.

  • 8/17/2019 Function Minimization Report

    4/13

    10/3/2005 Andrew McLennan – University of Oxford Page 4 of 13

    Table 1

    Minimization for histogramsThe first example of minimization I considered was the fitting of a histogram in order to produce a line of best fit. I

    constructed a function consisting of many Gaussian peaks together with some linear background noise and used this to

     populate a one-dimensional histogram.* The function ranged over 0-1000 with the peaks located at random places. This

    type of histogram is regularly seen here at CERN and so is a good place to start my testing.

    As the histogram was filled using randomly chosen points, I decided that the best way to test the pure efficiency of the

    fitting algorithms would be to pre-locate each of the peaks using a TSpectrum object. This information was then passedto each method as their initial search parameters and the length of time each one took to converge was recorded.

    The output of this program can be seen in Appendix 1  and Table 1 below contains the CPU running times of each

    method.

    As can be seen from the output, all three methods produced a good fit to the data but

    more importantly each one produced the same fit to the data implying that this fit is

    actually the best fit available.

    Examining the lengths of time each method took to run reveals something interesting.

    As suspected, Fumili works by far the fastest. This is because we supplied good

    initial information and Fumili is a specialized method for problems such as this,

    while Minuit and GMinuit are general methods. GMinuit ran faster than TMinuitwhen used in combination with an optimized fitter interface in the ROOT framework,

    exploiting object oriented features of the new C++ version of Minuit. When the same

    interface as in TMinuit is used, comparable running times are then obtained.

    Further tests on the range of convergence are now needed to fully examine the differences between the running times and

    range of convergence of the two methods before Minuit can be retired in favor of the new GMinuit.

    Examining areas of convergenceFrom the above program and the general requirements of each minimization methods, we can expect that when good

    information is provided, all methods will converge to the correct and same results. If,

    however, information is supplied which is not good then it is not known whether themethods will fit correctly or not. To test the methods when different initial information

    is supplied I produced a program to systematically cycle through the input parameters

    in order to see at which points the method works and which points it fails.

    To begin with I decided to start with a simpler example than the one above and to use

    only two peaks located at 8.0 and 11.0 on the range of 0-20. The program was created

    in such a way that changing the method of minimization involved only changing the

    name within the TVirtualFitter’s SetDefaultFitter() method. For each iteration of the

     program, my program set the initial suggested location of the means for the two

    Gaussian peaks, before it performed the fit. It cycled through each of the suggested means from (-5,-5) to (25,25), going

    up in integer intervals. The real means were located at 8.0 and 11.0 and so I expected that at least around these regions,

    all three methods should work.

    Together with identifying where each method converged correctly, I also produced two histograms of the length of time

    each method took given a specific input parameter for each method. One was for when the fit was good and the other was

    for when the method didn’t fit the histogram correctly. I chose to display the reciprocals of the actual times in order to be

    able to see the places of very fast convergence more easily.

    The results of this program for Minuit and GMinuit provided me with very useful information. As can been seen clearly

    in the two pictures below, Minuit converged to the correct results in far more places than GMinuit did. This suggested

    there was a major bug which needed to be fixed otherwise GMinuit wouldn’t be fully backwards compatible with the

    older version. Another worrying result from this program was the existence of islands within areas of convergence and

     bands of convergence and non-convergence. This wasn’t expected especially for such a simple example with parameters

    so close to the actual minimum.

    Looking at the lengths of time each method took to converge, we find that for GMinuit we almost always knew within 2seconds whether the method has converged correctly or not. Minuit does in some cases work faster than GMinuit, which

    is a problem, but generally there exists a greater number of places where Minuit converges much more slowly.

    CPU Running Times

    TMinuit 0.64s

    Fumili 0.15s

    GMinuit

    (new interface)0.47s

    GMinuit

    (interface as TMinuit) 0.66 s

    * To populate the histogram I allowed the computer to randomly choose points from this function and used these values to fill the histogram.

  • 8/17/2019 Function Minimization Report

    5/13

    10/3/2005 Andrew McLennan – University of Oxford Page 5 of 13

    By looking more closely at the points where Minuit performs better than GMinuit, it was possible to track down what

    was causing some of these problems in the code and hence fix it. Running this program again using the fixed version of

    GMinuit now produced results which were far more closely related to Minuit as can be seen below and in Appendix 2b.

    Islands and bands still exist, but to solve this will require a much more in-depth look at the reasons and paths the

    algorithm takes during its execution. The full output results of my program for Minuit and GMinuit can be found in

    Appendix 2a. More research into the problem of this was therefore definitely needed.

    Fumili on the other hand converged correctly in a far smaller region than either Minuit or GMinuit but for this function at

    least no islands or bands existed. The more interesting result however is that when Fumili does fit correctly the running

    time of the method is far less than either of the other two algorithms, but when bad initial information is supplied the

    running time can be exceptionally large. Hence as suspected, provided we supply good information to Fumili it will work

    very well for functions such as this, but if we don’t have adequate information then a general minimization method will

    usually work faster. The full output results for Fumili can be found in Appendix 3.

    Introduction of extra noiseTo allow me to further my investigation into these methods, I decided that generalizing my code would allow me much

    more freedom in the future when changing the function to be examined. I increased the range of the function to 0-1000

    with the peaks now located at 350.0 and 750.0 respectively. I also realized that currently the programs were taking muchtoo long to run using the standard ROOT interface and hence I adapted the code so that I could produce compiled C code

    which was, in turn, more stable.

    The main step I took, however, was to

    have the comparison of Minuit and

    GMinuit both located on the same

    results output. I decided this would

    make it easier to spot where the

    methods differed and hence help my

    investigation. Locations which are

    coloured Black   indicate parameters

    where neither Minuit nor GMinuit

    converged correctly. Blue  representsareas where both converged correctly.

    Green  areas where GMinuit

    outperformed Minuit and Red  where

    GMinuit failed but the original Minuit

    worked.

    Another feature I added to my program

    was the ability to add some extra

    random noise to the histogram. By being able to select and compare the differences between different percentage levels

    of noise I was able to compare the methods under more realistic experimental conditions. My final adaptation was to

    have two histograms displaying the frequency of convergence for the different distances away from the actual means.

    This would show that as you move the initial parameters away from the actually means, the less likely it was that either

    method would converge.

    Minuit Old GMinuit Fixed GMinuit

  • 8/17/2019 Function Minimization Report

    6/13

    10/3/2005 Andrew McLennan – University of Oxford Page 6 of 13

    Table 2

    The above picture represents a section of the output from the updated program. As can be seen, when no extra noise is

    introduced and even when using the version of GMinuit we had previously fixed, there are still several regions and bands

    where the method fails.

    When we next run the program with different levels of extra noise, we get some very interesting results. Not only are the

    regions of convergence for the methods reduced as the amount of noise is increased, but points where the method

     previously didn’t converge correctly now in some cases does converge. Bands on non-convergence still exist for this

    function even as the level of noise is increased, but the position of the bands actually changes. For 5% extra noise we see

    that GMinuit performs with a greater uniformity that Minuit as less islands are created. Finally, we notice for this

    example at least, that as the amount of extra random noise is increased, GMinuit starts to out perform Minuit. Table 2 shows the actual number of points where the two method converge

    Hence, not only does the accuracy of the initial information we supply have to be more

    accurate as noise level increases, but it seems GMinuit was designed to work well as

    the noise level increases. It would be interesting to have some further investigations

    with different levels and types of noise to see whether GMinuit actually does work

     better as noise increases. The full output results for the 3 different noise levels can be

    found in Appendix 4.

    Pure MinimizationThe final part of my project was to move away from using

    minimization for fitting histograms to testing the minimization

    methods for Minuit and GMinuit directly. I did this by again

    adapting my program to deal with more complicated two

    dimensional functions such as Rosenbrock’s Curved Valley

     problem* with the aim of being able to change the function

    under investigation easily. I designed the program so that it

    was easy to change the viewing range of the function, the

    amount of detail to print out, the position about which to

    concentrate the investigation and also whether I wanted just

    the basic information about this position or the extra graphical

    outputs. All these parameters can be set at runtime allowingthe user to view general information around the actual

    minimum and then to target specific regions or points without

    having to go and change any of the underlying code.

    Running the program without setting any of the input parameters produced the output results in Appendix 5a. As can be

    seen, there are several places where GMinuit should work but doesn’t. Some of these places are actually quite close to

    the actual minimum and hence even though we have fixed one problem in the code, others must exist. Again the bottom

    two histograms represent the number of correctly and incorrectly minimized positions which are of different distances

    away from the actual minimum. The second output screen shows the proportional number of correctly and incorrectly

    minimized points with respect to the total number of points available at that level. It can be clearly seen that as the

    information you supply diverges from the actual minimum, the number of points which will actually converge correctly

    tends to zero. This happens quite well with Minuit but GMinuit again struggles.

    Using this information and the code, it was possible to go into

    detail for the points where GMinuit failed and fix two more

     problems making a total of 3 fixes. Running the same program

    again but now using this fixed version of GMinuit results in the

    output in Appendix 5b. It can now clearly be seen that GMinuit

    works well for this function and that it even out performs Minuit

    in several places. This encouraging result shows that my

    investigation was going in the correct direction at least and

    hopefully it should be possible to find all the problems causing

    the discrepancies seen.

    Noise Minuit GMinuit

    0% 847 814

    2.5% 805 806

    5.0% 165 176

    * More information about Rosenbrock’s Curved Valley problem can be found in Appendix 5a.

  • 8/17/2019 Function Minimization Report

    7/13

    10/3/2005 Andrew McLennan – University of Oxford Page 7 of 13

    Next stepTo determine whether all the differences between Minuit and GMinuit had

     been found, I tested one further function. The Goldstein and Price’s function

    with four minima** is another difficult problem for minimization methods

    due to it having more than one local minimum. My program again adapted to

    the new information, allowing me to specify more exactly how much

    information I would like to see outputted as well as having an extra indicator

    for the case when Minuit and GMinuit both failed but failed to different

    values. I felt that this was also an important case to consider as it shows thatthe method must have taken different routes to get to its final result. Upon

    running my program for this function I expected there to only be a few places

    where the function failed. Unfortunately, however, it was obvious there are

    still major problems with GMinuit due to the large quantity or red markers dotted all over the input region. Not only did

    Minuit outperform GMinuit, both methods left many islands and bands of non-convergence within areas of convergence

    and vise-versa. Also, by looking at the second output results, the distance away from the actual minimum did not seem to

    affect the probability of whether a set of input parameters would converge.

    To get the most out of my program, it should be run with input parameters. The following is an example of the program:

    !  root Project5GoldsteinAndPriceFunction.C+( range , extra , print , X , Y ), where

    “range” is the distance either side of our chosen point to be displayed,

    “extra” is whether we would like only the basic output or the full detailed program output,

    “ print ” decides how much information each method should display, going from 0 to 3,“ X ” & “Y ” are the (x,y) coordinates about which my program should look.

    The default execution parameters would effectively be equivalent to

    !  root Project5GoldsteinAndPriceFunction.C+( 10 , 0 , 0 , 0 , -1.0 )

    ConclusionDuring this project, I have been able to investigate the properties of different minimization strategies. I have been able to

    show graphically where each method works and compare the new and old versions of Minuit. Even though three fixes

    have resulted from my investigation I have been able to show that there must be more still hidden within the code

    somewhere. The most important result I have found, however, is that both the new and old versions of Minuit fail in

     places where they were not expected to. It would be very interesting to be able to look at the code in far more detail andfollow the routes they take for these islands compared to the surrounding areas to find out what the reason is for the

    differences. Then, it may be possible to produce more rigorous methods for f inding minimizations and fits to histogramed

    data.

    Thus, to take this project further I would initially look at fixing the problems which are causing GMinuit to fail in so

    many places where Minuit actually worked for the Goldstein and Price’s function. This would be done in the same way

    as before by looking at the locations which caused anomalies, seeing what results they were actually giving and trying to

    use this information together with stepping through the code until the problem is discovered.

     Next I would adapt my program to higher dimensions, specifically 4 dimensions, as there exists several difficult test

    functions for which it would be interesting to see how GMinuit performed. As it is only really possible to display 2

    dimensions effectively, I would want it to be possible to fix certain dimensions and vary others. It should therefore be

     possible to specify this in the programs input parameters and at which specific values the fixed dimensions should be setto. The information I would then get for these new test functions would either provide more information and points at

    which need to be compared (so as to fix bugs within GMinuit) or increase our confidence in its algorithm. Finally, the

    last extension I would make would be to display both the function and region of convergence on the same graph, as this

    would make the investigation easier to visualize. Each comparison point would then be tied directly to an area of the

    function being minimized.

    As this investigation was very open ended with no previous information about what results to expect, I had to produce

    my program so that it could evolve and develop as more information was discovered. This can be seen with the

     progression of the programs I produced. The final program I produced is thus stable and has the ability to provide a firm

     base to continue the investigation. In the end I decided to only compare versions of Minuit leaving the Fumili

    investigation for another time. However, this wouldn’t be hard to incorporate into my program if someone so wished. I

    felt that comparing Minuit versions (which are generalized method) to Fumili (which is a specialized method), would not

    actually produce any comparable results.

    * More information about Goldstein and Price’s Function with four minima can be found in Appendix 5b.

  • 8/17/2019 Function Minimization Report

    8/13

    10/3/2005 Andrew McLennan – University of Oxford Page 8 of 13

    Appendix 1The output of my program for comparing Minuit, Fumili and GMinuit when good initial information is supplied. As can

     be seen, each method fits the function but the running time of each method differs greatly.

    The horizontal axis contains the bins for the range of the function.

    The vertical axis is the number of elements for each bin.

  • 8/17/2019 Function Minimization Report

    9/13

    10/3/2005 Andrew McLennan – University of Oxford Page 9 of 13

    Appendix 2a

    Appendix 2b

    Minuit

    Old GMinuit

    New GMinuit

  • 8/17/2019 Function Minimization Report

    10/13

    10/3/2005 Andrew McLennan – University of Oxford Page 10 of 13

    Appendix 3

    Fumili

  • 8/17/2019 Function Minimization Report

    11/13

    10/3/2005 Andrew McLennan – University of Oxford Page 11 of 13

    Appendix 4

    Noise Minuit GMinuit

    0% 847 814

    2.5% 805 806

    5.0% 165 176

    0% noise

    2.5% noise

    5.0% noise

    Results of

    minimization

    NeitherBoth

    Minuit only

    GMinuit only

  • 8/17/2019 Function Minimization Report

    12/13

    10/3/2005 Andrew McLennan – University of Oxford Page 12 of 13

    Appendix 5a

    Appendix 5b

    Rosenbrock’s Curved Valley

    ( )   222 )1(100),(   x x y y x F    +(  

    Minimum: 

    0)0.1,0.1(   = F   

    Possible starting point: 

    1..14).,0,1.0(   =! F   

    This problem is probably the best known

    test problem for minimization methods.

    It consists of a narrow parabolic valley

    with very steep sides. The floor of the

    valley follows approximately the

     parabola 20012+=  x y  and stepping

    methods tend to perform at least as well

    as gradient methods for this function.

    Before the two problems were fixed

    After the two problems were fixed

  • 8/17/2019 Function Minimization Report

    13/13

    10/3/2005 Andrew McLennan – University of Oxford Page 13 of 13

    Appendix 6

    Goldstein and Price function with four minima( ) ( )   ( ) ( ),,,,,, , 706481,0,180,0006140141911),(   y xy y x x y x y xy y x x y x y x F    +++++++++++  

    Local minima:  840)8.0,2.1(   = F   

    84)2.0,8.1(   = F   

    30)4.0,6.0(   =!! F   

    Minimum:  3)0.1,0(   =! F   

    Possible starting point:  35)6.0,4.0(   =!! F   

    This is another standard test function for minimization methods. It is an eighth-order polynomial in two variables

    which is well behaved near each minimum, but has four minima. An interesting place to start looking would be at the

    above starting point as it lies in between the two lowest minima.