Function Minimization Report

8/17/2019 Function Minimization Report

1/13

Function Minimization

By Andrew McLennan

Supervisor: Lorenzo Moneta


2/13

10/3/2005 Andrew McLennan – University of Oxford Page 2 of 13

Index

Introduction to Minimization 3 Project Description 3Minimization for Histograms 4Examining Areas of Convergence 4Introduction of extra noise 5

Pure Minimization 6Next Step 7Conclusion 7Appendix 1 8Appendix 2a 9Appendix 2b 9Appendix 3 10Appendix 4 11Appendix 5a 12Appendix 5b 12Appendix 6 13

I would like to take this opportunity to thank everyone who has helped me to get this far at CERN. Special thanks go to

my supervisor Lorenzo Moneta for his teaching and direction during this project. To Andras Zsenei for his consistent

help in the office. To everyone in my department for their support during this summer. To my university tutors Geraint

Jones and Andy Wathen for their teaching and references. To all my friends here at CERN who have made my time so

here so enjoyable and taught me so many new things about their cultures. Especially Jan, Nino, Dana, David, Carlos,

John, Stian, Diana, Florian, Laura, Lucia, Cristina, Aoife and Laura who I will hopefully go and visit soon. And finally, I

would like to thank my friends and family back in England for always being there for me.


3/13


Introduction to MinimizationA large class of problems in many different fields of research can be reduced to the problem of finding the smallest value

taken on by a function of one or more variable parameters. For example, the minimum of ( )23)( != x x f is zero and isobtained at x = 3.

The classic example of minimization which occurs so often in scientific research here at CERN, however, is the

estimation of unknown parameters in a theory, by minimizing the different between theory and experimental data.

Before we can tackle the minimization problem, we have to state what assumptions we are allowed to make. It is

assumed that the function F(x) is not actually known analytically but is only defined by the value it takes at a given

position. It is also assumed that we are allowed to specify a range which the parameters are allowed to take. Any

additional information we wish to provide such as the numerical values of the derivatives dF/dx at any point should be

given when available, but in general should not be assumed.

The function should get repeatedly evaluated at different points until its minimum is found and the method which finds

the minimum (within a given tolerance) after the fewest evaluations is generally considered to be the best.*

There are several difficult situations which the minimization methods have to overcome. These include

• Finding the global minimum opposed to a local minimum

• Finding the minimum point located somewhere within a large plateau

•

Finding the minimum point without having to check every point in the allowable range

We will test our methods to check whether the above situations are handled correctly

Project DescriptionThe goal of my project here at CERN is to examine the relationship between the amount of information we need to

supply in order for the minimization methods to converge to the correct result as well as comparing the differences

between the minimization methods Minuit, GMinuit and Fumili. Minuit comes from the class TMinuit and GMinuit is

the name of the packer for the new C++ version of Minuit. Both Minuit and GMinuit are general minimization methods

based on Fletcher’s unified approach to Variable Metric Methods (VMM) which combines rank-one and rank-two

formulas to deal with different types of minimization problems.** Fumili comes from the class TFumili and is a

specialized minimization technique based on Chi Squared minimization and is designed to work very quickly for certain

functions provided we supply good starting information.

My aim has been to produce software tools to systematically analyze the length of time each method takes to run as well

as graphical displays of the regions in which each method converges. It is important that both Minuit and GMinuit work

provided the same initial conditions apply for backwards compatibility, but ideally GMinuit should work faster in general

and over a larger range of input parameters. It is also important to insure against islands being formed within the range of

input parameters.***

Minimization within ROOT**** can be used in several ways:

• In order to produce a “line of best fit” on data within a histogram

• With pure minimization of a function of our choosing.

Given a histogram, the production of a line of best fit is very useful. Minimization methods are designed to take this data

and use it to efficiently produce a function which best matches the data points. This is a difficult mathematical problem

and hence why many different methods exist. Pure function minimization is what is behind the production of a line of

best f it and hence should be tested also. I therefore restricted my investigation to looking at these two cases using non-

trivial functions to get the most out of the methods.

Initially I didn’t know what results I would produce and so a lot of experimenting was required. This allowed me to

become very familiar with the methods I was examining and helped me to evolve and tailor my programs at each stage so

that I could produce interesting and useful results to aid in the fixing of bugs within GMinuit.

* Occasionally other considerations may be important, such as the amount of storage required by the method or the amount of computation

required to implement the method, but normally the dominating factor will be the time spent in evaluating the function

** A general minimization method is one which can handle all functions. More details about these and many more minimization methods can

be found on the Minuit website under Documentation. http://www.cern.ch/minuit *** An island is a region of input parameters of non-convergence for the method, located inside another region of convergence.

**** ROOT was produced at CERN for use in analyzing the data from high energy physics. See http://root.cern.ch for more details.


4/13


Table 1

Minimization for histogramsThe first example of minimization I considered was the fitting of a histogram in order to produce a line of best fit. I

constructed a function consisting of many Gaussian peaks together with some linear background noise and used this to

populate a one-dimensional histogram.* The function ranged over 0-1000 with the peaks located at random places. This

type of histogram is regularly seen here at CERN and so is a good place to start my testing.

As the histogram was filled using randomly chosen points, I decided that the best way to test the pure efficiency of the

fitting algorithms would be to pre-locate each of the peaks using a TSpectrum object. This information was then passedto each method as their initial search parameters and the length of time each one took to converge was recorded.

The output of this program can be seen in Appendix 1 and Table 1 below contains the CPU running times of each

method.

As can be seen from the output, all three methods produced a good fit to the data but

more importantly each one produced the same fit to the data implying that this fit is

actually the best fit available.

Examining the lengths of time each method took to run reveals something interesting.

As suspected, Fumili works by far the fastest. This is because we supplied good

initial information and Fumili is a specialized method for problems such as this,

while Minuit and GMinuit are general methods. GMinuit ran faster than TMinuitwhen used in combination with an optimized fitter interface in the ROOT framework,

exploiting object oriented features of the new C++ version of Minuit. When the same

interface as in TMinuit is used, comparable running times are then obtained.

Further tests on the range of convergence are now needed to fully examine the differences between the running times and

range of convergence of the two methods before Minuit can be retired in favor of the new GMinuit.

Examining areas of convergenceFrom the above program and the general requirements of each minimization methods, we can expect that when good

information is provided, all methods will converge to the correct and same results. If,

however, information is supplied which is not good then it is not known whether themethods will fit correctly or not. To test the methods when different initial information

is supplied I produced a program to systematically cycle through the input parameters

in order to see at which points the method works and which points it fails.

To begin with I decided to start with a simpler example than the one above and to use

only two peaks located at 8.0 and 11.0 on the range of 0-20. The program was created

in such a way that changing the method of minimization involved only changing the

name within the TVirtualFitter’s SetDefaultFitter() method. For each iteration of the

program, my program set the initial suggested location of the means for the two

Gaussian peaks, before it performed the fit. It cycled through each of the suggested means from (-5,-5) to (25,25), going

up in integer intervals. The real means were located at 8.0 and 11.0 and so I expected that at least around these regions,

all three methods should work.

Together with identifying where each method converged correctly, I also produced two histograms of the length of time

each method took given a specific input parameter for each method. One was for when the fit was good and the other was

for when the method didn’t fit the histogram correctly. I chose to display the reciprocals of the actual times in order to be

able to see the places of very fast convergence more easily.

The results of this program for Minuit and GMinuit provided me with very useful information. As can been seen clearly

in the two pictures below, Minuit converged to the correct results in far more places than GMinuit did. This suggested

there was a major bug which needed to be fixed otherwise GMinuit wouldn’t be fully backwards compatible with the

older version. Another worrying result from this program was the existence of islands within areas of convergence and

bands of convergence and non-convergence. This wasn’t expected especially for such a simple example with parameters

so close to the actual minimum.

Looking at the lengths of time each method took to converge, we find that for GMinuit we almost always knew within 2seconds whether the method has converged correctly or not. Minuit does in some cases work faster than GMinuit, which

is a problem, but generally there exists a greater number of places where Minuit converges much more slowly.

CPU Running Times

TMinuit 0.64s

Fumili 0.15s

GMinuit

(new interface)0.47s

GMinuit

(interface as TMinuit) 0.66 s

* To populate the histogram I allowed the computer to randomly choose points from this function and used these values to fill the histogram.


5/13


By looking more closely at the points where Minuit performs better than GMinuit, it was possible to track down what

was causing some of these problems in the code and hence fix it. Running this program again using the fixed version of

GMinuit now produced results which were far more closely related to Minuit as can be seen below and in Appendix 2b.

Islands and bands still exist, but to solve this will require a much more in-depth look at the reasons and paths the

algorithm takes during its execution. The full output results of my program for Minuit and GMinuit can be found in

Appendix 2a. More research into the problem of this was therefore definitely needed.

Fumili on the other hand converged correctly in a far smaller region than either Minuit or GMinuit but for this function at

least no islands or bands existed. The more interesting result however is that when Fumili does fit correctly the running

time of the method is far less than either of the other two algorithms, but when bad initial information is supplied the

running time can be exceptionally large. Hence as suspected, provided we supply good information to Fumili it will work

very well for functions such as this, but if we don’t have adequate information then a general minimization method will

usually work faster. The full output results for Fumili can be found in Appendix 3.

Introduction of extra noiseTo allow me to further my investigation into these methods, I decided that generalizing my code would allow me much

more freedom in the future when changing the function to be examined. I increased the range of the function to 0-1000

with the peaks now located at 350.0 and 750.0 respectively. I also realized that currently the programs were taking muchtoo long to run using the standard ROOT interface and hence I adapted the code so that I could produce compiled C code

which was, in turn, more stable.

The main step I took, however, was to

have the comparison of Minuit and

GMinuit both located on the same

results output. I decided this would

make it easier to spot where the

methods differed and hence help my

investigation. Locations which are

coloured Black indicate parameters

where neither Minuit nor GMinuit

converged correctly. Blue representsareas where both converged correctly.

Green areas where GMinuit

outperformed Minuit and Red where

GMinuit failed but the original Minuit

worked.

Another feature I added to my program

was the ability to add some extra

random noise to the histogram. By being able to select and compare the differences between different percentage levels

of noise I was able to compare the methods under more realistic experimental conditions. My final adaptation was to

have two histograms displaying the frequency of convergence for the different distances away from the actual means.

This would show that as you move the initial parameters away from the actually means, the less likely it was that either

method would converge.

Minuit Old GMinuit Fixed GMinuit


6/13


Table 2

The above picture represents a section of the output from the updated program. As can be seen, when no extra noise is

introduced and even when using the version of GMinuit we had previously fixed, there are still several regions and bands

where the method fails.

When we next run the program with different levels of extra noise, we get some very interesting results. Not only are the

regions of convergence for the methods reduced as the amount of noise is increased, but points where the method

previously didn’t converge correctly now in some cases does converge. Bands on non-convergence still exist for this

function even as the level of noise is increased, but the position of the bands actually changes. For 5% extra noise we see

that GMinuit performs with a greater uniformity that Minuit as less islands are created. Finally, we notice for this

example at least, that as the amount of extra random noise is increased, GMinuit starts to out perform Minuit. Table 2 shows the actual number of points where the two method converge

Hence, not only does the accuracy of the initial information we supply have to be more

accurate as noise level increases, but it seems GMinuit was designed to work well as

the noise level increases. It would be interesting to have some further investigations

with different levels and types of noise to see whether GMinuit actually does work

better as noise increases. The full output results for the 3 different noise levels can be

found in Appendix 4.

Pure MinimizationThe final part of my project was to move away from using

minimization for fitting histograms to testing the minimization

methods for Minuit and GMinuit directly. I did this by again

adapting my program to deal with more complicated two

dimensional functions such as Rosenbrock’s Curved Valley

problem* with the aim of being able to change the function

under investigation easily. I designed the program so that it

was easy to change the viewing range of the function, the

amount of detail to print out, the position about which to

concentrate the investigation and also whether I wanted just

the basic information about this position or the extra graphical

outputs. All these parameters can be set at runtime allowingthe user to view general information around the actual

minimum and then to target specific regions or points without

having to go and change any of the underlying code.

Running the program without setting any of the input parameters produced the output results in Appendix 5a. As can be

seen, there are several places where GMinuit should work but doesn’t. Some of these places are actually quite close to

the actual minimum and hence even though we have fixed one problem in the code, others must exist. Again the bottom

two histograms represent the number of correctly and incorrectly minimized positions which are of different distances

away from the actual minimum. The second output screen shows the proportional number of correctly and incorrectly

minimized points with respect to the total number of points available at that level. It can be clearly seen that as the

information you supply diverges from the actual minimum, the number of points which will actually converge correctly

tends to zero. This happens quite well with Minuit but GMinuit again struggles.

Using this information and the code, it was possible to go into

detail for the points where GMinuit failed and fix two more

problems making a total of 3 fixes. Running the same program

again but now using this fixed version of GMinuit results in the

output in Appendix 5b. It can now clearly be seen that GMinuit

works well for this function and that it even out performs Minuit

in several places. This encouraging result shows that my

investigation was going in the correct direction at least and

hopefully it should be possible to find all the problems causing

the discrepancies seen.

Noise Minuit GMinuit

0% 847 814

2.5% 805 806

5.0% 165 176

* More information about Rosenbrock’s Curved Valley problem can be found in Appendix 5a.


7/13


Next stepTo determine whether all the differences between Minuit and GMinuit had

been found, I tested one further function. The Goldstein and Price’s function

with four minima** is another difficult problem for minimization methods

due to it having more than one local minimum. My program again adapted to

the new information, allowing me to specify more exactly how much

information I would like to see outputted as well as having an extra indicator

for the case when Minuit and GMinuit both failed but failed to different

values. I felt that this was also an important case to consider as it shows thatthe method must have taken different routes to get to its final result. Upon

running my program for this function I expected there to only be a few places

where the function failed. Unfortunately, however, it was obvious there are

still major problems with GMinuit due to the large quantity or red markers dotted all over the input region. Not only did

Minuit outperform GMinuit, both methods left many islands and bands of non-convergence within areas of convergence

and vise-versa. Also, by looking at the second output results, the distance away from the actual minimum did not seem to

affect the probability of whether a set of input parameters would converge.

To get the most out of my program, it should be run with input parameters. The following is an example of the program:

! root Project5GoldsteinAndPriceFunction.C+( range , extra , print , X , Y ), where

“range” is the distance either side of our chosen point to be displayed,

“extra” is whether we would like only the basic output or the full detailed program output,

“ print ” decides how much information each method should display, going from 0 to 3,“ X ” & “Y ” are the (x,y) coordinates about which my program should look.

The default execution parameters would effectively be equivalent to

! root Project5GoldsteinAndPriceFunction.C+( 10 , 0 , 0 , 0 , -1.0 )

ConclusionDuring this project, I have been able to investigate the properties of different minimization strategies. I have been able to

show graphically where each method works and compare the new and old versions of Minuit. Even though three fixes

have resulted from my investigation I have been able to show that there must be more still hidden within the code

somewhere. The most important result I have found, however, is that both the new and old versions of Minuit fail in

places where they were not expected to. It would be very interesting to be able to look at the code in far more detail andfollow the routes they take for these islands compared to the surrounding areas to find out what the reason is for the

differences. Then, it may be possible to produce more rigorous methods for f inding minimizations and fits to histogramed

data.

Thus, to take this project further I would initially look at fixing the problems which are causing GMinuit to fail in so

many places where Minuit actually worked for the Goldstein and Price’s function. This would be done in the same way

as before by looking at the locations which caused anomalies, seeing what results they were actually giving and trying to

use this information together with stepping through the code until the problem is discovered.

Next I would adapt my program to higher dimensions, specifically 4 dimensions, as there exists several difficult test

functions for which it would be interesting to see how GMinuit performed. As it is only really possible to display 2

dimensions effectively, I would want it to be possible to fix certain dimensions and vary others. It should therefore be

possible to specify this in the programs input parameters and at which specific values the fixed dimensions should be setto. The information I would then get for these new test functions would either provide more information and points at

which need to be compared (so as to fix bugs within GMinuit) or increase our confidence in its algorithm. Finally, the

last extension I would make would be to display both the function and region of convergence on the same graph, as this

would make the investigation easier to visualize. Each comparison point would then be tied directly to an area of the

function being minimized.

As this investigation was very open ended with no previous information about what results to expect, I had to produce

my program so that it could evolve and develop as more information was discovered. This can be seen with the

progression of the programs I produced. The final program I produced is thus stable and has the ability to provide a firm

base to continue the investigation. In the end I decided to only compare versions of Minuit leaving the Fumili

investigation for another time. However, this wouldn’t be hard to incorporate into my program if someone so wished. I

felt that comparing Minuit versions (which are generalized method) to Fumili (which is a specialized method), would not

actually produce any comparable results.

* More information about Goldstein and Price’s Function with four minima can be found in Appendix 5b.


8/13


Appendix 1The output of my program for comparing Minuit, Fumili and GMinuit when good initial information is supplied. As can

be seen, each method fits the function but the running time of each method differs greatly.

The horizontal axis contains the bins for the range of the function.

The vertical axis is the number of elements for each bin.


9/13


Appendix 2a

Appendix 2b

Minuit

Old GMinuit

New GMinuit


10/13


Appendix 3

Fumili


11/13


Appendix 4

Noise Minuit GMinuit

0% 847 814

2.5% 805 806

5.0% 165 176

0% noise

2.5% noise

5.0% noise

Results of

minimization

NeitherBoth

Minuit only

GMinuit only


12/13


Appendix 5a

Appendix 5b

Rosenbrock’s Curved Valley

( ) 222 )1(100),( x x y y x F +(

Minimum:

0)0.1,0.1( = F

Possible starting point:

1..14).,0,1.0( =! F

This problem is probably the best known

test problem for minimization methods.

It consists of a narrow parabolic valley

with very steep sides. The floor of the

valley follows approximately the

parabola 20012+= x y and stepping

methods tend to perform at least as well

as gradient methods for this function.

Before the two problems were fixed

After the two problems were fixed


13/13


Appendix 6

Goldstein and Price function with four minima( ) ( ) ( ) ( ),,,,,, , 706481,0,180,0006140141911),( y xy y x x y x y xy y x x y x y x F +++++++++++

Local minima: 840)8.0,2.1( = F

84)2.0,8.1( = F

30)4.0,6.0( =!! F

Minimum: 3)0.1,0( =! F

Possible starting point: 35)6.0,4.0( =!! F

This is another standard test function for minimization methods. It is an eighth-order polynomial in two variables

which is well behaved near each minimum, but has four minima. An interesting place to start looking would be at the

above starting point as it lies in between the two lowest minima.

Function Minimization Report

Documents

Transcript of Function Minimization Report