COS702; Assignment 4 Cross Validation Methods

COS702;Assignment 4

Cross Validation MethodsUniversity of Southern Missisippi

Tyler Reese

November 7, 2012

The Problem

COS 702, Assignment 4:

Using Inverse MQ (1/√r2c2 + 1) and Gaussian (e−cr2) Radial Basis Func-

tions in the region [0,1]2 and the following test function:

f1 = @(x, y) .75 ∗ exp(−((9 ∗ x− 2).2 + (9 ∗ y − 2).2)/4);f2 = @(x, y) .75 ∗ exp(−((9 ∗ x + 1).2/49 + (9 ∗ y + 1).2/10));f3 = @(x, y) .5 ∗ exp(−((9 ∗ x− 7).2 + (9 ∗ y − 3).2)/4);f4 = @(x, y) .2 ∗ exp(−((9 ∗ x− 4).2 + (9 ∗ y − 7).2));

testfunction = @(x, y) f1(x, y) + f2(x, y) + f3(x, y)− f4(x, y);

Examine the performance of the following cross validation techniques:

• Leave-One-Out Method [2]

• The technique shown by Rippa [1, 2]

over a range of shape parameters and dataset sizes using Halton SequenceQuasi-Random data points.

Base their performance on both the time it takes to run for each datasetsize and the RMSE that results from using the shape parameter generatedfor each instance over an additional 100 random points.

Compare these results to those obtained by using a simple ”brute force”method of generating a full interpolation matrix for each shape parameterfor each size dataset and finding the smallest RMSE by testing each of themover 100 random test points.

RMSE =√

1100

∑100k=1(f(xk, yk)− s(xk, yk))2

1

Methods Overview

As it offers a useful summary of the Radial Basis Function (RBF) methodutilized in this work, the follow description is taken directly from the reporttitled COS702 Assignment 1; Radial Basis Functions by Tyler Reese.

”Using the functions defined in the provided matlab files (DistanceMa-trix.m, testfunction.m, hatonseq.m),... The main procedure is to reconstructa surface using radial basis functions (RBFs) is to use known data (i.e. thevalue of f(x,y)) to generate distance matrix with each element representingits cumulative distance in the x-y plane from the rest of the set. The ele-ments from this distance matrix are then used to calculate the correspondinginterpolation matrix with each element in the interpolation matrix equal tothe value of the RBF evaluated at the value of the distance matrix. Ap-propriately setting the product of the interpolation matrix and a vector ofunknown coefficients equal to the vector comprised of the values for f(x,y)leads to the solutions for the unknown coefficients. With these coefficients,the surface can be reconstructed over a set of known test points and thevalidity of the model confirmed using a similar procedure with the excep-tion that now the coefficients are known and the values for f(x,y) are beingcalculated and compared to the known values.” [4, 3]

Similar to a subsection of that report, the problem discussed here per-tains to determining the shape parameter, c, that is used in both the InverseMQ and the Gaussian RBFs. Optimizing that shape parameter improvesthe accuracy of the interpolated results. The Leave-One-Out method of de-termining an optimal shape parameter involves removing a single data pointfrom the dataset used to as data centers in developing the initial distancematrix. After building the interpolation matrix based on this N-1 dataset,this interpolation matrix is then used to generate a prediction at the locationof the point that was removed. This prediction is then compared to the ac-tual value of the point. The difference between these two values is recordedas the error associated with that point, ei. The whole process is carried outN times, sequentially removing a different individual point from the originaldataset. Each time this process is repeated, the interpolation matrix is con-structed using N-1 data points and is tested on a different data point untilthey have all been used as the test points. Then the Prediction ResidualError Sum of Squares (PRESS) is caculated by the following formula:

PRESS =∑N

1 (e2i )

2

This process is repeated of using a range of shape parameters, anddetermining the shape parameter that yields the smallest PRESS value isthe means by which the Leave-One-Out method generates an optimal shapeparameter. While this method is straight forward and thorough, it is alsoclear that it is very computationally expensive. [2]Another method for determining an optimal shape parameter wasdeveloped by Shmuel Rippa and was discussed in his paper titled Analgorithm for selecting a good value for the parameter c in radial basisfunction interpolation [1]. The resulting formula that aids in optimizing theshape parameters, as demonstrated by Rippa, is as follows:

Ec(k) = akA−1

kk

where ak is the kth coefficient of the vector that is multiplied by thedistance matrix to generate the interpolation matrix, and A is thatinterpolation matrix, making A−1

kk the kth diagonal element of the inverse ofthat matrix [2]. This method only requires a single calculation of theinterpolation matrix using all N points for each shape parameter, and theoptimal shape parameter is found by determining which error vector, Ec,has the smallest norm.The Matlab scripts developed and/or modified to solve this problem use thegeneral method used in the RBFeff.m script provided and include calls tothe functions mentioned above (DistanceMatrix.m, testfunction.m, andhaltonSeq.m). The script titled RBFeffHaltonCross.m employs theLeave-One-Out method of cross validation for datasets of Halton SequenceQuasi-Random points. The number of data points utilized is specified bychanging the value on line 4 (values used were 25, 30, 35, 40, 45, 50, and100). The script titled RBFeffHalCrossRippa.m employs the method ofcross validation outlined by Rippa for datasets of 25, 30, 35, 40, 45, 50, 100,200, 300, 400, and 500 Halton Sequence Quasi-Random points. Both ofthese scripts run both the Inverse MQ and Gaussian RBFs and output theoptimal shape parameter found for each and the time it took to determinethem as well as a plot of PRESS vs. shape parameter or norm(E) vs. shapeparameter. These shape parameter values were then included in a modifiedversion of a script developed for Assignment 1 titled RBFeffHalton.m. Thisscript evaluated the RMSE over 100 random test points for both RBFssequentially employing each of the shape parameters found.These results are also compared to the ”brute force” method developed todetermine appropriate shape parameters for Assignment 1. The previously

3

developed script, RBFeffHaltonFindC.m, was modified to find the shapeparameter for datasets of 25, 30, 35, 40, 45, 50, 100, 200, 300, 400, and 500points and generate the same outputs as the two cross validation scripts.This script generates a distance matrix for each size dataset, then uses it tobuild interpolation matrices for each of the shape parameters included inthe range specified (2.01 to 7 in 0.01 increments), compares each against100 random test points to minimize the RMSE directly, and outputs thesmallest RMSE found. It should be noted that the results of this methodand the results of the two methods previously discussed can not be directlycompared as each instance of this method uses 100 more points than areprovided to the other two methods to determine the optimal shapeparameter. This will be discussed further in the conclusions section.

4

Results

Below in Figures 1, 2, 3, 4, 5, 6, and 7 are examples of the results from eachof the various methods. In the case of the Gaussian RBF shape parametersas calculated from 400 Halton Quasi-Random points, it can be seen thatthere is a region of instability for smaller values of the shape parameter,c.This behavior affects larger values of c as the density of points in the region[0,1]2 increases due to the fact that this generates points that are very nearto eachother and for smaller values of c this creates issues with solving thematrix equation A ∗ c = Z. To illustrate this point, a second example plotis given showing the results generated using only 200 points. This behaviorwas also exhibited in the case of the Inverse MQ RBF, but it primarilyoccurs below the lower limit of the values for the shape parameter that arebeing investigated here and can only slightly be seen in the case of 500points (not shown).

Figure 1: Example plot of the effect of shape parameter on the PRESS valuewhen employing the Leave-One-Out method on Inverse MQ RBFs on HaltonQuasi-Random Points.

5

Figure 2: Example plot of the effect of shape parameter on the Norm ofthe error vector when employing the Rippa method on Inverse MQ RBFs onHalton Quasi-Random Points.

Figure 3: Example plot of the effect of shape parameter on the RMSE whenemploying the ”brute force” method on Inverse MQ RBFs on Halton Quasi-Random Points.

6

Figure 4: Example plot of the effect of shape parameter on the PRESS valuewhen employing the Leave-One-Out method on Gaussian RBFs on HaltonQuasi-Random Points.

Figure 5: Example plot of the effect of shape parameter on the Norm of theerror vector when employing the Rippa method on Gaussian RBFs on HaltonQuasi-Random Points.

7

Figure 6: Example plot of the effect of shape parameter on the RMSE whenemploying the ”brute force” method on Inverse MQ RBFs on Halton Quasi-Random Points.

Figure 7: Example plot illustrating the change in instability in the shapeparameter as the density of data points changes (in this case it is decreased).

8

The following two tables outline the overall results for both Inverse MQ andGausian RBFs.

Table 1: Number of data points used, shape parameter, time elapsed, andcorresponding MSRE from 100 random points for Inverse MQ RBF

Number of Points c time (s) RMSE(Leave-One-Out)

25 3.4 32.890 3.200x10−2

30 3.3 52.563 1.788x10−2

35 3.4 85.390 1.453x10−2

40 3.7 125.610 1.456x10−2

45 3.9 191.313 1.437x10−2

50 4.3 257.25 1.339x10−2

100 2.6 3019.7 2.644x10−3

(Rippa)25 2.44 0.219 4.652x10−2

30 3.54 0.172 2.247x10−2

35 3.32 0.250 1.818x10−2

40 3.06 0.312 1.813x10−2

45 2.01 0.391 2.349x10−2

50 3.11 0.500 1.324x10−2

100 3.56 2.437 5.186x10−3

200 2.69 11.797 2.757x10−4

300 3.26 32.375 6.002x10−4

400 2.39 67.172 4.227x10−5

500 2.28 119.781 2.270x10−7

(”Brute Force”)25 5.52 0.609 2.815x10−2

30 3.67 0.656 1.976x10−2

35 3.16 0.718 1.467x10−2

40 3.17 0.828 1.545x10−2

45 3.49 0.937 1.384x10−2

50 2.98 1.109 7.538x10−3

100 2.67 2.969 2.811x10−3

200 3.64 8.890 8.887x10−5

300 2.59 19.797 4.467x10−6

400 2.02 34.312 2.601x10−6

500 2.2 56.969 1.706x10−7

9

Table 2: Number of data points used, shape parameter, time elapsed, andcorresponding MSRE from 100 random points for Gaussian RBF

Number of Points c time (s) RMSE(Leave-One-Out)

25 2.6 31.078 6.02x10−2

30 7.5 49.843 1.08x10−1

35 7.5 78.969 1.07x10−1

40 7.4 117.25 6.58x10−2

45 5.3 166.593 2.98x10−2

50 5.5 260.89 2.49x10−2

100 3.6 2793.8 3.14x10−2

(Rippa)25 3.87 0.141 4.59x10−2

30 4.01 0.172 4.76x10−2

35 4.03 0.234 4.63x10−2

40 2.94 0.312 2.86x10−2

45 3.36 0.375 2.79x10−2

50 4.26 0.484 1.95x10−2

100 5.4 2.234 1.10x10−2

200 5.59 11.281 2.37x10−3

300 6.16 31.297 1.78x10−3

400 6.2 64.485 1.01x10−3

500 5.89 115.922 1.73x10−6

(”Brute-Force”)25 4.26 0.531 3.14x10−2

30 3.45 0.61 4.03x10−2

35 3.62 0.672 2.44x10−2

40 4.18 0.75 1.78x10−2

45 3.7 0.875 2.20x10−2

50 4.28 1.015 1.11x10−2

100 4.74 2.453 5.77x10−3

200 9.05 8.894 3.28x10−4

300 5.89 19.64 6.58x10−5

400 5.98 38.282 2.15x10−5

500 5.95 68.453 2.57x10−6

10

Conclusions

Given the values shown in Tables 1 and 2, one can immediately confirmthat the Leave-One-Out method is very computationally expensive. Foronly 25 data points, the Leave-One-Out method takes over 30 second, andfor 100 points it takes over 2700 seconds to run (over 45 min). While it isclear that this is an inefficient method, it can be seen that in general asmore points were used the resulting RMSE was decreasing in a mannerroughly similar to that exhibited by the other methods. This implies thatthe concept employed was sound even though the practicality ofimplementation is very limited.As expected, the method developed by Rippa is much more efficient thanthe Leave-One-Out method. For data sets containing 50 data points or less,it takes only up to half a second to run and yields RMSE values in the low10−2 range. As the number of points increases, the RMSE drops until itreaches 1.730x10−6 for the Gaussian RBF and 2.270x10−7 for the InverseMQ. These values come at a slight computational cost (just under 2 min),but they take nowhere near the time required for the Leave-One-Outmethod to process 1/5 the number of points.An unexpected result was the overall performance of the ”brute force”method. For smaller sizes of datasets (≤ 50), it is slightly out performed bythe Rippa method as far as computational time is concerned while yieldingonly slightly better RMSE values. However, for larger sized datasets, the”brute force” method starts to out perform the Rippa method forcomputational time, and relatively speaking it has a very impressivecomputational time for 500 points at around half that taken by the Rippamethod. In general, it also yields smaller RMSE values. As was previouslymentioned, direct comparisons must be made carefully due to the fact thatthe ”brute force” method is using an extra 100 random points during itscalculation of the optimal shape parameter. This is essentially equivalent tosetting aside 100 points from the dataset to use later for testing theaccuracy of the interpolation. With this in mind, one compares the resultsfor the ”brute force” method to those corresponding to the Rippa methodusing 100 more points (i.e. the 400 pt ”b.f.” to the 500 pt Rippa and soon). From this perspective, when there are enough data points to set aside100 of them, the ”brute force” method seems to be a clearly better choiceas far as computational time is concerned and only potentially gives up aslight benefit on accuracy. Certainly, for cases when there are a small

11

number of points available (i.e. less than 200, though hard to confirm fromthis particular set of results), the Rippa method appears to have the upperhand and quickly provides N evaluations of the error for each shapeparameter with N equal to the number of points in the data set.With this bit of knowledge concerning the potential benefit to the ”bruteforce” method, there are two relatively obvious directions one might beinclined to investigate. First, can the implementation of the Rippa methodbe made more efficient through more sophistifated programmingtechniques? However, if this is the case, it might also be possible to makethe ”brute force” method more efficient as well. Second, can the ”bruteforce” method be optimized for the size of the data set provided? That isto say that perhaps as the data set provided gets smaller, the number ofpoints ”set aside” to evaluate the accuracy against should also get smaller.Perhaps this could even be developed in such a way that as the size of thedata set provided decreases, the number of points in this subset convergesto a single point where the Rippa method offers a clear benefit over the”brute force” method.

12

References

[1] Shmuel Rippa, An algorithm for selecting a good value for the parameterc in radial basis function interpolation, Advances in ComputationalMathematics 11, pgs. 193-210, 1999.

[2] C.S. Chen, Lecture Notes; COS 702; Choosing a good shape parameterusing cross validation, Department of Mathematics, University ofSouthern Mississippi, USA.S

[3] C.S. Chen, Y.C.Hon, R.A. Schaback, Scientific Computing with RadialBasis Functions, Department of Mathematics, University of SouthernMississippi, USA.S

[4] Tyler Reese, COS702 Assignment 1; Radial Basis Functions, Universityof Southern Mississippi, Septmeber 22nd, 2012.

13

COS702; Assignment 4 Cross Validation Methods

Documents

Transcript of COS702; Assignment 4 Cross Validation Methods