"The Variable Projection method: some applications to Physics" "The Variable Projection method: some...

"The Variable Projection method: some "The Variable Projection method: some applications to Physics"applications to Physics"

V. Pereyra

Weidlinger Associates

399 El Camino Real #200

Mountain View, CA 94040

Yale University

International Workshop on Numerical Analysys and QCD

May 1-3, 2007

PlanPlan

Separable Nonlinear Least Squares Problems

The Variable Projection Method

Properties

Applications

– Mossbauer Spectroscopy

– Nuclear Magnetic Resonance Spectroscopy

– Lattice Quantum Chromodynamics

– Highly Accurate Approximation of Difficult Functions

– Neural Networks

– Surrogates

– Example Application: Optimal Design of a Supersonic Business Jet

Separable Nonlinear Least Squares ProblemsSeparable Nonlinear Least Squares Problems

In 1973, Gene Golub (Stanford University) and I wrote a paper introducing the method of Variable Projections for the solution of separable nonlinear least squares problems:

where the vector 'a' contains the "linear" parameters, 'w' holds the "nonlinear" parameters; 'x' is the vector

of input variables and 'y' is the output variable. The pairs (x j,yj) are the data to be fitted in the least squares

sense.

Thus, these models: aiI(x;w), are characterized for being linear combinations of nonlinear functions.

This method had a remarkable number and variety of applications in the last 30 years (see references

below) and I would like to comment briefly on some of them that may be of interest to this audience. "The differentiation of pseudo-inverses and nonlinear least squares whose variables separate". SIAM J. Numer. Anal.

10:413-432 (1973).

"Separable nonlinear least squares: the Variable Projection method and its applications". Inverse Problems 19:R1-R26 (2003).

€

mina,w ||Φ(x; w) a−y||22

=mina,w ( aiφi(

i=1

k∑ x

j;w)−y

j

j=1

m∑ )2,

Variable Projections MethodVariable Projections Method

The basic idea of the Variable Projection method is to solve "analytically" for the linear parameters in order to eliminate them from the problem:

a = +(w)y,

+ is the pseudoinverse of .

Since is the projector on the space orthogonal to the columns of (w), we

call this the Variable Projections functional.

The method consists of minimizing the modified problem in the nonlinear parameters 'w' first and then obtaining the linear parameters 'a' using the above formula. This implies that no guessed values are needed for the linear parameters and the problem dimension is diminished.

It turns out that the reduced problem is always better conditioned than the original one and the same iterative method converges faster for it. If properly implemented, the iterations have a similar cost, so the VP method is always faster.

An additional problem that was solved in the original paper comes from the fact that, for the above functional, when using optimization algorithms that require gradients implies that one needs to differentiate the pseudo-inverse of a matrix where the elements are functions of several variables.

€

minw ||(Φ(w)Φ+(w)-I)y ||2

2.

€

(Φ(w)Φ+(w)-I)

A Comparison of Methods (in original paper)A Comparison of Methods (in original paper)Optimization Methods Used in ComparisonsOptimization Methods Used in Comparisons

A1: Minimization without derivatives (R. Brent, PRAXIS).

A2: Minimization by Gauss-Newton (G-N) with step control.

A3: Minimization by Marquardt's regularization of G-N.

The method was implemented in a publicly available computer program: VARPRO.

A more modern version was produced at Bell Labs. (Gay and Kaufman) and it is available at www.nanet.org

An interactive, well documented version was produced at the National Bureau of Standards and has been in extensive use during many years.

Applications: Mossbauer SpectroscopyApplications: Mossbauer Spectroscopy

Nuclear Magnetic Resonance SpectroscopyNuclear Magnetic Resonance Spectroscopy

In NMR Spectroscopy, a sample is placed in an external magnetic field and a transient field from an RF coil is used to temporarily drive the various nuclei into a non-equilibrium distribution of magnetic spin states.

As the sample relaxes back to its equilibrium distribution, each type of excited nuclei radiates at a characteristic frequency fk. The sum of all these radiations gives rise to a signal of the form:

yn = akexpikexp(-dk+2fki)tn+ en.

Generally the frequencies are known (or can be picked in a pre-processing step) and one wants to determine the amplitudes ak and the damping parameters dk. There are a number of other a priori constraints that reduce the number of free parameters.

The problem of fitting this data is separable, with a model that is a linear combination of complex exponentials.

In vivoIn vivo NMRS and VP NMRS and VP

In 1988 van der Veen et al, at Delft University and Phillips Medical Systems published a very influential paper on the accurate quantification of in vivo NMR data using the Variable Projection method. By August 2002 this paper had more than 200 citations and as a result VARPRO became the standard tool in this area.

There is a center in Barcelona, with more than 300 paying customers, where a modern version of this software is located and used routinely in many different applications.

J. van der Veen, R. de Beer, P. Luyten, and D. van Ormondt, 'Accurate quantification of in vivo PNMR signals using the Variable Projection method and prior knowledge'. Magnetic Resonance in Medicine 6:92-98 (1988).

Lattice Quantum Chromo DynamicsLattice Quantum Chromo Dynamics

G. T. Fleming, What can lattice QCD theorists learn from NMR spectroscopists? Proceedings of the Third International Workshop

on Numerical Analysis and Lattice QCD, Edinburgh, Scotland, 2003.

QCD (continued)QCD (continued)

This is again a separable problem involving linear combinations of exponentials.

These problems are ubiquitous and have proven to be very hard, since they are usually ill-conditioned.

The great merit of VP is that the reduced problem is always better conditioned than the original one (see reference).

J. Sjoberg and M. Viberg, 'Separable non-linear least squares minimization - possible improvements for neural net fitting'. IEEE Workshop in Neural Networks for Signal Processing. Amelia Island Plantation, FL (1997).

Some Challenging Data Fitting Problems From Some Challenging Data Fitting Problems From Gregory Beylkin (UC, Boulder)Gregory Beylkin (UC, Boulder)

• Beylkin and Monzon have developed some very accurate ways of approximating difficult functions

• Their method combines quadrature formulas, moments, and other techniques, but in some cases the approximants consist of linear combinations of complex exponentials or Gaussians

• I show below how VARPRO performs in some of these tasks.

G. Beylkin and L. Monzon, " On Approximation of Functions by Exponential Sums" (2004). Applied and Computational Harmonic Analysis 19 (2005)

Approximating 1/x in [0.01,1]Approximating 1/x in [0.01,1]

By linear combinations of real exponentials; time is on a PC Pentium 4, 1.5 GHz, running LINUX

# Exponentials Max. rel. Error Time (sec)

10 2x10-7 0.39

14 1.8x10-8 0.46

16 1.8x10-9 0.57

18 6.5x10-11 0.5

Approximating Bessel's Function JApproximating Bessel's Function J0 0

inin [0,20[0,20]]

By a linear combination of 28 damped exponentials (i.e., real part of complex exponentials), using 2000 uniformly chosen samples. Exact and approximate are superposed and undistinguishable.

J0 Fitting ErrorJ0 Fitting Error

Neural Networks: Single Hidden Layer, Neural Networks: Single Hidden Layer, Fully Connected PerceptronFully Connected Perceptron

x1

x2

i(x1,x2)=1/(1+exp (w0i+w1ix1+w2ix2))

y(x1,x2) = i(x1,x2;wi)

i=1,...,3

Neural Networks (really)Neural Networks (really)

Neural Networks provide universal approximation functions of n variables that are linear combinations of nonlinear functions, such as sigmoids or arctanh:

€

φ(t)=1/(1+exp(−t)) Sigmoid

aiφi(

i=1

k∑ w

ijx j)

j=1

n∑ = a

iφi(

i=1

k∑ <w,x>) SingleLayerPerceptron

V. Pereyra, G. Scherer and F. Wong, 'Variable Projections Neural Network Training'. Mathematics and Computers in Simulation 73:231-243 (2006).

Training the NetworkTraining the Network

Training the network means finding the linear and nonlinear parameters in order to best fit (usually in the l2 sense) a set of input and output values, the training set:

T = {xs, ys), s=1,...,m, with xs Rn

So, this is also a separable problem, amenable to VARPRO.

Usually one keeps part of the data separately for testing the resulting Network.

One problem is to decide how many basis functions are necessary for a given accuracy. We choose to run repeatedly with an increasing number of nodes until we detect overfitting.

This process is sometimes refered to as "learning by example".

SurrogatesSurrogates

In applications (such as optimal design) in which one wants to evaluate many times an expensive function (say, the product of a large scale simulation), it is convenient to first create a data base of inputs/outputs and use it to generate a faithful approximation that can be evaluated rapidly.

These approximations are called response surfaces, proxies or surrogates and the process is called a fast model evaluation.

Neural Networks can be used to provide a general surrogate, given a data base of inputs and outputs that can be used to train and test the network.

Design requirements for a supersonic aircraft (objectives and constraints in red)

– 8,000 lb payload

– Cruise Mach number = 1.8

– Take-off and landing field lengths < 6,000 ft

– Ultimate load factor = 2.5 for pull-up maneuver

– 2 modern turbofan engines

Design parameterization

– Wing parameters: sweep, aspect ratio, position, reference area

– Mission parameters: Maximum Take-Off Gross Weight, Initial and final cruise altitudes

Example Design TaskExample Design Task

Example Design Task (Cont’d)Example Design Task (Cont’d)

Multi-objective optimization problem. By varying the parameters of the problem, simultaneously:

– Minimize the maximum sonic boom perceived loudness (in dBA with rise time modification)

– Maximize the range (in nmi) of the aircraft

Mission module accounts for cruise, subsonic climb, supersonic acceleration and take-off and landing segments.

Multiphysics simulation:

– Aerodynamics

– Aeroelasticity (structural analysis)

– Control

Surrogate FunctionsSurrogate Functions

Inputs:

-----------------------1. Wing reference area,

2. Wing aspect ratio,

3. Longitudinal position of the wing,

4. Wing sweep angle,

5. Lift coefficient at initial cruise,

6. Lift coefficient at final cruise,

7. Altitude at initial cruise,

8. Altitude at final cruise.

Outputs

----------------------------------• Drag coefficient at initial cruise,

• Sonic boom initial rise at initial cruise,

• Sonic boom sound level at initial cruise w/out rise time modification,

• Sonic boom sound level at initial cruise w/ first type rise time modification,

• Sonic boom sound level at initial cruise w/ third type rise time modification,

• Drag coefficient at final cruise,

• Sonic boom initial rise at final cruise,

• Sonic boom sound level at final cruise w/out rise time modification,

• Sonic boom sound level at final cruise w/ first type rise time modification,

• Sonic boom sound level at final cruise w/ third type rise time modification.

Results of Training for Output #3Results of Training for Output #3 The NN results for output #3,

"Sonic boom sound level at initial cruise w/out rise time modification"

are shown in the next table.

We look for an increase in either the RMS of the training or testing set when increasing the number of nodes, but also monitor the maximum error and the values of the linear coefficients of the network, for a good quality fit.

The values of the output values are in the range 85-95. We scale the input variables to [-1, 1].

40 random initial vectors of nonlinear parameters are chosen, and VARPRO is run for each one of them, since the problem is non-convex. The solution with the smallest rms is selected.

Results of Training for Output #3Results of Training for Output #3

Results on a 3 GHz Intel Xeon 64 bits dual processor

# Nodes RMS training RMS testing time (sec) max. res. max.coeff.

2 0.75 1.65 12.56 4

3 0.69 10.43 44.87 4

4 0.57 1.23 68.9 2

5 0.58 1.09 106.7 4

6 0.51 1.2 190 3

7 0.46 1.11 334 1.8 108

8 0.46 1.47 452 2.8

9 0.47 0.94 649 2.5

10 0.44 1.07 661 1.5 104

12 0.35 1.12 1097 2

VisualizationVisualization

An unresolved problem (as far as I know) is the meaningful visualization of functions of many variables (say more than 5!, and in this case 8).

In the next slide we show some views of the response surface #3 (sonic boom sound level at initial cruise) as a function of pairs of input variables, keeping the others fixed. This a notoriously nonlinear function with sharp discontinuities (shock fronts).

The NN chosen has 10 sigmoids (i.e., 90 nonlinear parameters and 11 linear ones).

We recall ...We recall ...

Multi-objective optimization problem. By varying the parameters of the problem, simultaneously:

– Minimize the maximum sonic boom perceived loudness (in dBA with rise time modification)

– Maximize the range (in nmi) of the aircraft

– The take off and landing length are restricted:

Take-off and landing field lengths < 6,000 ft

OptimizationOptimization

We use an evolutionary algorithm to get Pareto equilibrium points for the bi-objective optimization problem.

The costly aerodynamics and boom calculations are replaced by the surrogate functions

We use Deb's algorithm that tends to locate points in the boundary of the Pareto set.

These point are then analyzed and an appropriate compromise is chosen.

This compromise design is then verified with the high fidelity tools.

Some new algorithms under development promise to be two orders of magnitude faster

V. Pereyra, Fast Computation of Equispaced Pareto Manifolds and Pareto Fronts for Multi-objective Optimization Problems. To appear in Mathematics and Computers in Simulation, 2007.

Deb, K., Pratap, A., Agrawal, S., & and Meyarivan T., A Fast and elitist multi-objective genetic algorithm: NSGA-II. Technical Report No. 2000001. Kanpur: Indian Institute of Technology Kanpur, India, 2000.

J. J. Alonso, P. LeGresley and V. Pereyra, 'Aircraft design optimization'. To appear in Mathematics and Computers in Simulation (2007).

Open circles (64): initial population; only blue circles are feasible;

Asterisks: Pareto equilibrium front (blue for feasible designs).

Pareto FrontsPareto Fronts

By the way …By the way …

This paper has an early proof of the convergence of nested iterations. It gives the recipe for quadratic convergence of inexact Newton for large problems, where one uses an inner iterative method to solve the linear systems. This is usually attributed to R.S DEMBO, S.C. EISENSTAT, and T. STEIHAUG. Inexact Newton methods. SIAM J. Numer. Anal., 19:400--408, 1982.

And by the way …And by the way …

This contains an stable O(n**2) method for solving (square) Vandermonde systems.

"The Variable Projection method: some applications to Physics" "The Variable Projection method: some...

Documents

Transcript of "The Variable Projection method: some applications to Physics" "The Variable Projection method: some...