Curve Fitting Best Practice Part 5
Transcript of Curve Fitting Best Practice Part 5
![Page 1: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/1.jpg)
Enabling Science
IDBS • Unit 2 • Occam Court • Surrey Research Park • Guildford • Surrey GU2 7QB • UK
t: +44 1483 595000 • e: [email protected] • w: http://www.idbs.com
Curve Fitting Best Practice Part 5: Robust Fitting and Complex Models Most researchers are familiar with standard kinetics, Michaelis-Menten and dose response curves, but there are
many more available modern techniques of analysis that allow you to get greater value from data. This article
discusses the methods used in curve fitting today, including Iteratively Re-weighted Least Squares (IRLS) which is
also known as robust fitting. The constraints of this technique are also explored, including the reasons why robust
fitting is now more widely accepted and used today after its introduction some 20 years ago. The principles behind
complex models, and how they can be applied, are also discussed.
Quick introduction to weights By default, equal weight is given to every data point in a curve fit. In order to determine the best fit, standard
weighting is calculated by the Levenberg-Marquardt algorithm (LVM), which minimizes the sum-of-squares of the
vertical distance between the observed data and the curve or fitted data (residuals).
By default, LVM minimizes:
Σ (Ydata – Yfit)2
Unequal weighting can be assigned according to any scheme.
If weight values are assigned, LVM minimizes:
Σ [(Ydata – Yfit)/Weight]2
The lower the weight (closer to 0), the higher the values bearing on the fit.
Unequal weight can be assigned to data points within a certain tolerance, so that all points are included in the
analysis but those with less weight are given less bearing and meaning to the ultimate result.
For example, an instrument may have a certain data range within which it guarantees a high level of accuracy but
when the limits of that range are exceeded, the tolerances in the accuracy of that instrument decreases. In this
scenario, more bearing or weight can be given to the data points within the instrument’s tolerances, and outside of
that range, data points will still be included in the analysis but they have less bearing on the ultimate result.
A set of weighting values can be applied that will reflect that assumption and reduce the impact of any outliers
outside the tolerance ranges in the fitting process.
IRLS (Robust Fitting) Standard regression analysis is very prone to outliers and even a single outlier will affect results considerably, as
shown in Fig 1 below. Knocking out an individual outlier improves the curve fit considerably, as shown in Fig 2.
![Page 2: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/2.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 2 of 7
Robust fitting is an extension of standard regression (standard non-linear Least Squares Fitting (LSF)) that can
even out individual outliers in a data set and neutralize their effect on the ultimate result.
Robust fitting was introduced about 20 years ago but was not initially widely accepted because of the many
competing techniques available at the time and a lack of understanding of the most appropriate way to use it.
Another reason for the general reluctance to widely adopt IRLS was its computationally-intensive nature. Standard
non-linear LSF processes could be calculated by writing on paper using standard math techniques, but the more
robust technique of IRLS was much harder to perform in the same way. Early curve-fitting software packages were
not able to employ robust fitting, making the technique and its algorithms mostly unavailable to the mainstream.
Fig 1: Even one outlying data point can significantly affect the quality of a fit
Fig 2: Knocking out the outlier considerable improves results
![Page 3: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/3.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 3 of 7
IRLS (Robust Fitting) A fitting process is iterative and, on each iteration, the fitting algorithm changes parameter values based on the data
set provided in order to converge on best results.
Robust fitting introduces another variable to the fitting process, by varying individual weights for individual data
points as well as parameter values. Thus on each cycle of the iteration, the weighting values for each data point are
changed to enable the fit to converge at the best fit for the data. So if there is an outlier in the data set, it will be
significantly down weighted to achieve a more robust and better fit for the rest of the data set.
There are many IRLS techniques available, but the six major most commonly used are:
• Tukey’s Biweight*
• Andrew’s Sine*
• German-McClure
• Huber
• Welsch
• Cauchy
*Undefined over complete error space resulting in outliers being ‘removed’
Tukey’s Biweight and Andrew’s Sine are the most commonly used, and because they are not defined over a whole
error space, these two techniques differ slightly compared to the other four. For example, when employing Tukey’s
Biweight and Andrew’s Sine, if a data point is given a weighting value that might be significantly low, it is construed
as an outlier and removed from the data set. This occurs in curve-fitting applications such as XLfit when a user
chooses to automatically remove outlying points from a data set.
Note: The other four techniques down-weigh outlying points so they have no bearing on the fit at all, which is
equivalent to knocking them out. It is possible to combine IRLS with manual outlier knock-out where appropriate.
In the IRLS fitting scenarios below in Fig 3, Tukey’s Biweight is performed on three different sets of data, which are
all well defined but contain easily identifiable outliers. IRLS has removed these data points from the set, making
manual interaction unnecessary because the fit is of significant quality to be confident in the results produced.
Note: These data sets are well defined from a data perspective and are complete, with a reasonably high number of
data points. Applying IRLS to well formed data sets enables the analysis to be of significant quality and the process
to produce accurate results. Much like standard non-linear LSF, robust fitting does not work if there are any errors in
the X value.
![Page 4: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/4.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 4 of 7
For a data set with a large amount of scatter, the process involves re-weighting and changing the weight of each
point. It is very difficult for the fitting process to converge on a positive result and a single best fit for such data. IRLS
does require that the data fitted is of a significant quality, otherwise it is prone to failure. It is recommended that
IRLS is always used in conjunction with other data quality checks to ensure good results.
IRLS (Robust Fitting) The graph below illustrates how the IRLS process works. The blue line proceeds to infinity and if we assume that
the vertical axis is showing some level of impact on the curve fit for an individual outlier, the residual value - the
distance from the fitted curve – increases. So the further the point is away from the fitted line, the higher the point
outlier status is, and the more the impact on the curve fit. The red line is the IRLS fit. For one given individual outlier
in the data set, as its outlier status increases, its impact on the fit decreases and eventually reaches 0.
Fig 3: IRLS fitting improves fit results accuracy when applied to three different data sets
![Page 5: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/5.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 5 of 7
Complex models Data fitting and analysis is not just confined to basic Michaelis-Menten and dose response models. Complex
modelling can be used to analyze different types of data using standard non-linear LSF. The example below in Fig 5
shows time-controlled drug delivery with a number of different parameters being measured, while a drug is
administered at different time points in a pulsed nature. The graph is analyzing absorption of the drug into the blood
stream over time, indicated by the wave-like fit, allowing the researcher to determine the cycle and rate at which the
drug is distributed.
Fig 4: The impact of IRLS on an outlier compared to standard least-squares regression
Fig 5: Analyzing the cycle and
rate at which drug absorption
![Page 6: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/6.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 6 of 7
Composite models such as the those shown in Fig 6 allow us to analyze a data set using two different models. For
example, the researcher fits the first model up to a point in time until the data points start to go back down when the
model is changed to analyze a different phase within the data. Although this is a complex model, it allows the
researcher to fit results to a high degree of confidence.
Fig 7 below shows a common scenario where data is fitted to a standard dose response curve but the data points
start decreasing at the end of the measurements. The researcher can set up a technique to remove those final data
points, such as applying an IRLS fitting technique to eliminate those points that start to drop off.
Alternatively the researcher can use a model that has been constructed to tackle this kind of scenario. A bell-
shaped dose response model allows the extraction of data points at the bottom and top, so that parameters C1 and
C2 can be extracted as the EC50 values for these two linked dose response curves, with measureable slope factors
for both curves. Bell-shaped models provide an effective means of analyzing and interpreting a whole set of data,
as opposed to having to reject data points.
A scenario such as this comes up frequently in standard dose response analysis. If the last six points of this
example were knocked out and a standard dose response curve performed on the data set, the results for the first
curve in the bell-shaped model would display similar or exactly the same results as the standard dose response
curve.
Fig 6: Fitting composite data occurs
![Page 7: Curve Fitting Best Practice Part 5](https://reader034.fdocuments.us/reader034/viewer/2022051404/586cf6351a28ab09088b4a3f/html5/thumbnails/7.jpg)
Curve Fitting Best Practice Part 5
IDBS 2008 Page 7 of 7
Summary IRLS provides an advanced technique for reducing and neutralizing the effects of outliers in a fit. By weighting
individual data points, IRLS can increase the accuracy of fit results compared to those achieved using standard
regression (standard non-linear LSF). Both techniques, however, must be applied to a well defined and complete
data set in order to produce quality results.
Curve fitting is a flexible process offering a range of data analysis types, and researchers do not have to be
constrained by standard analysis techniques. Providing a variety of innovative ways of applying data analysis to
extract required results in varying scenarios, complex models extend data fitting and analysis beyond basic
Michaelis-Menten and dose response models and can be used in a wide range of applications.
Fig 7: A bell-shaped dose response model producing two fit results