Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science,...
-
Upload
bruno-burnley -
Category
Documents
-
view
232 -
download
5
Transcript of Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science,...
Polynomial Curve FittingBITS C464/BITS F464
Navneet Goyal
Department of Computer Science, BITS-Pilani, Pilani Campus, India
Polynomial Curve Fitting Seems a very trivial concept!! Why are we discussing it in Machine Learning
course? A simple regression problem!! It motivates a number of key concepts of ML!! Let’s discover…
Polynomial Curve Fitting
Observe Real-valuedinput variable x• Use x to predict valueof target variable t• Synthetic datagenerated fromsin(2π x)• Random noise intarget values
Input Variable
Targ
et
Varia
bl
e
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial Curve Fitting
Input Variable
Targ
et
Varia
bl
e
N observations of xx = (x1,..,xN)Tt = (t1,..,tN)T• Goal is to exploit trainingset to predict value offrom x• Inherently a difficultproblemData Generation:
N = 10Spaced uniformly in range [0,1]Generated from sin(2πx) by addingsmall Gaussian noiseNoise typical due to unobserved variables
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial Curve Fitting
Input Variable
Targ
et
Varia
bl
e
• Where M is the order of the polynomial• Is higher value of M better? We’ll see shortly!• Coefficients w0 ,…wM are denoted by vector w• Nonlinear function of x, linear function of coefficients w• Called Linear Models
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Sum-of-Squares Error Function
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial curve fitting
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial curve fitting Choice of M?? Called the model selection or model
comparison
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
0th Order Polynomial
Poor representations of sin(2πx)
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
1st Order Polynomial
Poor representations of sin(2πx)
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
3rd Order Polynomial
Best Fit to sin(2πx)
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
9th Order Polynomial
Over Fit: Poor representation of sin(2πx)
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial Curve Fitting Good generalization is the objective Dependence of generalization performance on M? Consider a data set of 100 points Calculate E(w*) for both training data & test data Choose M which minimizes E(w*) Root Mean Square Error (RMS)
Sometimes convenient to use as division by N allows us to compare different sizes of data sets on equal footing
Square root ensures ERMS is measure on the same scale ( and in same units) as the target variable t
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Flexibility & Model Complexity M=0, very rigid!! Only 1 parameter to play
with!
Flexibility & Model Complexity M=1, not so rigid!! 2 parameters to play with!
Flexibility & Model Complexity So what value of M is most suitable?
Any Answers???
Over-fittingFor small M(0,1,2) Inflexible to handle oscillations of sin(2πx) M(3-8) flexible enough to handle oscillations of sin(2πx) For M=9 Too flexible!! TE = 0GE = high
Why is it happening?
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial Coefficients
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Data Set Size M=9- Larger the data set, the more complex model we can afford to fit to the data- No. of data pts should be no less than 5-10 times the no. of adaptive parameters in the model
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Over-fitting Problem
Should we limit the no. of parameters according to the available training set?
Complexity of the model should depend only on the complexity of the problem!
LSE represents a specific case of Maximum Likelihood
Over-fitting is a general property of maximum likelihood
Over-fitting Problem can be avoided using the Bayesian Approach!
Over-fitting Problem
In Bayesian Approach, the effective number of parameters adapts automatically to the size of the data set
In Bayesian Approach, models can have more parameters than the number of data points
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Regularization
Penalize large coefficient values
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Regularization:
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Regularization:
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Regularization: vs.
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial Coefficients
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Take Away from Polynomial Curve Fitting Concept of over-fitting Model Complexity & Flexibility
Will keep revisiting it from time to time…