Spline Regression Models Things to Keep in Mind when ...

Spline Regression Models

• Things to Keep in Mind when Fitting Polynomial Regression Models

• Piecewise Polynomials (Splines)

• Applying Variable Selection Methods to Choose Knots

Important Considerations when Fitting Polynomials to Data

• Example (paper):

• How does the speed of a paper mill machine affect quality of the fin-ished product?

• Measurements of the amount of green liquor produced are recordedfor various speeds.

Paper Mill Example (Cont’d)

\item[ ] Data:

\begin{verbatim}

> paper

green.liquor machine.speed

1 16.0 1700

2 15.8 1720

3 15.6 1730

4 15.5 1740

5 14.8 1750

6 14.0 1760

7 13.5 1770

8 13.0 1780

9 12.0 1790

10 11.0 1795

Example (Cont’d)

> attach(paper)

> plot(green.liquor ˜ machine.speed) # paperplot.pdf

1700 1720 1740 1760 1780

machine.speed

What happens if we use a linear model?

Fit the model and check the residual plot:

> paper.lm <- lm(green.liquor ˜ machine.speed)> plot(paper.lm, which=1, pch=16) # paperres.pdf

12 13 14 15 16 17

Fitted values

lm(formula = green.liquor ~ machine.speed)

Residuals vs Fitted

Try a quadratic model

> paper.lm2 <- lm(green.liquor ˜ machine.speed +

I(machine.speedˆ2))

> summary(paper.lm2)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.709e+03 2.448e+02 -6.984 0.000215

machine.speed 2.023e+00 2.798e-01 7.230 0.000173

I(machine.speedˆ2) -5.929e-04 7.994e-05 -7.417 0.000147

Residual standard error: 0.2101 on 7 degrees of freedom

• Fitted model:

y = −1709 + 2.02x− .00059x2

(x = machine speed, y is amt of green liquor) error st. dev. estimate =.21.

Is this model satisfactory?

Check the residual plot again:

> plot(paper.lm2, which=1, pch=16) # paper2res.pdf

12 13 14 15 16

Fitted values

lm(formula = green.liquor ~ machine.speed + I(machine.speed^2))

Residuals vs Fitted

Overlaying the Data with the Fitted Curve

> plot(green.liquor ˜ machine.speed)

> quadline(paper.lm2) # paper2plot.pdf

1700 1720 1740 1760 1780

machine.speed

Quadratic Fitted to Paper Data

Are there any Influential Observations?

Check Cook’s Distance:

> plot(paper.lm2, which=4, pch=16) # paper2cook.pdf

2 4 6 8 10

Obs. number

lm(formula = green.liquor ~ machine.speed + I(machine.speed^2))

Cook’s distance plot

Check observation 10 more closely:

Influence on the coefficients and on the fitted values:

> dfbetas(paper.lm2)[10,]

(Intercept) machine.speed I(machine.speedˆ2)

-1.383708 1.395116 -1.406660

> dffits(paper.lm2)[10]

-2.258152

Observation 10 is highly influential.

Important Considerations when Fitting Polynomials to Data

• Order of the Model

Keep this as low as possible – parsimony

• Example: titanium heat data - 49 observations on g and temperature.

Quadratic fit:

> attach(titanium)

> titanium.lm2 <- lm(g˜poly(temperature,2))

> plot(titanium, pch=16)

> lines(spline(temperature,predict(titanium.lm2)),

col=4, lwd=2)

Titanium Example (Cont’d): A Failure for the Quadratic Polynomial Model

A pretty miserable fit:

600 700 800 900 1000

temperature

Does a higher order polynomial fit better?

• 5th order fit:

col=4, lwd=2)

Titanium Heat Data Example (Cont’d): Attempting to Fit Using High DegreePolynomials

• 5th order model:

600 700 800 900 1000

temperature

Example (Cont’d)

• 21st order fit:

col=4, lwd=2)

Titanium Heat Data Example (Cont’d): High Degree Polynomial Regres-sion is Futile

• 21st order model:

600 700 800 900 1000

temperature

Better to use piecewise polynomials (splines)

Important Considerations (Cont’d)

• Ill-Conditioning

• Example: Consider the Hilbert matrix Hp: hij = 1i+j−1, i, j = 1, . . . , p:

hilbert <- function(n=2){

matrix(1/(rep(seq(1,n),n)+

rep(seq(0,n-1),rep(n,n))),ncol=n)

> hilbert(2)

[,1] [,2]

[1,] 1.0 0.500

[2,] 0.5 0.333

Ill-Conditioning (Cont’d)

> hilbert(4)

[,1] [,2] [,3] [,4]

[1,] 1.000 0.500 0.333 0.250

[2,] 0.500 0.333 0.250 0.200

[3,] 0.333 0.250 0.200 0.167

[4,] 0.250 0.200 0.167 0.143

Ill-Conditioning (Cont’d)

• The Hilbert matrix is famous for being ill-conditioned.• The ratio of the largest eigenvalue to the smallest eigenvalue is very

large.• Matrix inversion is unstable, since the smallest eigenvalue is numeri-

cally indistinguishable from 0.• The inverse of the 2 × 2 Hilbert matrix:

solve(hilbert(2))

[,1] [,2]

[1,] 4 -6

[2,] -6 12

Ill-Conditioned Matrices (Cont’d)

• Try to invert the 7 × 7 Hilbert matrix:

solve(hilbert(7))

Error in solve.default(hilbert(7)) : singular

matrix ‘a’ in solve

• From a numerical point of view, large Hilbert matrices appear singular.

• The determinant of a matrix is equal to the product of the eigenvalues;thus, all Hilbert matrices are invertible (nonsingular).

Ill-Conditioned Matrices (Cont’d)

• The inverse of the Hilbert matrix can be computed using the followingfunction

inverse.hilbert <- function (n){Hinv <- matrix(0, nrow=n, ncol=n)for (i in 1:n){for (j in 1:n){Hinv[i,j] <- (-1)ˆ(i+j)*(i+j-1)*

choose(n+i-1,n-j)*choose(n+j-1,n-i)*choose(i+j-2,i-1)ˆ2

}}Hinv}

> inverse.hilbert(7)[,1] [,2] [,3] [,4] [,5] [,6] [,7]

[1,] 49 -1176 8820 -29400 48510 -38808 12012[2,] -1176 37632 -317520 1128960 -1940400 1596672 -504504[3,] 8820 -317520 2857680 -10584000 18711000 -15717240 5045040[4,] -29400 1128960 -10584000 40320000 -72765000 62092800 -20180160[5,] 48510 -1940400 18711000 -72765000 133402500 -115259760 37837800[6,] -38808 1596672 -15717240 62092800 -115259760 100590336 -33297264[7,] 12012 -504504 5045040 -20180160 37837800 -33297264 11099088

Example (Cont’d)

• We can check that this is correct by multiplying with the Hilbert matrixof size 7:

> sum(hilbert(7)[1,]*inverse.hilbert(7)[1,])[1] 1> sum(hilbert(7)[1,]*inverse.hilbert(7)[2,])[1] 0> sum(hilbert(7)[1,]*inverse.hilbert(7)[3,])[1] 0> sum(hilbert(7)[1,]*inverse.hilbert(7)[4,])[1] 0> sum(hilbert(7)[1,]*inverse.hilbert(7)[5,])[1] 0> sum(hilbert(7)[1,]*inverse.hilbert(7)[6,])[1] 0> sum(hilbert(7)[1,]*inverse.hilbert(7)[7,])[1] 0

Is the Hilbert Matrix just an Artificial Example?

• Consider the following regression problem:

y = β0 + β1x + · · ·+ βpxp + ε

where the x values have been taken equally spaced on the intervalfrom 0 to 1: e.g. if n = 10, x = .1, .2, . . . ,1.

• As n →∞, 1nXTX = Hp.

Multicollinearity

• Consider the paper example again:

> paper.lm3 <- lm(green.liquor˜machine.speed+

I(machine.speedˆ2)+I(machine.speedˆ3))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.75e+03 1.80e+04 0.38 0.72

machine.speed -1.25e+01 3.08e+01 -0.41 0.70

I(machine.speedˆ2) 7.72e-03 1.76e-02 0.44 0.68

I(machine.speedˆ3) -1.59e-06 3.36e-06 -0.47 0.65

Multicollinearity

> vif(paper.lm2)

machine.speed I(machine.speedˆ2)

15616 15616

• Reminder: If VIF > 10, we have a problem.

> vif(paper.lm3)

machine.speed I(machine.speedˆ2) I(machine.speedˆ3)

1.68e+08 6.76e+08 1.70e+08

Orthogonal Polynomials

• It is better to use orthogonal polynomials.

• Obtain them using the Gram-Schmidt orthogonalization procedure.

• poly(x,k) evaluates the first k +1 orthogonal polynomials Pi(x) atx.

• The model becomes

yi = β0P0(xi) + β1P1(xi) + · · ·+ βkPk(xi) + εi

where P0(xi) = 1, P1(x) is linear in x, . . . , Pk(x) is a kth degreepolynomial in x.

Orthogonal Polynomials

• Orthogonality property:

n∑i=1

Pj(xi)Pk(xi) = 0 if i 6= j

• Implication: XTX is a diagonal matrix with jth element

XTXjj =n∑

P2j (xi)

• Numerically stable to invert!

Example: Applying Orthogonal Polynomials to the Paper Data

> paper.orth3 <- lm(green.liquor ˜ poly(machine.speed,3))> summary(paper.orth3)

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 14.1200 0.0705 200.40 1.0e-12poly(machine.speed, 3)1 -4.9051 0.2228 -22.01 5.7e-07poly(machine.speed, 3)2 -1.5581 0.2228 -6.99 0.00043poly(machine.speed, 3)3 -0.1050 0.2228 -0.47 0.65399

• Forward selection is straightforward now. VIFs are all 1.

7.5 More on Orthogonal Polynomials

• Model:

yi = α0P0(xi) + α1P1(xi) + · · ·+ αkPk(xi) + εi

where P0(xi) = 1, P1(x) is linear in x, . . . , Pk(x) is a kth degreepolynomial in x.

• Orthogonality property:

n∑i=1

Pj(xi)Pk(xi) = 0 if i 6= j

Quadratic Example

x -1 0 1 2y 2 1 2 10

• If we use the non-orthogonal polynomials 1, x, x2, then

1 −1 11 0 01 1 11 2 4

= [x1˜

Gram-Schmidt

• Convert this to an orthogonal basis:

||x1˜||

= [1/2 1/2 1/2 1/2]

˜= x2

˜−(xT

) y1˜

y′2˜

|| y′2˜||

= [−3 − 1 1 3]/√

˜= x3

˜−(xT

) y1˜−(xT

) y2˜

Gram-Schmidt (Cont’d)

y′3˜

|| y′3˜||

= [1 − 1 − 1 1]/2

= 2 y1˜

, z2˜

20 y2˜

, z3˜

= 2 y3˜

Xorth = [z1˜

1 −3 11 −1 −11 1 −11 3 1

Gram-Schmidt (Cont’d)

• The columns of Xorth are orthogonal so

XTorthXorth =

4 0 00 20 00 0 4

• What are the orthogonal polynomials in this case?

P0(x) = 1

P1(x) = Ax + B = 2x− 1

P2(x) = Ax2 + Bx + C = x2 − x− 1

Orthogonal Polynomials (Cont’d)

• Check orthogonality:

4∑i=1

Pj(xi)Pk(xi) = 0, j 6= k

• The orthogonalized regression problem is

y = β0 + β1(2x− 1) + β2(x2 − x− 1) + ε

y = 3.75 + 1.25(2x− 1) + 2.25(x2 − x− 1)

• This simplifies to

y = .25 + .25x + 2.25x2

which could be obtained (but with attendant numerical difficulties) fromthe original X matrix.

General Case: Higher Order Orthogonal Polynomial Regression

yi = α0P0(xi) + α1P1(xi) + · · ·+ αkPk(xi) + εi

P0(x1) P1(x1) ... Pk(x1)P0(x2) P1(x2) ... Pk(x2)· · · · · · · · · · · ·

P0(xn) P1(xn) ... Pk(xn)

i=1 P20 (xi) ... 0

· · · . . . · · ·0 ...

∑ni=1 Pk(xi)

XT y˜

P0(xi)yi...∑

Pk(xi)yi

∑Pj(xi)yi∑Pj(xi)2

Confidence interval for αj

E[αj] = αj

V (αj) =σ2∑n

i=1 P2j (xi)

SSE =∑

y2i −

k∑j=1

n∑i=1

Pj(xi)yi

MSE = SSE/(n− k − 1)

αj ± tα/2,n−k−1

√√√√ MSE∑P2

j (xi)

Significance testing

:SSR(αj) = αj

∑Pj(xi)yi

H0 : αj = 0 H1 : αj 6= 0

F0 =SSR(αj)

Reject H0 if F0 > Fα/2,1,n−k−1.

Since the regression sum of squares for the jth term does not dependon any other coefficients, this partial F test does not depend on termsalready included in the model.

Introduction to Splines

• Polynomials are not flexible enough to adequately model all smoothfunctions.

• Piecewise polynomials or splines are more flexible.

• What is a piecewise polynomial?

• Example: Consider two polynomials

p1(x) = −2x2 + .5x3

p2(x) = −2x2 + .5x3 − 2(x− 6)3

Splines (Cont’d)

• We can make a piecewise polynomial out of these two polynomials bycutting them at x = 6 (they are equal there) and tying them togetherwith a knot at x = 6:

s(x) = −2x2 + .5x3, x < 6

s(x) = −2x2 + .5x3 − 2(x− 6)3, x ≥ 6

s(x) = −2x2 + .5x3 − 2(x− 6)3I(x ≥ 6)

s(x) = −2x2 + .5x3 − 2(x− 6)+3

• This is a cubic spline with a knot at 6.

Splines (Cont’d)

0 2 4 6 8 10

− 2x2 + 0.5x3

0 2 4 6 8 10

− 2x2 + 0.5x3

− 2x2 + 0.5x3 − 2(

− 2x2 + 0.5x3

− 2x2 + 0.5x3 − 2(x − 6)3

cubic spline, knot at 6.0

> source("splineeg.R") # some spline examples

> spline.eg() # plots two polynomials and a spline

Another Example: Three different cubic polynomials

p1(x) = −2x2 + .5x3

p2(x) = −2x2 + .5x3 − 5(x− 6)3

p3(x) = −2x2 + .5x3 − 5(x− 6)3 + 5(x− 6.5)3

• Cut them up at x = 6 and x = 6.5 and tie together with knots there:

s(x) = −2x2+ .5x3−5(x−6)3I(x ≥ 6)+5(x−6.5)3I(x ≥ 6.5)

s(x) = −2x2 + .5x3 − 5(x− 6)+3 + 5(x− 6.5)+

• This is a cubic spline with knots at 6 and 6.5.

A Spline Curve

0 2 4 6 8 10

− 2x2 + 0.5x3

0 2 4 6 8 10

− 2x2 + 0.5x3 − 5(x − 6)3

− 2x2 + 0.5x3

− 2x2 + 0.5x3 − 5(x − 6)3

− 2x2 + 0.5x3 − 5(x − 6)3 + 5(x − 6.5)3

− 2x2 + 0.5x3

− 2x2 + 0.5x3 − 5(x − 6)3

− 2x2 + 0.5x3 − 5(x − 6)3 + 5(x −

cubic spline, knots at 5.0, 6.5

> spline.eg2() # plots three polynomials and a spline

Truncated Power function

T (x) = (x− τ)k+ = (x− τ)kI(x ≥ τ)

T (x) is 0 for x < τ and (x− τ)k for x ≥ τ .

T (x) is a simple k degree spline with a knot at τ .

Spline Regression Models:

y = β0 + β1x + · · ·+ βkxk + γ1T1(x) + · · ·+ γhTh(x)

where Tj(x) = (x− τj)k+.

• Knots are at τ1, τ2, . . . , τh.

• The β’s and γ’s can be estimated by least-squares. The X matrix hask + h + 1 columns.

• (Exercise: What are the columns of X?)

Splines (Cont’d)

• B-splines are more numerically stable than truncated splines.

• The regression model in terms of B-Splines:

y = β0 + β1B1(x) + β2B2(x) + · · ·+ βkBk(x) + ε

Example: The B-spline transformations of temperature (Titanium Data)

knots at 825, 885, 895, 905, 990; degree = 3

600 800 1000

temperature

Spline Transformation of Temperature

600 800 1000

temperature

600 800 1000

temperature

600 800 1000

temperature

Example (Cont’d)

The other four transformations:

600 800 1000

temperature

600 800 1000

temperature

600 800 1000

temperature

600 800 1000

temperature

Number of transformations = degree + number of knots

Example: Fitting a Spline Curve to the Titanium Data

> require(splines)

[1] TRUE, scale=.9

> titanium.spline<-lm(g ˜ bs(temperature,

knots=c(825,885,895,905,990),degree=3))

Example (Cont’d)

> plot(titanium,pch=16)> lines(spline(temperature,predict(titanium.spline)),

col=4,lwd=2)> title("Cubic Spline Fitted to Titanium Data")

600 700 800 900 1000

temperature

Cubic Spline Fitted to Titanium Data

Choosing knots using Variable Selection Methods

• Spline example - geophones:

tr.pwr <- function (x, knot, degree=3)

{ # truncated power function

(x > knot)*(x - knot)ˆdegree

# one knot per function

xx <- cbind(distance,distanceˆ2,distanceˆ3,

outer(distance,seq(20,80,length=20),tr.pwr))

# we start with 20 knots equally spaced between

# 20 and 80, and use forward selection to choose

# the best ones:

geophones.fwd <- regsubsets(thickness ˜ xx,

method="forward", nvmax=12, data=geophones)

summary.regsubsets(geophones.fwd)$cp

[1] 153.39 52.34 31.41 20.27 15.77 10.59

[7] 10.78 10.68 9.75 8.81 8.52 9.21

# Which knots are in?

seq(20,80,length=20)[summary.regsubsets(

geophones.fwd)$which[11,-seq(1,4)]]

[1] 20.0 32.6 35.8 38.9 45.3 51.6 54.7 70.5 73.7 76.8

knots.sub <- summary.regsubsets(

geophones.fwd)$which[11,-seq(1,4)]

knots.try<-seq(20,80,length=20)[knots.sub]

geophones.bs <- lm(thickness ˜ bs(distance, knots =

knots.try,Boundary.knots = c(0,100)),data=geophones)

PRESS(geophones.bs)

[1] 285

plot(geophones)

lines(spline(geophones$distance,

predict(geophones.bs)),col=4)

# you can check plot(geophones.bs) to see if there are

# problems

Titanium Example

xx <- cbind(temperature,temperatureˆ2,temperatureˆ3,outer(temperature,seq(620,1050,length=30),tr.pwr))

titanium.fwd <- regsubsets(g ˜ xx, method="forward",nvmax=15, data=titanium)

summary.regsubsets(titanium.fwd)$cp[1] 72343.36 57292.13 42136.90 27174.83 11382.40 5551.22[7] 1795.84 351.67 103.14 68.64 11.84 5.72

[13] 6.91 8.15 9.52> knots.try<-seq(620,1050,

length=30)[summary.regsubsets(titanium.fwd)$which[12,-seq(1,4)]]titanium.bs <- lm(g ˜ bs(temperature, knots = knots.try,

Boundary.knots = c(500,1100)))plot(titanium)lines(spline(temperature,predict(titanium.bs)),col=4)

# plot(titanium.bs)

Spline Regression Models Things to Keep in Mind when ...

Documents

Transcript of Spline Regression Models Things to Keep in Mind when ...

Document12 - Repository UIrepository.ui.ac.id/dokumen/lihat/5687.pdf · Pendekatan Model Regresi Spline Linear (Approach of Model Regression Spline Linear) . Oleh: Agustini Tripena

Smoothing Spline Semi-parametric Nonlinear Regression Modelsyuedong.faculty.pstat.ucsb.edu/papers/snr.pdf · 2008-08-16 · Special spline models include polynomial spline on a continuous

On the use of spline regression in the study of congruence in … · 2019. 2. 3. · Spline Regression in Congruence Research 3 The study of congruence is central to organizational

Regression Spline Mixed Models for Analyzing EEG Data and ... · I Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press,

Spatial Spline Regression Modelssangalli/2013_Sangalli-Ramsay-Ramsay_JRSS… · Keywords: functional data analysis, spatial data analysis, semi-parametric model, penalized smoothing,

ASSIST: A Suite of S functions Implementing Spline smoothing …yuedong.faculty.pstat.ucsb.edu/assist.pdf · 2004-05-19 · 2 Smoothing Spline Regression Models In Section 2.1, we

Spline Bezier B-Spline

Comparison of Truncated Spline Regression with Simple ...

A Comparison of Regression Spline Smoothing - Matt Wand

Human mind versus regression equation

Smoothing Spline Gaussian Regression: More Scalable ...chong/ps/kim2r.pdfapplicability. In this article, we study more scalable computation of smoothing spline regres-sion via certain

From linearity to nonlinear additive spline modeling in Partial Least-Squares regression

Some Aspects of the Spline Smoothing Approach to Non …jizhu/jizhu/wuke/Silverman-JRSSB... · 2009-03-13 · Some Aspects of the Spline Smoothing Approach to Non-Parametric Regression

The Future of Technology: The Regression of the Human Mind?

arXiv:1312.6974v2 [stat.ME] 30 Apr 2014 · of related work on model-based curve clustering approaches using polynomial regression mix-tures (PRM) and spline regression mixtures (SRM)

Using Multivariate Adaptive Regression Spline and …...Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India M. Ahmadlou

BOOSTING MULTIVARIATE ADAPTIVE REGRESSION SPLINE …repository.its.ac.id/45202/1/1315105027-Undergraduate_Theses.pdf · LEMBAR PENGESAHAN.....iii ABSTRAK ... Program Rujuk Balik,

THEORY OF SPLINE REGRESSION WITH APPLICATIONS TO TIME SERIES, LONGITUDINAL… · 2016-09-14 · ABSTRACT THEORY OF SPLINE REGRESSION WITH APPLICATIONS TO TIME SERIES, LONGITUDINAL,

Additive cubic spline regression with Dirichlet process ...apps.olin.wustl.edu/faculty/chib/papers/chibGreenberg2010.pdfS.Chib,E.Greenberg/JournalofEconometrics156(2010)322 336 323

1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.