Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf ·...

46
Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the figures in this presentation are taken from ”An Introduction to Statistical Learning, with applications in R” (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani Seppo Pynn¨ onen Applied Multivariate Statistical Analysis

Transcript of Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf ·...

Page 1: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Part VI

Moving Beyond Linearity

As of Dec 4, 2019Some of the figures in this presentation are taken from ”An Introduction to Statistical

Learning, with applications in R” (Springer, 2013) with permission from the authors:

G. James, D. Witten, T. Hastie and R. Tibshirani

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 2: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 3: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Linear models may be too rough approximations of the underlyingrelationships.

However, because the true (non-linear) model is unknown, severalapproaches have been developed to approximate the underlyingtrue relationship.

Polynomial regression.

Step functions.

Regression splines.

Smoothing splines.

Local regression.

Generalized additive models.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 4: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Polynomial regressions

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 5: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Polynomial regressions

Polynomial regression fits instead of the simple regression

y = β0 + β1x + ε (1)

polynomial function

y = β0 + β1x + β2x2 + · · ·+ βdx

d + ε. (2)

The motivation is to increase flexibility of the fitted model tocapture possible non-linearities.

Usually the degree of the polynomial is at most three or four.

Polynomial regression can be applied in logit and probit models aswell.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 6: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Polynomial regressions

Example 1

Fit a fourth order polynomial regression in the Wage data by regressing wage onthe fourth order polynomial of age. All coefficient are statistically significant atthe 5 % level (the fourth order is on the border line).

> (fit <- lm(wage ~ poly(age, 4), data = Wage)) # polynomial model of order 4

Call:

lm(formula = wage ~ poly(age, 4), data = Wage)

Coefficients:

(Intercept) poly(age, 4)1 poly(age, 4)2 poly(age, 4)3 poly(age, 4)4

111.70 447.07 -478.32 125.52 -77.91

> summary(fit)

Call:

lm(formula = wage ~ poly(age, 4), data = Wage)

Residuals:

Min 1Q Median 3Q Max

-98.707 -24.626 -4.993 15.217 203.693

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 111.7036 0.7287 153.283 < 2e-16 ***

poly(age, 4)1 447.0679 39.9148 11.201 < 2e-16 ***

poly(age, 4)2 -478.3158 39.9148 -11.983 < 2e-16 ***

poly(age, 4)3 125.5217 39.9148 3.145 0.00168 **

poly(age, 4)4 -77.9112 39.9148 -1.952 0.05104 .

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 39.91 on 2995 degrees of freedom

Multiple R-squared: 0.08626,Adjusted R-squared: 0.08504

F-statistic: 70.69 on 4 and 2995 DF, p-value: < 2.2e-16

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 7: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Polynomial regressions

> with(Wage, plot(x = age, y = wage, pch = 20, col = "light grey", fg = "gray")) # scatter plot

> (age.range <- range(Wage$age)) # age range to be used to impose the fitted line

[1] 18 80

> ## predictions, setting se.fit = true procuces also standard errors of the fitted line

> wage.pred <- predict(fit, newdata = data.frame(age = age.range[1]:age.range[2]), se.fit = TRUE)

> lines(x = age.range[1]:age.range[2], y = wage.pred$fit, col = "steel blue") # fitted line

> ## 95% lower bound of the fitted line

> lines(x = age.range[1]:age.range[2], y = wage.pred$fit - 2 * wage.pred$se.fit, col = "red") #

> ## 95% upper bound

> lines(x = age.range[1]:age.range[2], y = wage.pred$fit + 2 * wage.pred$se.fit, col = "red") #

●●

● ●

● ● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●● ●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

20 30 40 50 60 70 80

5010

015

020

025

030

0

age

wag

e

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 8: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Step functions

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 9: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Step functions

Polynomial functions impose global structure on x .

Step functions impose local structures by converting a continuousvariable into an ordered categorical variable

Co(x) = I (x < c1),C1(x) = I (c1 ≤ x < c2),C3(x) = I (c2 ≤ x < c3),

...CK−1(x) = I (cK−1 ≤ x < cK ),CK (x) = I (cK ≤ x),

(3)

where I (·) is and indicator function that returns 1 if the conditionis true, and zero otherwise.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 10: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Step functions

Because C0(x) + C1(x) + · · ·+ CK (x) = 1, C0(x) becomes thereference class, and the dependent variable y is regressed on Cj(x),j = 1, . . . ,K , resulting to (a dummy variable regression)

y = β0 + β1C1(x) + β2C2(x) + · · ·+ βKCK (x) + ε, (4)

so the regression linen becomes a piece-wise constant function.

As class C0(x) is the reference class, βjs indicate deviations from it.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 11: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Step functions

Example 2

Divide age in four sub-intervals and fit regression on the dummy variables basedon them.

> library(ISLR)

> age4cl <- cut(Wage$age, breaks = 4) # four age classes

> table(x = cut(Wage$age, breaks = 4)) # n of observations in four age classes

x

(17.9,33.5] (33.5,49] (49,64.5] (64.5,80.1]

750 1399 779 72

> summary(fit.step <- lm(wage ~ age4cl, data = Wage)) # fit and summarize

Call:

lm(formula = wage ~ age4cl, data = Wage)

Residuals:

Min 1Q Median 3Q Max

-98.126 -24.803 -6.177 16.493 200.519

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 94.158 1.476 63.790 <2e-16 ***

age4cl(33.5,49] 24.053 1.829 13.148 <2e-16 ***

age4cl(49,64.5] 23.665 2.068 11.443 <2e-16 ***

age4cl(64.5,80.1] 7.641 4.987 1.532 0.126

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 40.42 on 2996 degrees of freedom

Multiple R-squared: 0.0625,Adjusted R-squared: 0.06156

F-statistic: 66.58 on 3 and 2996 DF, p-value: < 2.2e-16

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 12: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Step functions

> range(age.range <- min(Wage$age):max(Wage$age)) # age range from min to max

[1] 18 80

> pred.step <- predict(fit.step, newdata = data.frame(age4cl = cut(age.range, breaks = 4)),

+ interval = "confidence") # prediction with 95% confidence interval

●●

● ●

● ● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●● ●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●● ●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

20 30 40 50 60 70 80

5010

015

020

025

030

0

Step Function Regression of Wageon Four Age Classes

Age

Wag

eFit+/−2 std err

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 13: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Basis functions

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 14: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Basis functions

Polynomial regressions and piece-wise constant regression arespecial cases of a basis function approach

y = β0 + β1b1(x) + β2b2(x) + · · ·+ βKbK (x) + ε, (5)

where bj(x) are fixed known functions, called basis functions.

Other popular basis functions are based on Fourier series, wavelets,and on different spline functions.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 15: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 16: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

Spline basis representation

The basis model in (5) can be used to define a piecewise degree dpolynomial under the constraint that it and possibly its d − 1 derivativesare continuous.

Typically, again d is fairly low, like 3 (cubic).

There are many different ways to represent polynomial splines.

The most direct ways to represent for example the cubic spline with Kknots, where knots define points at which polynomial coefficients canchange, is to start off with a basis for a cubic polynomial, x , x2, x3, andadd one truncated power basis function per knot.

A truncated power basis function is defined as

h(x , ξ) = (x − ξ)3+ =

{(x − ξ)3 if x > ξ

0 otherwise,(6)

where ξ is the knot.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 17: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

The regression is of the form

y = β0+β1x+β2x2+β3x

3+β4h(x , ξ1)+· · ·+βK+3h(x , ξK )+ε (7)

where ξj are the K knots.

Therefore the regression amounts to estimating K + 4 regressioncoefficients and thus uses K + 4 degrees of freedom.

Because splines can have high variance at the outer range ot thepredictors (very small and very large values), additional boundaryconstraints are often imposed.

A natural spline is a regression spline, where the function isrequired to be linear at the boundary (region where x is smallerthan the smallest knot, or larger than the largest knot).

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 18: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

The number of places of the knots can be chosen for example bycross-validation or just choosing the places and number such thatthe curve looks nice.

Fewer knots can be selected in places were the function seems tochange slower and more knots in places where the function appearschange faster.

Compared to polynomial regression, splines can produce moreflexibility with less estimated coefficients.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 19: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

Example 3

Splines in R can be fitted utilizing splines package which is part of the basepackage and is therefore readily available.

Function bs() generates basis functions for splines with the specified set of knots(see help(bs), which shows that the default is a cubic spline).

Fitting in the Work data set wage on spline function of age with three knots atξ1 = 25, ξ2 = 40, and ξ3 = 60.

> library(ISLR)

> library(splines)

> fit.bs <- lm(wage ~ bs(age, knots = c(25, 40, 60)), data = Wage) # spline regression

> summary(fit.bs)

Call:

lm(formula = wage ~ bs(age, knots = c(25, 40, 60)), data = Wage)

Residuals:

Min 1Q Median 3Q Max

-98.832 -24.537 -5.049 15.209 203.207

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 60.494 9.460 6.394 1.86e-10 ***

bs(age, knots = c(25, 40, 60))1 3.980 12.538 0.317 0.750899

bs(age, knots = c(25, 40, 60))2 44.631 9.626 4.636 3.70e-06 ***

bs(age, knots = c(25, 40, 60))3 62.839 10.755 5.843 5.69e-09 ***

bs(age, knots = c(25, 40, 60))4 55.991 10.706 5.230 1.81e-07 ***

bs(age, knots = c(25, 40, 60))5 50.688 14.402 3.520 0.000439 ***

bs(age, knots = c(25, 40, 60))6 16.606 19.126 0.868 0.385338

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 39.92 on 2993 degrees of freedom

Multiple R-squared: 0.08642,Adjusted R-squared: 0.08459

F-statistic: 47.19 on 6 and 2993 DF, p-value: < 2.2e-16

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 20: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

> age.grid <- min(Wage$age):max(Wage$age)

> pred.bs <- predict(fit.bs,

+ newdata = data.frame(age = age.grid),

+ interval = "confidence") # produce confidence interval for the line

> head(pred.bs) # a few first lines of the regression fit and confidence interval

fit lwr upr

1 60.49371 41.94418 79.04325

2 62.71254 51.68291 73.74217

3 65.81602 58.41249 73.21956

4 69.59243 63.05391 76.13096

5 73.83004 67.51567 80.14442

6 78.31713 72.58930 84.04495

> with(Wage, plot(x = age, y = wage, col = "light gray",

+ xlab = "Age", ylab = "Wage",

+ main = "Regression Spline with Three Knots"))

> lines(x = age.grid, y = pred.bs[, 1], col = "steel blue") # fitted line

> lines(x = age.grid, y = pred.bs[, 2], col = "red") # l95

> lines(x = age.grid, y = pred.bs[, 3], col = "red") # u95

> legend("topright", legend = c("Regression fit", "+/-2 std err"), lty = c("solid", "solid"),

+ col = c("steel blue", "red"), bty = "n")

> dev.print(pdf, "../lectures/figures/ex63.pdf")

Natural splines can be produced by the ns() function

> fit.ns <- lm(wage ~ ns(age, knots = c(25, 40, 60)), data = Wage) # natural spline

> summary(fit.ns)

Call:

lm(formula = wage ~ ns(age, knots = c(25, 40, 60)), data = Wage)

Residuals:

Min 1Q Median 3Q Max

-98.733 -24.761 -5.187 15.216 204.965

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 54.760 5.138 10.658 < 2e-16 ***

ns(age, knots = c(25, 40, 60))1 67.402 5.013 13.444 < 2e-16 ***

ns(age, knots = c(25, 40, 60))2 51.383 5.712 8.996 < 2e-16 ***

ns(age, knots = c(25, 40, 60))3 88.566 12.016 7.371 2.18e-13 ***

ns(age, knots = c(25, 40, 60))4 10.637 9.833 1.082 0.279

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 39.92 on 2995 degrees of freedom

Multiple R-squared: 0.08588,Adjusted R-squared: 0.08466

F-statistic: 70.34 on 4 and 2995 DF, p-value: < 2.2e-16

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 21: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

> pred.ns <- predict(fit.ns, newdata = data.frame(age = age.grid)) # natural spline predictions

> with(Wage, plot(x = age, y = wage, col = "light gray",

+ xlab = "Age", ylab = "Wage",

+ main = "Regression Spline with Three Knots")) # scatter plot

> lines(x = age.grid, y = pred.splines[, 1], col = "steel blue") # base spline fitted line

> lines(x = age.grid, y = pred.splines[, 2], lty = "dashed", col = "red") # l95 bs regression

> lines(x = age.grid, y = pred.splines[, 3], lty = "dashed", col = "red") # u95 bs regression

> lines(x = age.grid, y = pred.ns, col = "green", lwd = 2) # natural spline fitted line

> abline(v = c(25, 40, 60), col = "grey", lty = "dashed")

> legend("topright", legend = c("Base Spline", "+/-2 std err (bs)", "Natural Spline"),

+ lty = c("solid", "dashed", "solid"),

+ col = c("steel blue", "red", "green"), bty = "n")

●●

●●

● ●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

● ●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●●●●

● ●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

20 30 40 50 60 70 80

5010

015

020

025

030

0Regression Spline with Three Knots

Age

Wag

e

Basis Spline+/−2 std err (bs)Natural Spline

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 22: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

Instead of defining knots in bs() or ns() we can define the degrees of freedom, df,which defines equal spaced knots.

Recall, in cubic spline estimates K + 4 parameters of which one is the intercept.

Accordingly with K = 3 knots df is set to 6 in bs().

In the natural spline the function is required to be linear in the boundary whichreduces the number of estimated parameters by 2.

Therefore for K = 3 knots df is 4 in ns().

> dim(bs(Wage$age, knots = c(25, 40, 60))) # bs() predefined knots

[1] 3000 6

> dim(bs(Wage$age, df = 6)) # bs() equal spaced knots

[1] 3000 6

> attr(bs(Wage$age, df = 6), which = "knots")

25% 50% 75%

33.75 42.00 51.00

> attr(bs(Wage$age, knots = c(25, 40, 60)), which = "knots")

[1] 25 40 60

> dim(ns(Wage$age, knots = c(25, 40, 60))) # ns() predefined knots

[1] 3000 4

> dim(ns(Wage$age, df = 4)) # ns() equal spaced knots

[1] 3000 4

> attr(ns(Wage$age, knots = c(25, 40, 60)), which = "knots")

[1] 25 40 60

> attr(ns(Wage$age, df = 4), which = "knots")

25% 50% 75%

33.75 42.00 51.00

Finally, by degree parameter in bs() and ns() functions the can be defined the

degree of the polynomial (default is 3, i.e., cubic).

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 23: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Regression splines

Choosing the number and location of knots

Number and location of the knots?

Number of knots define the flexibility of the spline–more knots,more flexibility.

Place more knots in places where the function varies mostrapidly.

Fewer knots where the function seems more stable.

In practice, however, often the knots are placed uniformly. In Rthis can be done by defining df as seen in the above example.

Cross-validation can be used to select the degrees of freedom.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 24: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 25: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Fitting some ’smooth’ curve g(x) is another way to fit regression lines.

This amounts to minimizing RSS =∑n

i=1(yi − g(xi ))2.

In order to make g(x) smooth, i.e., that it is not too jigged (the extremeis that it goes through all data points making RSS zero), an approach is tofind the function g that minimizes

n∑i=1

(yi − g(xi ))2 + λ

∫g ′′(t)2dt, (8)

where λ is a nonnegative tuning parameter and g ′′(t) is the secondderivative of g .

The function g that minimizes (8) is known as a smoothing spline.

Similar to lasso, equation (8) takes the “loss + penalty” formulation,where the sum is the loss and the integral part is the penalty term thatpenalizes the variability in g .

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 26: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

It can be sown that g(x) that minimizes (8) is a piecewise cubicpolynomial with knots at the unique values of x1, . . . , xn and it islinear outside the extreme knots.

Thus, g(x) that minimizes (8) is a natural cubic spline with knotsx1, . . . , xn.

However, g(x) is not the same natural cubic spline that one wouldget if one applied the basis function approach.

The smoothness of the spline is controlled by the penalty term,i.e., by the magnitude of λ.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 27: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Choosing the smoothing parameter λ

The tuning parameters controls the roughness of the smoothingspline, λ = 0 results to a solution of g that goes through everyobservation xi , and as λ→∞, results to a linear function such thatthe second derivative g ′′ is zero, i.e., perfectly smooth function.

In the former case the effective degrees of freedom is n and in thelater 2 (a linear function goes through 2 points).

Degrees of freedom refer to number of free parameters in themodel.

Smoothing spline has n parameters and hence n nominal degrees offreedom.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 28: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Choosing λ

However, these n parameters are heavily constrained or shrunkdown by the penalty term and thereby reducing the flexibility ofthe smoothing spline.

The effective degrees of freedom is denoted as dfλ which is ameasure of flexibility of the smoothing spline.

High dfλ indicates high flexibility (lower bias but higher variance)while low dfλ implies low flexibility (higher bias but lower variance).

For a given λ the smoothing spline is defined in terms of an n × nmatrix Sλ, such that

gλ = Sλy , (9)

where g is the solution of (8) for a given λ, i.e., it is a n-vector offitted values of the smoothing spline at the training pointsx1, . . . , xn.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 29: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Choosing λ

The effective degrees of freedom is defined as the trace of the matrix Sλ, i.e.,

dfλ = tr(Sλ) =n∑

i=1

{Sλ}ii , (10)

where (Sλ)ii is the ith diagonal element of Sλ.

The problem is to find optimal λ.

This is done by cross-validation, like LOOCV.

It turns out that LOOCV can be computed very efficiently (need only a singlefit), using

RSScv (λ) =n∑

i=1

(yi − g(−i)λ (xi ))2 =

n∑i=1

(yi − gλ(xi )

1− {Sλ}ii

)2

, (11)

where g(−i)λ (xi ) denotes the fitted value estimated without the ith observation

(xi , yi ).

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 30: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Example 4

Continuing with the orangeWage data, fitting smoothing splines can beperformed by smooth.spline() function in the splines package.

> lines(fit.smooth3, col = "green", lwd = 2) # ’generalized CV

> fit.smooth <- smooth.spline(x = Wage$age, y = Wage$wage, df = 16) # fixed number of 16 knots

> fit.smooth2 <- smooth.spline(x = Wage$age, y = Wage$wage, cv = TRUE) # LOOCV

Warning message:

In smooth.spline(x = Wage$age, y = Wage$wage, cv = TRUE) :

cross-validation with non-unique ’x’ values seems doubtful

> fit.smooth3 <- smooth.spline(x = Wage$age, y = Wage$wage) # dafaul ’generalized CV’

> round(fit.smooth2$df, digits = 1)

[1] 6.8

> round(fit.smooth3$df, digits = 1)

[1] 6.5

> with(Wage, plot(x = age, y = wage,

+ cex = .5, # reduce the size of the plotting symbol to one half

+ col = "grey",

+ xlab = "Age", ylab = "Wage",

+ main = "Smoothing Spline"))

> lines(fit.smooth, col = "steel blue", lwd = 2) # fitted line, lwd = 2 doubles the line thickness

> lines(fit.smooth2, col = "red", lwd = 2) # LOOCV

> legend("topright", legend = c(paste("Fixed", round(fit.smooth$df, 0), "df"),

+ paste("LOOCV", round(fit.smooth2$df, 1), "df")),

+ col = c("steel blue", "red"), lty = "solid", lwd = 2, bty = "n")

> fit.smooth$lambda # lambda when df = 16

[1] 0.0006537868

> fit.smooth2$lambda # lambda produced by LOOCV

[1] 0.02792303

> fit.smooth3$lambda # lambda produced by ’generalized’ CV

[1] 0.0348627

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 31: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Smoothing splines

Smoothing spline fits with fixed and CV selected df values.

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

20 30 40 50 60 70 80

5010

015

020

025

030

0

Smoothing Spline

Age

Wag

eFixed 16 dfLOOCV 6.8 df

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 32: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Local regressions

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 33: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Local regressions

Local regression gives another approach for fitting flexiblenon-linear functions.

The general idea is to fit weighted least squares in the neighboringpoints around a given point x0.

The (relative) number of points around which the (weighted) leastsquares is applied is defined by the span parameter s.

The weighting function Ki0 is often referred to as the weightfunction.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 34: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Local regressions

The graph below illustrates the idea of local regressions for asimulated data, the yellow areas refers to the weighting function,the blue line is the underlying true function f (x), the orange is thefitted local regression estimate f (x), and the red lines refer to thelocal weighted regressions around the particular x value.

0.0 0.2 0.4 0.6 0.8 1.0

−1

.0−

0.5

0.0

0.5

1.0

1.5

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

O

O

0.0 0.2 0.4 0.6 0.8 1.0

−1

.0−

0.5

0.0

0.5

1.0

1.5

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

OOO

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

O

OO

O

O

O

O

O

O

O

OO

O

O

O

OO

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O

O

O

OO

O

O

O

O

O

O

O

O

O

O

OO

O

O

O

O

O

OO

O

O

O

O

OO

O

O

OO

O

O

Local Regression

Source: James et al. (2013) Fig 7.9

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 35: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Local regressions

Example 5

Local regressions with span values 0.2 and 0.5

> fit.loess02 <- loess(wage ~ age, data = Wage, span = .2)

> fit.loess05 <- loess(wage ~ age, data = Wage, span = .5)

> age.grid <- min(Wage$age):max(Wage$age) # needed for prediction to plot fitted lines

> with(data = Wage, plot(x = age, y = wage, cex = .5, col = "grey",

+ xlab = "Age", ylab = "Wage",

+ main = "Local Regression"))

> lines(x = age.grid, predict(fit.loess02, newdata = data.frame(age = age.grid)), lwd = 2,

+ col = "red") # loess with s = .2

> lines(x = age.grid, predict(fit.loess05, newdata = data.frame(age = age.grid)), lwd = 2,

+ col = "blue") # s = .5

> legend("topright", legend = c("span = 0.2", "span = 0.5"), col = c("red", "blue"),

+ lwd = 2, lty = "solid", bty = "n")

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●● ●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

20 30 40 50 60 70 80

5010

015

020

025

030

0

Local Regression

Age

Wag

e

span = 0.2span = 0.5

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 36: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

1 Moving beyond linearity

Polynomial regressions

Step functions

Basis functions

Regression splines

Smoothing splines

Local regressions

Generalized additive models

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 37: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

Generalizes additive models (GAMSs) deal with multiple regressionwith several predictors, x1, . . . , xp by allowing nonlinear functionsof each predictor, while maintaining additivity.

The multiple linear regression model

y = β0 + β1x1 + · · ·+ βpxp + ε (12)

can be extended by replacing the linear componets, βjxj , with a(smooth) nonlinear function fj(xj), such that

y = β0 + f1(x1) + f2(x2) + · · ·+ fp(xp) + ε. (13)

This is an example of a GAM.

It is notable that if fjs are selected as natural splines, fitting themodel reduces just to an OLS fit with natural splines of xjs asexplanatory variables.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 38: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

Example 6

Consider again the Wage data and fit the model

wage = β0 + f1(year) + f2(age) + f3(education) + ε (14)

utilizing natural splines.

In (14) year and age are quantitative and education is qualitative with five levels(less than hs, hs, some college, college, advanced).

> library(ISLR)

> library(splines)

> levels(Wage$education)

[1] "1. < HS Grad" "2. HS Grad" "3. Some College"

[4] "4. College Grad" "5. Advanced Degree"

> str(Wage)

’data.frame’: 3000 obs. of 11 variables:

$ year : int 2006 2004 2003 2003 2005 2008 2009 2008 2006 2004 ...

$ age : int 18 24 45 43 50 54 44 30 41 52 ...

$ maritl : Factor w/ 5 levels "1. Never Married",..: 1 1 2 2 4 2 2 1 1 2 ...

$ race : Factor w/ 4 levels "1. White","2. Black",..: 1 1 1 3 1 1 4 3 2 1 ...

$ education : Factor w/ 5 levels "1. < HS Grad",..: 1 4 3 4 2 4 3 3 3 2 ...

$ region : Factor w/ 9 levels "1. New England",..: 2 2 2 2 2 2 2 2 2 2 ...

$ jobclass : Factor w/ 2 levels "1. Industrial",..: 1 2 1 2 2 2 1 2 2 2 ...

$ health : Factor w/ 2 levels "1. <=Good","2. >=Very Good": 1 2 1 2 1 2 2 1 2 2 ...

$ health_ins: Factor w/ 2 levels "1. Yes","2. No": 2 2 1 1 1 1 1 1 1 1 ...

$ logwage : num 4.32 4.26 4.88 5.04 4.32 ...

$ wage : num 75 70.5 131 154.7 75 ...

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 39: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

Regression with natural splines

> gam1 <- lm(wage ~ ns(year, df = 4) + ns(age, df = 5) + education, data = Wage) # natural splines

> year.grid <- min(Wage$year):max(Wage$year)

> summary(gam1) # regression summary

Call:

lm(formula = wage ~ ns(year, df = 4) + ns(age, df = 5) + education,

data = Wage)

Residuals:

Min 1Q Median 3Q Max

-120.513 -19.608 -3.583 14.112 214.535

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 46.949 4.704 9.980 < 2e-16 ***

ns(year, df = 4)1 8.625 3.466 2.488 0.01289 *

ns(year, df = 4)2 3.762 2.959 1.271 0.20369

ns(year, df = 4)3 8.127 4.211 1.930 0.05375 .

ns(year, df = 4)4 6.806 2.397 2.840 0.00455 **

ns(age, df = 5)1 45.170 4.193 10.771 < 2e-16 ***

ns(age, df = 5)2 38.450 5.076 7.575 4.78e-14 ***

ns(age, df = 5)3 34.239 4.383 7.813 7.69e-15 ***

ns(age, df = 5)4 48.678 10.572 4.605 4.31e-06 ***

ns(age, df = 5)5 6.557 8.367 0.784 0.43328

education2. HS Grad 10.983 2.430 4.520 6.43e-06 ***

education3. Some College 23.473 2.562 9.163 < 2e-16 ***

education4. College Grad 38.314 2.547 15.042 < 2e-16 ***

education5. Advanced Degree 62.554 2.761 22.654 < 2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 35.16 on 2986 degrees of freedom

Multiple R-squared: 0.293,Adjusted R-squared: 0.2899

F-statistic: 95.2 on 13 and 2986 DF, p-value: < 2.2e-16

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 40: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

There seems to be non-linearity in each component.

Consider next marginal effects of each predictor.

> pred.wage1 <- predict(gam1, newdata = data.frame(year = year.grid,

+ age = mean(Wage$age),

+ education = levels(Wage$education)[4]),

+ interval = "confidence") # year effect

> pred.wage2 <- predict(gam1, newdata = data.frame(age = age.grid,

+ year = mean(Wage$year),

+ education = levels(Wage$education)[4]),

+ interval = "confidence") # age effect

> pred.wage3 <- predict(gam1, newdata = data.frame(age = mean(Wage$age),

+ year = mean(Wage$year),

+ education = levels(Wage$education)),

+ interval = "confidence") # education effect

> par(mfrow = c(1, 3)) # split graph window into tree segemnts

> with(Wage, plot(x = year, y = wage, cex = .5, col = "grey",

+ xlab = "Year", ylab = "Log(Wage)", main = "Marginal Effect of Year"))

> lines(x = year.grid, y = pred.wage1[, 1], col = "red")

> lines(x = year.grid, y = pred.wage1[, 2], col = "steel blue", lty = "dashed")

> lines(x = year.grid, y = pred.wage1[, 3], col = "steel blue", lty = "dashed")

> with(Wage, plot(x = age, y = wage, cex = .5, col = "grey",

+ xlab = "Age", ylab = "Wage", main = "Marginal Effect of Age"))

> lines(x = age.grid, y = pred.wage2[, 1], type = "l", col = "red")

> lines(x = age.grid, y = pred.wage2[, 2], lty = "dashed", col = "steel blue")

> lines(x = age.grid, y = pred.wage2[, 3], lty = "dashed", col = "steel blue")

> with(Wage, plot(x = education, y = wage, xlab = "Education", ylab = "Wage",

+ main = "Marginal Effect of Education")) # education effect

> lines(x = 1:5, y = pred.wage3[, 1], col = "red")

> lines(x = 1:5, y = pred.wage3[, 2], col = "steel blue", lty = "dashed")

> lines(x = 1:5, y = pred.wage3[, 3], col = "steel blue", lty = "dashed")

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 41: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

Margin effect of year, age, and education on wage.

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

2003 2005 2007 2009

5010

015

020

025

030

0

Marginal Effect of Year

Year

Log(

Wag

e)

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

20 30 40 50 60 70 80

5010

015

020

025

030

0

Marginal Effect of Age

Age

Wag

e

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●●●●

1. < HS Grad 4. College Grad

5010

015

020

025

030

0

Marginal Effect of Education

Education

Wag

e

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 42: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

Qualitatively similar plots can be produced directly with gam() function availablein the gam package.

> library(gam)

> plot.Gam(gam1, se = TRUE, ylim = c(-50, 40), col = "red",

+ ylab = "f(year)", terms = attr(gam1$terms, which = "term.labels")[1])

> par(mfrow = c(1, 3))

> plot.Gam(gam1, se = TRUE, ylim = c(-30, 30), col = "red",

+ ylab = "f(year)", terms = attr(gam1$terms, which = "term.labels")[1])

> plot.Gam(gam1, se = TRUE, ylim = c(-50, 30), col = "red",

+ ylab = "f(age)", terms = attr(gam1$terms, which = "term.labels")[2])

> plot.Gam(gam1, se = TRUE, ylim = c(-50, 30), col = "red",

+ ylab = "f(education)", terms = attr(gam1$terms, which = "term.labels")[3])

2003 2005 2007 2009

−30−20

−100

1020

30

year

f(year)

20 30 40 50 60 70 80

−40−20

020

age

f(age)

−40−20

020

40

f(educa

tion)

education

1. < HS Grad 4. College Grad

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 43: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

The interpretation of the above margin effects is fairly straightforward.

Holding age and education fixed, wage tends to increase slightly withyears.

Similarly, holding year and education fixed, wage tends to be highest for35 to 60 years old persons.

Finally, holding year and age fixed, wage tends to be the higher the

higher is education.

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 44: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

R function gam of the gam package plots similar graphs in terms of bs splines.

> gam2 <- gam(wage ~ s(year, df = 4) + s(age, df = 5) + education, data = Wage)

> summary(gam2)

Call: gam(formula = wage ~ s(year, df = 4) + s(age, df = 5) + education,

data = Wage)

Deviance Residuals:

Min 1Q Median 3Q Max

-119.43 -19.70 -3.33 14.17 213.48

(Dispersion Parameter for gaussian family taken to be 1235.69)

Null Deviance: 5222086 on 2999 degrees of freedom

Residual Deviance: 3689770 on 2986 degrees of freedom

AIC: 29887.75

Number of Local Scoring Iterations: 2

Anova for Parametric Effects

Df Sum Sq Mean Sq F value Pr(>F)

s(year, df = 4) 1 27162 27162 21.981 2.877e-06 ***

s(age, df = 5) 1 195338 195338 158.081 < 2.2e-16 ***

education 4 1069726 267432 216.423 < 2.2e-16 ***

Residuals 2986 3689770 1236

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Anova for Nonparametric Effects

Npar Df Npar F Pr(F)

(Intercept)

s(year, df = 4) 3 1.086 0.3537

s(age, df = 5) 4 32.380 <2e-16 ***

education

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 45: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

> par(mfrow = c(1, 3)) # split window into 3 sub-windows

> plot(gam2, se = TRUE, ylim = c(-30, 30), col = "red",

+ ylab = "f(year)", terms = attr(gam2$terms, which = "term.labels")[1])

> plot(gam2, se = TRUE, ylim = c(-50, 30), col = "red",

+ ylab = "f(age)", terms = attr(gam2$terms, which = "term.labels")[2])

> plot(gam2, se = TRUE, ylim = c(-50, 30), col = "red",

+ ylab = "f(education)", terms = attr(gam2$terms, which = "term.labels")[3])

2003 2005 2007 2009

−30−20

−100

1020

30

year

f(year)

20 30 40 50 60 70 80

−40−20

020

age

f(age)

−40−20

020

40

f(educa

tion)

education

1. < HS Grad 4. College Grad

In gam() other functions like loess (gam package includes its own lo()

function), poly can be used as well, see library(help = gam) and help(lo).

Seppo Pynnonen Applied Multivariate Statistical Analysis

Page 46: Part VI Moving Beyond Linearity - Uwasalipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc6.pdf · Moving beyond linearity Part VI Moving Beyond Linearity As of Dec 4, 2019 Some of the

Moving beyond linearity

Generalized additive models

We can also test appropriate specification of the models.

For example, the effect of year looks quite linear.

Statistical significance of the non-linearity can be tested with the F -test.

Also, we can test whether year can be altogether removed from the model, i.e., that it trulydoes not affect on wage.

This can be accomplished by estimating a model without year and a model where year isincluded without higher order components.

> m1 <- gam(wage ~ s(age, df = 5) + education, data = Wage) # year removed

> m2 <- gam(wage ~ year + s(age, df = 5) + education, data = Wage) # year’s linear effect

> m3 <- gam(wage ~ s(year, df = 4) + s(age, df = 5) + education, data = Wage) # same as gam2

> anova(m1, m2, m3, test = "F")

Analysis of Deviance Table

Model 1: wage ~ s(age, df = 5) + education

Model 2: wage ~ year + s(age, df = 5) + education

Model 3: wage ~ s(year, df = 4) + s(age, df = 5) + education

Resid. Df Resid. Dev Df Deviance F Pr(>F)

1 2990 3711731

2 2989 3693842 1 17889.2 14.4771 0.0001447 ***

3 2986 3689770 3 4071.1 1.0982 0.3485661

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The results show that that the linear effect of year is significant but the spline non-linearityis not significant.

Thus, there is not evidence that a nonlinear function of year is needed.

Seppo Pynnonen Applied Multivariate Statistical Analysis