Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1...

15
Advanced Topics I Nonlinear Regression Non-parametric Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney Huang Clemson University

Transcript of Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1...

Page 1: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.1

Lecture 11Advanced Topics ISTAT 8020 Statistical Methods IISeptember 24, 2020

Whitney HuangClemson University

Page 2: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.2

Agenda

1 Nonlinear Regression

2 Non-parametric Regression

3 Ridge Regression

Page 3: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.3

Moving Away From Linear Regression

We have mainly focused on linear regression so far

The class of polynomial regression can be thought as astarting point for relaxing the linear assumption

In this lecture we are going to discuss non-linear andnon-parametric regression modeling

Page 4: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.4

Population of the United States

Let’s look at the USPop data set, a bulit-in data set in R. This isa decennial time-series from 1790 to 2000.

* * * * * * * * **

**

**

* **

**

**

*

1800 1850 1900 1950 2000

0

50

100

150

200

250

300

U.S. population

Census year

Pop

ulat

ion

in m

illio

ns

Page 5: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.5

Logistic Growth CurveA simple model for population growth is the logistic growthmodel,

Y = m(X,φ) + ε

=φ1

1 + exp [−(x− φ2)/φ3]+ ε

−5 0 5 10

0

2

4

6

8

10

Logistic growth curve

We are going to fit a logistic growth curve to the U.S.population data set

Page 6: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.6

Fitting logistic growth curve to the U.S. population

φ1 = 440.83, φ2 = 1976.63, φ3 = 46.29

* * * * * * * * * * * * * * * * **

**

**

1800 1850 1900 1950 2000 2050 2100

0

100

200

300

400

500

Census year

Pop

ulat

ion

(mill

ions

)

*

NLRPolyR

Page 7: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.7

Non-parametric Regression

Let’s use the motor-cycle impact data as an illustrativeexample. This data set is taken from a simulated motor-cyclecrash experiment in order to study the efficacy of crashhelmets.

***** *** **** ****** *********

*

**

***

**

****

**

*

*

**

**

*

*

*

****

*

*

**

****

*** **

*

**

****

*

*

*

*

*

*

*

*

*

***

*

**

*

*

*

*

** *

*

*

*

*

*

*

*

**

*

*

*

* **

**

*

*

*

**

***

* *

** *

**

****

*

10 20 30 40 50

−100

−50

0

50

Time (ms)

Acc

eler

atio

n (g

)

Page 8: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.8

Non-parametric Regression Fits

The main idea “non-parametric” regression modeling is to fitthe data “locally”. Therefore, no global structure assumptionmade when fitting the data.

***** *** **** ****** *********

*

**

***

**

****

**

*

*

**

**

*

*

*

****

*

*

**

****

*** **

*

**

****

*

*

*

*

*

*

*

*

*

***

*

**

*

*

*

*

** *

*

*

*

*

*

*

*

**

*

*

*

* **

**

*

*

*

**

***

* *

** *

**

****

*

10 20 30 40 50

−100

−50

0

50

Time (ms)

Acc

eler

atio

n (g

)

Regression SplineGeneralized Additive ModelSmoothing Spline

Page 9: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.9

Regression Tree

Partitioning X-space into sub-regions and fit simple modelto each sub-regionThe partitioning pattern is encoded in a tree structure

We will use Major League Baseball Hitters Data from the1986–1987 season to give you a quick idea of what aregression tree might look like

Page 10: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.10

Regression Tree

Years < 5

Hits >= 42

Years < 4

Hits < 118

Years < 7 Hits < 185

Years < 6

Hits < 142

Hits >= 152

Hits < 160

>= 5

< 42

>= 4

>= 118

>= 7 >= 185

>= 6

>= 142

< 152

>= 160

536100%

22634%

20431%

14221%

32910%

4543%

69766%

46534%

33510%

51824%

94932%

91429%

6223%

94926%

85111%

102614%

95110%

6883%

10756%

11705%

13283%

Page 11: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.11

Longley’s Economic Regression Data

We are going to use Longley’s data set, which provides awell-known example of multicollinearity, to illustrate Ridgeregression.

Page 12: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.12

Linear Regression Fit

Page 13: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.13

The Predictor Variables are Highly Correlated

Page 14: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.14

Ridge Regression as Multicollinearity Remedy

Recall least squares suffers because (XTX) is almostsingular thereby resulting in highly unstable parameterestimates

Modification of least squares that overcomesmulticollinearity problem

βridge = argminβ

(Y −Zβ

)T (Y −Zβ

)s.t.

p−1∑j=1

β2j ≤ t,

where Z is assumed to be standardized and Y isassumed to be centered

Ridge regression results in (slightly) biased but morestable estimates and better prediction performance

Page 15: Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1 Lecture 11 Advanced Topics I STAT 8020 Statistical Methods II September 24, 2020 Whitney

Advanced Topics I

Nonlinear Regression

Non-parametricRegression

Ridge Regression

11.15

Ridge Regression Fit