Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1...
Transcript of Advanced Topics I Lecture 11 - GitHub Pages · 2021. 5. 21. · Regression Ridge Regression 11.1...
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.1
Lecture 11Advanced Topics ISTAT 8020 Statistical Methods IISeptember 24, 2020
Whitney HuangClemson University
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.2
Agenda
1 Nonlinear Regression
2 Non-parametric Regression
3 Ridge Regression
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.3
Moving Away From Linear Regression
We have mainly focused on linear regression so far
The class of polynomial regression can be thought as astarting point for relaxing the linear assumption
In this lecture we are going to discuss non-linear andnon-parametric regression modeling
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.4
Population of the United States
Let’s look at the USPop data set, a bulit-in data set in R. This isa decennial time-series from 1790 to 2000.
* * * * * * * * **
**
**
* **
**
**
*
1800 1850 1900 1950 2000
0
50
100
150
200
250
300
U.S. population
Census year
Pop
ulat
ion
in m
illio
ns
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.5
Logistic Growth CurveA simple model for population growth is the logistic growthmodel,
Y = m(X,φ) + ε
=φ1
1 + exp [−(x− φ2)/φ3]+ ε
−5 0 5 10
0
2
4
6
8
10
Logistic growth curve
We are going to fit a logistic growth curve to the U.S.population data set
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.6
Fitting logistic growth curve to the U.S. population
φ1 = 440.83, φ2 = 1976.63, φ3 = 46.29
* * * * * * * * * * * * * * * * **
**
**
1800 1850 1900 1950 2000 2050 2100
0
100
200
300
400
500
Census year
Pop
ulat
ion
(mill
ions
)
*
NLRPolyR
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.7
Non-parametric Regression
Let’s use the motor-cycle impact data as an illustrativeexample. This data set is taken from a simulated motor-cyclecrash experiment in order to study the efficacy of crashhelmets.
***** *** **** ****** *********
*
**
***
**
****
**
*
*
**
**
*
*
*
****
*
*
**
****
*** **
*
**
****
*
*
*
*
*
*
*
*
*
***
*
**
*
*
*
*
** *
*
*
*
*
*
*
*
**
*
*
*
* **
**
*
*
*
**
***
* *
** *
**
****
*
10 20 30 40 50
−100
−50
0
50
Time (ms)
Acc
eler
atio
n (g
)
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.8
Non-parametric Regression Fits
The main idea “non-parametric” regression modeling is to fitthe data “locally”. Therefore, no global structure assumptionmade when fitting the data.
***** *** **** ****** *********
*
**
***
**
****
**
*
*
**
**
*
*
*
****
*
*
**
****
*** **
*
**
****
*
*
*
*
*
*
*
*
*
***
*
**
*
*
*
*
** *
*
*
*
*
*
*
*
**
*
*
*
* **
**
*
*
*
**
***
* *
** *
**
****
*
10 20 30 40 50
−100
−50
0
50
Time (ms)
Acc
eler
atio
n (g
)
Regression SplineGeneralized Additive ModelSmoothing Spline
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.9
Regression Tree
Partitioning X-space into sub-regions and fit simple modelto each sub-regionThe partitioning pattern is encoded in a tree structure
We will use Major League Baseball Hitters Data from the1986–1987 season to give you a quick idea of what aregression tree might look like
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.10
Regression Tree
Years < 5
Hits >= 42
Years < 4
Hits < 118
Years < 7 Hits < 185
Years < 6
Hits < 142
Hits >= 152
Hits < 160
>= 5
< 42
>= 4
>= 118
>= 7 >= 185
>= 6
>= 142
< 152
>= 160
536100%
22634%
20431%
14221%
32910%
4543%
69766%
46534%
33510%
51824%
94932%
91429%
6223%
94926%
85111%
102614%
95110%
6883%
10756%
11705%
13283%
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.11
Longley’s Economic Regression Data
We are going to use Longley’s data set, which provides awell-known example of multicollinearity, to illustrate Ridgeregression.
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.12
Linear Regression Fit
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.13
The Predictor Variables are Highly Correlated
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.14
Ridge Regression as Multicollinearity Remedy
Recall least squares suffers because (XTX) is almostsingular thereby resulting in highly unstable parameterestimates
Modification of least squares that overcomesmulticollinearity problem
βridge = argminβ
(Y −Zβ
)T (Y −Zβ
)s.t.
p−1∑j=1
β2j ≤ t,
where Z is assumed to be standardized and Y isassumed to be centered
Ridge regression results in (slightly) biased but morestable estimates and better prediction performance
Advanced Topics I
Nonlinear Regression
Non-parametricRegression
Ridge Regression
11.15
Ridge Regression Fit