Game Theory & Logistic Regression: Monetizing Trust in Contracts Through Binary Classification
Regression Theory
description
Transcript of Regression Theory
Motivatio
n
Regression Theorywith
Additive Models and CMARS
4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009
GerhardGerhardGerhardGerhardGerhardGerhardGerhardGerhard--------Wilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm Weber * *
Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz , , Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal ,, Fatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma Yerlikaya ,, Pakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize Taylan * *, * *,
Elcin Kartal Elcin Kartal Elcin Kartal Elcin Kartal ,, Efsun KürümEfsun KürümEfsun KürümEfsun Kürüm ,, Ayse ÖzmenAyse ÖzmenAyse ÖzmenAyse Özmen
Institute of Applied Mathematics, Institute of Applied Mathematics, Middle East Technical University, Ankara, TurkeyMiddle East Technical University, Ankara, Turkey
** Faculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal
* ** * Department of Mathematics, Dicle University, TurkeyDepartment of Mathematics, Dicle University, Turkey
n
• Introduction, Motivation• Regression• Additive Models• MARS• PRSS for MARS
Content
• PRSS for MARS • CQP for MARS• Tikhonov Regularization for MARS• Numerical Experience and Comparison• Research Extensions • Conclusion
learning from data has become very importantin every field of science and technology, e.g., in
• financial sector,• quality improvent in manufacturing,•• computational biology,• medicine and• engineering.
Introduction
Learning enables for doing estimation and prediction.
Regression is mainly based on the problems and methods of
• least squares estimation,• maximum likelihood estimation • and classification.
New tools for data analysis, based on nonparametric regression and smoothing:
• additive (and multiplicative) models.
CART
vs.
Introduction
MARS
Additive (and multiplicative) models (studied at IAM, METU):
• spline regression in additive models,• spline regression in generalized addive models,
• MARS:
Introduction
• MARS: piecewise linear (per dimension) regression in multiplicative models,
• spline regression for stochastic differential equations
via additive and nonlinear models.
One of motivations of this research has been the approximation of financial data points(x,y), e.g., coming from
• the stock market,• credit rating,• economic factor,• company properties.
Regression: a Motivation
For example, to estimate the probability of a default of a particular credit:• It is used one of the latest three data points above.
There are different approaches for estimating the probability of a default.• Regression models (binary choice) are one of them.• For example, we assume that we have the dependent variable Y
with Y = 1 (“default”) or Y = 0 (“no default”) satisfies
,
: vector of independent variable(s) (input) such as credit rating.
( ) +Y F X ε=
X
• Estimation for the default probability P,
• Also, this estimation can be done via following linear regression
[ ]( ) + ( ).P E F X ε F X= =
Y X= + +Τα β ε .
Regression: a Motivation
• An estimate for the default probability of a corparate bond can be obtained:
α and β are unknown parameters. They can be estimated via linear regressionmethods or maximum likelihood estimation. In many important cases, these just meanleast squares estimation.
Y X= + +α β ε .
P X ;Τβ= α +
Input vector and output variable Y ;
linear regression :
• E(Y | X) is linear (...) and
( )1 2, ,...,T
mX X X X=
1 01
( ,..., )m
m j jj
Y E Y X X Xε β β ε=
= + = + +∑
Regression
• which minimizes
or
( )0 1, ,...,T
mβ β β β=
( ) ( )2
1
:N
Ti i
i
RSS y x=
= −∑β β
( ) ( ) ( )TRSS β Y Xβ Y Xβ= − − ( ) 1ˆ ,T TX X X y
−=β
( ) 1 2ˆCov( ) Tβ X X σ−
=
• Classical understanding:
additive separation of variables
In the input space:
Regression, Additive Models
• New interpretation:
separation of clusters and corresponding enumeration
(A)
are estimated by a smoothing on a single coordinate.
Standard convention at .
( ) ( )1 2 01
, ,...,m
i i i im j ijj
E Y x x x f xβ=
= +∑
jf
( )( ): 0x E f x =
Regression, Additive Models
Standard convention at .
• Backfitting algorithm (Gauss-Seidel algorithm). • This procedure depends on the partial residual against :
( )0ˆ .ij i k ik
k j
r y f xβ≠
= − −∑
( )( ): 0ij j ijx E f x =
ijx
• Estimating each smooth function by holding all the other ones fixed.
initialization:
cycle0
ˆ : ave( | 1,..., ) ,iy i Nβ = = ˆ ( ) 0, ,j ijf x i j≡ ∀
1,..., ,1,..., ,1,...,j m m=
( )ˆˆ , 1,...,m
r y f x i Nβ= − − =∑
Regression, Additive Models
is updated by smoothing the partial residuals
against
until the functions almost do not change.
• Convergence (condition)
( )0ˆˆ , 1,...,ij i k ik
k j
r y f x i Nβ≠
= − − =∑
jf̂
( )0ˆˆ ( 1,..., )
m
ij i k ikk j
r y f x i Nβ≠
= − − =∑ ijx
• Convergence of the backfitting, ˆ ,Tf = f
( )
.
.
.
.
ˆ : Nm Nmj kj k j
ST IR IR≠
−→
∑
f
fa
, f, f, f, f
1
Regression, Additive Models
• Full cycle: ; then, corresponds to l full cycles.
• Always converges if all smoother are symetric and all eigenvalues of are either +1 or in the interior of the unit ball: .
1 1ˆ ˆ ˆ ˆ...m m-=T T T T
.
.
.
m
f
ˆ lT
1λ| |<
T̂
• To extend the additive model to a wide range of distribution families:generalized additive models (GAM):
( )( ) ( ) ( )0µ β=
= = +∑m
j ji 1
X ψ X f XG ,
Regression, Generalized Additive Models
• are unspecified, , G : link function;
• : elements of a finite dimensional space consisting, e.g., of splines;
• spline orders (or degrees): suitably choosen, depending on the density and variation properties of the corresponding data in x and y components, respectively.
• problem of specifying becomes a finite dimensional parameter estimation problem.
jf
jf ( )0 1: , ,...,T
mf fθ β=
θ
• be distinct knots of and
• The function on the interval is a spline of degree k relative to the knots .
• If
(1) (polynomial of degree ; ),
0,..., Nx x 1N + [ ]ba, 0 1 ... Na x x x b= < < < =
)(xgk [ ]ba,jx
1,j jk kx x
f IP+
∈ k≤ 1,...,0 −= Nj
Regression, Generalized Additive Models,Splines
(2) ,
the space of splines on is called and relative to the distinct
knots; then, . .
• In practice, a spline is represented by a different polynomial on each subinterval and for this reason there could be a discontinuity in its kth derivative at the internal knots
.
1j j +
[ ]1 ,kkf C a b−∈
kg [ ]ba, k℘ 1N +
dim k N k℘ = +
1 1,..., Nx x −
• Characterize a spline of degree k, can be represented by
coefficients to be determined.
1, ,
:j j
k j k x xf f
+ =
, 10
( ) ( ) , if , ;+=
= − ∈ ∑k
ik j ij j j j
i
f x g x x x x x
( 1)k N+ijg
( ) ( )( ) ( ) ( 1,..., 1; 0,..., 1),= = − = −l lf x f x j N l k
Regression, Generalized Additive Models,Splines
• To hold:
there are conditions, and the remaining degrees of freedom are
- .
( ) ( ), 1 ,( ) ( ) ( 1,..., 1; 0,..., 1),− = = − = −l l
k j j k j jf x f x j N l k
( 1)k N −
( 1)k N+ ( 1)k N k N− = +
• Financial markets have different kinds of trading activities. These activities work with
• short-, mid- or long-term horizons
• from days and weeks to months and years.
• These data can sometimes be problematic for being used at the models,
Clustering for Generalized Additive Models
• These data can sometimes be problematic for being used at the models,
e.g.,given a longer horizon with sometimes less frequent data recorded, but to other times highly frequent measurements.
• the structure of data may has particular properties:
i. larger variability ii. outliersiii. some data do not have any meaning.
Clustering for Generalized Additive Models
• data variation:
Clustering for Generalized Additive Models
• for the sake of simplicity: for each interval
jI≡jN N
• Density:
; the density of the input data in the j-th interval:
• Variation
1,..., mI I
number of point in : .
length of ij j
jj
x ID
I=
Clustering for Generalized Additive Models
If over the interval the data are :
• If this value is big, at many data points, the curvature of any approximating curve could be big.
� occurrence of outliers,� instability of the model.
1 1( ),..., ( ) j j N j N jx , y x , yjI
1
11
: .N
j i j i ji
V y y−
+=
= −∑
• (or ) intervals (or cubes) according to the data grouped. • (cube ), the associated index of data variation
or
• In fact, from both the viewpoints of data fitting and complexity (or stability),
1,..., pI I 1,..., mQ Q
jI jQ
( )j j j
j j j j j
Ind :=D V
Ind :=d D v (V )
⋅
⋅
Clustering for Generalized Additive Models
• In fact, from both the viewpoints of data fitting and complexity (or stability),
o cases with a high variation distributed over a very long intervall are very muchless problematic than cases with a high variation over a short interval;
o oscillation, o curvature,o up to nonsmoothness,
o penalty!
• Additive model can be fit by data. Given observations for
• penalized sum of squares PRRS
22''
0 01 1 1
( , ,..., ) : ( ) ( )bN m m
1 m i j ij j j j ji j j a
PRSS f f y f x f t dt= = =
= − − +
∑ ∑ ∑ ∫β β µ
( , ) ( = 1,2,..., ).i iy x i N
Regression, Additive Models
• (smoothing parameters, tradeoff)
• large values of yield smoother curves, smaller ones result in more fluctuation.
• New estimation methods for additive model with CQP :
1 1 1i j j a= = =
0jµ ≥
jµ
0, ,
2
20
1 1
2''
min ,
subject to ( ) , 0,
( ) ( 1,2,..., ).
t β f
N m
i j iji= j
j j j j
t
y β f x t t
f t dt M j m
=
− − ≤ ≥
≤ =
∑ ∑
∫
jdj jθ=∑
Regression, Additive Models
• The functions are splines:
• Then, we get
jf1
( ) ( ).j jj l l
l
f x h xθ=
=∑
0, ,
2 20 2
2
0 2
min ,
subject to ( , ) , 0,
( , ) ( 1,..., ).
t β f
j j
t
W t t
V M j m
β θ
β θ
≤ ≥
≤ =
Regression, Additive Models
http://144.122.137.55/gweber/http://144.122.137.55/gweber/
• To estimate general functions of high-dimensional arguments.
• An adaptive procedure.
• A nonparametric regression procedure.
• No specific assumption about the underlying functional relationship between the dependent and independent variables.
MARS Multivariate Adaptive Regression Spline
between the dependent and independent variables.
• Ability to estimate the contributions of the basis functions so that both the additive and the interactive effects of the predictors are allowed to determine the response variable.
• Uses expansions in piecewise linear basis functions of the form
+ ( , ) = [ ( )] ,c x xτ τ ++ − ( , ) = [ ( )]-c x x .τ τ +− −
{ }[ ] : max 0,q q+ =
y
• •
•• •
••
••
••••
••
•
MARS
Basic elements in the regression with MARS.
• Let us consider
• The goal is to construct reflected pairs for each input
τ x
• ••
••
•••
+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ
( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )
( 1,2,..., ).=jX j p
y
• •
•• •
••
••
••••
••
•
MARS
Basic elements in the regression with MARS.
• Let us consider
• The goal is to construct reflected pairs for each input
τ x
• ••
••
•••
+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ
( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )
( 1,2,..., )jX j p=
y
• •
•• •
••
••
••••
••
•
MARS
Basic elements in the regression with MARS.
• Let us consider
• The goal is to construct reflected pairs for each input
( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )
( 1,2,..., )jX j p=
τ x
• ••
••
•••
+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ r egression w ith
• Set of basis functions:
• Thus, can be represented by
{ } { }{ }1, 2, ,: ( ) , ( ) , ,..., , 1,2,...,|j j j j N jX X x x x j pτ τ τ+ +℘ = − − ∈ ∈% % %
( )f X
MARS
• are basis functions from or products of two or more such functions; interaction basis functions are created by multiplying an existing basis function with a truncated linear function involving a new variable.
• Provided the observations represented by the data
01
( ) .M
m mm
Y Xθ θ ψ ε=
= + +∑
( 1,2,..., )m m Mψ = ℘
( , ) ( 1,2,..., ) :i iy i N=x
1
( ) : [ ( )] .m
m m mj j j
K
mj
s xκ κ κ
ψ τ +=
= ⋅ −∏x
• Two subalgorithms:
(i) Forward stepwise algorithm:
• Search for the basis functions.• Minimization of some “lack of fit” criterion. • The process stops when a user-specified value is reached. M
MARS
• The process stops when a user-specified value is reached.
• Overfitting.So a backward deletion procedure is appliedby decreasing the complexity of the model without degrading the fit to the data.
(i) Backward stepwise algorithm:
maxM
• Remove from the model basis functions that contribute to the smallest increasein the residual squared error at each stage, producing an optimally estimated model with respect to each number of terms, called .
• is related with some complexity of the estimation.• To estimate the optimal value of :
f̂αα
αα
MARS
• Alternative:
2
=12
( ( ))GCV :=
(1 ( ) )
N
i ii
ˆy f
N
α
α
−
−
∑ x
M
( ) := +
:=
:=
:=
:=
u d K
N
u
K
d
αM
number of samples
number of independent basis functions
number of knots selected by forward stepwise algorithm
cost of optimal basis
PRSS for MARS
( )max
1 2
2 22 2,
1 1 1, ( )( , )
: ( ) ( )α
αα α α
λ θ ψ= = = <
∈=
= − + ∑ ∑ ∑ ∑ ∫MN
m mi i m m r s m
i m r sr s V m
PRSS y f D dx t t
{ }( ) : | 1,2,...,
: ( , ,..., )
mj m
m T
V m j K
t t t
κ= =
t =
• Tradeoff between both accuracy and complexity. • Penalty parameters .
{ }
1 2
1 2
1 2 1 2
: ( , ,..., )
( , )
: , , 0,1
Km
m Tm m mt t t
α α αα α α α α
== + ∈
t =
where ( )1 2, ( ) : ( )m m m m
r s m m r sD t tα αα αψ ψ= ∂ ∂ ∂t t
mλ
Knot Selection
Grid Selection
Grid SelectionMotivatio
n
( )max
max
1 11 11, ( ),..., ( ), ( ),..., ( )
TMM Mi i M i M i M ix xψ ψ ψ ψ+
+( ) :=ψψψψ d x x
( )max
,M
Τθ θ θ0 1:= , ,...,θθθθ
max1 2 1 2: ( , ,..., , , ,..., )d MM M M Ti i i i i i ix x x x x x+ +=
{ } { }1,2,...,( ) 0,1,2,..., 1Kmj
mj K Nκσ ∈ ∈ +
K
CQP and Tikhonov Regularization for MARS
1 21 21 2
, , ,
ˆ , ,..., ,m m mm m K mm
KmKm
mi
l l l
x x xκ κ κ
κ κ κσ σ σκ κ κ
=
x1, ,1
ˆ :m
m mj jm m
j jj j
Kmi
l lj
x xκ κ
κ κσ σκ κ
+=
∆ = −
∏x
( )1( ) : ( ),..., ( )T
N=d d dψ ψ ψψ ψ ψψ ψ ψψ ψ ψ
12
1 2
2 2
,1
, ( )( , )
ˆ ˆ: ( ) .m mim r s m i i
r sr s V m
L Dα
αα α α
ψ= <
∈=
= ∆
∑ ∑ x x
L is an matrix.max max( 1) ( 1)M M+ × +
• For a short representation, we can rewrite the approximate relation as
max ( 1)2 2 2
21 1
( ) .λ θ+
= =
= − + ∑ ∑KmM N
m im mm i
PRSS Ly dψ θψ θψ θψ θ
CQP and Tikhonov Regularization for MARS
• In case of the same penalty parameter , then:
Tikhonov regularization
2 2
22( ) .λ= − +PRSS y d Lψ θ θψ θ θψ θ θψ θ θ
2( : )mλ λ ϕ= =
CQP for MARS
• Conic quadratic programming:
,
2
min ,
( ) ,
.
tt
t
M
θ
≤
≤
d y
L
subject to ψ θ −ψ θ −ψ θ −ψ θ −
θθθθ2
.M≤Lθθθθ
2, ( 1,2,..., ).min ≤ − =− T
i ii iT q i kIn general :
xc x p xD x dsubject to
CQP for MARS
• Conic quadratic programming:
,
2
min ,
( ) ,
.
tt
t
M
θ
≤
≤
d y
L
subject to ψ θ −ψ θ −ψ θ −ψ θ −
θθθθ2
.M≤Lθθθθ
2, ( 1,2,..., ).min ≤ − =− T
i ii iT q i kIn general :
xc x p xD x dsubject to
. Moreover, is a primal dual optimal solution if and only if 1 2( , , , , , )t χ η ω ωθθθθ
max 1
0 ( ): ,
1 0 0N
TM
tχ
+
− = +
d yψψψψθθθθ
CQP for MARS
maxmax
max
max
maxmax max
max
11
1
1
1 211 1
1 2
11 2
00: ,
0 0
0 0 10 1,
0( ) 0 0
0, 0,
, M
MM
TM
TTMN
T TMM M
T T
N
t
M
L L
η
ω ω
ω χ ω η
ω ω
++
+
+
++ +
+
= +
+ =
= =
∈ ∈
L
d L
θθθθ
ψψψψ
2
max 21
,
, .MNL Lχ η
+
++∈ ∈
• CQPs belong to the well-structured convex problems.
• Interior Point Methods.
• Better complexity bounds.
CQP for MARS
• Better practical performance.
C-MARS
• We had the following data:
X1 1,5554 1,5326 -0,1823 0,1627 0,5687 0,1706 0,2041 -0,1823 -0,82 -0,7234 0,4446 -0,3291 -1,5583 1,2706 1,7555
X2 0,1849 1,1538 0,7586 -1,5363 1,906 0,3761 1,3323 -0,0064 -1,7275 1,141 0,3761 0,5673 -0,1976 0,7586 0,1849
X3 1,264 1,2023 -1,0995 0,8529 1,3051 -0,3802 -0,7913 0,1336 0,2363 -1,0995 -0,0719 -0,894 -1,0995 0,9557 1,5722
X4 1,2843 1,0175 -0,9676 0,7408 1,0635 -0,506 -0,7937 -0,0564 0,0455 -0,9676 -0,2482- 0,8557 -0,9676 0,8707 1,7339
X5 -0,7109 0,1777 0,1422 0,0355 3,2699 0,3554 -0,1777 1,5283 -0,0711 0,3554 0,8886 0,4621 -0,9241 -0,9241 -0,0711
Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263
Numerical Experience and Comparison
Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263
X1 0,0474 -0,8713 -0,2158 0,2179 1,5426 -1,16 0,9857 0,6752 0,5402 -1,4528 1,9349 -0,8299 -0,681 0,7304 -1,1305
X2 0,9498 -0,1976 -1,7275 -0,9626 1,3323 -0,9626 0,1849 -1,345 1,3323 -0,0064 0,1849 0,3761 -1,345 -0,7713 -0,0064
X3 0,0308 -0,6885 1,0584 0,5446 0,5446 -0,483 0,4419 1,264 0,0308 -1,3051 2,086 -0,5857 -0,2775 1,5722 -1,3051
X4 0,1543 -0,7278 1,0046 0,3752 0,3752 -0,5839 0,2613 1,2843 -0,1543 -1,0635 2,5631 -0,6578 -0,4241 1,7339 -1,0635
X5 1,1018 0,6753 -0,391 -0,2843 1,4217 0,4621 -0,8175 0,7819 0,2488 1,5283 -0,1777 -1,7771 0,4621 -1,0307 0,3554
Y 1,1477 -0,3916 -0,4624 -1,0993 2,8639 -1,0285 0,1923 -0,7631 2,05 1,0238 0,9177 -1,2055 -0,3208 -0,5862 -0,6216
• We constructed model functions for these data using the MARS Software where we selected the maximum number of basis elements: . Then,5
maxM =
1 2BF = max{0, + 1.728};
= -1.081 + 0.626 * BF
ωModel 1 : = 1
X
Y
1 2BF = max{0, + 1.728};
BF = max{0, - 0.462}* BF
ωModel 2 : = 2
X
X
Numerical Experience and Comparison
1 = -1.081 + 0.626 * BFY 2 5 1
1 2
BF = max{0, - 0.462}* BF
= -1.073 + 0.499* BF 0.656 BF+X
Y *
1 2
2 5 1
4 3 1
1 2 4
BF = max{0, + 1.728};
BF = max{0, - 0.462} * BF
BF = max{0, + 0.586} * BF
-1.176 + 0.422 * BF + 0.597 * BF + 0.236 * BF
ωModel 3 : = 3best mod >>>
el
=
X
X
X
Y
• and, finally,
1 2
2 5 1
3 5 1
4 3 1
BF = max{0, + 1.728}
BF = max{0, - 0.462} * BF
BF = max{0, 0.462 - } * BF
BF = max{0, + 0.586} * BF
-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF
ωModel 4 : = 4
=
X
X
X
X
Y
Numerical Experience and Comparison
1 2 3-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF
+ 0.22
= Y
46 * BF
1 2
2 5 1
3 5 1
4 3 1
5 3 1
1
BF = max{0, + 1.728};
BF = max{0, - 0.462} * BF
BF = max{0, 0.462 - } * BF
BF = max{0, + 0.586} * BF
BF = max{0, - 0.586 - } * BF
= -1.248 + 0.487 * BF + 0.486 * B
ωModel 5 : = 5
X
X
X
X
X
Y 2 3 4 5F - 0.118 * BF + 0.282 * BF + 0.263 * BF
• Then, we considered a large model with 5 five basis functions; we found (writing a MATLAB code):
0 0 0 0 0 0
0 1.8419 0 0 0 0
0 0 0.7514 0 0 0
0 0 0 0.9373 0 0
=
L
Numerical Experience and Comparison
• We constructed models using different values for in the optimization problem, which was solved by MOSEK (CQP).
• Our algorithm constructs a model with 5 parameters always; in case of Salford, there are 1, 2, 3, 4 or 5 parameters.
0 0 0 0.9373 0 0
0 0 0 0 2.1996 0
0 0 0 0 0 0.3905
M
1 17.6425 4.2003 1.1531 0.771
2
RESULTS OF SALFORD MARS
z = RSS2
θL t = GCVRSSω
Numerical Experience and Comparison
2 11.1870 3.3447 1.0430 0.613
3 7.7824 2.7897 1.0368 0.550
4 6.6126 2.5715 1.1967 0.626
5 6.2961 2.5092 1.1600 0.840
RESULTS OF OUR APPROACH
0.05 5 5.16894 0.05 0.2940 5 4.2024 0.2940
0.1 5 4.959342 0.1 0.2945 5 4.2006 0.2945
0.15 5 4.755559 0.15 0.295 5 4.1988 0.2950
0.2 5 4.557617 0.2 0.3 5 4.180557 0.3
0.25 5 4.365811 0.25 0.35 5 4.002338 0.35
M ω z = RSS2
θL t = M z = RSS2
θL t =ω
Numerical Experience and Comparison
0.25 5 4.365811 0.25 0.35 5 4.002338 0.35
0.265 5 4.3095 0.2650 0.4 5 3.831675 0.4
0.275 5 4.2723 0.2750 0.45 5 3.669118 0.45
0.285 5 4.2354 0.2850 0.5 5 3.515233 0.5
0.2865 5 4.2299 0.2865 0.55 5 3.370588 0.55
0.2875 5 4.2262 0.2875 0.552 5 3.3650 0.5520
0.2885 5 4.2226 0.2885 0.555 5 3.3567 0.5550
0.2895 5 4.2189 0.2895 0.558 5 3.3483 0.558
0.28965 5 4.2183 0.2897 0.560 5 3.3428 0.5600
0.28975 5 4.2180 0.2897 0.561 5 3.3401 0.5610
0.28985 5 4.2176 0.2899 0.562 5 3.3373 0.5620
0.28995 5 4.2172 0.2899 0.565 5 3.3291 0.5650
0.575 5 3.3019 0.5750 0.96 5 2.5968 0.96
0.585 5 3.2751 0.5850 0.97 5 2.5880 0.97
0.595 5 3.2488 0.5950 0.98 5 2.5797 0.98
0.6 5 3.235746 0.6 0.99 5 2.5718 0.99
0.65 5 3.111253 0.6 5 1 5 2.564459 1
0.7 5 2.997622 0. 7 2 5 2.509165 1.16009
M z = RSS2
θL t = M z = RSS2
θL t =ω ω
Numerical Experience and Comparison
0.7 5 2.997622 0. 7 2 5 2.509165 1.16009
0.75 5 2.895324 0.7 5 2.1 5 2.509165 1.16009
0.8 5 2.804764 0.8 2.2 5 2.509165 1.16009
0.805 5 2.7964 0.8050 2.3 5 2.509165 1.16007
0.810 5 2.7881 0.8100 2.4 5 2.509165 1.16008
0.820 5 2.7719 0.8200 2.5 5 2.509165 1.16001
0.830 5 2.7562 0.8300 2.6 5 2.509165 1.16007
0.840 5 2.7410 0.8400 2.7 5 2.509165 1.16007
0.85 5 2.726261 0.85 2.8 5 2.509165 1.16009
0.9 5 2.660023 0.9 2.9 5 2.509165 1.16009
0.95 5 2.60612 0.95 3 5 2.509165 1.16009
4 5 2.509165 1.160084
. We drew L curves: 2
dy
ψθ
−ψ
θ−
ψθ
−ψ
θ− 4.5
5
5.5
4.5
5
5.5
2d
yψ
θ−
ψθ
−ψ
θ−
ψθ
−
4.5
5
5.5
4.5
5
5.5
Numerical Experience and Comparison
• Conclusion: Based on the L curve criterion and for the given data, our solution is better than Salford solution for MARS.
()
dy
ψθ
−ψ
θ−
ψθ
−ψ
θ−
0 0.2 0.4 0.6 0.8 1 1.2 1.42.5
3
3.5
4
0 0.2 0.4 0.6 0.8 1 1.2 1.42.5
3
3.5
4
2Lθθθθ
2Lθθθθ
()
dy
ψθ
−ψ
θ−
ψθ
−ψ
θ−
0 0.2 0.4 0.6 0.8 1 1.2 1.42.5
3
3.5
4
0 0.2 0.4 0.6 0.8 1 1.2 1.42.5
3
3.5
4
• All test data sets are also compared according to the performance measure such as MSE, MAE, Correlation Coefficient, R2, PRESS, Mallows’ Cp etc..
• These measures are based on the average of nine values (one for each fold and each replication).
Numerical Experience and Comparison
C - M A R S
Please find much more numerical experience and comparison in
Yerlikaya, Fatma,
A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing,
Numerical Experience and Comparison
MSc. Thesis at Institute of Applied Mathematics of METU, Ankara, 2008.
Piecewise Linear Functions - Stock Market
figures generated byErik KropatErik KropatErik KropatErik Kropat
Forward Stepwise Algorithm Revisited
high complexity
Forward Stepwise Algorithm Revisited
Forward Stepwise Algorithm Revisited
Forward Stepwise Algorithm Revisited
Forward Stepwise Algorithm Revisited
Regularization & Uncertainty Robust Optimization
Laurent El Ghaoui
• ••
Regularization & Uncertainty Robust Optimization
• Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press,2004.
• Breiman, L., Friedman, J. H., Olshen, R., and Stone, C., Classification and Regression Trees, Belmont, CA:Wadsworth Int. Group, 1984.
• Craven, P., and Wahba, G., Smoothing noisy data with spline functions: estimating the correct degree ofsmoothing by the method of generalized cross-validation, Numerische Mathematik 31 (1979) 377-403.
• Friedman, J.H., Multivariate adaptive regression splines, The Annals of Statistics 19, 1 (1991) 1-141.
• Hansen, P.C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,SIAM, Philadelphia, 1998.
References
• Hastie, T., Tibshirani, R., and Friedman, J.H., The Element of Statistical Learning, Springer Verlag, NY, 2001.
• MOSEK SOFTWARE, http://www.mosek.com/ .
• Myers, R.H., and Montgomery, D.C., Response Surface Methodology: Process and ProductOptimization Using Designed Experiments,New York: Wiley (2002).
• Nemirovski, A., Lectures on modern convex optimization, Israel Institute Technology (2002), http://iew3.technion.ac.il/Labs/Opt/LN/Final.pdf.
• Nesterov, Y.E., and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993.
• Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology, Optimization, 56, 5–6, October–December (2007) 675–698.
• P. Taylan, P., Weber , G.-W., and Yerlikaya, F., Continuous optimization applied in MARS for modern applications in finance, science and technology, in ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies, Neringa, Lithuania, May 20-23, 2008.