OPTIMIZATION OF MODELS: LOOKING FOR THE BEST...
Transcript of OPTIMIZATION OF MODELS: LOOKING FOR THE BEST...
OPTIMIZATION OF MODELS: LOOKING FOR THEBEST STRATEGY
Pavel Kordík, Oleg Ková řík, Miroslav ŠnorekDepartment of Computer Science and Engineering,
Faculty of Eletrical Engineering,Czech Technical University in Prague, Czech Republic
[email protected] (Pavel Kordík)
2/18
Motivation
• Continuous optimization• Several methods available• Which is the best?
• Is there any strategy to chose the best method for given task?
3/18
Our task: FAKE GAME research project
FAKE INTERFACE
MO DELMO DEL
MO DEL MO DEL
MO DEL
MO DEL Math equations
Feature ranking
Interesting behaviour
Credibilityestimation
Classes boundaries,relationship of variables
DATAWAREHOUSING
DATAINTEGRATION
DATACLEANING
INPUTDATA
Classification, Prediction,Identification and Regression
DATACOLLECTION
PROBLEMIDENTIFICATION
DATAINSPECTION
AUTOMATEDDATA
PREPROCESSINGGAME ENGINE
4/18
The GAME engine for automated data mining
MO DELMO DEL
MO DEL MO DEL
MO DEL
MO DEL
GAMEPREPROCESSED
DATA
CLASSIFICATION
REGRESSION
PREDICTION
IDENTIFICATION
OR
OR
OR
How it works inside?
5/18
The GAME engine: building a model
Group of Adaptive Models Evolution (GAME)
Inductive model
Heterogeneous units
Nichinggenetic algorithm (will be explained) employed in each layer to optimize the topology of GAME networks.
Inputvariables......
First layer of units
2nd layer of units
Outputvariable
x1
xn
x2
...
01 1
axayn
i
m
j
rji +
=∑ ∏
= =
Polynomial unit
x1
xn
x2
...
11
+=
+=∑ n
n
iii axay
Linear unit
Units in layer evolvedby genetic algorithm
6/18
Heterogeneous units in GAMEx1
xn
x2
...1
1+
=
+=∑ n
n
iii axay
Linear (LinearNeuron)
x1
xn
x2
... 01 1
axayn
i
m
j
rji +
=∑ ∏
= =
Polynomial (CombiNeuron)x1
xn
x2
...( )
( )
( )0
11
22
1
2
*1 aeay n
n
iii
a
ax
n ++= +
=
+
∑ −−
+
Gaussian (GaussianNeuron)
x1
xn
x2
...03
121 sin aaxaaay n
n
iiinn +
+∗∗= +=
++ ∑
Sin (SinusNeuron)
x1
xn
x2
... 0
11
1a
e
y n
iii xa
++
=∑−=
Logistic (SigmNeuron)
x1
xn
x2
...0
*
21
1
* aeayn
iiin xaa
n +=∑
+=
+
Exponential (ExpNeuron)
x1
xn
x2
...0
11 1
*1
2
2
2
aaxxaxa
ay
n
n
i
n
jjijin
n
iii
n +++
=
+= =
+=
+
∑∑∑
Rational (PolyFractNeuron)
x1
xn
x2
...
Universal (BPNetwork)
( ))(1
12
1p
n
p pq
n
qq xy ∑∑ =
+
=
= φψ
7/18
Optimization of coefficients (learning)
x1
xn
x2
... ( )( )
( )0
11
22
1
2
*1 aeay n
n
iii
a
ax
n++= ++++
=
+
∑ −−
++++
Gaussian (GaussianNeuron)
We have inputs x1, x2, …, xn and target output y in the training data set
We are looking for optimal values of coefficients a0, a1, …, an+2
y’
The difference between unit output y’ and the target value y should be minimal for all vectors from the training data set
∑=
−=m
i
yyE1
2)'(
8/18
What is an analytic gradient and how to derive it?
Error of the unit for training data (energy surface)
Gradient of the error
Unit with gaussian transfer function
Partial derivation of error in the direction of coefficient ai
9/18
Partial derivatives of the Gauss unit
10/18
Optimization of their coefficients
Unit
repeat
Optimization
method
optimize coefficientsgiven inintial values
new values
coefficientsa1 , a2 , ..., a n
error
final values
compute
error on
training
dataestimate
gradient
a)
b)
Unit
repeat
Optimization
method
optimize coefficientsgiven inintial values
new values
coefficientsa1 , a2 , ..., a n
error
final values
compute
error on
training
data
compute
gradient
of the
error
gradient
Unit does not provideanalytic gradientjust error of the unit
Unit provides analytic gradientand the error of the unit
11/18
Very efficient gradient based training for hybrid networks developed!
Quasi Newton methoda) estimating gradientb) gradient supplied
12/18
Optimization methods available in GAME
13/18
Experimental results of competing opt. methods on Building data set
Hot water consumption
QNCG
SADEDEall
HGAPSOCACO
SOSpalDEACOPSO
OS
Cold water consumption
QNallDE
SADECGOS
HGAPSOCACO
SOSpalDEACOPSO
Energy consumption
CGDE
QNSADE
allSOS
CACOPSO
HGAPSOACO
OSpalDE
Hot water consumption
QNCG
SADEDEall
HGAPSOCACO
SOSpalDEACOPSO
OS
Hot water consumption
QNCG
SADEDEall
HGAPSOCACO
SOSpalDEACOPSO
OS
Cold water consumption
QNallDE
SADECGOS
HGAPSOCACO
SOSpalDEACOPSO
Cold water consumption
QNallDE
SADECGOS
HGAPSOCACO
SOSpalDEACOPSO
Energy consumption
CGDE
QNSADE
allSOS
CACOPSO
HGAPSOACO
OSpalDE
Energy consumption
CGDE
QNSADE
allSOS
CACOPSO
HGAPSOACO
OSpalDE
RMS error on testing data sets (Building data) averaged over 5 runs
14/18
RMS error on the Boston data set
15/18
Classification accuracy [%] on the Spiral data set
16/18
Evaluation on diverse data sets
What is it All ?
17/18
Remember the Genetic algorithm optimizing the structure of GAME?
1
2
3
4
5
6
7
NichingGA
Linear transfer unit
12345671001000 not implemened CACO
Polynomial trasfer unit
Optimization method
Inputs
12345670000110
Transfer function
Inputs
12345672115130
12345671203211
Transfer function
DE
Opt. m.
02212
3211 axxaxxay ++=
02211 axaxay ++= added intochromosomes
18/18
Conclusion
• It is wise to combine several different optimization strategies for the training of inductive models.
• Evolution of optimization methods works, but it is not significantly better than the random selection of methods.
• Nature inspired methods are slow for this problem (they don’t care about the analytic gradient).
• Future work: utilize the gradient in nature inspired methods.