OPTIMIZATION OF MODELS: LOOKING FOR THE BEST...

Post on 29-Sep-2020

0 views 0 download

Transcript of OPTIMIZATION OF MODELS: LOOKING FOR THE BEST...

OPTIMIZATION OF MODELS: LOOKING FOR THEBEST STRATEGY

Pavel Kordík, Oleg Ková řík, Miroslav ŠnorekDepartment of Computer Science and Engineering,

Faculty of Eletrical Engineering,Czech Technical University in Prague, Czech Republic

kordikp@fel.cvut.cz (Pavel Kordík)

2/18

Motivation

• Continuous optimization• Several methods available• Which is the best?

• Is there any strategy to chose the best method for given task?

3/18

Our task: FAKE GAME research project

FAKE INTERFACE

MO DELMO DEL

MO DEL MO DEL

MO DEL

MO DEL Math equations

Feature ranking

Interesting behaviour

Credibilityestimation

Classes boundaries,relationship of variables

DATAWAREHOUSING

DATAINTEGRATION

DATACLEANING

INPUTDATA

Classification, Prediction,Identification and Regression

DATACOLLECTION

PROBLEMIDENTIFICATION

DATAINSPECTION

AUTOMATEDDATA

PREPROCESSINGGAME ENGINE

4/18

The GAME engine for automated data mining

MO DELMO DEL

MO DEL MO DEL

MO DEL

MO DEL

GAMEPREPROCESSED

DATA

CLASSIFICATION

REGRESSION

PREDICTION

IDENTIFICATION

OR

OR

OR

How it works inside?

5/18

The GAME engine: building a model

Group of Adaptive Models Evolution (GAME)

Inductive model

Heterogeneous units

Nichinggenetic algorithm (will be explained) employed in each layer to optimize the topology of GAME networks.

Inputvariables......

First layer of units

2nd layer of units

Outputvariable

x1

xn

x2

...

01 1

axayn

i

m

j

rji +

=∑ ∏

= =

Polynomial unit

x1

xn

x2

...

11

+=

+=∑ n

n

iii axay

Linear unit

Units in layer evolvedby genetic algorithm

6/18

Heterogeneous units in GAMEx1

xn

x2

...1

1+

=

+=∑ n

n

iii axay

Linear (LinearNeuron)

x1

xn

x2

... 01 1

axayn

i

m

j

rji +

=∑ ∏

= =

Polynomial (CombiNeuron)x1

xn

x2

...( )

( )

( )0

11

22

1

2

*1 aeay n

n

iii

a

ax

n ++= +

=

+

∑ −−

+

Gaussian (GaussianNeuron)

x1

xn

x2

...03

121 sin aaxaaay n

n

iiinn +

+∗∗= +=

++ ∑

Sin (SinusNeuron)

x1

xn

x2

... 0

11

1a

e

y n

iii xa

++

=∑−=

Logistic (SigmNeuron)

x1

xn

x2

...0

*

21

1

* aeayn

iiin xaa

n +=∑

+=

+

Exponential (ExpNeuron)

x1

xn

x2

...0

11 1

*1

2

2

2

aaxxaxa

ay

n

n

i

n

jjijin

n

iii

n +++

=

+= =

+=

+

∑∑∑

Rational (PolyFractNeuron)

x1

xn

x2

...

Universal (BPNetwork)

( ))(1

12

1p

n

p pq

n

qq xy ∑∑ =

+

=

= φψ

7/18

Optimization of coefficients (learning)

x1

xn

x2

... ( )( )

( )0

11

22

1

2

*1 aeay n

n

iii

a

ax

n++= ++++

=

+

∑ −−

++++

Gaussian (GaussianNeuron)

We have inputs x1, x2, …, xn and target output y in the training data set

We are looking for optimal values of coefficients a0, a1, …, an+2

y’

The difference between unit output y’ and the target value y should be minimal for all vectors from the training data set

∑=

−=m

i

yyE1

2)'(

8/18

What is an analytic gradient and how to derive it?

Error of the unit for training data (energy surface)

Gradient of the error

Unit with gaussian transfer function

Partial derivation of error in the direction of coefficient ai

9/18

Partial derivatives of the Gauss unit

10/18

Optimization of their coefficients

Unit

repeat

Optimization

method

optimize coefficientsgiven inintial values

new values

coefficientsa1 , a2 , ..., a n

error

final values

compute

error on

training

dataestimate

gradient

a)

b)

Unit

repeat

Optimization

method

optimize coefficientsgiven inintial values

new values

coefficientsa1 , a2 , ..., a n

error

final values

compute

error on

training

data

compute

gradient

of the

error

gradient

Unit does not provideanalytic gradientjust error of the unit

Unit provides analytic gradientand the error of the unit

11/18

Very efficient gradient based training for hybrid networks developed!

Quasi Newton methoda) estimating gradientb) gradient supplied

12/18

Optimization methods available in GAME

13/18

Experimental results of competing opt. methods on Building data set

Hot water consumption

QNCG

SADEDEall

HGAPSOCACO

SOSpalDEACOPSO

OS

Cold water consumption

QNallDE

SADECGOS

HGAPSOCACO

SOSpalDEACOPSO

Energy consumption

CGDE

QNSADE

allSOS

CACOPSO

HGAPSOACO

OSpalDE

Hot water consumption

QNCG

SADEDEall

HGAPSOCACO

SOSpalDEACOPSO

OS

Hot water consumption

QNCG

SADEDEall

HGAPSOCACO

SOSpalDEACOPSO

OS

Cold water consumption

QNallDE

SADECGOS

HGAPSOCACO

SOSpalDEACOPSO

Cold water consumption

QNallDE

SADECGOS

HGAPSOCACO

SOSpalDEACOPSO

Energy consumption

CGDE

QNSADE

allSOS

CACOPSO

HGAPSOACO

OSpalDE

Energy consumption

CGDE

QNSADE

allSOS

CACOPSO

HGAPSOACO

OSpalDE

RMS error on testing data sets (Building data) averaged over 5 runs

14/18

RMS error on the Boston data set

15/18

Classification accuracy [%] on the Spiral data set

16/18

Evaluation on diverse data sets

What is it All ?

17/18

Remember the Genetic algorithm optimizing the structure of GAME?

1

2

3

4

5

6

7

NichingGA

Linear transfer unit

12345671001000 not implemened CACO

Polynomial trasfer unit

Optimization method

Inputs

12345670000110

Transfer function

Inputs

12345672115130

12345671203211

Transfer function

DE

Opt. m.

02212

3211 axxaxxay ++=

02211 axaxay ++= added intochromosomes

18/18

Conclusion

• It is wise to combine several different optimization strategies for the training of inductive models.

• Evolution of optimization methods works, but it is not significantly better than the random selection of methods.

• Nature inspired methods are slow for this problem (they don’t care about the analytic gradient).

• Future work: utilize the gradient in nature inspired methods.