GMDH networks the KnowledgeMiner software Jakub...

Czech Technical University in PragueFaculty of Electrical Engineering

Bachelor thesis

GMDH networks the KnowledgeMiner software

Jakub Novak

Supervisor: Ing. Kordık Pavel

Studijnı program: Elektrotechnika a informatika strukturovany bakalarsky

Obor: Informatika a vypocetnı technika

kveten 2006

Podekovanı

Chtel bych podekovat Ing. Pavlu Kordıkovi za jeho pomoc a vedenı me bakalarske prace.

iii

Prohlasenı

Prohlasuji, ze jsem svou bakalarskou praci vypracoval samostatne a pouzil jsem pouze podkladyuvedene v prilozenem seznamu.Nemam zavazny duvod proti uzitı tohoto skolnıho dıla ve smyslu §60 Zakona c. 121/2000 Sb.,o pravu autorskem, o pravech souvisejıcıch s pravem autorskym a o zmene nekterych zakonu(autorsky zakon).

V Praze dne 26.6.2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Abstract

Inductive methods such as Group Method of Data Handling (GMDH) are good in solving ill-posed tasks. Our past studies indicate that the task of the age estimation based on senescenceindicators obtained from skeletons is ill-posed. In this work I apply the GMDH methodsimplemented in the KnowledgeMiner software on the ”Antro” and ”Building” data. I alsocompare the KnowledgeMiner to another application the GAME simulator.

Abstrakt

Problemy realneho sveta mohou byt casto velmi komplexnı a pouze jedna velicina je nemuzepopsat. Proto nastrojem vhodnym pro jejich resenı se jevı Group Method of Data Handling(GMDH). Vyzkum stanovenı veku podle starnutı kostı na kostre se ukazal jako presne takovyprıpad a proto je jej vhodne resit pomocı techto nastroju. V teto praci se zameruji na programKnowledgeMiner a metody GMDH v nem aplikovane. Ty pak vyuzıvam k experimentovanı na”Antro” a ”Building” datech. Konecne vysledky nasledne porovnavam s GAME simulatorem.

vii

Content

List of pictures xi

List of tables xiii

1 Introduction 1

2 Data mining description 22.1 Data-driven approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2.1 Data selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2.2 Data transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.3 Choice and application of data mining algorithms . . . . . . . . . . . . . 4

3 Parametric GMDH algorithms 53.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Principles used in GMDH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2.1 Model of optimal complexity . . . . . . . . . . . . . . . . . . . . . . . . 9

4 KnowledgeMiner - theoretical part 104.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 GMDH implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2.1 Elementary models and active neurons . . . . . . . . . . . . . . . . . . . 124.2.2 Generation of alternate model variants . . . . . . . . . . . . . . . . . . . 124.2.3 Criteria of model selection . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 KnowledgeMiner - experimental part 155.1 Creating model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Real world data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2.1 “Anthro” data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.1.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.1.2 Age variable modeling . . . . . . . . . . . . . . . . . . . . . . . 215.2.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2.2 Building data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2.2.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2.2.2 Prediction of consumption . . . . . . . . . . . . . . . . . . . . 23

6 Comparing to GAME and WEKA 256.1 Group of Adaptive Models Evolution (GAME) . . . . . . . . . . . . . . . . . . 25

6.1.1 Ensemble techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7 Conclusion 27

8 References 29

A List of used shortcuts 31

B Content of appended CD 33

ix

List of pictures

3.1 Network at start of modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Network after creation of all models of the 1st layer . . . . . . . . . . . . . . . 73.3 Network after selection of best models . . . . . . . . . . . . . . . . . . . . . . . 83.4 Network after creation of all models of the 2nd layer . . . . . . . . . . . . . . . 83.5 Final network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Table menu providing spreadsheet related functionality . . . . . . . . . . . . . . 114.2 The data basis of KnowledgeMiner . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Classical layer structure of a final multilayered GMDH model using active neurons 134.4 Final multilayered GMDH model . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.1 Selecting input and output variables . . . . . . . . . . . . . . . . . . . . . . . . 165.2 The Modeling menu of KnowledgeMiner . . . . . . . . . . . . . . . . . . . . . . 165.3 Dialog window for setting up the modelling process . . . . . . . . . . . . . . . . 175.4 Extended dialog window for setting up the modelling process . . . . . . . . . . 175.5 Model equation and report for the created model . . . . . . . . . . . . . . . . . 195.6 Graf of created model with training and testing data part . . . . . . . . . . . . 225.7 Graf of created model - zoom in learning part . . . . . . . . . . . . . . . . . . . 235.8 Graf of created model - HW consumption . . . . . . . . . . . . . . . . . . . . . 24

6.1 The example of the GAME network . . . . . . . . . . . . . . . . . . . . . . . . 25

xi

List of tables

5.1 GMDH Age regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Building data - estimating consumption . . . . . . . . . . . . . . . . . . . . . . 24

6.1 Age regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

xiii

CHAPTER 1. INTRODUCTION 1

1 Introduction

This bachelor thesis deals with Group Method of Data Handling (GMDH) that was originallyintroduced by A.G. Ivachknenko in 1966 and its implementation in the KnowledgeMiner soft-ware. The GMDH should have well characteristics for modelling a short multidimensional datasamples. So it should be suitable to solve ill-posed task on real world data. Anthro and buildingdata are exactly real world data.

We solve the task of the age estimation based on senescence indicators obtained from skeletons.We had these information in the Anthro data. Our goal was to found out if we could predictage from these data. The current methods of the age assessment supply predictions the error ofwhich is included between 10 and 20 years. We hoped that we could predict it more accurately.

To solve this task we used applications GAME, KnowledgeMiner and WEKA. We comparedour results to estimate the best age prediction. We did the same experiment with another a realworld data set - building data (description in [Prechelt, 94]). This data set is frequently usedfor benchmark modelling methods. We excluded the information about time of measurementand we validate our results on this data set and showed that we used adequate methods.

The GMDH is close-knit with data mining. They try to uncover relationships among dataitems. To search for patterns. That’s the treason why I mentioned it in this publication. Wetried to find a patterns between input and output variables.

The assessment of age at death from the skeleton is a very interesting topic for me. Predicthow long somebody lived. This topic catches my attention together with a modern technologiesuch as neural networks. I wanted to be part of this project so I started to learn how to workwith KnowledgeMiner.

This thesis begins with chapter Data mining. It describes what is data mining and for whatit is suitable. What is data-driven approach and on what it is dependent. Data selection andtransformation.

Next, chapter Parametric GMDH algorithms introduces what is Group Method of Data Han-dling. On what it’s based on and what principles are used in GMDH.

Chapter KnowledgeMiner - theoretical part introduces you into main features used in KM. Howoriginal GMDH differs from GMDH implemented in the KnowledgeMiner software and whatcriteria of model selection are used.

Chapter KnowledgeMiner - experimental part shows how to make a input-output model, de-scribes all experiments on anthro and building data sets and their results.

Chapter Comparing to GAME and WEKA describes another application called GAME andintroduces final results. And in the end there is a table with comparative results.

Final chapter Conclusion summarize this thesis.

2 CHAPTER 2. DATA MINING DESCRIPTION

2 Data mining description

Data mining is the use of automated data analysis techniques to uncover previously undetectedrelationships among data items. Three of the major data mining techniques are regression,classification and clustering. Data mining is the great technique for our purpose to solve the“ill-posed task”. We can predict output variables using regression. Basically, regression takesa numerical data set and develops a mathematical formula that fits the data.

Models are generated from the data in the form of networks of active neurons in an evolutionaryfashion of repetitive generation of populations of competing models of growing complexity andtheir validation and selection until an optimal complex model - not too simple and not toocomplex - has been created. That is, growing a tree-like network out of seed information(input and output variables data) in an evolutionary fashion of pairwise combination andsurvival-of-the-fittest selection from a simple single individual (neuron) to a desired final, notoverspecialized behavior (model). Neither, the number of neurons and the number of layers inthe network, nor the actual behavior of each created neuron is predefined. All this is adjustedduring the process of self-organisation, and therefore, is called self-organising data mining,according to [Mueller,Lemke, 99]. I built of such networks using KnowledgeMiner. They hadbeen learned from input and output variables. I used two data sets for this purpose.

2.1 Data-driven approach

Here, knowledge extraction from data, i.e., to derive a model from experimental measurements,has advantages when a priori only little knowledge or no definite theory is on hand. This isparticularly true for objects with fuzzy characteristics. This adequate exactly to our problemwith very noisy data and we try to find a model from this information.

The data-driven approach generates a description of the system behaviour from observationsof real systems evaluating how it behaves (output) under different conditions (input). This issimilar to statistical modelling and its goal is to infer general laws from specific cases. Themathematical relationship that assigns an input to an output and that imitates the behaviour ofa real-world system using these relationships usually has nothing to do with the real processesrunning in the system, however. The system is not described in all of its details and functions.It is treated as a black box.

The task of experimental systems analysis is to select mathematical models from data of Nobservations or cases and of M system variables xit, i=1, 2, ..., M, t=1, 2, ..., N, to select thestructure of the mathematical model (structure identification) and to estimate the unknownparameters (parameter identification). Commonly, statistically based principles of model for-mation are used that require the modeller to have a priori information about the structureof the mathematical model available. A good deal of work goes into identifying, gathering,cleansing and labeling data, into specifying the questions to be asked for it, and into findingthe right way to view it to discover useful patterns. Unfortunately, this traditional processingcan take up a big part of the whole project effort.

Obviously, methods of experimental systems analysis cannot solve an analysis of causes ofevents for such fuzzy objects. There are several important facts that have to be underlined.First of all, the goal of data-driven modelling is to estimate the unknown relation between out-put (y) and input (x) from a set of past observations. Very important is the fact that modelsobtained in this way are only able to represent a relation between input and output for which

CHAPTER 2. DATA MINING DESCRIPTION 3

the observed samples of input and output have its values.

Secondly, many other factors that are not observed or controlled may influence the system’soutput. Therefore, knowledge of observed input values does not uniquely specify the output.This uncertainty of the output is caused by the lack of knowledge of the not observed factors.It results in statistical dependency between the observed inputs and outputs.

Thirdly, there is a difference between statistical dependency and causality. Cherkassky[Cherkassky, 98] has underlined that the task of learning/estimation of statistical dependencybetween (observed) inputs and outputs can occur in the following situations or any of theircombinations:

• output causally depend on the (observed)inputs;

• inputs causally depend on the output(s);

• input-output dependency is caused by other (unobserved) factors;

• input-output correlation is noncausal.

It follows that causality cannot be inferred from data analysis alone. Instead each of the 4possibilities or their combinations is specified, and, therefore, causality must be assumed ordemonstrated by arguments outside the data, according to [Cherkassky, 98]

2.2 Data mining

Knowledge discovery from data, and, specifically, data mining as its heart, is an interactive anditerative process of solving several subtasks and decisions like data selection and preprocessing,choice and application of data mining algorithms, and analysis of the extracted knowledge.SAS Institute, Inc. , e.g., has formulated the data mining process as a five-step process, calledSEMMA: sampling, exploration, manipulation, modelling and assessment. IBM Corp. has an-other interpretation of the data mining process, and other companies may have their own termsas well. An overview of possible steps comprising knowledge discovery from data, according toFayyad et al. [Fayyad, 96].

Data mining itself includes not only a straightforward utilisation of a single analytical technique.It consists of processes which many methods and techniques are appropriate for depending onthe nature of the inquiry. This set of methods contains data visualisation, tree-based models,neural networks, methods of mathematical statistics (clustering and discriminant analysis, e.g.),and methods of artificial intelligence. Also included in the spectrum of modelling methods aremethods of knowledge extraction from data using self-organising modelling technology.

2.2.1 Data selection

Often, limitations of the data itself, which are rarely collected for knowledge extraction, are themajor barrier for obtaining high quality knowledge. It is necessary to decide, which data areimportant for the task we are trying to realise. Not all available information is usable for datamining. A detailed understanding of the structure, coverage, and quality of the information isrequired. Therefore, it is necessary to preselect a set of variables from the available informationthat might have an impact on the users decision making process and that can be observedand measured or transformed in a measurable numerical value. In practical applications, itis impossible to obtain complete sets of variables. Therefore, the modelled systems are open

4 CHAPTER 2. DATA MINING DESCRIPTION

systems, and all important variables that are not included in the data set (for what reasonever) are summarised as noise.

2.2.2 Data transformation

All kinds of data mining algorithms need to transform symbolic values into numeric values. Thistranslation often requires to turn discrete symbols or categories into numeric values. Some-times, the data also need to be transformed into a form that is accepted by a certain datamining algorithm as input information. Neural networks, e.g., need scaled data. Most neuralnetwork algorithms accept numeric data in the range of 0 to 1 or -1 to 1, only, depending ontheir activation function. We had data sets in numerical format. So I needn’t to convert somenon-numerical values. All used variables were suitable for used algorithm.

For statistically based algorithms, it is useful to normalise or scale the data to achieve betternumerical results.

A very important kind of transformation is subdivision of data samples into two or more subsets.One subset of data is used to train the model (training data set) and another subset is usedto test the accuracy of model (checking or testing data set). Sometimes, a third data set canbe used to validate the obtained model (validation data set). The most common approach israndomly dividing the source data into two explicit data sets. When time series are considered,the most recent data are used to test a model, while the other data serve to train the model.In self-organising data mining, a more powerful way of data subdivision is suggested.

2.2.3 Choice and application of data mining algorithms

Data mining is a method of searching data for unexpected patterns or relations ships using avariety of tools and algorithms [Devlin, 97]. According to Fayyad [Fayyad, 96] this can be:

• classification: learning a function that maps (classifies) a data item into one of severalpredefined classes;

• regression: learning a function that maps a data item into a real-valued prediction vari-able;

• clustering: identifying a finite set of categories or clusters to describe the data;

• summarisation: finding a compact description for a subset of data;

• dependency modelling: finding a model that describes significant dependencies betweenvariables;

• change and deviation detection: discovering the most significant changes in the data frompreviously measured or normative values.

Many techniques have been developed to solve such data mining functions. They can beapplied to perform the common data mining activities of associations, clustering, classifica-tion, modelling, sequential patterns, and time series forecasting. Algorithms of self-organisingmodelling can be applied for many data mining functions in a similar context, according to[Mueller,Lemke, 99]. I focused in my experiments at regression modeling. That’s because Itried to find out a function that describes input data and using that I can predict outputvariables.

CHAPTER 3. PARAMETRIC GMDH ALGORITHMS 5

3 Parametric GMDH algorithms

3.1 Induction

Group Method of Data Handling was applied in a great variety of areas for data miningand knowledge discovery, forecasting and systems modeling, optimization and pattern recogni-tion. Inductive GMDH algorithms give possibility to find automatically interrelations in data,to select optimal structure of model or network and to increase the accuracy of existing algo-rithms, according to [GMDH web].

The first polynomial network algorithm, the GMDH was developed by Ivakhnenko 1967, andconsiderable improvements were introduced in the 1970s and 1980s by versions of the Polyno-mial Network Training algorithm (PNETTR) by Barron and the Algorithm for Synthesis ofPolynomial Networks (ASPN) by Elder.

The GMDH approach is based on

• the black-box method as a basic approach to analyse systems from input-output datasamples, and

• the connectionism as a representation of complex functions through networks of elemen-tary functions.

The objective of this approach is to estimate networks of the right size with a structure evolvedduring the estimation process. The first polynomial network algorithm, the Group Methodof Data Handling of Ivakhnenko, uses linear regression to fit quadratic polynomial nodes toan output variable considering all input variable pairs in turn. The basic idea is that oncethe elements on a lower level are estimated and the corresponding intermediate outputs arecomputed, the parameters of the elements of the next level can be estimated then. In thefirst layer, all possible pairs of the inputs are considered and the best models of the layer(intermediate models) - in the sense of the selection criterion - are used as inputs for thenext layer(s). In the succeeding layers all possible pairs of the intermediate models from thepreceding layer(s) are connected as inputs to the units of the next layer(s). This means thatthe output of a unit of a processed level may become an input to several other units in thenext level dependent from a local threshold value. Finally, when additional layers provide nofurther improvement, the network synthesis stops, according to [Mueller,Lemke, 99].

3.2 Principles used in GMDH

The principle of induction is composed of:

1. the cybernetic principle of self-organisation as an adaptive creation of a network withoutsubjective points given;

2. the principle of external complement enabling an objective selection of a model of optimalcomplexity and

3. the principle of regularization of ill-posed tasks.

1. The cybernetic principle of self-organisation as an adaptive creation of a net-work without subjective points given

Self-organisation is considered in building the connections between the units by a learningmechanism to represent discrete items. For this approach, the objective is to estimate networks

6 CHAPTER 3. PARAMETRIC GMDH ALGORITHMS

of the right size with a structure evolving during the estimation process. A process is said toundergo self-organisation if identification emerges through the system’s environment.

To realise a self-organisation of models from a finite number of input-output data samples thefollowing conditions must exist to be fulfilled:

First condition: There is a very simple initial organisation (neuron) that enables the descrip-tion of a large class of systems through the organisation’s evolution (pic. 3.1).

Picture 3.1: Network at start of modelling

The recent version of the accompanied ”KnowledgeMiner” software has included a GMDHalgorithm that realises for each created neuron an optimisation of the structure of the transferfunction (Active Neuron) [Lemke, 97]. As a result, the synthesised network is a composition ofdifferent, a priori unknown neurons, and their corresponding transfer function is selected fromall possible linear or nonlinear polynomials:

f(xi, xj) = a0 + a1xi + a2xj + a3xixj + a4x2i + a5x

2j .

In this way, neurons themselves are self-organised, and it significantly increase the flexibility ofnetwork function synthesis.

Second condition: There is an algorithm for mutation of the initial or already evolved or-ganisations of a population.

Genetic Algorithms are working on more or less stochastic mutations of the model structure bymeans of crossover, stochastic changes of characteristic parameters, and others. In the GMDHapproach, a gradual increase of model complexity is used as a basic principle. The successivecombination of many variants of mathematical models with increasing complexity has provento be a universal solution in the theory of self-organisation. To apply this principle, a systemof basic functions (simple elementary models or neurons) is needed. Their appropriate choice,and the ways the elementary models are combined to more complicated model variants along


with a regulation of their production decide the success of self-organisation.

In a first layer, all possible pairs of the m inputs are generated to create the transfer functionsof the k=m*(m-1)/2 neurons of the first network layer (pic. 3.2). In “KnowledgeMiner”, forexample, each transfer function fk is adaptively created by another self-organising process andthey may differ one from another by their number of variables used and by their functionalstructure and complexity.

Picture 3.2: Network after creation of all models of the 1st layer

Third condition: There is a selection criterion for validation and measure of the usefulnessof an organisation relative to the intended task of modelling.

According to this condition, several best models - each consists of a single neuron only in thefirst layer - are ranked and selected by the external selection criterion (pic. 3.3). The selectedintermediate models survive, and they are used as inputs for the next layer to create a newgeneration of models while the nonselected models die (pic. 3.4).The procedure of inheritance, mutation and selection stops automatically if a new generationof models provides no further model quality improvement. Then, a final, optimal complexmodel is obtained. In distinction to Neural Networks, the complete GMDH Network is, duringmodelling, a superposition of many alternate networks that live simultaneously. Only the final,optimal complex model represents a single network while all others die after modelling (pic. 3.5).

2. The principle of an external complement which enables an objective choice of amodel of optimal complexity

GMDH objectively selects the model of optimal complexity using an inductive approach shownabove. Important features of such an inductive approach are the use of an external complementthat can be a selection criterion, e.g. For the ill-posed task of selecting a model from the set of

8 CHAPTER 3. PARAMETRIC GMDH ALGORITHMS

Picture 3.3: Network after selection of best models

Picture 3.4: Network after creation of all models of the 2nd layer

possible models, this principle is as follows: A “best” model can be selected from a given datasample, only, if additional external information is used. External information is informationor data not yet used for model creation and parameter estimation, which is usually done on atraining data set. This means, an external criterion is necessary to evaluate the models’ qualityon fresh information (testing data set, for example).

3. The principle of regularization of ill-posed tasks.

The goal of statistical modelling is to infer general laws (models) from specific cases (observa-tions). The task of selecting a model based on observed input and output data is, according


Picture 3.5: Final network after selection of an optimal model y* (here: after 3 layers)

to Tichonov [Tichonov, 74], an ”ill-posed” task. This is true for any modelling method sincemodelling is always done on a finite number of data only: statistics, Neural Networks, GMDH...

For ill-posed tasks, it is not possible in principle to establish a single valid model from thequantities of possible models without further additional information. This additional informa-tion must be given as a criterion that creates new information to select a best model from allequivalent models. Such information can be introduced, for example, by an additional termwhere model complexity (e.g., number of parameters) or roughness (e.g., integrated squaredslope of its response surface) is used to constrain the fit. The criterion that has to be minimisedis then a weighted sum of the training error and the measure of model complexity or roughness,according to [Elder, 96].

3.2.1 Model of optimal complexity

In selecting models the goal is to estimate a function using a finite number of training samples.The finite number of training samples implies that any estimate of an unknown function isalways inaccurate (biased). For highly complex functions it becomes difficult to collect enoughsamples to attain high accuracy. There is a contradiction between approximation capability andprediction capability of the model. The model must have appropriate structure and complexityto be powerful enough to approximate the known data (training data), but also constrainedenough to generalise successfully, that is, to do well on new data not yet seen during modelling(testing data). There are always many models with a similar closeness of fit on the trainingdata. Often simpler, not so accurate models will generalise better on testing data than morecomplex ones, according to [Mueller,Lemke, 99].

10 CHAPTER 4. KNOWLEDGEMINER - THEORETICAL PART

4 KnowledgeMiner - theoretical part

4.1 Features

KnowledgeMiner is a self-organising modelling and prediction tool that has implementedGMDH, Analog Complexing (AC), and Fuzzy Rule Induction (FRI) modelling methods inits version 5.0. I focused at GMDH implementation and its testing at data sets.

The GMDH implementation employs active neurons and provides in this way networks ofactive neurons at the lowest possible level already. It can be used to create linear/nonlinear,static/dynamic time series models, multi-input/single-output models and multi-input/multi-output models as systems of equations from short and noisy data samples. All obtained modelsare described analytically. Systems of equations are necessary to model a set of interdependentvariables objectively and free of conflicts. They are available both analytically and graphicallyby a system graph reflecting the interdependence structure of the system.

For modelling and prediction of fuzzy objects, Analog Complexing and Fuzzy Rule Inductioncan be applied.

In Analog Complexing, a model consists of a composition of similar patterns. Since several mostsimilar patterns are always used to form a model and a prediction by a synthesis, an interval ofuncertainty of prediction will be produced simultaneously. This is of special importance whenusing predictions for decision support.

Fuzzy Rule Induction can be used to create fuzzy or logic rules or systems of rules from fuzzyor Boolean data. Complex systems are described qualitatively and the obtained models arerelatively easy to interpret.

Overview of generated models:

• linear/nonlinear time series models

• static/dynamic linear/nonlinear multi-input/single-output models

• systems of linear/nonlinear difference equations (multi-input/multi-output models)

• systems of static/dynamic multi-input/multi-output fuzzy rules described analytically inall four cases, as well as

• nonparametric models obtained by Analog Complexing.

All models that are created in a document are stored in a virtually unlimited model base. Thismeans, every variable in KnowledgeMiner’s data base can have five different model representa-tions in a single document - a time series model, a multi-input/single-output model, a systemmodel (multi-output model), an AC model and a fuzzy model. Another, special kind of modelthat is related to fuzzy modelling is a defuzzification model. When using GMDH or FRI, up tothree best models are created and stored in the model base separately. They are equally acces-sible and applicable. Every model can be used immediately after modelling for status-quo orwhat-if predictions within the program creating new data. Comments can be added to modelsby either writing text or voice annotations.

All data are stored in a spreadsheet with core functionality including simple formulas andabsolute or relative cell references. The data can be imported in two ways: as a standardASCII-text file or via the clipboard by copying/pasting data. Several mathematical functionsare available for synthesising new data to extend the data basis optionally (pic. 4.1).Picture 4.2 shows the typical construction of the data basis. To avoid exhausting homogeneousinformation, the complete document (data, models, text, sound) will be stored in a single file.

CHAPTER 4. KNOWLEDGEMINER - THEORETICAL PART 11

Picture 4.1: Table menu providing spreadsheet related functionality

Picture 4.2: The data basis of KnowledgeMiner: data are stored in a spreadsheet and can beedited

4.2 GMDH implementation

This section describes how the GMDH in KnowledgeMiner differs from its original descriptionas it is shown in chapter 3. The KM has implemented a lot of features. It’s using improvedGMDH algorithm to achieve the best results. For example the neurons must not have twoinput. So it allows a more flexible modeling. And other features as you can see further.


4.2.1 Elementary models and active neurons

One condition for a self-organisation of models is that there is a very simple initial organisa-tion allowing modelling of a large class of systems by its evolution. For many dynamic andstatic systems, these elementary models or neurons can have a polynomial form. Therefore,KnowledgeMiner has implemented a complete second order polynomial as default analyticalelementary model structure:

f(vi, vj) = a0 + a1vi + a2vj + a3vivj + a4v2i + a5v

2j .

The arguments vi, vj represent all kinds of input data like non lagged input variables xi,t , laggedinput variables xi,t−n , derivative input variables or even functions or models, e.g.,

√xi, 1/xi,

sin(xi) or log(xi). In contrast to classical GMDH algorithms, the elementary model structureis not fixed here. It is rather an abstract definition of the most complex model a single neuroncan have a priori. The true model of every created neuron is instantiated adaptively by self-organisation. In this sense, the chosen abstract elementary model defines the class of possiblemodels for this level of selforganisation.Here, self-organisation and statistics are closely connected: Beginning from the simplest possiblemodel f(vi, vj) = a0, a process of advanced regression techniques, model validation and selectionis induced. Each particular model instance is estimated on a learning data set and is validatedon a separate testing data set using an external criterion. By this evolution from a simpleto a more complex model, a neuron really can have every possible model instance within theframe of a second order polynomial. Every neuron self-selects during modelling which inputsin which combination are relevant according to the chosen task-related criterion. This neuronactivity has proven to be very effective for both avoiding inclusion of non-relevant variables andcreating more parsimonious and less complex network models.

4.2.2 Generation of alternate model variants

The second level of self-organisation employs a multilayered-iterational GMDH algorithm.There are two enhancements to the basic algorithm, however.The first difference is that the neurons must not have two input variables due to their self-selecting capability. Here, it is obvious now that this is a valuable feature since it allows a moreflexible, more independent model synthesis and estimation. With active neurons, for example,it is possible equally to select models consisting of an odd number of variables or an odd numberof polynomial order.The second difference of KnowledgeMiner’s GMDH algorithm is (optional) application of a socalled layer-break-through structure: all original input variables vi and all selected Fp bestmodels of all p preceding layers are used as inputs to create the model candidates of layer p+1.In the classical algorithm, for comparison, the original input variables are used as inputs forthe first layer only, the F1 best models of the first layer are used as inputs to form the modelsof the second layer and so forth (pic. 4.3). The enhanced version breaks with this fixed layerdependence structure, and it allows considering any selected model candidate (or original inputvariable) as input information at any layer of model creation (pic. 4.4). This processing requiresmore computer memory and more computing time, of course, but it improves independenceand flexibility of model synthesis gained by active neurons, already, even more. This greaterflexibility of model synthesis, however, also amplifies the danger that models are becomingincreasingly collinear with growing number of layers. To avoid evolution of collinear inputvariables generated during modelling, the information matrix will be optimized after each newlayer.To get a well conditioned information matrix for any given data set, all used variables willbe normalised before modelling and denormalised after modelling internally if this option was

CHAPTER 4. KNOWLEDGEMINER - THEORETICAL PART 13

chosen:v =

v − µv

σv, µv −mean of variable v, σv − standard deviation of v

Although the complete modelling is done in the normalised data space, the modeller alwayswill work in the original data space, for modelling and application of models (e.g., prediction).Considering the two levels of self-organisation, the GMDH implementation of KnowledgeMinerrepresents a (multi-input/single-output) network of active neurons.

Picture 4.3: Classical layer structure of a final multilayered GMDH model using active neurons

Picture 4.4: Final multilayered GMDH model that has employed both active neurons and layerbreak-trough

4.2.3 Criteria of model selection

Since the main purpose of KnowledgeMiner is prediction, it has implemented a criterion ofselection that produce powerful predictive models. One most efficent solution here provides thePrediction Error Sum of Squares criterion (PESS):

PESS =1N

n∑

i=1

(yt − f(xt, at))2

It is an external criterion but does not require users to subdivide data explicitly since it em-ploys cross-validation techniques internally. Therefore, it is appropriated for under-determinedmodelling tasks or for modelling of short data samples.The PESS criterion virtually uses the complete data set for learning and validating a model.Virtually only, because it automatically excludes the t-th observation (t ∈ T ) at a time t as


a validation set. Using a kind of sliding window of a length of one, each data point is cross-validated against the other data points. The sum of all N validations provides a measure ofhow consistent a model is when applied to new data. PESS is computed every time a newmodel candidate variant is synthesised and it controls both self-organisation of active neuronsand self-organisation of the network model.A second criterion that is reported, but one that has no influence on selecting models, is themean absolute percentage error criterion (MAPE):

MAPE =

∑

i∈N

|yi − yi|∑

i∈N

|yi|

where yi , yi are observed and estimated values of the output variable, according to [Mueller,Lemke, 99]

CHAPTER 5. KNOWLEDGEMINER - EXPERIMENTAL PART 15

5 KnowledgeMiner - experimental part

I did all experiments in the application KnowledgeMiner v5.0.9 Platinum edition. It is designedfor Mac OS and Macintosh Classic so I had to use emulation software to run it under Windows.I used the data sets in text file. KM had no problems to open this files and work with them.I had one data set with training data and other with testing. It was necessary to learn KMwith the learning data set a patterns and then to use this models at the testing data to predictobserved value. I had a two data sets with the training and testing data to make this testingprocedure more demonstrativeness. It took a lot of time sometimes because it made difficultcalculations. So it was’t easy to get some results quickly. I worked with a anthropological and“building” data.

5.1 Creating model

A short example of creating an input-output model shall illustrate the overall process moreclearly. Assuming we have a data set of a anthropogenic data and we want to use variables ofthe data set to model a certain output variable. For this example, the dependent variable was

x23 = y - Age.

As potential input variables were used:

x1 = Europex2 = Asiax3 = Africax4 = North Americax5 = Africanerx6 = Portugalx7 = SOTOx8 = Spainx9 = Suissex10 = Thailandx11 = USABx12 = USAWx13 = ZULUx14 = malex15 = femalex16 = PUSAx17 = PUSBx18 = PUSCx19 = SSPIAx20 = SSPIBx21 = SSPICx22 = SSPID

The definition of this output and input variables will be done in the spreadsheet intuitively(pic. 5.1): By clicking with the mouse in the first row of column X23, first, the output variableis defined. The header of the column changes from X23 to Y. Then, by selecting correspondingcells, the input variables will be identified - columns are indicating variables and rows datasamples. Irregular definitions will be detected and corrected later during modelling. In thisway, any combination of input and output variables can be chosen for a given table of data

16 CHAPTER 5. KNOWLEDGEMINER - EXPERIMENTAL PART

Picture 5.1: Input and output variables can be defined by selecting corresponding cells in thespreadsheet

without the need of reconstructing the table always manually. Construction of the informationmatrix is a task KnowledgeMiner is responsible for, already.Choose the menu item Create Input-Output Model... (pic. 5.2), and the dialog window at pic.5.3 appears.

Picture 5.2: The Modeling menu of KnowledgeMiner

At the left side of the window, the output variable and all selected input variables are listedand can be reviewed. Then, it is necessary to define the data length one would like to use.When input variables were defined with selecting variables in the table already (as shown atpic. 5.1 for this example), they don’t need to be redefined in this window. Next, the generalmodel type can be chosen: linear or nonlinear. When nonlinear is selected, KnowledgeMiner


not necessarily will create a nonlinear model due to active neurons. If a detected optimalmodel would be linear, it would also be selected as the best model. Finally, it must be decidedwhether a system of equations should be generated or not. A system of equations would consistof all considered variables as output variables and input variables accordingly. Click on the“Modeling” button and the modelling process will start.

Picture 5.3: Dialog window for setting up the modelling process

Alternatively, an extended dialog window (pic. 5.4) can be used when clicking on the MoreChoices button.

Picture 5.4: Extended dialog window for setting up the modelling process


First, a third data subset, the examination set, can be defined if a greater length than zerowill be typed in. The examination set is employed for true out-of-sample model performancetesting during modelling on data not yet used for both learning and testing of created modelcandidates. The performance measure is referenced then when selecting best model candidateswithin a layer as another discriminating criterion. The examination data set will always becut from the end of a data set while reducing the available data sets for learning and testingmodels. Therefore, the examination set length should be very small only.

Next, it is possible to set up a range of variables within which the chosen settings shall beapplied automatically. In our example, the choice “from X1 to X23” would generate input-output models for the variables from x1 to x6 sequentially on corresponding sets of input andoutput variables. In this case, the same effect would appear by simply checking the “all selectedvariables” control. Usually, it should also be checked when creating systems of equations in asingle modelling run.

The “Layer Break-Through on...” control defines the permissible grade of freedom for networkstructure synthesis. If “no application” was chosen the network will evolve in the classical waypic. 4.3. Otherwise, either lagged and non lagged inputs or non lagged inputs only will be servedfor all layers of network synthesis (pic. 4.4). Note that application of layer break-throughsignificantly increases memory requirements and computing time, but also may improve mod-elling results dramatically. Therefore, it is the recommended option.

Finally, there are two slider controls influencing selection of best model candidates at differ-ent levels. Active neurons - The left slider controls self-organisation of active neurons bydefining a threshold value of a performance gain a created model candidate must show aftervalidation - compared with a previously selected intermediate best model - to become the newbest reference model for that neuron. The greater this threshold value is the more restrictiveactive neuron self-organisation will be, and its transfer function will be composed of the moresignificant input variables and terms. Usually, thresholds between 1% (linear models) and 15%(nonlinear models) are good.

Network layers - The right slider should be used to define an appropriate number of bestmodels of a layer which will survive as inputs for the following layer(s). This choice dramaticallyinfluences memory requirements, so appropriate number means to find a compromise betweenmemory and computing time and a comfortable freedom of choice. This is especially true whenlayer break through network synthesis was chosen. Here, a small number of best models aregood, already, since the best models of all layers plus the initial input variables are used asinputs at any layer also.

When the modelling process has finished itself, the created model(s) are visible graphicallyand analytically (pic. 5.5). Along with the model equation(s) other important information isreported in the same window. The model(s) are added to the model base, and they can beapplied in a next step for prediction immediately.

5.2 Real world data

We worked with the data from real world. First data set consists of Anthro data where we tryto predict the age from skeletons. Second data set consists of Building data. There we predictthe consumption of energy, hot and cold water from outdoor conditions. But a real-world dataare mostly very noisy. So our goal was to experiment with these data sets and try to estimatethe best configuration to achieve the best prediction.


Picture 5.5: Model equation and report for the created model

Training and test setThe data used for experiments on KnowledgeMiner were split into two parts: one part on whichthe training is performed, called the training data, and another part on which the performanceof the resulting network is measured, called the test set. The idea is that the performance of anetwork on the test set estimates its performance in real use. This means that absolutely noinformation about the test set examples or the test set performance of the network must beavailable during the training process. We had the data set divided probably at 30% the testdata and 70% the train data.

Pre-processingI used a data set saved in the text file. There was only one process that had to be done. Ihad to convert these data to the Microsoft Excel and each information had to be inserted into


a separate cell. After this I saved these data into the text file again but with these differencethat Microsoft Excel put a tabulator between each information. That was essential for the KMbecause thanks a tabulators he knows that he has to divide each information to a separatecell. I apply this procedure for the anthro and building data. Another pre-processing was notnecessary.

5.2.1 “Anthro” data

Our school works with the Universite Bordeaux especialy with Jaroslav Bruzek. He gave us thisdata. They are collected from museums all over the world. They contain different appearancesattrition of skeleton and age whose the dead man achieved. I tried to predict the age fromthese remains. But these data is very noisy so sometimes it wasn’t very accurate prediction.The supply predictions the error of which is included between 10 and 30 years. I used differentsettings to achieve the best results.

5.2.1.1 Data description

Data represent a set of observations the skeletal indicators studied for the proposal of themethods of age at death assessment from the human skeleton (Schmitt, 2001; Schmitt et al.,2002). It is a results of the visual scoring of the morphological changes of the features in twopelvic joint surfaces defined and described by a text accompanied with photos. The materialconsists of 955 subjects from the 9 human skeletal series of subjects known age and sex. Thiscollections (populations) are dispersed on 4 continents (Europe, North America, Africa, Asia).The age in the death of the individuals varies between 19 and 100 years.

Input data consist of some information. First information is the input factor name. It isn’tincluded in a modeling process because it is only a serial number. Next 13 samples consist anationality of the skeleton (x1 − x13). Further two if it was a female or male (x14, x15). Andmost important piece of knowledge are:

Three features are scored on the pubic symphysis in the pelvis:

• Posterior plate (PUSA) scored in three phases (1-2-3) - x16

• Anterior plate (PUSB) observed in three phases (1-2-3) - x17

• Posterior lip (PUSC) scored in two phases (1-2) - x18

Four features on the sacro-pelvic surface of the ilium ware observed:

• Transverse organisation (SSPIA) evaluated in two phases (1-2) - x19

• Modification of the articular surface (SSPIB) scored in four phases (1-2-3-4) - x20

• Modification of the apex (SSPIC) observed in two phases (1-2) - x21

• Modification of the iliac tuberosity (SSPID) estimated in two phases (1-2) - x22

That these inputs are most important you can see at picture 5.5. There is “Relevant inputvariables” - it shows that x16 − x22 are most relevant variables for prediction. It is no wonderbecause they represent the skeletal indicators.And at the end there is information about the age of skeleton (x23). It is used as an output fora training process and I predicted it in a testing phase.All these input-output factors you can see in section 5.2.2.1, page 23.


Next, we divided data into the two subsets to avoid unbalanced division. We had train1, test2data set and train2, test2 data set. Train set had 639 samples and test set 319 samples.

5.2.1.2 Age variable modeling

The main reason why we work with the Anthropological data is that we try to find out if it ispossible to predict a age from different appearances attrition of skeleton. It wasn’t easy task.We worked with different applications so that we can compare results. That you can see insection 6 at page 6.

I experimented with different adjusting to reach the best results as you can see in table 5.1.The configurations have united basis in the default configuration with these difference that arementioned in the Description of configuration.

I tried another configurations too but they weren’t so successful. That is the reason why theyaren’t in the table. It had prove that using Layer break trough leads to better results. Theywere around 0,710 (modified RMS - Root mean squared error, equation 5.1) without LBT andon the other hand with LBT 0.662 (Layer break through, see section 5.1). So I performed alltests with this option. I tried to use the third examination set during training phase unseen.But it didn’t improve result too. So I worked without this option.

RMS =1N

√√√√N∑

i=1

(y − d)2i (5.1)

Where: N – used samples number, y – predicted value, d – real value

Next, I had to chose linear or nonlinear model. But in KM when nonlinear is selected, KM notnecessarily will create a nonlinear model due to active neurons. If a detected optimal modelwould be linear, it would also be selected as the best model. That had prove during testing.Results were equivalent if I selected a linear or nonlinear model. That is the reason why Iperformed experiments mostly with a nonlinear model.

Configuration Train 1 Test 1 Train 2 Test 2 Testavgconfig 1 0,45773 0,69570 0,46431 0,67458 0,68514config 2 0,47688 0,71613 0,46795 0,66152 0,68883config 3 0,47670 0,71094 0,46771 0,66193 0,68643config 4 0,46646 0,70330 0,46917 0,66177 0,68253config 5 0,53752 0,77992 0,55504 0,72760 0,75376config 6 0,45780 0,69291 0,46367 0,67290 0,68290config 7 0,45417 0,69027 0,46110 0,67421 0,68224config 8 0,44980 0,70599 0,45575 0,66131 0,68365

Table 5.1: Estimating the ”Age” variable - error of GMDH models

Description of configuration:I chose this configuration because it should leads to the best results. Network layers (NL) usedto define an appropriate number of best models of a layer which will survive as inputs for thefollowing layers. So I tried to prove that more best models improve prediction.Active neurons (AN) control self-organisation of active neurons. The greater this thresholdvalue is the more restrictive active neuron self-organisation will be. It means that it will becomposed of the more significant input variables.


• config 1: Default configuration (linear model (LM), active neurons (AN) 10%, networklayers (NL) )

• config 2: nonlinear model (NLM), network layers set to 3

• config 3: NLM, NL 7

• config 4: NLM, NL 25

• config 5: NLM, AN 35%

• config 6: NLM, AN 1%

• config 7: NLM, AN 0.1%

• config 8: NLM, AN 0.1%, NL 45

You can find meaning of these adjustments in section 5.1, page 18. How I made models youcan see more precisely in section 5.1.

5.2.1.3 Results

The best results that I achieved you can see in table 5.1. The numbers represent testing error.I calculated it from equation 5.1 - it is modified RMS. As you can see that it’s really high.It shows us that Antro data are very noisy. The predicted age was sometimes about 30 yearsout of the real value. That indicate that the assessment of age at death from the skeleton inthis way isn’t accurate. We hoped that it can be estimate more accurately. But experimentsshowed that is not possible from these indicators. For the chosen data basis the best modelwas generated as you can see at pic. 5.5, page 5.1.

I hoped that great value of best models improve prediction but it didn’t prove so much in thiscase. To achieve a good result only a small amount of best models with combination of LBTwas enough.Major influence had active neurons (AN) control. It proved that threshold around 1% is good.If it was much restrictive it included smaller amount of input variables so prediction wasn’t soaccurate in this case. More input factors were better for this prediction. That shows that thereisn’t a unique indicator in this data set for prediction.

Picture 5.6: Graf of created model with training and testing data part (anthro data)


Picture 5.7: Graf of created model (anthro data) - zoom in learning part

5.2.2 Building data

Prediction of consumption in a building. I try to predict the consumption of electrical energy,hot water, and cold water, based on the outside temperature, outside air humidity, solar radi-ation, and wind speed that represent input information. We excluded the information abouttime of measurement.

5.2.2.1 Data description

For this example, the dependent output variables were

x5 = Electrical energyx6 = Hot waterx7 = Cold water

As potential input variables were used:

x1 = Outside temperaturex2 = Outside air humidityx3 = Solar radiationx4 = Wind speed

4 inputs, 3 outputs, 4010 examples. The data set was divided into a training set with 2007samples and a testing set with 2003 samples. It is a lot of information to get a good model.

The data set was created based on problem of “The Great Energy Predictor Shootout - the firstbuilding data analysis and prediction problem” contest, organized in 1993 for the ASHRAEmeeting in Denver, Colorado [Prechelt, 94].

5.2.2.2 Prediction of consumption

We used this data set to compare it to the anthro data. This data set is well-known and we canvalidate that our prediction methods are suitable for this task. They are based on the sameproblem as anthro data (prediction of consumption and age). So we could use same methodsfor prediction and evaluate that we predict age well. But the data is very noisy so age couldn’tbe estimated accurately.


I used same configuration as in the section 5.2.1.2. Data set was separated in the same way asanthro data (training and test set).

ResultsThe best results were achieved in prediction of cold and hot water. You can see that in table 5.2.Energy prediction went against worse. We can deduce from these experiments that consumptionof water is really dependent at outdoor conditions. But electrical energy isn’t and we can’tpredict it from these data. Same conclusions are in [Prechelt, 94] so we validate our results andshow that we used adequate methods.

Configuration Train E Train CW Train HW Test E Test CW Test HWconfig 1 2,65571 0,01164 0,01380 2,64272 0,01208 0,01419config 2 2,80550 0,01196 0,01417 2,87280 0,01249 0,01432config 3 2,80550 0,01196 0,01417 2,76889 0,01249 0,01432config 4 2,80550 0,01196 0,01417 2,76889 0,01249 0,01432config 5 3,38124 0,01196 0,01417 3,34822 0,01249 0,01432config 6 2,67744 0,01177 0,01394 2,66539 0,01234 0,01423config 7 2,65571 0,01151 0,01380 2,64272 0,01201 0,01426config 8 2,65571 0,01164 0,01370 2,64272 0,01208 0,01404

Table 5.2: Estimating energy (E), cold (CW) and hot water (HW) values - error of GMDHmodels

Picture 5.8: Graf of created model (building data) - config 8

CHAPTER 6. COMPARING TO GAME AND WEKA 25

6 Comparing to GAME and WEKA

6.1 Group of Adaptive Models Evolution (GAME)

This work proceeds from the theory of inductive models construction commonly known asGroup Method of Data Handling (GMDH) that was originally introduced by A.G. Ivachknenkoin 1966 [Ivakhnenko, 94]. Where the traditional modeling methods (eg. MLP neural network)

input variables

output variable

first layer

second layer

third layer

output layer

interlayer connection

3 inputsmax

4 inputs max

P C P G

P P C

L

P L C

Picture 6.1: The example of the GAME network. Network evolved on the training data consist-ing of units with suitable transfer function (P-perception unit optimized by backpropagationalgorithm, L-linear transfer unit and C-polynomial transfer unit, both optimized by QuasiNewton method).

fail due to the ”curse of dimensionality” phenomenon, the inductive methods are capable tobuild reliable models. The problem is decomposed into small subtasks. At first, the informationfrom most important inputs is analyzed in the subspace of low dimensionality, later the ab-stracted information is combined to get a global knowledge of the system variables relationship.Picture 6.1 shows the example of inductive model (GAME network). It is constructed layer bylayer during the learning stage from units that transfer information feedforwardly form inputsto the output. The coefficients of units’ transfer functions are estimated using the training dataset describing the modeled system. Units within single model can be of several types (hybridmodel) - their transfer function can be linear(L), polynomial(C), logistic(S), exponential(E),small multilayer perceptron network(P), etc. Each type of unit has its own learning algorithmfor coefficients estimation. The niching genetic algorithm is employed in each layer to choosesuitable units. Which types of units are selected during the evolution to make up the modeldepends on the nature of modeled data. More information about inductive modeling can befound in [Kordık, 05], according to [Kordık, Snorek, 05].

6.1.1 Ensemble techniques

The Age estimation using GAME was performed by ensemble techniques. So here is a intro-duction in this problem.

Ensemble techniques [Garvin, 04] are based on the idea that a collection of a finite number

26 CHAPTER 6. COMPARING TO GAME AND WEKA

of models (eg. neural networks) is trained for the same task. Neural network ensemble[Zhi-Hua, Jianxin, Wei, 02] is a learning paradigm where a group of neural networks is trainedfor the same task. It originates from Hansen and Salamon’s work [Hansen, Salamon, 90], whichshows that ensembling a number of neural networks can significantly improve the generalizationability. To create the ensemble of neural networks, the most prevailing approaches are Baggingand Boosting. Bagging is based on bootstrap sampling [Zhi-Hua, Jianxin, Wei, 02]. It gen-erates several training sets from the original training set and then trains a component neuralnetwork from each of those training sets. Our Group of Adaptive Models Evolution methodgenerates the ensemble of models (GAME networks) by the Bagging approach. By using theensemble, instead of single GAME model, we can improve the accuracy of modelling. But thisis not only advantage of using ensembles. There is a highly interesting information encoded inthe ensemble behaviour. It is the information about credibility of member models, accordingto [Kordık, Snorek, 05].

6.2 Results

I wanted compare results to WEKA too. But they weren’t available in suitable form. Theywere enumerated in a different error value and at different data set. So it was impossible tocompare to our results.

Final results you can see in table 6.1:

Configuration Train 1 Test 1 Train 2 Test 2 TestavgGMDH 0,454 0,690 0,461 0,674 0,682GMDH 0,449 0,705 0,455 0,661 0,683GAME 0,458 0,660 0,455 0,679 0,669GAME 0,458 0,659 0,455 0,679 0,669

Table 6.1: Estimating the ”Age” variable - error of models

There are included only the best two models from both techniques. The GMDH includes bestresults from the table 5.1. They were achieved as it is described in previous section.Pavel Kordık performed experiments in GAME and I have results from him. He used Ensembletechniques as it is described in section 6.1.1 to achieve this results.

You can see that results are very similar. That shows that our tools have similar power. It canby thanks to that the GMDH and GAME have the same basis. They work at same principleusing neuron networks. But in average result GAME is better. It is no wonder because ithas improved neural network. GAME is more scientific tool. KnowledgeMiner hasn’t so muchsettings and restrictive variables such as GAME. But on the other hand GMDH implementedin KM is well adjusted because it keeps up with GAME. They have similar results at trainingand testing data. We can see that training data error is similar too. It shows a ability to avoidoverfitting is the same.It needs more experiments to get more exact results. Such as dividing into more training andtesting data sets. We can see from GMDH results that there isn’t unique configuration toachieve best results. That shows a little bit unbalanced division. It would be better to havemore training and testing sets. But it didn’t help to get better age prediction. The Anthrodata set is evidently very noisy.

CHAPTER 7. CONCLUSION 27

7 Conclusion

This thesis shows solving the prediction of age at death from the skeleton based on the Anthrodata. I tried to find out if it is possible to accurately predict age from the given samples. Thisresearch shows that is impossible to exactly determine. Even if I applied best configurationto predict the best models it wasn’t accurate enough. Unfortunately, the data set is so noisythat we can’t rely on created models. One reason why it’s so noisy is the evaluation or scoringof skeletal morphological changes is very subjective. Other reason is that the range and thespeed of the senescence varies from an individual to the other one, what explains the difficultyelaborating a reliable and accurate method of estimation of the age. These reasons have impactat prediction. The comparison with the GAME application validate my results.

Using GMDH in the KnowledgeMiner I created neuron networks to create appropriate models.Applied method was evaluated by well-known benchmark modeling data set. The KnowledgeM-iner software proved its quality. Although it is a commercial tool and the configuration optionsare therefore limited, it presented good results. It is easy to understand and to work with. It’ssuitable for people who don’t want to discover a lot of settings. Even with default settings itis able to achieve good results. This research can continue with improvement data samples. Itneedn’t be so noisy with more accurate scoring or evaluation data samples.

In this work, I have performed several experiments with the KnowledgeMiner tool. I foundthe best configuration of this tool on the real world ”Antro” data and ”Building” data. Ihave also compared the accuracy of models generated by the KnowledgeMiner with the resultscomputed by the GAME application. Although it is possible to get slightly better models withthe GAME application, I recommend using implicit settings in the KnowledgeMiner tool. Theresulting models are good enough and it takes just the fraction of time to get them (comparingto advanced configuration settings and the GAME application).

28 CHAPTER 7. CONCLUSION

CHAPTER 8. REFERENCES 29

8 References

[Devlin, 97] Devlin, B.: Data Warehouse from Architecture to ImplementationAddison-Wesley. Reading, Massachusetts 1997

[Elder, 96] Elder IV, J.F., D. Pregibon: A Statistical Perspective on Knowledge Dis-covery in DatabasesMenlo Park, California 1996

[Fayyad, 96] Fayyad, U.M., G. Piatetsky-Shapiro, P. Smyth: From Data Mining to Knowl-edge Discovery: An Overview. In : Fayyad, U.M. et al: Advances in KnowledgeDiscovery and Data Mining.California 1996

[Garvin, 04] Gavin Brown: Diversity in Neural Network EnsemblesPh.D. thesis, The University of Birmingham, January 2004

[Hansen, Salamon, 90] L.K. Hansen, P. Salamon: Neural network ensemblesIEEE Trans. Pattern Anal. Machine Intelligence 12, 1990

[Cherkassky, 98] Cherkassky, V., F.Mulier: Learning from DataNew York 1998

[Ivakhnenko, 94] Madala, Ivakhnenko: Inductive Learning Algorithm for Complex Sys-tem ModellingCRC Press, Boca Raton 1994

[Kordık, 05] Kordık P.: Group of Adaptive Models EvolutionTechnical Report DCSE-DTP-2005-07, CTU Prague 2005.

[Kordık, Snorek, 05] Kordık P., Snorek M.: Ensemble Techniques for Credibility Esti-mation of GAME Model, Artificial Neural Networks: Formal Models and TheirApplications - ICANN 2005Berlin: Springer, 2005

[Lemke, 97] Lemke, F., J.-A. Muller: Self-Organizing Data Mining for a Portfolio Trad-ing SystemJournal for Computational Intelligence in Finance vol.5, 1997

[Mueller,Lemke, 99] Mueller, Johann-Adolf; Lemke, Frank: Self-Organising Data MiningBerlin, Dresden 1999, 1. Edition

[Prechelt, 94] Lutz Prechelt: Proben1 - A Set of Neural Network Benchmark Problemsand Benchmarking RulesGermany 1994

[Tichonov, 74] Tichonov, A.N., V.Ja. Arsent’jev: Metody resenija nekorrektnych zadacNauka, Moskva 1974

[Zhi-Hua, Jianxin, Wei, 02] Zhi-Hua Zhou , Jianxin Wu, Wei Tang: Ensembling neuralnetworks: Many could be better than allArtificial Intelligence 137, 2002

[GMDH web] Group method of data handling webpagehtt://www.gmdh.net

30 CHAPTER 8. REFERENCES

[CTU] School webpage of Neural networkshttp://service.felk.cvut.cz/courses/36NAN/index.html

PRILOHA A. LIST OF USED SHORTCUTS 31

A List of used shortcuts

AN Active Neurons

ASPN Algorithm for Synthesis of Polynomial Networks

GAME Group of Adaptive Models Evolution

GMDH Group Method of Data Handling

KM KnowledgeMiner

LBT Layer Break Through

LM Linear Model

MAPE Mean Absolute Percentage Error Criterion

NL Network Layers

NLM Nonlinear Model

PESS Prediction Error Sum of Squares criterion

PNETTR Polynomial Network Training algorithm

RMS Root mean squared error

32 PRILOHA A. LIST OF USED SHORTCUTS

PRILOHA B. CONTENT OF APPENDED CD 33

B Content of appended CD

| - readme.txt|| - text/|| - - BP.pdf|| - data/

readme.txt - includes description what you can find in a different directories

text/ - directory including text of BP

data/ - directory including appended files

GMDH networks the KnowledgeMiner software Jakub...

Documents

Transcript of GMDH networks the KnowledgeMiner software Jakub...