Using ontogenic classification networks in a smart structures application

11
Pergamon 0305-0548(94)00078-6 Computers Ops Res. Vol. 22, No. 9, pp. 871-881, 1995 Copyright © 1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0305-0548/95 $9.50+ 0.00 USING ONTOGENIC CLASSIFICATION NETWORKS IN A SMART STRUCTURES APPLICATION Laura Burket and Seth Flanders:~ Department of Industrial Engineering, Lehigh University, Bethlehem, PA 18015, U.S.A. (Received April 1994 in revisedform September 1994) Scope and Purlmse--This paper describes the usefulness of a class of neural networks which operates quite differently from the popular backpropagation trained network. The performance of the RCE network, a so- called ontogenic or evolutionary neural network, does not depend on arbitrarily or heuristically selected user parameters or configurations, as does backpropagation. Thus, the computational overhead involved in finding the "right" backpropagation model, which often goes unreported, is negligible when using the RCE network. We show that this network not only automates the process of configuring the neural network architecture, but also has the capability to outperform hackpropagation for two classification problems derived from experimental data on beam vibration minimization. Abstract--A key aspect of neural network application is determining how to select the network architecture, and how to use the data to optimize performance and to facilitate proper evaluation. Ontogenic (evolutionary) networks address these problems by leaving the user with minimal parameter selection and decision making, and utilizing available data more fully. We explain and illustrate this idea by comparing the classification performance for backpropagation and an ontogenic neural network. The ontogenic network is shown to have significant advantages in classification accuracy, speed, and utilization of data; furthermore, it does not require the user to determine the number of hidden units or any learning rate parameters, which avoids the trial and error approach necessary for both backpropagation and radial basis functions. In previous tests, ontogenic nets showed the ability to match backpropagation's results in a fraction of the training time. In addition, one of those problems demonstrated a further advantage of the network: by refraining from responding to patterns lying outside of the region on which it was trained, the network does not attempt to extrapolate and may signal that training and test sets lack adequate compatibility. Two new test problems arising out of a beam vibration minimization application in smart structures show the ability of the network to find solutions to low dimensional complex mappings for classifications which pose great difficulty for backpropagation. 1. INTRODUCTION One of the most popular application areas for neural networks is classification, with the back- propagation paradigm frequently used. Classification problems perhaps suit neural networks better than any single application area since a neural network's lack of precision in exchange for superior generalizability leads to good performance. The precision of a classification boundary constructed is not an issue, but the ability of the boundary to discriminate between classes is the criterion for success. Unfortunately, recent authors have noted a risk in using backpropagation in difficult classifica- tion problems. That is, unless the appropriate combinations of architectural (number of hidden units, for example) and algorithmic parameters is found, backpropagation trained neural networks can perform miserably. Even when a good combination is found, the amount of data needed both to train adequately and evaluate properly a neural network can be excessive. tLaura Burke is an Associate Professor of Industrial and Manufacturing Systems Engineering at Lehigh University. She received the B.S. and M.S. in Industrial Engineering from The Pennsylvania State University, and the Ph.D. from the University of California-Berkeley. Her research interests are in artificial intelligence applications in decision and optimization problems. :~Seth Flanders is a Ph.D. candidate in the Department of Industrial and Manufacturing Systems Engineering at Lehigh University. He received the B.S. and M.S. in General Engineering from the University of Illinois at Urbana-Champaign. His interests include neural networks, genetic algorithms, and other operations research techniques. 871

Transcript of Using ontogenic classification networks in a smart structures application

Page 1: Using ontogenic classification networks in a smart structures application

Pergamon 0305-0548(94)00078-6

Computers Ops Res. Vol. 22, No. 9, pp. 871-881, 1995 Copyright © 1995 Elsevier Science Ltd

Printed in Great Britain. All rights reserved 0305-0548/95 $9.50 + 0.00

U S I N G O N T O G E N I C C L A S S I F I C A T I O N N E T W O R K S I N A

S M A R T S T R U C T U R E S A P P L I C A T I O N

Laura Burket and Seth Flanders:~ Department of Industrial Engineering, Lehigh University, Bethlehem, PA 18015, U.S.A.

(Received April 1994 in revised form September 1994)

Scope and Purlmse--This paper describes the usefulness of a class of neural networks which operates quite differently from the popular backpropagation trained network. The performance of the RCE network, a so- called ontogenic or evolutionary neural network, does not depend on arbitrarily or heuristically selected user parameters or configurations, as does backpropagation. Thus, the computational overhead involved in finding the "right" backpropagation model, which often goes unreported, is negligible when using the RCE network. We show that this network not only automates the process of configuring the neural network architecture, but also has the capability to outperform hackpropagation for two classification problems derived from experimental data on beam vibration minimization.

Abstract--A key aspect of neural network application is determining how to select the network architecture, and how to use the data to optimize performance and to facilitate proper evaluation. Ontogenic (evolutionary) networks address these problems by leaving the user with minimal parameter selection and decision making, and utilizing available data more fully. We explain and illustrate this idea by comparing the classification performance for backpropagation and an ontogenic neural network. The ontogenic network is shown to have significant advantages in classification accuracy, speed, and utilization of data; furthermore, it does not require the user to determine the number of hidden units or any learning rate parameters, which avoids the trial and error approach necessary for both backpropagation and radial basis functions. In previous tests, ontogenic nets showed the ability to match backpropagation's results in a fraction of the training time. In addition, one of those problems demonstrated a further advantage of the network: by refraining from responding to patterns lying outside of the region on which it was trained, the network does not attempt to extrapolate and may signal that training and test sets lack adequate compatibility. Two new test problems arising out of a beam vibration minimization application in smart structures show the ability of the network to find solutions to low dimensional complex mappings for classifications which pose great difficulty for backpropagation.

1. I N T R O D U C T I O N

One of the most popular application areas for neural networks is classification, with the back- propagation paradigm frequently used. Classification problems perhaps suit neural networks better than any single application area since a neural network's lack of precision in exchange for superior generalizability leads to good performance. The precision of a classification boundary constructed is not an issue, but the ability of the boundary to discriminate between classes is the criterion for success .

Unfortunately, recent authors have noted a risk in using backpropagation in difficult classifica- tion problems. That is, unless the appropriate combinations of architectural (number of hidden units, for example) and algorithmic parameters is found, backpropagation trained neural networks can perform miserably. Even when a good combination is found, the amount of data needed both to train adequately and evaluate properly a neural network can be excessive.

tLaura Burke is an Associate Professor of Industrial and Manufacturing Systems Engineering at Lehigh University. She received the B.S. and M.S. in Industrial Engineering from The Pennsylvania State University, and the Ph.D. from the University of California-Berkeley. Her research interests are in artificial intelligence applications in decision and optimization problems.

:~Seth Flanders is a Ph.D. candidate in the Department of Industrial and Manufacturing Systems Engineering at Lehigh University. He received the B.S. and M.S. in General Engineering from the University of Illinois at Urbana-Champaign. His interests include neural networks, genetic algorithms, and other operations research techniques.

871

Page 2: Using ontogenic classification networks in a smart structures application

872 Laura Burke and Seth Flanders

"Ontogenic" networks automate the process of parameter selection by dynamically constructing units to mask subregions of classes. One of the first such networks is due to Reilly e t al. [1]. Their RCE network has been used in many image recognition and other pattern recognition applications. In addition, ontogenic nets such as ARTMAP [2], cascade correlation [3], and dynamic radial basis functions [4] have surfaced recently. For the task of classification, nets such as ARTMAP and RCE have sufficient complexity, and we focus on that type in this paper; specifically, RCE based networks. The primary advantage of these networks is the lack of arbitrariness associated with their design. For that reason, as Section 2 will describe, less data is required for comprehensive training and evaluation. An additional critical advantage is the net's ability to "know what it does not know", i.e. to signal extrapolation. These advantages have been previously demonstrated [5], and those results on several classification problems have shown that the RCE network can perform comparably to and sometimes better than both backpropagation and radial basis function (RBF) networks. On two small classification problems it showed strong advantages in data utilization, speed, and automation of design decisions.

In this paper, we illustrate the considerable gap in performance as measured by classification accuracy between backpropagation and ontogenic nets for data collected from a beam considered a smart structure. For these problems, large data sets and highly nonlinear mappings significantly increase the difficulty of the problem of achieving a good classification. Data sets are sufficiently large to allow thorough training and to enable generalization assessment. In addition, neither problem is actually a true classification problem--rather, they both represent mappings which we have "coarsened." By this we mean that the original, continuous output variable was divided into discrete ranges, which are sufficient for the application. In their original continuous mapping form, they proved nearly impossible to model with empirical methods.

2. COMPUTATION AND AND DATA-RELATED DEMANDS OF NEURAL NETWORKS WITH ARBITRARY HIDDEN LAYERS

Backpropagation has become well-known and often synonomous with the term 'neural network'. Figure 1 illustrates a typical backpropagation neural network. The input units receive data attributes, and the output units give the network's response. The hidden units play a critical role in constructing a mapping from input to output space. Weights on the connections between units are adjusted iteratively in response to a set of training data according to the backpropagation algorithm [6]. The shortcomings of such a neural network, however, have also attracted note. Perhaps one of the most unsettling difficulties lies in the arbitrariness associated with network design, i.e. the number of hidden units and their connectivity to other layers of units. For the same data, different experimenters can achieve vastly different results. For difficult data as we l l a s uncomplicated data, the search for the "correct" combination of design and training choices can itself demand excessive time.

It is well-recognized that the performance of backpropagation is, in general, quite dependent on the architecture chosen and the training data and strategy. In order to ensure adequate learning of a mapping from training data, a critical number of hidden units is necessary. Some researchers find that any number of units at or above this critical limit will yield similar performance [7]. Unfortunately, this is not always the case. It is quite possible that an "oversized" network will overfit the training data and perform unacceptably on test data [8].

To address this problem, several approaches have emerged. Weigend e t al. [8] proposed a method which draws on common sense principles and is easily implemented. Their method recommends purposely oversizing the network, but avoiding overfitting problems by monitoring generalization ability of the network as training progresses. The rationalization for the approach is that not only the number of weights in the network but also the length of the training period affect generalization ability. If an oversized network is trained to too low an error, then the likelihood of overfitting is greatly increased. Since their work, several authors have used or proposed the same or similar strategies.

The drawback of this kind of approach is twofold. It is expensive in terms of time, since training is interrupted at regular, short intervals in order to gauge performance on the monitoring set. Obviously it makes heavy demands on data, since the total amount of data used for training

Page 3: Using ontogenic classification networks in a smart structures application

Ontogenic classification networks

Network Response

873

1 Environmental information

Artificial Neural Structure

Fig. I. Three layer backpropagation neural network.

includes both the set used to find weights and the set used to monitor progress. A third holdout set is still required to validate the network. And finally, it can still yield relatively inferior results.

Other approaches to improving generalization, like weight elimination [8], can yield improve- ments but incorporate still more parameters which must be determined experimentally and heuristically. Adding noise to training data has also been explored in an effort to ameliorate the overfitting problem [9], but results seem to be dependent on the problem. Fahlman and Lebiere's cascade correlation algorithm [3] shows promise as a constructive approach which adds hidden units "as needed", but again, user defined parameters dictate the precise meaning of "as needed."

In addition, though the network's ability to generalize is highly prized, it is not desirable that the network should respond to inputs which are significantly different from those on which it was trained. The network should signal this case, which is presumably of more practical use than a "guess" by the network. Backpropagation trained networks do not have the ability to make such distinctions, unless considerably modified.

2.1. Training and validation issues

Probably the most important aspect of proper training and validation for neural networks is the quantity and quality of data needed. Typical applications involve training networks of various sizes and configurations, then testing on a separate data set. The configuration and architecture yielding the best results is often chosen and that level of performance is reported. The flaw with this approach is that the separate test set, while not used to decide the values of weights in the network, is used to decide the number of hidden uni ts- -another critical parameter. Using the best test set performance may, then, overestimate the ability of the network. Additionally, the trial and error approach consumes time and lacks a guarantee of success.

On the other hand, it is even more difficult to assess the results if only one set (the training set used to find weights) is used in the experiment. In conventional statistics, a fair judgment of the goodness of fit is the adjusted coefficient of determination (R2):

R2dj = 1 - [(SSE) * (N - 1)]/I(SSTO ) * (N - p - 1)] (1)

Page 4: Using ontogenic classification networks in a smart structures application

874 Laura Burke and Seth Flanders

where N is the number of examples in the training set, p is the number of parameters in the model, and where

SSE= ~-~(yj- yS) 2

S S T O = (yj -

j = 1 , . . . , P (2)

j = 1 , . . . ,P . (3) J

yj is the correct output for input pattern j; y~ is the model (neural network) output for input pattern j; and 33 is the mean y value over all P patterns in the training set. Thus, finding the adjusted R 2 for the error on the training set requires normalization with the number of parameters; however, the apparent parameters of neural networks, the weights, do not correspond in a one-to-one fashion with the effective parameters of the model [10].

Thus the use of networks having arbitrary hidden layer configurations entails massive amounts of data, a great deal of which is not even involved in finding the mapping, in order to build a truly validated network and ensure that overfitting or overestimation of performance has not occurred. In addition, the standard approaches to experimentation with various numbers of hidden units, combinations of learning rate, momentum, sigmoidal slope, and other parameters demand excessive computational overhead which typically goes unreported in most applications. Often the training time required by the backpropagation algorithm is itself dominated by the overhead involved in

@ @

Fig. 2. Classification via mask construction.

1 1 1

0 ~ ~ _ 1 11 1 11 000000 ~ 00 0

Fig. 3. Classification via decision surface.

Page 5: Using ontogenic classification networks in a smart structures application

Ontogenic classification networks 875

determining the appropriate structure. At least for classification problems, the ontogenic RCE network appears to offer an important alternative to this costly and time-consuming approach. Moreover, the classification yielded by ontogenic classifiers can provide a "coarse mapping" which can be used in a continuous mapping to yield good results as we found in Ref. [11].

3. O N T O G E N I C N E T W O R K S

The critical shortcomings of backpropagation described in Section 2 may be summarized as arbitrariness of architecture and parameter selection, data demands for proper evaluation and generalization assessment, and the inability of the network to "know what it does not know." We propose here two key components to a neural network system which can address these short- comings. First, the neural network must construct bounded masks of decision regions rather than hypersurfaces, as in backpropagation. By masking a classification region, the network inherently limits its ability to respond to input patterns which fall considerably outside of the masks formed. Figures 2 and 3 illustrate the idea.

In Fig. 2, the classification decision is made by the following rule: If a pattern lies within a mask labeled for class i, then respond "i". If a pattern lies outside of any constructed masks, a "no response" is made. In this scenario, pattern X would be rightfully regarded as too dissimilar to previous data to judge. In Fig. 3, a nonlinear decision surface--such as that which may be constructed via backpropagation--would prompt a response of "1" to the pattern labelled X. This pattern, however, lies well outside of the region on which the network was trained by training exemplars. Thus, a more reliable response would have been "no response".

Radial basis function networks (RBFs) [12] have been proposed as an alternative to back- propagation because of their advantage in speed. They also solve the classification problem by finding masks in the pattern space which reduce the complexity of the problem. RBFs will not be described here; these methods, like backpropagation, are quite dependent on choices which the user must make, and our previous experiments demonstrated this problem [5].

The second critical component, which RBFs do not possess, is that of evolutionary or ontogenic construction. Networks possessing this characteristic build or claim critical hidden units as deemed necessary by the corresponding learning algorithm. Ontogenic nets have begun to emerge in a variety of forms. In 1982, Reilly et al. [1] proposed an evolutionary network which later became the basis for their NESTOR hardware. The fundamental idea underlying the original RCE network,

Classification (Output) Layer

• • • O]

l I o • • • o j

Weights, B

Prototype Layer Widths, L

Weights, A

IO • • • • 0 i

Input Layer Receives Pattern F

Diagram of RCE Classification Network

Fig. 4. RCE network.

Page 6: Using ontogenic classification networks in a smart structures application

876 Laura Burke and Seth Flanders

3,5"

3!

2.5-

2-

1.5

1

0.5

0 300

i -/

5o too A

200 250

- - ! - Optimal E

Fig. 5. Experimental beam results: optimal electric field setting as a function of disturbance frequency.

which is quite simple, shares characteristics which several newer paradigms, such as cascade correlation, ARTMAP, and dynamic RBFs. Each of these networks possesses either an ability to mask decision regions with bounded surfaces or to generate hidden units as deemed necessary, or both.

Probably the two networks which share the most important properties are RCEs and ARTMAP. The well-known vigilance parameter of ART is adapted in ARTMAP according to correctness of classification, as is the width or radius parameter for the RCE network. Next we describe the operation of the RCE based network used here.

3.1. Ontogenic RCE-based network description

The RCE network due to Reilly et al. [1] has been applied in several classification type problems and represents an alternative to backpropagation. Its primary advantage lies in its automation of the decision on number of of hidden units. Additionally, no user defined learning parameters are necessary. It has several desirable properties with respect to "common sense" measures that we discovered. We first describe the network and our modification which allows more general classification. We contrast the performance of the network with backpropagation, and isolate those training decisions which can affect performance.

The details of the RCE network appear in Ref. [1]. Figure 4 illustrates a labelled network structure. Here, following previous work [5], we describe its basic operations. The network consists of three layers of units: input, prototype, and classification. Prototype units respond to an input pattern F according to a threshold function of the net input received:

Fj.Ai, i fF j .A j > 1 gJ = 0, otherwise. (4)

The weight vector for unit j , Aj, can be represented as

Aj = LjPi (5)

where Pj is the prototype vector for unit j and Lj represents the angular width of the region of influence of unit j . This angular width can be likened to an "inverse vigilance". The response

Page 7: Using ontogenic classification networks in a smart structures application

Ontogenic classification networks 877

function (4) can be rewritten as:

1 gJ = Lj(F. Pj), i f F . P j > ~ (6)

0, otherwise.

which suggests that a unit j responds to a pattern F only if the similarity between F and the prototype vector for unit j exceeds some threshold.

The prototype units then transmit their responses to a final classification layer. Initially all weights connecting prototype units with classification units are 0. When a classification unit which corresponds to that of the pattern present at the input layer fails to respond, the following takes place. A new prototype unit, J, is committed to the pattern, Thus Pj is set to equal F(e), the present pattern (where e is the class of the pattern). Lj is set to L0, an initial width parameter. Finally, the connection between prototype unit J and the classification unit corresponding to class c is set to 1.

When a classification unit, c', responds to an input pattern having associated class e ¢ c', the following takes place. All committed prototype units which responded to the pattern and which are connected to unit c' with a weight of 1 will have their Lj parameters reduced. This reduction ensures that the present pattern does not cause the same units, corresponding to incorrect classes, to respond. Thus for pattern F(c r) which causes prototype unit J to respond wrongly, Ls is reduced to:

Lj = F(c ' )Pj (7)

which clearly suppresses response as equations (4) and (5) show.

3.2. Modification for general distance classification RCE based networks are typically applied to image and character recognition problems, and

problems in which the classification method is required to be invariant to the norm of the pattern. Such requirements suit the RCE network since normalization of patterns allows the parameter L to adjust in a meaningful way.

Unfortunately, however, the use of normalized patterns restricts the applicability of the RCE net. In order to address this problem, the following important modification is proposed. Instead of response equation (4), we substitute the following:

{ I F - P~I - R i, if l f - Pjl - R/ < O gJ = 0, otherwise (8)

where IX - Y] denotes the Euclidean distance between vectors X and Y. The new response equation has a similar effect to equation (4). If a pattern F does not fall within a

spherical region of influence of unit j , as defined by radius Rj, then unit j does not respond. At the same time, the supervised approach to modifying Lj now becomes modification of Rj as follows:

R~ = IV - Pjl (9)

for a pattern F which caused unit j to respond wrongly. Although the distance modification is straightforward, and enables the network to address a

wider variety of problems, it requires additional connectivity in the actual network structure.

3.3. Assessment of performance

The RCE network may be trained to an acceptably low error, or some maximum number of cycles. In all of the experiments reported here, training was stopped as soon as "sufficiently good" classification was achieved on the training set. To assess its generalization performance, we use a "liberal" interpretation of the network response. That is, for the test set patterns separate from those on which the network is trained, the classification unit with the highest net input is chosen as the only responding unit. It is possible to interpret the network response in other ways which signal confusion, but we do not discuss those here. The net input to a classification unit in recall mode is

Page 8: Using ontogenic classification networks in a smart structures application

878 Laura Burke and Seth Flanders

given by

net(j) = Z bkj gk (10) k

where bkj is the weight between prototype unit k and classification unit j . Recall that these weights are initialized at 0 and set equal to 1 according to the training procedure described earlier.

3.4. Determination of initial width

The single parameter selected by the user is the initial width or radius, R0. This radius determines the size of the spherical region associated with a prototype unit. The network begins with all units having identical widths. In order to ensure that prototype units will not proliferate initially, we require a R 0 large enough so that no patterns are excluded. The data set for the problem will determine the width; for example, the maximum variance in the data will suffice. Thus, this decision is not made arbitrarily and the resulting parameter can be completely determined by the data.

4. EXPERIMENTAL RESULTS

Previously [5], we illustrated the advantages of RCE over backpropagation for two two- dimensional classification problems. A third problem, classification of tool wear, corresponded to physical data and illustrated another key property of RCEs: while none of the networks used performed very well on the test data (yet all achieved good training accuracy), only the RCE indicated the problem by refraining from responding to a high percentage of test patterns. Without such an indication, the neural network provides answers in which the user places undue confidence. In all, these experiments served primarily to illustrate the ease of use of RCE, its speed, and comparable accuracy. The backpropagation network results represented the best over several architectures and training times, while RCE results represent a single trial with no arbitrary user decisions.

In this paper we show the advantage of the ontogenic network for problems having high complexity and sufficient data to achieve training over a specified range. We next describe the nature of the data, and how it was obtained. Then, we present results for the ontogenic and backpropagation networks.

4.1. Description of experimental data for smart structure

Two test problems, both generated from the same physical experiment, were used to illustrate the properties of the RCE network and contrast it with backpropagation. The physical data collected consisted of the frequency of vibration of a beam in the presence of an applied disturbance. The frequency of the disturbance, and hence that of the beam's vibration, ranged from 0 to about 300 Hz. For each frequency level, the amplitude of the beam's vibration at a single point was measured. The same information was collected under several different control actions. This control action was the application of an electric field to the beam, and that field ranged from 0 to 3.5 kVs in steps of 0.1 kV.

Application of an electric field to the beam affects its vibration because the beam encases an ER (electrorheological) fluid which changes its structural properties in response to the applied field [13]. More details on this phenomenon can be found in Ref. [14]. Since the specifics of the structural changes caused by varying the applied E continue to be investigated, our goal was to find an empirical relationship between the frequency of vibration measured and the "best" field to apply. The best field is that which leads to minimum vibration. More formally, the desired electric field, El, for a given frequency, f , is

E~ Elae min = = e {a~},e E E; (11)

where a e is the measured amplitude of vibration of the beam in the presence of electric field, e. E is the set of all electric field values.

The experiments conducted yielded the graph shown in Fig. 5. The ability to plot the relationship seems to negate the need for an analytical approach. However, our intention to extend this research

Page 9: Using ontogenic classification networks in a smart structures application

Ontogenic classification networks 879

-3O

~" -40

-50

~ . -60

< -70

-80 -1

-90 l

-100 ~7 ~ -

_ ,~- - - o e - - ' e , i t - - _, ~ ~ " ~ - ~ 2 . 4

F r e q u e n c y (Hz) ~ ~ ~, oQ ~ • ~ t-- c q - - t t 3 ~ , - .

Voltage (KV)

Fig, 6. Experimental beam results: amplitude of vibration as a function of disturbance frequency and applied electric field.

to more complex scenarios leading to higher dimensions in the input and output space indicates the need to model this relationship. Moreover, an analytical model which could compute the best E for any frequency with the potential of hardware implementation can fill a useful role in an overall strategy for beam vibration control.

The data for the second experiment consisted of all 9000 observations taken which related the frequency of excitation and applied voltage (as inputs) to the observed amplitude of displacement of the beam. Thus, the second problem required a causal or "forward" model from inputs to effected output; the first problem required a mapping from externally applied input to optimal control (voltage) setting.

The results reported for backpropagation are the best over a series of experiments using varying numbers of hidden units and training periods. The experiments conducted using backpropagation used a "low" and a "high" value for the number of hidden units. In some cases, a range of units was tested. Training was stopped at regular intervals or when training error appeared to level out. For each test problem, then, assessment of performance was taken at least 10 times. No attempt was made to optimize learning rates and momentum values which were preset at an level which had yielded good results over previous problems.

For the RCE network, a maximum value for the number of prototype units to be committed, and an initial R0 as described in Section 3.4 were set. Training then progressed for each problem until acceptable training set classification occurred. "Acceptable" translates to either perfect classifica- tion or that achieved at the end of some maximum elapsed time. The test set was not used to help determine when training should end or how many hidden units to use, as it typically has to be for backpropagation in order to avoid overfitting. In the experiments reported here, however, overfitting was not as much a problem as simply achieving acceptable training accuracy.

4.2. Results.for problem 1 (optimal mapping)

The mapping pictured in Fig. 5 and discussed in the previous section was transformed into a classification problem by dividing the range of the output into seven discrete regions. (Attempts to use backpropagation to model the continuous mapping met with failure [1 I].) Using the RCE-based

Page 10: Using ontogenic classification networks in a smart structures application

880 Laura Burke and Seth Flanders

method, we achieved perfect classification accuracy on the training set of 250 patterns after six cycles (corresponding to under a minute of processing time on a 486-based PC). The test set accuracy was 94%, corresponding to three misclassifications out of 49. About 70 prototype units were generated. For backpropagation, we used numerous architectures and found typically that we could not achieve better than 50% classification on the training set. For example, for 100 hidden units and 100 cycles (corresponding to over 250,000 presentations of data) the network achieved about 50% accuracy on the training set and about 35% on the test set. Other architectures, including two hidden layers of 100 and 50 units respectively, did not improve on these results. We do not intend to suggest that, given enough training time, experimentation time (on the part of the user) and parameters, backpropagation cannot succeed; rather, it is apparent that the time required to find the successful combination of architecture and training time greatly exceeds that for the RCE based system.

4.3. Results for problem 2 (forward mapping)

Figure 6 shows experimental results relating the externally applied frequency of excitation and the applied voltage to the resulting amplitude of beam vibration. We again "coarsened" the continuous mapping by dividing the continuous output variable (amplitude) into seven discrete regions.

After 15 cycles of training on the RCE based network, an accuracy of 93.8% (1666/1778) correct classification was achieved. We stopped training before perfect classification due to the very high number of patterns in the training set. It is inevitable that a correspondingly high number of prototype units will evolve, and we wished to achieve reasonably good classification in an acceptable period of time (about 7 min on a 486 based machine). After training, the network achieved 82.5% correct classification on the test set, corresponding to 5814 correct classifications out of a total of 7000 patterns. Using backpropagation and numerous arthitectures, including one with two hidden layers having 100 and 50 units respectively, and after 20 cycles training (corresponding to over an hour), the network achieved no better than 62% classification. The improvement with training cycles was becoming negligible, which is why no further training was attempted. Test set results were also at or below 60%.

5. CONCLUSION

Two difficult nonlinear mappings were treated as classification problems by discretizing the output variable. We found that numerous choices for architecture and training time for back- propagation still led to considerably inferior results than did a single training experiment with our RCE-based network. The important finding here is not that backpropagation can perform poorly. It is extremely easy to find a configuration for backpropagation which will fail, even though it is still possible to succeed with the "right" choices. Instead, the point we wish to make is that when a problem suits a classification representation, an ontogenic network offers significant advantages over backpropagation in both time used finding a configuration (since ontogenic networks automate that process) and in processing time. Only one parameter need be input to the RCE-based network, and that is a simple function of the spread of the data.

The results found here further support previous research which indicates the usefulness of these networks. However, ontogenic networks seem best suited to problems in which a large amount of data is available. Like radial basis function networks, RCE based nets require similarity in input patterns to drive mask formation, and when there is only enough data to support very few patterns per mask created, performance is likely to be poor. In the beam vibration experiments, our wealth of data facilitated the RCE network but actually posed problems for the backpropagation networks.

While these networks have shortcomings, our findings with beam vibration data from a physical experiment suggest they may be powerful alternatives to backpropagation for difficult nonlinear mappings.

Acknowledgement--This research was supported in part by a National Science Foundation National Young Investigator Award and by funding from the Army Research Office (Grant No. DAAL03-92-G-0388) to the first author.

Page 11: Using ontogenic classification networks in a smart structures application

Ontogenic classification networks 881

R E F E R E N C E S

1. D. L. Reilly, L. N. Cooper and C. Elbaum, A neural model for category learning. Biol. Cybernetics 45, 35-41 (1982). 2. G. Carpenter, S. Grossberg and J. Reynolds, ARTMAP: Supervised real-time learning and classification of nonsta-

tionary data by a self-organizing neutral network. Neural Networks 4, 565-588 (1991). 3. S. E. Fahlman and C. Lebiere, The Cascade-Correlation Learning Architecture. Technical Report CMU-CS-90-100,

Carnegie Mellon University, Pittsburgh, PA (1990). 4. W. Blevins and D. St. Clair, Determining the number and placement of functions for radial basis approximation

networks. Intelligent Engineering Systems Through Artificial Neural Networks, Vol. 3. ASME Press (1993). 5. L. I. Burke, A comparison of neural networks for classification: advantages of RCE-based networks. Intelligent

Engineering Systems Through Artificial Neural Networks, Vol. 3. ASME Press (1993). 6. D. Rumelhart, J. McClelland and the PDP Research Group, Parallel Distributed Processing, Vol. I. MIT Press,

Cambridge, MA (1986). 7. J. F. Pollard, M. R. Broussard, D. B. Garrison and K. Y. San, Process identification using neural networks. Computers

chem. Engng 16, 253 270 (1992). 8. A. Weigend, B. Huberman and D. Rumelhart, Predicting the future: a connectionist approach. Int. J. Neural Systems l,

193-210 (1990). 9. K. Matsuoka, Noise injection into inputs in backpropagation learning. IEEE Trans. Systems, Man, and Cybernetics 22,

436-440 (1992). 10. J. Moody, The effective number of parameters: an analysis of generalization and regularization in nonlinear learning

systems. In Advances in Neural Information Processing Systems (Edited by J. Moody et al.), 4. Morgan Kaufman Publishers, San Mateo, CA (1992).

11. L. Burke, S. Vaithyanathan and S. Flanders, A hybrid neural network approach to beam vibration minimization. IEEE Trans. Systems, Man and Cybernetics, submitted.

12. J. Moody and C. Darken, Fast learning in networks of locally-tuned processing units. Neural Comput. 1,281-294. 13. W. Winslow, Induced fibration of suspensions. J. applied Phys. 20, 1137-1140 (1949). 14. J. P. Coulter, T. G. Duclos and D. N. Acker, Electrorheological materials in structural damping applications. J. Sound

Vibration, to appear.