Artificial neural networks and genetic algorithms for gear fault detection

Mechanical Systemsand

Signal Processing

www.elsevier.com/locate/jnlabr/ymssp

Mechanical Systems and Signal Processing 18 (2004) 1273–1282

Letter to the editor

Artificial neural networks and genetic algorithmsfor gear fault detection

1. Introduction

Condition monitoring is gaining importance in industry because of the need to increasemachine availability. The use of vibration and acoustic emission (AE) signals is quite common inthe field of condition monitoring of rotating machinery [1–5] with potential applications ofartificial neural networks (ANNs) in automated detection and diagnosis [2,4,5]. Multi-layerperceptrons (MLPs) and radial basis functions (RBFs) are most commonly used ANNs [6,7],though interest on probabilistic neural networks (PNNs) is increasing in general [7,8] and in thearea of machine condition monitoring [9,10]. Genetic algorithms (GAs) have been used to makethe classification process faster and accurate using minimum number of features which primarilycharacterise the system conditions with optimised structure or parameters of ANNs [10,11]. In arecent work [12], results of MLPs with GAs were presented for fault detection of gears using onlytime-domain features of vibration signals. In this approach, the features were extracted from finitesegments of two signals: one with normal condition and the other with defective gears. In thepresent work, comparisons are made between the performance of three different types of ANNs,both without and with automatic selection of features and classifier parameters for the dataset of[12]. The roles of different vibration signals, obtained under both normal and light loads and atlow and high sampling rates, are investigated. The results show the effectiveness of the extractedfeatures from the acquired and preprocessed signals in diagnosis of the machine condition. Theprocedure is illustrated using the vibration data of an experimental setup with normal anddefective gears [13].

2. Vibration data and feature extraction

In [13], vibration signals measured from seven accelerometers on a pump set driven by anelectrical motor through a two-stage gear reduction unit were presented. The sensors were placednear the driving shaft and the bearings supporting the gear-shafts. Four sets of measurements withtwo levels of load (maximum and minimum) and at two sampling rates (3200 and128,000 samples/s) were obtained. The sampling rates were selected much above the gear meshfrequency. Number of samples collected for each channel was 77,824 to cover sufficient number ofcycles.

ARTICLE IN PRESS

0888-3270/$ - see front matter r 2004 Published by Elsevier Ltd.

doi:10.1016/j.ymssp.2003.11.003

The samples were divided into 38 segments of 2048 samples each, which was further processed[12] to extract nine features (1–9): mean, root mean square, variance, skewness, kurtosis andnormalised fifth to ninth central moments. The effect of segment size on the variation of featuresbetween segments was studied and the present segment size (2048 data points) was chosen. Similarfeatures were extracted from derivative and integral of the signals (10–27) and from low- andhigh-pass filtered signals (28–45) [12]. The procedure of feature extraction was repeated for twoload conditions, two sampling rates (high and low) and two gear conditions (normal anddefective) giving a total set of 45� 266� 2� 2� 2 features. Each of the features was normaliseddividing each row by its absolute maximum value keeping the inputs within 71 for better speedand success of the network training. However, a scheme of statistical normalisation withzero mean and a standard deviation of 1 for each feature set was also attempted. The resultscomparing the effectiveness of two normalisation schemes (magnitude and statistical) arediscussed in Section 5.4.

3. Artificial neural networks

There are numerous applications of ANNs in data analysis, pattern recognition and control[6,7]. Among different types of ANNs, three namely, MLP, RBF and PNN are considered in thiswork. Here a brief introduction to ANNs is given and readers are referred to texts [6,7] for details.

3.1. Multi-layer perceptron

The feed-forward MLP neural network, used in this work, consisted of three layers: input,hidden and output. The input layer had nodes representing the normalised features extracted fromthe measured vibration signals. The number of input nodes was varied from 3 to 45 and thenumber of output nodes was 2. The number of hidden nodes was varied between 10 and 30,similar to [12].

3.2. Radial basis function networks

The structure of a RBF network is similar to that of an MLP. The activation function of thehidden layer is Gaussian spheroid function as follows:

yðxÞ ¼ e�Jx�cJ2=2s2: ð1Þ

The output of the hidden neuron gives a measure of distance between the input vector x and thecentroid, c; of the data cluster. The parameter, s, represents the radius of the hypersphere. Theparameter is generally determined using iterative process selecting an optimum width on the basisof the full data sets. However, in the present work, the width is selected along with the relevantinput features using the GA based approach. In the present work, the RBFs were created, trainedand tested using Matlab through a simple iterative algorithm of adding more neurons in thehidden layer till the performance goal is reached.

ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821274

3.3. Probabilistic neural networks

The structure of a PNN is similar to that of a RBF, both having a Gaussian spheroid activationfunction in the first of the two layers. The linear output layer of the RBF is replaced with acompetitive layer in PNN which allows only one neuron to fire with all others in the layerreturning zero. The major drawback of using PNNs was computational cost for the potentiallylarge size of the hidden layer which could be equal to the size of the input vector. The PNN can beBayesian classifier, approximating the probability density function (PDF) of a class using Parzenwindows [6]. The generalised expression for calculating the value of Parzen approximated PDF ata given point x in feature space is given as follows:

fAðxÞ ¼1

ð2pÞ2spNA

XNA

i¼1

e�Jx�cJ2=2s2; ð2Þ

where p is the dimensionality of the feature vector, NA is the number of examples of class A usedfor training the network. The parameter s represents the spread of the Gaussian function and hassignificant effects on the generalisation of a PNN.One of the problems with the PNN is handling the skewed training data where the data from

one class is significantly more than the other class. The presence of skewed data is more likely in areal environment as the number of data for normal machine condition would, in general, be muchlarger than the machine fault data. The basic assumption in PNN approach is the so-called priorprobabilities, i.e. the proportional representation of classes in training data should match, to somedegree, the actual representation in the population being modeled [6,8]. If the prior probability isdifferent from the level of representation in the training cases, then the accuracy of classification isreduced. To compensate for this mismatch, the a priori probabilities can be given as input tothe network and the class weightings are adjusted accordingly at the binary output nodes of thePNN [6,8]. If the a priori probabilities are not known, then training data set should be largeenough for the PDF estimators to asymptotically approach the underlying probability density.The skewed training set problem also affects MLPs.In the present work, the data sets have equal number of samples from normal and faulty gear

conditions. The PNNs were created, trained and tested using Matlab. The width parameter isgenerally determined using iterative process selecting an optimum value on the basis of the fulldata sets. However, in the present work the width is selected along with the relevant input featuresusing the GA-based approach, as in case of RBFs.

4. Genetic algorithms

GAs have been considered with increasing interest in a wide variety of applications. Thesealgorithms are used to search the solution space through simulated evolution of ‘survival of thefittest’ for solving linear and non-linear problems through mutation, crossover and selectionoperations applied to individuals in the population [14]. The basic issues of GAs, in the context ofthe present work, are briefly discussed in this section.A population size of 10 individuals was used starting with randomly generated genomes. GA

was used to select the most suitable features and one variable parameter related to the particular

ARTICLE IN PRESS

Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1275

classifier: the number of neurons in the hidden layer for MLP and the RBF kernel width (s) forRBF and PNN. For a training run needing N different inputs to be selected from a set of Qpossible inputs, the genome string would consist of N þ 1 real numbers. The first N integers(xi; i ¼ 1;N) in the genome are constrained to be in the range 1pxjpQ for ANNs

X ¼ fx1 x2;y;xN xNþ1gT: ð3Þ

The last number xNþ1 has to be within the range SminpxNþ1pSmax: The parameters Smin and Smax

represent, respectively, the lower and the upper bounds on the classifier parameter. A probabilisticselection function, namely, normalised geometric ranking [15] was used such that the betterindividuals, based on the fitness criterion in the evaluation function, have higher chance of beingselected. Non-uniform-mutation function [14] using a random number for mutation based oncurrent generation and the maximum generation number, among other parameters was adopted.Heuristic crossover [14] producing a linear extrapolation of two individuals based on the fitnessinformation was chosen. The maximum number of generations was adopted as the terminationcriterion for the solution process. The classification success for the test data was used as the fitnesscriterion in the evaluation function.

5. Results and discussion

The data set were split in training and test sets of size 45� 140� 2� 2� 2 and45� 126� 2� 2� 2, respectively. No separate validation set was used due to limited size of theavailable data. The number of output node was two for MLPs and RBFs, and one for PNNs. Theuse of one output node for all would have been enough. However, the classification success wasnot satisfactory with one output node in case of MLPs and RBFs for the present data sets with theparticular choice of network structure and activation functions. The target value of the firstoutput node was set 1 and 0 for normal and failed bearings, respectively, and the values wereinterchanged (0 and 1) for the second output node. For PNNs, the target values were specified as 0and 1, respectively, representing normal and faulty conditions. Results are presented to see theeffects of accelerometer location and signal processing for diagnosis of machine condition usingANNs without and with feature selection based on GA. The training success for each casewas 100%.

5.1. Performance comparison of ANNs without and with feature selection

In this section classification results are presented for ANNs without and with GA-based featureselection. For each case of straight ANN, number of neurons in the hidden layer was 24 and forstraight RBFs and PNNs, widths (s) were kept constant at 0.50 and 0.10, respectively. Thesevalues were found on the basis of several trials of training the ANNs. In GA-based approach, onlythree features were selected from the corresponding range of input features. In case of MLPs, thenumber of neurons in the hidden layer was selected in the range of 10 and 30 whereas for RBFsand PNNs, the Gaussian spread (s) was selected in the range of 0.1 and 1.0 with a step size of 0.1.

ARTICLE IN PRESS


5.1.1. Effect of sensor location

Table 1 shows the classification results for each of the sensor locations. First nine inputfeatures (1–9) were used in straight ANNs. The test success improved substantially ineach case with feature selection. The poor performance of the straight ANNs may be attributedto the insufficient feature set with not enough discrimination between normal and faultyconditions.

5.1.2. Effect of signal pre-processingTable 2 shows the effects of signal processing on the classification results with first four signals

(1–4) and features from the corresponding ranges. Test success improved with GA, in some casesto 100%.

5.1.3. Effect of load and sampling rateTable 3 shows the effects of load and signal sampling rate on the classification results using the

full feature range (1–45) for first four signals (1–4). For most cases with GA, test success improvedto 100%.

ARTICLE IN PRESS

Table 1

Performance comparison of classifiers without and with feature selection for different sensor locations

Data set Test success (%)

MLP RBF PNN

Straight With GA Straight With GA Straight With GA

Signal 1 88.89 100 50.00 100 91.67 100

Signal 2 95.83 100 50.00 100 83.33 100

Signal 3 100 100 55.56 100 94.44 100

Signal 4 87.50 94.44 91.67 100 80.56 94.44

Signal 5 48.61 100 63.89 100 55.56 100

Signal 6 56.94 98.61 91.67 94.44 63.89 98.61

Signal 7 87.50 100 55.56 100 77.78 100

Table 2

Performance comparison of classifiers without and with feature selection for different signal preprocessing


MLP RBF PNN


Signals 1–4 96.53 100 97.92 99.31 98.61 98.61

Derivative/integral 97.92 100 97.92 95.14 97.92 97.92

High-/low-pass filtering 94.44 100 97.92 100 99.31 100


5.2. Effect of number of selected features

Table 4 shows test classification success for the ANNs with number of selected features varyingfrom 3 to 6 out of 45 for the first four signals at low load and low sampling rate. The test successwas almost 100% for all three classifiers.

5.3. Performance of PNNs with selection of 6 features

The performance of PNNs with 6 features selected from the corresponding ranges was studied.The test success was 100% for all cases except one. The computation time (on a PC with PentiumIII processor of 533MHz and 64 MB RAM) for training the PNNs was also noted for eachcase. These values (39.397–61.468 s) were not much different from PNNs with three features(36.983–56.872 s) but higher than straight cases (0.250–1.272 s). These values were substantiallylower than RBFs and MLPs, however direct comparison is not made among the ANNs due todifference in code efficiency.

5.4. Results using statistical normalisation

The data sets discussed in previous sections were normalised in magnitude to keep the featureswithin 71. The procedure of GA-based selection of Section 5.2 was repeated for PNNs using thestatistically normalised features. The test classification success was almost 100% for both schemes

ARTICLE IN PRESS

Table 3

Performance comparison of classifiers without and with feature selection for different load and sampling rates


Load Sampling rate MLP RBF PNN


Max Low 100 100 97.92 97.92 99.31 100

Min Low 96.88 100 94.44 100 100 100

Max High 80.21 100 86.81 99.31 96.53 100

Min High 97.57 100 97.92 100 99.31 100

Table 4

ANN performance for different number of features selected

Number of selected features Test success (%)

MLP RBF PNN

3 100 100 100

4 99.65 100 100

5 100 100 100

6 100 100 100


of normalisation. Training time increased somewhat with higher number of features but not indirect proportion.

5.5. Separability of data sets

To investigate the separability of data sets with and without gear fault, three features selectedby GA in MLP, RBF and PNN are shown in Figs. 1(a)–(c), respectively, for magnitudenormalised features. In all three cases, the data clusters are quite well separated with only a smallamount of overlap, especially the separation is the best for PNN. This can explain 100%

ARTICLE IN PRESS

00.2

0.40.6

0.81

-1

-0.5

0

0.5

10

0.2

0.4

0.6

0.8

1

(c)

0.40.5

0.60.7

0.80.9

1

0.2

0.4

0.6

0.8

10.4

0.5

0.6

0.7

0.8

0.9

1

1st Feature(a)

2nd Feature

3rd

Fea

ture

NormalFaulty

0.20.4

0.60.8

1

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

(b)

1st Feature2nd Feature

3rd

Fea

ture


3rd

Fea

ture

NormalFaulty

NormalFaulty

Fig. 1. Scatter plots of GA selected features with magnitude normalisation: (a) MLP, (b) RBF, (c) PNN.


classification success even with only three features for all three classifiers. Fig. 2 shows the scatterplot of three statistically normalised features selected by GA for PNN. This also shows wellseparation of data clusters explaining 100% classification success.

5.6. Comparison with other technique

Principal component analysis (PCA) is used to reduce dimensionality of data by forming newset of variables, known as principal components (PCs), representing the maximal variability in thedata without loss of information [16]. Figs. 3(a) and (b) show the plots of first three PCs formagnitude and statistically normalised feature sets, respectively. These PCs account for more than60% of variability of the feature sets. The separation between the data clusters for two classes isnot very prominent. The classification success of using the first three to eight PCs of magnitudenormalised data is presented in Table 5. These were found to be very unsatisfactory:54.86–67.71% for MLPs, 50.00–77.78% for RBFs, 65.97–85.42% for PNNs, compared to thefeature selection procedure of GA which gave almost 100% classification success for all threeclassifiers. This shows the superiority of the present approach of GA-based feature selection overusing the PCs.To summarise, the classification success of MLPs and PNNs, without feature selection, were

comparable and better than RBFs for most of the cases considered, but almost all weresubstantially less than 100%. The use of GAs with only three features gave almost 100%classification for most of the test cases even with different load conditions and sampling rates forall ANNs. The use of six selected features with PNNs gave almost 100% test success for the casesconsidered. The training time with feature selection was reasonable for PNNs and substantiallylower than RBFs and MLPs, with and without feature selection. The classification success ofthe ANNs compares well with that of support vector machines for the same data and feature

ARTICLE IN PRESS

-2

-1

0

1

2

-2

-1

0

1

2-2

-1

0

1

2

3

4

5

6


3rd

Fea

ture

NormalFaulty

Fig. 2. Scatter plot of GA selected features with statistical normalisation for PNN.


sets [12] and with other work of GA-based MLPs [10]. However, it should be mentioned that theANN-based approach has its inherent limitation that for changed machine condition withdifferent load, the retraining of ANNs may be required [5].

Acknowledgements

The data set was acquired in the Delft ‘‘Machine diagnostics by neural network’’ project withhelp from TechnoFysica B.V., The Netherlands and can be downloaded freely at web-address:

ARTICLE IN PRESS

-3-2 -1

01 2

-2-1

01

2-2

0

2

1st Principal component

(a)

2nd Principal component

3rd

Prin

cipa

lcom

pone

nt

NormalFaulty

-15-10 -5

05

10

-15-10

-50

510

-10

0

10

1st Principal component

(b)

2nd Principal component

3rd

Prin

cipa

l com

pone

nt

NormalFaulty

Fig. 3. Scatter plots of principal components for features with different normalisation schemes: (a) magnitude,

(b) statistical.

Table 5

ANN performance with principal components (PC)

Number of first PCs Test success (%)

MLP RBF PNN

3 65.28 63.89 65.97

4 67.71 77.78 74.31

5 54.51 57.64 75.69

6 64.24 53.47 83.33

8 54.86 50.00 85.42


http://www.ph.tn.tudelft.nl/Bypma/mechanical.html. The author thanks Dr. Alexander Ypma ofDelft Technical University for making the dataset available and providing useful clarifications.

References

[1] J. Shiroishi, Y. Li, S. Liang, T. Kurfess, S. Danyluk, Bearing condition diagnostics via vibration and acoustic

emission measurements, Mechanical Systems and Signal Processing 11 (1997) 693–705.

[2] R.B. Randall (Guest Ed.), Special issue on gear and bearing diagnostics, Mechanical Systems and Signal

Processing 15 (2001) 827–993.

[3] K.R. Al-Balushi, B. Samanta, Gear fault diagnosis using energy-based features of acoustic emission signals,

Proceedings of IMechE, Part I: Journal of Systems and Control Engineering 216 (2002) 249–263.

[4] A.C. McCormick, A.K. Nandi, Classification of the rotating machine condition using artificial neural networks,

Proceedings of IMechE, Part C: Journal of Mechanical Engineering Science 211 (1997) 439–450.

[5] B. Samanta, K.R. Al-Balushi, Artificial neural network based fault diagnostics of rolling element bearings using

time-domain features, Mechanical Systems and Signal Processing 17 (2003) 317–328.

[6] P.D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, USA, 1995.

[7] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice Hall, NJ, USA, 1999.

[8] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1999) 109–118.

[9] L.B. Jack, A.K. Nandi, A.C. McCormick, Diagnosis of rolling element bearing faults using radial basis functions,

Applied Signal Processing 6 (1999) 25–32.

[10] L.B. Jack, Applications of artificial intelligence in machine condition monitoring, Ph.D. Thesis, Department of

Electrical Engineering and Electronics, the University of Liverpool, 2000.

[11] L.B. Jack, A.K. Nandi, Genetic algorithms for feature extraction in machine condition monitoring with vibration

signals, IEE Proceedings Vision and Image Signal Processing 147 (2000) 205–212.

[12] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with genetic

algorithms, Mechanical Systems and Signal Processing 18 (2004) 625–644.

[13] A. Ypma, R. Ligteringen, R.P.W. Duin, E.E.E. Frietman, Pump vibration data sets, Pattern recognition group,

Delft University of Technology, 1999.

[14] Z. Michalewicz, Genetic algorithms+Data Structures=Evolution Programs, Springer, New York, USA, 1999.

[15] C.R. Houk, J. Joines, M.A. Kay, A genetic algorithm for function optimisation: a matlab implementation, North

Carolina State University, Report No: NCSU IE TR 95 09, 1995.

[16] I.T. Jolliffe, Principal Component Analysis, 2nd Edition, Springer, Berlin, 2002.

B. SamantaDepartment of Mechanical and Industrial Engineering College of Engineering, Sultan Qaboos

University P.O. Box 33, PC 123, Muscat, Sultanate of Oman

E-mail address: [email protected]

ARTICLE IN PRESS


http://www.ph.tn.tudelft.nl/~ypma/mechanical.html

Artificial neural networks and genetic algorithms for gear fault detection

Documents

Transcript of Artificial neural networks and genetic algorithms for gear fault detection