32 Fault Detection in Induction Motors Based on Artificial Intelligence
Artificial neural networks and genetic algorithms for gear fault detection
Transcript of Artificial neural networks and genetic algorithms for gear fault detection
Mechanical Systemsand
Signal Processing
www.elsevier.com/locate/jnlabr/ymssp
Mechanical Systems and Signal Processing 18 (2004) 1273–1282
Letter to the editor
Artificial neural networks and genetic algorithmsfor gear fault detection
1. Introduction
Condition monitoring is gaining importance in industry because of the need to increasemachine availability. The use of vibration and acoustic emission (AE) signals is quite common inthe field of condition monitoring of rotating machinery [1–5] with potential applications ofartificial neural networks (ANNs) in automated detection and diagnosis [2,4,5]. Multi-layerperceptrons (MLPs) and radial basis functions (RBFs) are most commonly used ANNs [6,7],though interest on probabilistic neural networks (PNNs) is increasing in general [7,8] and in thearea of machine condition monitoring [9,10]. Genetic algorithms (GAs) have been used to makethe classification process faster and accurate using minimum number of features which primarilycharacterise the system conditions with optimised structure or parameters of ANNs [10,11]. In arecent work [12], results of MLPs with GAs were presented for fault detection of gears using onlytime-domain features of vibration signals. In this approach, the features were extracted from finitesegments of two signals: one with normal condition and the other with defective gears. In thepresent work, comparisons are made between the performance of three different types of ANNs,both without and with automatic selection of features and classifier parameters for the dataset of[12]. The roles of different vibration signals, obtained under both normal and light loads and atlow and high sampling rates, are investigated. The results show the effectiveness of the extractedfeatures from the acquired and preprocessed signals in diagnosis of the machine condition. Theprocedure is illustrated using the vibration data of an experimental setup with normal anddefective gears [13].
2. Vibration data and feature extraction
In [13], vibration signals measured from seven accelerometers on a pump set driven by anelectrical motor through a two-stage gear reduction unit were presented. The sensors were placednear the driving shaft and the bearings supporting the gear-shafts. Four sets of measurements withtwo levels of load (maximum and minimum) and at two sampling rates (3200 and128,000 samples/s) were obtained. The sampling rates were selected much above the gear meshfrequency. Number of samples collected for each channel was 77,824 to cover sufficient number ofcycles.
ARTICLE IN PRESS
0888-3270/$ - see front matter r 2004 Published by Elsevier Ltd.
doi:10.1016/j.ymssp.2003.11.003
The samples were divided into 38 segments of 2048 samples each, which was further processed[12] to extract nine features (1–9): mean, root mean square, variance, skewness, kurtosis andnormalised fifth to ninth central moments. The effect of segment size on the variation of featuresbetween segments was studied and the present segment size (2048 data points) was chosen. Similarfeatures were extracted from derivative and integral of the signals (10–27) and from low- andhigh-pass filtered signals (28–45) [12]. The procedure of feature extraction was repeated for twoload conditions, two sampling rates (high and low) and two gear conditions (normal anddefective) giving a total set of 45� 266� 2� 2� 2 features. Each of the features was normaliseddividing each row by its absolute maximum value keeping the inputs within 71 for better speedand success of the network training. However, a scheme of statistical normalisation withzero mean and a standard deviation of 1 for each feature set was also attempted. The resultscomparing the effectiveness of two normalisation schemes (magnitude and statistical) arediscussed in Section 5.4.
3. Artificial neural networks
There are numerous applications of ANNs in data analysis, pattern recognition and control[6,7]. Among different types of ANNs, three namely, MLP, RBF and PNN are considered in thiswork. Here a brief introduction to ANNs is given and readers are referred to texts [6,7] for details.
3.1. Multi-layer perceptron
The feed-forward MLP neural network, used in this work, consisted of three layers: input,hidden and output. The input layer had nodes representing the normalised features extracted fromthe measured vibration signals. The number of input nodes was varied from 3 to 45 and thenumber of output nodes was 2. The number of hidden nodes was varied between 10 and 30,similar to [12].
3.2. Radial basis function networks
The structure of a RBF network is similar to that of an MLP. The activation function of thehidden layer is Gaussian spheroid function as follows:
yðxÞ ¼ e�Jx�cJ2=2s2: ð1Þ
The output of the hidden neuron gives a measure of distance between the input vector x and thecentroid, c; of the data cluster. The parameter, s, represents the radius of the hypersphere. Theparameter is generally determined using iterative process selecting an optimum width on the basisof the full data sets. However, in the present work, the width is selected along with the relevantinput features using the GA based approach. In the present work, the RBFs were created, trainedand tested using Matlab through a simple iterative algorithm of adding more neurons in thehidden layer till the performance goal is reached.
ARTICLE IN PRESS
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821274
3.3. Probabilistic neural networks
The structure of a PNN is similar to that of a RBF, both having a Gaussian spheroid activationfunction in the first of the two layers. The linear output layer of the RBF is replaced with acompetitive layer in PNN which allows only one neuron to fire with all others in the layerreturning zero. The major drawback of using PNNs was computational cost for the potentiallylarge size of the hidden layer which could be equal to the size of the input vector. The PNN can beBayesian classifier, approximating the probability density function (PDF) of a class using Parzenwindows [6]. The generalised expression for calculating the value of Parzen approximated PDF ata given point x in feature space is given as follows:
fAðxÞ ¼1
ð2pÞ2spNA
XNA
i¼1
e�Jx�cJ2=2s2; ð2Þ
where p is the dimensionality of the feature vector, NA is the number of examples of class A usedfor training the network. The parameter s represents the spread of the Gaussian function and hassignificant effects on the generalisation of a PNN.One of the problems with the PNN is handling the skewed training data where the data from
one class is significantly more than the other class. The presence of skewed data is more likely in areal environment as the number of data for normal machine condition would, in general, be muchlarger than the machine fault data. The basic assumption in PNN approach is the so-called priorprobabilities, i.e. the proportional representation of classes in training data should match, to somedegree, the actual representation in the population being modeled [6,8]. If the prior probability isdifferent from the level of representation in the training cases, then the accuracy of classification isreduced. To compensate for this mismatch, the a priori probabilities can be given as input tothe network and the class weightings are adjusted accordingly at the binary output nodes of thePNN [6,8]. If the a priori probabilities are not known, then training data set should be largeenough for the PDF estimators to asymptotically approach the underlying probability density.The skewed training set problem also affects MLPs.In the present work, the data sets have equal number of samples from normal and faulty gear
conditions. The PNNs were created, trained and tested using Matlab. The width parameter isgenerally determined using iterative process selecting an optimum value on the basis of the fulldata sets. However, in the present work the width is selected along with the relevant input featuresusing the GA-based approach, as in case of RBFs.
4. Genetic algorithms
GAs have been considered with increasing interest in a wide variety of applications. Thesealgorithms are used to search the solution space through simulated evolution of ‘survival of thefittest’ for solving linear and non-linear problems through mutation, crossover and selectionoperations applied to individuals in the population [14]. The basic issues of GAs, in the context ofthe present work, are briefly discussed in this section.A population size of 10 individuals was used starting with randomly generated genomes. GA
was used to select the most suitable features and one variable parameter related to the particular
ARTICLE IN PRESS
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1275
classifier: the number of neurons in the hidden layer for MLP and the RBF kernel width (s) forRBF and PNN. For a training run needing N different inputs to be selected from a set of Qpossible inputs, the genome string would consist of N þ 1 real numbers. The first N integers(xi; i ¼ 1;N) in the genome are constrained to be in the range 1pxjpQ for ANNs
X ¼ fx1 x2;y;xN xNþ1gT: ð3Þ
The last number xNþ1 has to be within the range SminpxNþ1pSmax: The parameters Smin and Smax
represent, respectively, the lower and the upper bounds on the classifier parameter. A probabilisticselection function, namely, normalised geometric ranking [15] was used such that the betterindividuals, based on the fitness criterion in the evaluation function, have higher chance of beingselected. Non-uniform-mutation function [14] using a random number for mutation based oncurrent generation and the maximum generation number, among other parameters was adopted.Heuristic crossover [14] producing a linear extrapolation of two individuals based on the fitnessinformation was chosen. The maximum number of generations was adopted as the terminationcriterion for the solution process. The classification success for the test data was used as the fitnesscriterion in the evaluation function.
5. Results and discussion
The data set were split in training and test sets of size 45� 140� 2� 2� 2 and45� 126� 2� 2� 2, respectively. No separate validation set was used due to limited size of theavailable data. The number of output node was two for MLPs and RBFs, and one for PNNs. Theuse of one output node for all would have been enough. However, the classification success wasnot satisfactory with one output node in case of MLPs and RBFs for the present data sets with theparticular choice of network structure and activation functions. The target value of the firstoutput node was set 1 and 0 for normal and failed bearings, respectively, and the values wereinterchanged (0 and 1) for the second output node. For PNNs, the target values were specified as 0and 1, respectively, representing normal and faulty conditions. Results are presented to see theeffects of accelerometer location and signal processing for diagnosis of machine condition usingANNs without and with feature selection based on GA. The training success for each casewas 100%.
5.1. Performance comparison of ANNs without and with feature selection
In this section classification results are presented for ANNs without and with GA-based featureselection. For each case of straight ANN, number of neurons in the hidden layer was 24 and forstraight RBFs and PNNs, widths (s) were kept constant at 0.50 and 0.10, respectively. Thesevalues were found on the basis of several trials of training the ANNs. In GA-based approach, onlythree features were selected from the corresponding range of input features. In case of MLPs, thenumber of neurons in the hidden layer was selected in the range of 10 and 30 whereas for RBFsand PNNs, the Gaussian spread (s) was selected in the range of 0.1 and 1.0 with a step size of 0.1.
ARTICLE IN PRESS
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821276
5.1.1. Effect of sensor location
Table 1 shows the classification results for each of the sensor locations. First nine inputfeatures (1–9) were used in straight ANNs. The test success improved substantially ineach case with feature selection. The poor performance of the straight ANNs may be attributedto the insufficient feature set with not enough discrimination between normal and faultyconditions.
5.1.2. Effect of signal pre-processingTable 2 shows the effects of signal processing on the classification results with first four signals
(1–4) and features from the corresponding ranges. Test success improved with GA, in some casesto 100%.
5.1.3. Effect of load and sampling rateTable 3 shows the effects of load and signal sampling rate on the classification results using the
full feature range (1–45) for first four signals (1–4). For most cases with GA, test success improvedto 100%.
ARTICLE IN PRESS
Table 1
Performance comparison of classifiers without and with feature selection for different sensor locations
Data set Test success (%)
MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Signal 1 88.89 100 50.00 100 91.67 100
Signal 2 95.83 100 50.00 100 83.33 100
Signal 3 100 100 55.56 100 94.44 100
Signal 4 87.50 94.44 91.67 100 80.56 94.44
Signal 5 48.61 100 63.89 100 55.56 100
Signal 6 56.94 98.61 91.67 94.44 63.89 98.61
Signal 7 87.50 100 55.56 100 77.78 100
Table 2
Performance comparison of classifiers without and with feature selection for different signal preprocessing
Data set Test success (%)
MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Signals 1–4 96.53 100 97.92 99.31 98.61 98.61
Derivative/integral 97.92 100 97.92 95.14 97.92 97.92
High-/low-pass filtering 94.44 100 97.92 100 99.31 100
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1277
5.2. Effect of number of selected features
Table 4 shows test classification success for the ANNs with number of selected features varyingfrom 3 to 6 out of 45 for the first four signals at low load and low sampling rate. The test successwas almost 100% for all three classifiers.
5.3. Performance of PNNs with selection of 6 features
The performance of PNNs with 6 features selected from the corresponding ranges was studied.The test success was 100% for all cases except one. The computation time (on a PC with PentiumIII processor of 533MHz and 64 MB RAM) for training the PNNs was also noted for eachcase. These values (39.397–61.468 s) were not much different from PNNs with three features(36.983–56.872 s) but higher than straight cases (0.250–1.272 s). These values were substantiallylower than RBFs and MLPs, however direct comparison is not made among the ANNs due todifference in code efficiency.
5.4. Results using statistical normalisation
The data sets discussed in previous sections were normalised in magnitude to keep the featureswithin 71. The procedure of GA-based selection of Section 5.2 was repeated for PNNs using thestatistically normalised features. The test classification success was almost 100% for both schemes
ARTICLE IN PRESS
Table 3
Performance comparison of classifiers without and with feature selection for different load and sampling rates
Data set Test success (%)
Load Sampling rate MLP RBF PNN
Straight With GA Straight With GA Straight With GA
Max Low 100 100 97.92 97.92 99.31 100
Min Low 96.88 100 94.44 100 100 100
Max High 80.21 100 86.81 99.31 96.53 100
Min High 97.57 100 97.92 100 99.31 100
Table 4
ANN performance for different number of features selected
Number of selected features Test success (%)
MLP RBF PNN
3 100 100 100
4 99.65 100 100
5 100 100 100
6 100 100 100
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821278
of normalisation. Training time increased somewhat with higher number of features but not indirect proportion.
5.5. Separability of data sets
To investigate the separability of data sets with and without gear fault, three features selectedby GA in MLP, RBF and PNN are shown in Figs. 1(a)–(c), respectively, for magnitudenormalised features. In all three cases, the data clusters are quite well separated with only a smallamount of overlap, especially the separation is the best for PNN. This can explain 100%
ARTICLE IN PRESS
00.2
0.40.6
0.81
-1
-0.5
0
0.5
10
0.2
0.4
0.6
0.8
1
(c)
0.40.5
0.60.7
0.80.9
1
0.2
0.4
0.6
0.8
10.4
0.5
0.6
0.7
0.8
0.9
1
1st Feature(a)
2nd Feature
3rd
Fea
ture
NormalFaulty
0.20.4
0.60.8
1
0.2
0.4
0.6
0.8
10
0.2
0.4
0.6
0.8
1
(b)
1st Feature2nd Feature
3rd
Fea
ture
1st Feature2nd Feature
3rd
Fea
ture
NormalFaulty
NormalFaulty
Fig. 1. Scatter plots of GA selected features with magnitude normalisation: (a) MLP, (b) RBF, (c) PNN.
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1279
classification success even with only three features for all three classifiers. Fig. 2 shows the scatterplot of three statistically normalised features selected by GA for PNN. This also shows wellseparation of data clusters explaining 100% classification success.
5.6. Comparison with other technique
Principal component analysis (PCA) is used to reduce dimensionality of data by forming newset of variables, known as principal components (PCs), representing the maximal variability in thedata without loss of information [16]. Figs. 3(a) and (b) show the plots of first three PCs formagnitude and statistically normalised feature sets, respectively. These PCs account for more than60% of variability of the feature sets. The separation between the data clusters for two classes isnot very prominent. The classification success of using the first three to eight PCs of magnitudenormalised data is presented in Table 5. These were found to be very unsatisfactory:54.86–67.71% for MLPs, 50.00–77.78% for RBFs, 65.97–85.42% for PNNs, compared to thefeature selection procedure of GA which gave almost 100% classification success for all threeclassifiers. This shows the superiority of the present approach of GA-based feature selection overusing the PCs.To summarise, the classification success of MLPs and PNNs, without feature selection, were
comparable and better than RBFs for most of the cases considered, but almost all weresubstantially less than 100%. The use of GAs with only three features gave almost 100%classification for most of the test cases even with different load conditions and sampling rates forall ANNs. The use of six selected features with PNNs gave almost 100% test success for the casesconsidered. The training time with feature selection was reasonable for PNNs and substantiallylower than RBFs and MLPs, with and without feature selection. The classification success ofthe ANNs compares well with that of support vector machines for the same data and feature
ARTICLE IN PRESS
-2
-1
0
1
2
-2
-1
0
1
2-2
-1
0
1
2
3
4
5
6
1st Feature2nd Feature
3rd
Fea
ture
NormalFaulty
Fig. 2. Scatter plot of GA selected features with statistical normalisation for PNN.
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821280
sets [12] and with other work of GA-based MLPs [10]. However, it should be mentioned that theANN-based approach has its inherent limitation that for changed machine condition withdifferent load, the retraining of ANNs may be required [5].
Acknowledgements
The data set was acquired in the Delft ‘‘Machine diagnostics by neural network’’ project withhelp from TechnoFysica B.V., The Netherlands and can be downloaded freely at web-address:
ARTICLE IN PRESS
-3-2 -1
01 2
-2-1
01
2-2
0
2
1st Principal component
(a)
2nd Principal component
3rd
Prin
cipa
lcom
pone
nt
NormalFaulty
-15-10 -5
05
10
-15-10
-50
510
-10
0
10
1st Principal component
(b)
2nd Principal component
3rd
Prin
cipa
l com
pone
nt
NormalFaulty
Fig. 3. Scatter plots of principal components for features with different normalisation schemes: (a) magnitude,
(b) statistical.
Table 5
ANN performance with principal components (PC)
Number of first PCs Test success (%)
MLP RBF PNN
3 65.28 63.89 65.97
4 67.71 77.78 74.31
5 54.51 57.64 75.69
6 64.24 53.47 83.33
8 54.86 50.00 85.42
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–1282 1281
http://www.ph.tn.tudelft.nl/Bypma/mechanical.html. The author thanks Dr. Alexander Ypma ofDelft Technical University for making the dataset available and providing useful clarifications.
References
[1] J. Shiroishi, Y. Li, S. Liang, T. Kurfess, S. Danyluk, Bearing condition diagnostics via vibration and acoustic
emission measurements, Mechanical Systems and Signal Processing 11 (1997) 693–705.
[2] R.B. Randall (Guest Ed.), Special issue on gear and bearing diagnostics, Mechanical Systems and Signal
Processing 15 (2001) 827–993.
[3] K.R. Al-Balushi, B. Samanta, Gear fault diagnosis using energy-based features of acoustic emission signals,
Proceedings of IMechE, Part I: Journal of Systems and Control Engineering 216 (2002) 249–263.
[4] A.C. McCormick, A.K. Nandi, Classification of the rotating machine condition using artificial neural networks,
Proceedings of IMechE, Part C: Journal of Mechanical Engineering Science 211 (1997) 439–450.
[5] B. Samanta, K.R. Al-Balushi, Artificial neural network based fault diagnostics of rolling element bearings using
time-domain features, Mechanical Systems and Signal Processing 17 (2003) 317–328.
[6] P.D. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, USA, 1995.
[7] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice Hall, NJ, USA, 1999.
[8] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1999) 109–118.
[9] L.B. Jack, A.K. Nandi, A.C. McCormick, Diagnosis of rolling element bearing faults using radial basis functions,
Applied Signal Processing 6 (1999) 25–32.
[10] L.B. Jack, Applications of artificial intelligence in machine condition monitoring, Ph.D. Thesis, Department of
Electrical Engineering and Electronics, the University of Liverpool, 2000.
[11] L.B. Jack, A.K. Nandi, Genetic algorithms for feature extraction in machine condition monitoring with vibration
signals, IEE Proceedings Vision and Image Signal Processing 147 (2000) 205–212.
[12] B. Samanta, Gear fault detection using artificial neural networks and support vector machines with genetic
algorithms, Mechanical Systems and Signal Processing 18 (2004) 625–644.
[13] A. Ypma, R. Ligteringen, R.P.W. Duin, E.E.E. Frietman, Pump vibration data sets, Pattern recognition group,
Delft University of Technology, 1999.
[14] Z. Michalewicz, Genetic algorithms+Data Structures=Evolution Programs, Springer, New York, USA, 1999.
[15] C.R. Houk, J. Joines, M.A. Kay, A genetic algorithm for function optimisation: a matlab implementation, North
Carolina State University, Report No: NCSU IE TR 95 09, 1995.
[16] I.T. Jolliffe, Principal Component Analysis, 2nd Edition, Springer, Berlin, 2002.
B. SamantaDepartment of Mechanical and Industrial Engineering College of Engineering, Sultan Qaboos
University P.O. Box 33, PC 123, Muscat, Sultanate of Oman
E-mail address: [email protected]
ARTICLE IN PRESS
Letter to the editor / Mechanical Systems and Signal Processing 18 (2004) 1273–12821282