Early Fault Detection Scheme and Optimized CBM Strategy ...€¦ · Early Fault Detection Scheme...
Transcript of Early Fault Detection Scheme and Optimized CBM Strategy ...€¦ · Early Fault Detection Scheme...
Early Fault Detection Scheme and Optimized CBM Strategy for Gear Transmission System Operating under
Varying Loads
by
Ming Yang
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Mechanical and Industrial Engineering University of Toronto
© Copyright by Ming Yang (2011)
II
Early Fault Detection Scheme and Optimized CBM Strategy for
Gear Transmission System Operating under Varying Loads
Ming Yang
Doctor of Philosophy
Graduate Department of Mechanical and Industrial Engineering University of Toronto
2011
Abstract The development of fault detection schemes and optimized condition-based maintenance (CBM)
strategies for gear transmission systems has received considerable attention in recent years.
Most models considered the gear systems operating under constant load. Constant load
assumptions imply that changes in condition monitoring data are caused only by deterioration of
the gear systems. However, most real gear systems operate under varying loads and speeds
which affect the condition monitoring data signature of the system. This typically makes it
difficult to recognize the occurrence of an impending fault and to optimize a CBM strategy.
This thesis first presents a novel approach to detect and localize the gear failure occurrence for a
gear system operating under varying load conditions. An autoregressive model with exogenous
variables is fitted to the time-synchronously averaged (TSA) condition monitoring data when the
gear transmission system operated under various load conditions in good state. The fault
detection and localization indicator is calculated by applying F-test to the residual signals of the
ARX model. Then, the gear deteriorating process is modeled as a three state hidden continuous
Markov model with partial information, and the model parameters are estimated using ARX
model residuals. The pseudo likelihood function is maximized and the EM algorithm is applied.
III
Finally, a multivariate Bayesian process control is developed to optimize the decision variables
of the CBM strategy: the time interval for next data collection and the control limit for the
posterior probability calculated at next data collection epoch. The decision variables are
calculated by minimizing the total expected cost from current stage to the end of the production
run. Dynamic programming is applied to update the decision variables after new monitoring data
are acquired, and the maintenance decision is made by taking all available information into
account.
IV
Acknowledgments
I would like to take this opportunity to thank everyone who has helped me during the
development of this thesis.
I am very grateful to my supervisor Dr. Viliam Makis. It is his vast knowledge, considerable
enthusiasm and great patience that guided me throughout the development of the research. His
careful reviews of my manuscript helped me a lot in this thesis development. I also enjoyed and
appreciated the thoughtful insights and advice from the other members of my PhD committee Dr.
Jean Zu and Dr. Roy H. Kwon, and I would like to thank them for their reviews of my thesis.
Also, I would like to acknowledge Syncrude Canada Ltd., NSERC, and the University of
Toronto Fellowship for providing financial support for my research. In addition, I would like to
thank all the people currently and formerly at the Department of Mechanical and Industrial
Engineering for providing a great social and professional environment.
I would also like to express my appreciation to all the members of Quality, Reliability and
Maintenance Laboratory, and Vibration Monitoring, Signal Processing and CBM Laboratory,
especially Lawrence Yip, Jing Yu, Rui Jiang, ZhiJian Yin, and Michael J. Kim for their help and
friendship.
Finally, I am grateful to my parents for their constant support and encouragement in spite of the
distance, and to my beautiful wife who has stood by my side and always supported me in all of
my endeavors.
V
Table of Contents
Chapter 1 Introduction................................................................................................................. 1
1.1 Motivation........................................................................................................................... 1
1.2 Thesis Outline ..................................................................................................................... 2
1.3 Thesis Contributions ........................................................................................................... 6
Chapter 2 ARX Model-Based Gearbox Fault Detection and Localization Operating
under Varying Load................................................................................................................. 8
2.1 Introduction......................................................................................................................... 8
2.2 Experimental set-up and data collection ........................................................................... 10
2.3 Time-Synchronous Averaging Algorithm and Filtered Envelope Extraction .................. 14
2.4 ARX Model using Gear Vibration Data under Varying Load.......................................... 18
2.4.1 ARX Model based on the Modified TSA Signal and Filtered TSA Envelope ........ 19
2.4.2 ARX Model Order Identification and Parameters Estimation.............................. 21
2.4.3 Numerical Results ................................................................................................. 23
2.5 Hypothesis testing............................................................................................................. 26
2.6 Conclusions....................................................................................................................... 28
Chapter 3 POHM Model for Monitoring Gear Deterioration Condition ............................. 30
3.1 Introduction....................................................................................................................... 30
3.2 Model Formulation of Gear Deterioration Process........................................................... 33
3.2.1 Model Assumptions ............................................................................................... 33
3.2.2 Continuous-Time Markov Model for Gear Deterioration Process....................... 34
3.2.3 Stochastic Relationship between Residual Observation and Operational States. 37
3.2.4 POHM model ........................................................................................................ 39
VI
3.3 Parameter Estimation Using EM Algorithm..................................................................... 41
3.3.1 Random Breakdown Time of a Gear Deterioration Process ................................ 41
3.3.2 Sojourn Time of the Deterioration Process in Good State ................................... 44
3.3.3 Parameter Estimation using the EM Algorithm.................................................... 45
3.4 Numerical Results............................................................................................................. 49
3.4.1 Residual Observations Extraction ........................................................................ 49
3.4.2 POHM Parameters Obtained by Expert Judgment .............................................. 52
3.4.3 Estimated POHM Parameters using the EM Iteration Algorithm........................ 55
3.5 Conclusions....................................................................................................................... 60
Chapter 4 Bayesian Process Control to Optimize Gearbox CBM Policy .............................. 62
4.1 Introduction....................................................................................................................... 62
4.2 Problem Statement and Assumptions ............................................................................... 64
4.3 The Bayesian model.......................................................................................................... 67
4.3.1 Posterior Probability of Gear Deteriorating Process in Warning State .............. 68
4.3.2 Distribution of jiz + ............................................................................................... 70
4.3.3 Control Limit of Gear Deteriorating Process in Good State................................ 72
4.4 Cost Function for Gear Deterioration Process .................................................................. 75
4.4.1 Cost Function before the End of the Production Run........................................... 76
4.4.2 Cost Function at the End of the Production Run.................................................. 78
4.5 Dynamic Programming to Optimize the Control Limit and the Data Collecting
Intervals............................................................................................................................. 78
4.6 Numerical Results............................................................................................................. 83
4.7 Conclusions....................................................................................................................... 88
Chapter 5 Conclusions and Future Research........................................................................... 89
VII
5.1 Conclusions....................................................................................................................... 89
5.2 Future Research ................................................................................................................ 91
5.2.1 Application of the Bayesian Process Control Model to Planetary Gear System.. 91
5.2.2 Relaxation of Model Assumptions......................................................................... 92
Bibliography ................................................................................................................................ 94
VIII
List of Tables
Table 3.1 EM Algorithm Iteration Results: Mean Vectors 1 μ and 2 μ ....................................... 56
Table 3.2 EM Algorithm Iteration Results: Covariance Matrix 1 Σ ............................................. 57
Table 3.3 EM Algorithm Iteration Results: Covariance Matrix 2 Σ ............................................ 58
Table 3.4 EM Algorithm Iteration Results: States Transition Parameters Λ ............................. 60
IX
List of Figures
Figure 1.1 Schematic Representation of Parallel Gear Transmission System................................ 1
Figure 2.1 Mechanical Diagnostic Test Rig ................................................................................. 11
Figure 2.2 Locations of Accelerometers ....................................................................................... 11
Figure 2.3 (a) Gearbox Output Torque and (b) Gearbox Input Speed.......................................... 12
Figure 2.4 Broken Teeth in Test Run #14..................................................................................... 13
Figure 2.5 Vibration Data: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%
Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100%
Output Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50%
Output Torque. ........................................................................................................... 17
Figure 2.6 TSA Signals (Dotted Line) and Filtered TSA Envelop (Solid Line): (a) Healthy State
300% Output Torque, (b) Faulty State at 300% Output Torque; (c) Healthy State at
100% Output Torque, (d) Faulty State at 100% Output Torque; (e) Healthy State at
50% Output Torque, (f) Faulty State at 50% Output Torque..................................... 18
Figure 2.7 TSA Data for ARX Model Building: (a) Filtered TSA Envelopes used as an ARX
Model Input (b) TSA Vibration Data used as an ARX Model Output ...................... 23
Figure 2.8 ARX Model Order Selection Using BIC Criterion ..................................................... 24
Figure 2.9 Autocorrelation Function for ARX Model Residuals ................................................. 25
Figure 2.10 ARX Residuals: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%
Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100%
Output Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50%
Output Torque. ........................................................................................................... 26
Figure 2.11 F-test of ARX Model Residuals (1 Indicates Rejection of 0 H ) ............................... 28
X
Figure 3.1 Transition Paths from Good State to Breakdown State............................................... 42
Figure 3.2 (a), (b) Residual Standard Deviations of A02 and A03, and (c) Motor Current Mean51
Figure 3.3 (a) and (b) Residuals of Polynomial Model for A02 and A03 .................................... 52
Figure 3.4 Residual Observations Collected by Accelerometers A02 and A03 ........................... 53
Figure 3.5 EM algorithm Estimated Mean Vectors 1 μ and 2 μ ................................................... 55
Figure 3.6 EM Algorithm Estimated Covariance Parameters 1 Σ and 2 Σ .................................. 57
Figure 3.7 EM Algorithm Estimated States Transition Parameters Λ ........................................ 59
Figure 4.1 Total expected cost at each production stage with breakdown cost 150..................... 84
Figure 4.2 Optimal data collection intervals and control limits with breakdown cost 150 .......... 85
Figure 4.3 Total expected cost at each production stage with breakdown cost 400..................... 86
Figure 4.4 Optimal data collection intervals and control limits with breakdown cost 400 .......... 87
Figure 5.1 Schematic Representation of Planetary Gear Transmission System........................... 91
Figure 5.2 Lab Planetary Gear Train Test Rig.............................................................................. 92
XI
List of Appendices
Appendix A: ARX model ........................................................................................................... 100
Appendix B: F-test...................................................................................................................... 101
1
Chapter 1 Introduction
1.1 Motivation Gears are one of the most efficient and compact mechanical designs to smoothly transmit torques
and change the angular velocities from one shaft to another via gear tooth contact Townsend [16].
The schematic of a gear transmission system is shown in Figure 1.1, which consists of a pinion
gear (the smaller gear) and a driven gear (the larger gear). The pinion gear has fewer teeth than
the driven gear. Normally, a gear transmission system is designed to reduce the angular velocity
in order to increase the output torque. In such a speed reduction gear transmission system, the
pinion is connected with an input shaft, and the driven gear is connected with an output shaft.
Figure 1.1 Schematic Representation of Parallel Gear Transmission System
Compared with other mechanical transmission systems, gear systems have very compact
structure due to short centre distances between two gears, and can achieve high reduction ratio
and a constant value of velocity ratio. Hence, gear transmission systems can be found in nearly
every rotating machine. Since the gear transmission system is the key mechanical component,
the failure of gear transmission system will directly impact the overall system. To improve gear
2
transmission systems reliability and reduce unplanned failure, it is necessary to apply condition
monitoring and to implement condition-based maintenance (CBM), where CBM is a
maintenance program that recommends maintenance decisions based on the information
collected through condition monitoring.
In industry, most of gears are sealed in a gearbox to reduce the noise due to gear tooth meshing,
to avoid the risk of injury due to the high rotation speed, and to prevent the leakage of lubricate
oil that is used to protect the meshing face of gears. However, the sealed gears make it
impossible to directly monitor and inspect the gears state without stopping the system. Davis [31]
indicated that gear transmission systems could fail in many different ways, and there is often no
indication of difficulty until total failure occurs, except for an increase in noise level and
vibration. Collecting and analyzing vibration data has been very popular for early fault detection,
condition monitoring, and CBM of gear transmission systems. The development of fault
detection and maintenance schemes for such gear system has received considerable attention in
recent years, such as Badaoui et al. [35], Hambaba et al. [3], Hambaba et al. [4], Lin et al. [12],
Miao & Makis [42], Qu & Lin [33], Tan & Mba [9], Toutountzakis et al. [50], Wang et al. [54],
Wang et al. [57], and Liu & Makis [6], mostly considering constant load. However, in real
production situations, most gear transmission system operate under varying load which affects
the vibration signature of the system and in general makes it difficult to recognize the occurrence
of an incipient fault. Only a few studies dealing with this problem have been found, such as
Zhan et al. [63], Zhan & Makis [62], and Wang et al. [59].
1.2 Thesis Outline In this study, we focus on the development of an early fault detection scheme and optimized a
CBM strategy based on the condition monitoring data collected from a gear transmission system
3
operating under varying load conditions. Chapter 2 presents an autoregressive model with
exogenous variables ARX-based approach to extract the fault-relating signature from condition
monitoring data. Then F-test is applied to detect and localize the gear early failure. In Chapter 3,
a partially observable hidden Markov (POHM) model is used to describe the deterioration
process of the gear transmission system and the stochastic relationship between condition
monitoring data and the gear deterioration. Using the ARX model residuals, the POHM model
parameters are estimated using expectation-maximization (EM) algorithm. Chapter 4 deals with
an optimal choice of the epoch of monitoring data collection and stopping policy to perform
inspection and possible preventive maintenance based on the POHM model and the ARX model
residuals. The conclusions and future research are summarized in Chapter 5.
Chapter 2
This Chapter presents a novel approach to detect and localize the gear failure for a gear
transmission system operating under varying load conditions. The data used in this research
have been collected from Mechanical Diagnostics Test Bed, which was designed to provide
experimental vibration data for research on diagnostics of a commercial transmission, at the
Applied Research Laboratory of the Pennsylvania State University. The gear transmission
system was driven at a constant input speed using a drive motor, and the variable torque was
applied by absorption motor. In this test, the gearbox contained a 70-tooth driven gear and a 21-
tooth pinion gear. The full lifetime condition monitoring data are collected from the experimental
system operating from a new condition to a breakdown under varying load. The data files were
separated into two groups. The first group data were used to build the model and the second
group data were used to detect the gearbox failure.
4
Vibration data obtained from the gearbox represent a complex combination of information plus
noise produced by the background. Firstly, a time-synchronous averaging (TSA) algorithm is
applied to average out the contributions of both the noise and signals whose periods are not equal
to the shaft rotation period of the gear of interest in a gearbox. Filtered TSA envelopes are then
extracted from the TSA vibration data for each data file. Using the TSA signals as output and
the filtered TSA envelopes as input variable, an ARX model is identified based on Bayesian
information criterion (BIC), and the model parameters are estimated using the least squares
algorithm. The residual signal of an ARX model represents the difference from the average
tooth-meshing vibration in a healthy state and usually shows evidence of faults earlier and more
clearly than the TSA signal. Since the majority of energy in the healthy state of the target gear is
concentrated at the meshing fundamental and its harmonics, the ARX model residuals are much
less sensitive to the load variation and shaft imbalance than the TSA signal.
The variance of the ARX model residuals is further investigated in this research. A statistical
hypothesis test (F-test) is conducted to test the equality of the population variance and sample
variance of the ARX model residuals for each data file. The p-value of the F-test is used as
indicator of fault advancement over a full lifetime of a gear transmission system, which reduces
the complexity of the decision making based on the gear motion residuals. The F-test results can
predict not only when the gear failure occurs, but also where the gear failure is localized.
Chapter 3
In Chapter 3, the deterioration process of the gear transmission system is modeled as a
homogeneous continuous-time Markov chain with three states: good state 1 s , warning state 2 s ,
and breakdown state 3 s . State 1 s , and state 2 s are classified as operational states and 3 s is a
non-operational state. State 1 s is considered as either new gear or gear with minor defects, and
5
it is not necessary to take any maintenance action at this state. State 2 s is worse than 1 s and
preventive maintenance action must be taken immediately to avoid major breakdown. State 3 s
is breakdown state and can be observed directly.
If the gear transmission system breakdown costs are higher than preventive maintenance costs,
which is true in real application, the warning state becomes a critical state in the gear
deteriorating process. It is important to decide when the gear deteriorating process transits into
warning state, so that we can apply preventive maintenance to avoid the considerable breakdown
cost. However, the warning state is unobservable, so the residual information, which is extracted
from condition monitoring data, is used to estimate the epoch when the gear transits into warning
state. The stochastic relationship between the gear real states and residual information is
described by a multivariate normal distribution which depends on the actual system state.
Partially observable hidden Markov (POHM) model can be applied to the system where the
internal states are not known, but can only be inferred from external observations. Because the
maximum likelihood estimation of the POHM model parameters is difficult to apply, the pseudo
likelihood function is introduced to estimate the model parameters. The model parameters are
obtained by maximizing the pseudo likelihood function, and the feasible solution is the result of
the expectation-maximization (EM) iterative algorithm. The POHM model parameters are
calibrated using the real condition monitoring data collected from Mechanical Diagnostics Test
Bed. The model plays an important role in establishing a Bayesian process control model to
optimize the data collection schedule and CBM strategy in chapter 4.
Chapter 4
In this chapter, a multivariate Bayesian control model is presented to optimize an optimal CBM
strategy for the gear transmission system. First, we assume the gear transmission system
6
production run is finite, so the system deteriorating process is considered as a finite process.
Once the length of the production run is achieved, the system will be stop and an inspection will
be applied no matter what maintenance decision is made based on the condition monitoring data.
Before the system breakdown state 3 s is observed, there are two states in the system
deteriorating process: good state 1 s and warning state 2 s . Right after new data are collected, a
posterior probability that the process transits into warning state is calculated. If the posterior
probability is greater than a pre-calculated control limit, the system will be stopped for
inspection. On the other hand, if posterior probability is less than the control limit, this posterior
probability will be applied to compute an epoch of the next data collection and a control limit to
make a inspection decision after next data collection.
In this chapter, we optimize the epoch of each data collection and the inspection schedule by
minimizing the total expected cost using dynamic programming. Dynamic programming is a
well known method of solving the complex optimization problems by using backward recursion
algorithm. However, dynamic programming requires that the state variable and the decision
variable should be in a finite set, so we discretize the posterior probability, sampling interval and
control limit variable. Then, dynamic programming optimality equation is established with the
control limit and the data collection interval as decision variables, and the posterior probability
as a state variable.
1.3 Thesis Contributions The main original contributions of this research are as follows:
1) An ARX model-based approach to extract the gear fault-relating signature of the gear
transmission system operating under varying load condition has been established. The
ARX model residuals dramatically alleviate the impact of the gear meshing signature and
7
varying load signature, and show evidence of gear failure occurrence earlier and more
clearly than the original vibration data.
2) F-test as a quantitative indicator of gear fault occurrence has been introduced. The test
reduces the complexity of the decision making based on the ARX model residuals. The F-
test results can tell us not only when the gear failure occurs, but also where the gear failure is
localized
3) A POHM model for the gear transmission system based on the expectation-
maximization iterative algorithm has been developed. The numerical iteration procedure
generated a reasonable parameter estimates compared to our results from subjective
judgment. It is confirmed that the POHM model can be applied to estimate gear hidden
states based on the condition monitoring data.
4) A multivariate Bayesian control model for optimizing the gear transmission system
CBM strategy has been developed. When new data are collected, the Bayes theorem is
applied to get updated posterior probability, which is used to recalculate the epoch of next
data collection and make an on-line inspection decision by minimizing total expected cost.
8
Chapter 2 ARX Model-Based Gearbox Fault Detection and Localization
Operating under Varying Load
2.1 Introduction Gears are the most efficient and compact devices used to transmit torques and change the angular
velocities. They are widely applied in many machines, such as mining machines, automobiles,
helicopters, and aircraft turbine engines. Gearbox vibration data carries a lot of useful
information and it has been very popular for condition monitoring and early fault detection of
gearboxes. The condition monitoring and fault detection schemes improve gear transmission
systems reliability and reduce their failure occurrence. The development of fault detection and
diagnostic schemes for gear transmission systems has been an active area of research in recent
years, due to the need for many manufacturing companies to reduce unplanned production
capacity loss caused by gear transmission systems failure and to improve equipment reliability
through condition monitoring and failure prevention. Vibration data are mostly collected by
accelerometers mounted on the shell of a gearbox. However, the accelerometers collect not only
gear vibration signals, but also the vibration signals associated with all transmission components
(such as rotating shafts, bearings, and motor rotation), which makes it very difficult to recognize
the occurrence of an incipient fault.
In recent articles, advanced non-parametric approaches have been considered such as wavelets in
Wang et al. [17] and Wang & McFadden [56], short-time Fourier transform in Wang &
McFadden [55], and multiresolution Fourier transform in Kar & Mohanty [8]. Some papers have
applied parametric time-series models for gearbox fault detection, mostly assuming constant load.
In Kay [46], the author showed that time series models are appropriate for representing spectra
with sharp peaks (but not deep valleys) and are particularly useful for modeling sinusoidal data,
9
which is the case of the gear meshing signals. For example, in Wang & Wong [53], an
autoregressive (AR) model was considered for gear fault detection fitted to the vibration signal
of the gear of interest in its healthy state. The model was used as a linear prediction error filter
to process the future state signal from the same gear. The gear state was assessed using the error
signal obtained as the difference between the filtered and unfiltered signals. Zhan & Makis [62]
presented an adaptive Kalman filter-based AR model with varying coefficients fitted to the gear
motion residual signal in the healthy state of the gear of interest considering several load
conditions. The compromised model order was determined using several criteria. Liu & Makis
[6] proposed a gear failure diagnosis method based on vector autoregressive (VAR) modeling of
the vibration signals, dimensionality reduction applying dynamic principal component analysis
and condition monitoring using a multivariate Q control chart. Rofe [49] considered an
autoregressive moving-average (ARMA) model as a prediction error filter to detect gear fault
accounting for load variation, and the variance and kurtosis of ARMA model residuals were used
to indicate the presence or absence of a fault. However, different models were fitted for different
loads, i.e., each model was built under the assumption that the load was a constant.
If the load is assumed to be a constant, the vibration signals caused by a fluctuating load are not
interpreted correctly. The fluctuating load condition dramatically affects the vibration signature
of the system, so that the model developed under the constant load assumption cannot recognize
whether the vibration signature changes are caused by the load variation or by a failure
occurrence. It should be noted that most real production systems operate under varying load.
The previous AR, ARMA and VAR models which were fitted to vibration data did not consider
load variation. The proper approach would be to consider load as additional information in a
time series model, so that an autoregressive model with an exogenous input (ARX) or a
10
multivariate vector ARX model would be the appropriate choice. To our knowledge, this novel
approach has not been considered in the vibration literature.
In this section, an ARX model is proposed to consider varying load as the input, so that the
model and the developed fault detection scheme can be used in real situations. The section is
organized as follows. Subsection 2.2 briefly describes the experimental gear rig and the data
acquisition system used in this study. In Subsection 2.3, a time-synchronous averaging (TSA)
algorithm is applied to average out the contributions of both the noise and signals whose periods
are not equal to the shaft rotation period of the gear of interest in a gearbox. Filtered TSA
envelopes are then extracted from the TSA vibration data for each data file under varying load
condition. In Subsection 2.4, using the TSA signals as output and the filtered TSA envelopes as
input variable, an ARX model is identified and fitted to the data. In Subsection 2.5, F -test
applied to the ARX model residuals is performed and the corresponding p-value is proposed as a
quantitative indicator of fault advancement over a full lifetime of a gearbox. The conclusions are
summarized in Subsection 2.6.
2.2 Experimental set-up and data collection The vibration data used in this research have been obtained from the Applied Research
Laboratory of the Pennsylvania State University. The data were collected using the Mechanical
Diagnostics Test Bed (as shown in Figure 2.1), which was designed to provide experimental
vibration data for research on diagnostics of a commercial transmission. The gearbox was driven
at a set input speed using a 22.38KW, 1750 rpm drive motor, and the torque was applied by a
55.95KW absorption motor (MDTB [38]). A vector unit capable of controlling the current
output of the absorption motor accomplished the variation of the torque. The MDTB is highly
efficient because the electrical power that is generated by the absorber is fed back to the driver
11
motor. The mechanical and electrical losses are sustained by a small fraction of wall power. The
MDTB has the capability of testing single and double reduction industrial gearboxes with ratios
from about 1.2:1 to 6:1. The gearboxes are nominally in the 3.675~14.7 kW range. The system is
defined to provide the maximum versatility to speed and torque settings.
Figure 2.1 Mechanical Diagnostic Test Rig
The vibration data of test-run #14 is used in this study. In this test, the gearbox contained a 70-
tooth driven gear and a 21-tooth pinion gear. The nominal output of the gearbox was 62.66 Nm.
The vibration data were collected by accelerometer A02 and A03 mounted on the shell of the
gearbox as shown in the following figure.
Figure 2.2 Locations of Accelerometers
12
Each data file obtained from accelerometer A02 includes 200,000 samples, which were collected
in 10 second windows with 20 kHz sampling frequency. The resolution of analog-to-digital
converter was 16-bit, which assured that the accuracy of the accelerometers was preserved. The
motor velocity was acquired using the drive motor speed vector feedback V01. The gearbox
output load was acquired from the absorption motor torque vector feedback V05. The sampling
frequencies of V01 and V05 were 1 kHz. After the vibration data collection, V01 and V05
started collecting load data. In each data file, there were 10,000 samples, which were collected
during 10 second periods with 1kHz sampling frequency. Hence, the vibration data were not
synchronized with the V01 and V05 data. The whole test took 116 hours, and there were 338
data files in total. The gearbox was run at 100% (62.66 Nm) output torque for 96 hours. Then,
the gearbox was run under varying load from file 194 to 338. The output load ramped up from
50% (31.33 Nm) to 300% (187.98 Nm) in 8 minutes, and then ramped down back to 50% again
in 8 minutes. Figure 2.3 shows the mean of the drive motor speed and gearbox output torque.
Figure 2.3 (a) Gearbox Output Torque and (b) Gearbox Input Speed
13
The gearbox input speed fluctuated in the range less than 0.06%, so the speed can be consider as
a constant. After 20 hours, the test-run #14 was shut down with five fully broken and two
partially broken teeth on the driven gear which occurred as shown in the following figure.
Figure 2.4 Broken Teeth in Test Run #14
In this research, we considered the data files obtained under varying load condition, so the files
from 194 to 338 were analyzed except file 212 that was reported unreliable in MDTB [38] due to
some accelerometer problems. All the data files were separated into two groups. The files 194 -
246 were used to build the ARX model and the files 247 - 338 were used to detect the gearbox
failure.
We chose six data files to demonstrate our procedures. The file 283 was selected, since it was
the first file showing that the gear fault occurred. The results obtained for file 283 were
compared with the result obtained for file 254, both of these files were collected when the
gearbox was ramped up to 300% output load. Also, we chose files 259 and 288 for a comparison,
since for both of them the output load was ramped down to 100%. When the gearbox runs at a
low output load condition, the detection of gear failure is much more difficult than when the
gearbox runs at a high output load condition. For the low load condition, we compared the
14
results using the data files 261 and 290, both were collected at 50% output torque. Based on the
inspection performed after gearbox failure, the data files 254, 259, and 261 were collected when
the gearbox was in healthy state, and files 283, 288, and 290 were collected when the gearbox
was in failure state.
2.3 Time-Synchronous Averaging Algorithm and Filtered Envelope Extraction
The filtered TSA envelope is composed of the information of load variation and gear shaft
vibration due to shaft imbalance. This envelope is considered as an input for the ARX model. A
method how to extract the filtered TSA envelope from the collected vibration data is presented in
this section. First, the TSA signal is obtained and then the envelope carrying load information is
extracted from the TSA signal. Finally, a low pass filter is applied to remove the gear fault
signature from the TSA envelope.
Vibration data obtained from the gearbox represent a complex combination of information plus
noise produced by the background. Time-domain Synchronous Averaging Algorithm (TSA)
providing an average time signal of one individual gear over a large number of cycles has been
commonly used in the detection of gear faults, see e.g. McFadden [39], [40], and Dalpiaz et al.
[22]. TSA reduces the contributions of both noise and signals whose periods are not equal to the
given gear shaft rotation period by synchronizing the sampling of the vibration signal with the
shaft rotation of a particular gear and evaluating the ensemble average over many revolutions
with the start of each revolution at the same angular position. Suppose there are n data points in
each vibration data file collected from a gearbox operating under a constant speed. )(kV ,
nk ,...,2,1= , is used to denote this discrete time vibration data. Since the gearbox is running
15
under a constant speed, the number of sampling points which correspond to one complete
revolution of the gear of interest is described as
⎥⎥
⎤⎢⎢
⎡=
t m
s
/NffK (2.1)
where s f is the sampling frequency, mf is the fundamental meshing frequency of the gear of
interest, tN is the number of teeth of the given gear, and ⎡ ⎤ • is the ceil function returning the
closest higher integer. The number of cycles to be averaged can be obtained from
⎣ ⎦ 0 / KNM = , where N is total number of sampling points in each data file and ⎣ ⎦• is the
floor function returning the closest lower integer. In the time domain, the conventional TSA
signal of )(kV is defined as
KkiKkM
k M
i,...,2,1 ,)(1)(
1
00
TSA 0
=+= ∑−
=
VV (2.2)
If the gearbox operates under various speeds, we will re-sample the original vibration data. For
example, if the input speed increases, we decrease the re-sampling frequency; if the input speed
decreases, we increase the re-sampling frequency. In this way, the number of data points which
correspond to one complete revolution of the gear becomes a constant, so the Equation (2.2) can
be applied to calculate the TSA signals.
In next section, the TSA signal is going to be processed by an ARX model with an optimal
model order b , s and r , which requires that the number of TSA signals is cK + , where
( )sbrc += , max , to obtain K residuals corresponding to a complete revolution of the gear of
interest. Thus, the equation to calculate the TSA signal is modified as follows:
16
cKkiKkM
kM
i+=+= ∑
−
=
,...,2,1 ,)(1)( 1
0
TSA VV (2.3)
where ⎣ ⎦ ⎡ ⎤ -/ c/KKNM = .
The TSA signals contain not only gear fault signatures, but also gear meshing frequencies, gear
shaft imbalance signals, and load variation signatures. An ARX model is introduced to partially
filter out the gear meshing frequencies, shaft imbalance signals, and load variation signatures
from the TSA signals using a filtered TSA envelope as an external input. We will now present a
method how to extract a filtered TSA envelope. First, we localize peaks of the meshing signal of
each gear. The upper envelope is made up of all the local maximum values. We filter out the
peaks caused by the local gear fault, and fit a piecewise linear function ul to the modified upper
envelope. Using the same method, the other piecewise function dl is fitted to the lower
envelope using the local minimum value of each gear meshing signal. The estimated load
signals )(ˆ kl for each data file are given as:
( ) ( )2
)(ˆ klklkl du −= (2.4)
When a local gear fault occurs, a peak may be found in the estimated load signals and this kind
of peak can be considered as a impulse component when compared with estimated load signals
)(ˆ kl . The filtered TSA envelope )( kx is extracted from the estimated load signals )(ˆ kl
by applying a is applied to average out the impact this impulse component caused by the local
gear fault from. So, the filtered TSA envelope )( kx contain the information related to
varying load and shaft imbalance, and this information is not affected by local gear faults.
17
Then, we applied previously described TSA technique to the vibration data collected by
accelerometer A02. Each data file contained the data collected during 10 seconds. As shown in
Figure 2.5, the horizontal axis is the time and the vertical axis is the amplitude of the acceleration.
Figure 2.5 Vibration Data: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%
Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100% Output
Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50% Output Torque.
Based on these figures, it is difficult to obtain any information of gear state, so we apply the TSA
algorithm and extract filtered TSA envelopes, which are shown in the following figures
18
Figure 2.6 TSA Signals (Dotted Line) and Filtered TSA Envelop (Solid Line): (a) Healthy State
300% Output Torque, (b) Faulty State at 300% Output Torque; (c) Healthy State at 100% Output
Torque, (d) Faulty State at 100% Output Torque; (e) Healthy State at 50% Output Torque, (f)
Faulty State at 50% Output Torque.
2.4 ARX Model using Gear Vibration Data under Varying Load In this study, an ARX model is utilized to detect and localize gear failure for a gearbox operating
under varying load. The residual signal of an ARX model represents the difference from the
average tooth-meshing vibration in a healthy state and usually shows evidence of faults earlier
and more clearly than the TSA signal. Since the majority of energy in the healthy state of the
target gear is concentrated at the meshing fundamental and its harmonics, the ARX model
residual signal will be much less sensitive to the load variation and shaft imbalance than the TSA
signal. In the next section, we present an approach to identify the model order and to estimate
parameters of an ARX model and show numerical results using real data.
19
2.4.1 ARX Model based on the Modified TSA Signal and Filtered TSA Envelope
The TSA signal TSA V allows signal frequencies, which are synchronous with the meshing
frequency of interest, to be isolated from all other signals. However, TSA V still contains a large
numbers of components which are not related to the gear fault signature, such as the gear
meshing signal g V , the load variation signature, and the gear shaft vibration signal caused by
shaft geometric asymmetry and assembly errors (Rofe [49]). In gear fault detection and
diagnosis, it is necessary to isolate these signals from the TSA signal to obtain a residual signal
that is more efficiently indicative of the gear fault.
For healthy gears, the gear meshing signal is modulated by some low shaft order (first and/or
second order) functions. The spectrum of the gear meshing signal is therefore dominated by the
sharp spectral peaks at the fundamental gear meshing frequency and its harmonics accompanied
by some low-order modulation sidebands. The TSA signal of a gear meshing signal can be
expressed by the following equation (McFadden [40]):
[ ] bKkkβkfkAH
kH
h h h h h h +=+++= ∑
=
,...,2,1 , ) ( 2 cos ) (1 1)(0
g θaV π (2.5)
where Hh , ... , 1 , 0= is the meshing harmonic number, hA is the amplitude at the th- h
harmonic frequency (i.e., s fhf h ×= ), ) ( k ha is the amplitude modulation function, hβ is
the initial phase, and ) ( k hθ is the phase modulation function at the th- h harmonic.
Kay [46] proved that the AR model is appropriate to represent spectra of gear meshing signals
with sharp peaks. For a gearbox operating under constant load and speed, and the gear shaft
imbalances ignored, Wang & Wong [53] successfully applied an AR model to represent gear
meshing signal. The AR model is given by the following equation:
20
( ) )()()( g g g kkk r nVBδV += (2.6)
where
( ) rrr
1 BBBδ δδ L−= (2.7)
B denotes the backshift operator and g n are the AR models residuals.
When the load variation and shaft imbalance are present, the filtered TSA envelope )( kx can
be used to represent the combination of the signature caused by varying load and shaft imbalance.
Then, the TSA signal TSA V is described as an ARX model with the filtered TSA envelope as an
exogenous variable by the following equation:
( ) )( )( )()( g TSA kkkk p nxBwVV ++= (2.8)
where
( ) ppp www
1 0 BBBw L++= (2.9)
and )( kn is an error term. More details about ARX model are introduced in Appendix A.
Submitting Equation (2.6) into Equation (2.8), we can represent the TSA signal as
( ) ( )
( ) ( )( ) ( ) ( ) ( ) ( ) ( )
( ) ( )[ ]( ) ( )[ ] ( )[ ]
( ) ( ) ( )[ ] ( )[ ] )()( 1)( 1 )(
)()( 1)( 1
)()( )(
)( -)( -)( )(
)( )( )()(
)( )( )()()(
g TSA
g
g g
g g
g g
g g TSA
kkkk
kkk
kkk
kkkk
kkkk
kkkkk
rrpr
rrp
pr
rprrpr
pr
pr
nnBδxBδBwVBδ
nnBδxBδBw
nxBwVBδ
nBδxBwBδnBδxBwBδ
nxBwnVBδ
nxBwnVBδV
+−+−+=
+−+−+
++=
++
+++=
+++=
(2.10)
21
In order to rewrite the above equation as an ARX model, letting ( ) ( ) ( )[ ]BδBwBBω rpb
s
1 −=
and ( )[ ] )()(1)( g kkk r nnBδN +−= , we have an ARX model described by the following
equation:
( ) ( ) )( )( )()(
TSA
TSA kkkk bsr NxBBωVBδV ++= (2.11)
where )( kN is the noise term approximated by the ARX model residuals. For fault detection,
only the information embedded in residuals is of interest.
2.4.2 ARX Model Order Identification and Parameters Estimation
In this section, we show how to identify an ARX model order and estimate the model parameters.
The structure of an ARX model is represented by the equation:
( ) ( ) )( )( )( )( kkkk bsr NxBBωyBδy ++= (2.12)
where
( ) sss
1 0 BBB ωωωω +++= L (2.13)
( ) rrr
1 BBB δδδ ++= L (2.14)
y is the TSA vibration signal )(TSA kV , x is the ARX model exogenous input representing
the filtered TSA envelop, and )( kN is assumed as a Gaussian white noise with zero mean and
variance 2 Nσ when the gearbox is in a healthy state.
In order to test the normality assumption of )( kN , we apply the Jarque-Bera test (Jarque &
Bera [10] and [11]) that is a goodness-of-fit measure of departure from normality based on the
sample kurtosis JK and skewness JS with sample size Jn . The test statistic J is defined as
22
( )⎥⎦
⎤⎢⎣
⎡ −+=
43
6
2 2
J
JJ KSnJ (2.15)
If the sample size Jn is large (e.g. 3 >Jn , preferably 100 >Jn ), J is chi-square distribution
with two degrees of freedom. Based on the Jarque-Bera test with significant level 0.01, each
ARX residual corresponding to from data files 194 to 246 is normal distribution except data file
210. Then we calculate the autocorrelation of all ARX model residual from data files 194 to 246,
and the Figure 2.9 shows that the uncorrelated assumption about )( kN is reasonable.
For given model order parameters b , s and r , the parameters of the ARX model are estimated
using the least squares algorithm (Ljung [32]):
)( )( )( )( ˆ1
1-
1∑∑==
⎥⎦
⎤⎢⎣
⎡=
m
k
m
k
T m kkkk yθ ϕϕϕ (2.16)
where m is the number of data values, mθ is the estimated ARX model parameter vector
[ ]T
sr
0 1 ˆˆˆˆˆ ωωδδ KK=θ (2.17)
and ϕ is the input and output series vector used to build the ARX model,
[ ]T , )(, , )(, )(, , )1()( sbkbkrkkk −−−−−−−= xxyy KKϕ (2.18)
The model order parameters b , s and r are determined using Bayesian information criterion
(BIC) (Schwarz [24])
mmdσ
Nrsb) ln( ) ˆ (ln BIC 2
, , += (2.19)
23
where 2ˆ Nσ is the maximum likelihood estimate of 2
Nσ , and 1+++= rsbd is the number of
estimated parameters in the model. BIC imposes a greater penalty for the number of estimated
model parameters than does the Akaike information criterion (Akaike [27]):
constant 2 ) ˆ (ln AIC 2 , , ++=
mdσ
Nrsb (2.20)
Once the ARX model is identified and all its parameters are estimated, the ARX model residuals
can be calculated using the following equation:
( )[ ] ( ) )( ˆ-)(ˆ1)( ˆ kkk bsr xBB yBN ωδ−= (2.21)
2.4.3 Numerical Results
In this study, the filtered TSA envelope and TSA data TSA V from files 194 to 246 (shown in
Figure 2.7) are used to fit an ARX model. Comparing Figure 2.7 with Figure 2.3, we can find
out that the filtered TSA envelope contains the components caused not only by varying load but
also by the gear shaft imbalance.
Figure 2.7 TSA Data for ARX Model Building: (a) Filtered TSA Envelopes used as an ARX
Model Input (b) TSA Vibration Data used as an ARX Model Output
24
As mentioned above, least squares algorithm is applied to estimate the ARX model parameters.
The order of the model is identified by BIC approach that was introduced in Section 2.4.2.
Figure 2.8 shows that the minimum value of BIC for our ARX model is obtained for 0=b , 3=s
and 66=r .
Figure 2.8 ARX Model Order Selection Using BIC Criterion
The parameters of this ARX model ( ) ( ) )( )( )1( )( 3 66 kkkk NxBωyBδ y ++−= were obtained as
follows.
( ) 2 3 5991.0674.1082.1 BBBω +−= (2.22)
and
25
( )
66 65 64 63 62 61
60 59 58 57 56 55
54 53 52 51 50 49
48 47 46 45 44 43
42 41 40 39 38 37
36 35 34 33 32 31
30 29 28 27 26 25
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 9 8 7
6 5 4 3 2 66
0.196-0.23730.0418-0.04786 0.01407 - 0.05247
0.03023-0.01164-0.00834-0.02966 0.03620.007412
0.06028 0.119 - 0.02916 0.02914 - 0.0716 0.03102-
0.024 0.02912 0.02091 0.02584 0.04177 0.01523-
0.05132 0.02264 0.02903 - 0.123 0.08044 - 0.02661
0.1782 - 0.2217 0.2306 - 0.01246 0.1178 0.03604
0.012860.001733 0.05813-0.0011250.10340.1493 -
0.09999 0.06914 0.2211 - 0.1128 0.1622 0.136 -
0.0005206-0.0126 0.03125-0.04267 0.1695-0.1699
0.1243 - 0.01803 - 0.06916 0.1159- 0.08425 0.011 -
0.11690.008993 0.12930.29651.01-1.598
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
Bδ
+++
+−+
+++
+++−+
+−++
++++
+−++
++++
+++
++
−+++=
(2.23)
The residual checking described in Box et al. [19] is used to examine whether the ARX model is
correct. Figure 2.9 shows the autocorrelations of the ARX model residuals )( kN using the
data from files194 to 246. All autocorrelations are very close to zero and do not show any
marked correlation pattern. Hence, the residual checking indicates that the model is adequate
and correct.
Figure 2.9 Autocorrelation Function for ARX Model Residuals
26
Now, we compute the ARX residuals for each data file from 194 to 338.
Figure 2.10 shows the ARX residuals for data files 254, 283, 259, 288, 261, and 290. The gear
failure still cannot be easily detected using the ARX residuals, so the F-test is introduced to
analyze the ARX model residuals. The results of the F-test can tell us not only when the gear
failure occurs, but also where the gear failure is localized.
Figure 2.10 ARX Residuals: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%
Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100% Output
Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50% Output Torque.
2.5 Hypothesis testing The variance of residuals shows a strong relationship with the state of each gear tooth, so
variance of the ARX model residuals is investigated in this research. After the ARX model
residuals are obtained, an F-test is applied to the variance of the residuals. More details about F-
test are given in Appendix B. The F-test result is then used as indicator to decide whether the
27
gear default occurred or not, which reduces the complexity of the decision making based on the
gear motion residuals.
First, the 70-tooth driven gear is divided into 10 sections so that each section contains 7 adjacent
teeth corresponding to 36 degrees out of the total of 360 degrees of one shaft rotation. The ARX
model residuals are uncorrelated and independent when the teeth are in the healthy-state. Hence,
the model residuals would be normally distributed with zero mean. In the healthy state, it is
reasonable to assume that all the gears are identical, so we denote the population variance of the
ARX residuals of each data file as 2 σ , the sample variance of the sn residuals in each section as
2 s S , and the sample variance of the nn residuals of the other 63 teeth as 2
nS . Then, F-statistic is
utilized to test the equality of the population variance 2 sσ and the no section variance 2
nσ .
Hence, we test the hypothesis
221
220
n
s
n
s
: H
: H
σσ
σσ
>
= (2.24)
using the following F-statistic
,2
2
, n
s
SSF
ns=νν (2.25)
where 1 s −= snν and 1 −= nn nν are degrees of freedom. Since we have a one sided test, the
rejection of the null hypothesis 0 H means that the variance 2 sσ is greater than 2
nσ . The
residuals of the selected section are more variable than the residuals of other sections, which
indicate a gear fault occurrence in this section.
Figure 2.11 shows that there is no rejection of the null hypothesis 0 H for gear teeth 1~35 and
57~70. In the section containing gear teeth 43~49, the 0 H is rejected starting from the data file
28
283, which indicates that the gear fault occurred. Also, for the sections of 36~42 and 50~56, the
hypothesis 0 H is rejected starting from the data files 335 and 338, respectively.
Figure 2.11 F-test of ARX Model Residuals (1 Indicates Rejection of 0 H )
2.6 Conclusions In this section, a gearbox fault detection scheme was proposed using vibration data collected
under varying load condition. A modified TSA algorithm was introduced to compute a TSA
signal for an ARX model building. An ARX model takes into account the effect of the
exogenous input variables on the output variables. In this research, the ARX model was fitted
using the filtered TSA envelope as an exogenous input and the TSA signal as the output. In each
data file, the TSA signal includes about 2500 data points collected in 10 sec. The model order is
66 and the load varying information is included in the TSA envelopes. The parameters of the
ARX model were estimated using the least squares algorithm, and the order of the ARX model
29
was determined using BIC criterion. Then, the F -test was applied to the variances of the ARX
model residuals for each data file, and the corresponding p-value was proposed as a quantitative
indicator of fault advancement over a full lifetime of a gearbox. The method presented in this
section was compared with the method in Zhan & Makis [61]. The data from the test-run # 14
was analyzed in both studies. The method presented in this chapter can detect a gear fault
occurrence earlier than their method. As shown in Figure 2.11, the gear defects started from
section of 43~59 and spread to the adjacent sections later. At the end of the testing, the gearbox
was checked and seven teeth were broken and spread in 3 adjacent sections, which gives
evidence that the ARX model can localize the gear defects, estimate the severity of the gear
defects and describe the gear deteriorating process. Also, our method is much faster and hence,
very suitable for on-line gearbox fault detection and damage localization. The results obtained in
this chapter are very promising and more research should be done in this area.
30
Chapter 3 POHM Model for Monitoring Gear Deterioration Condition
3.1 Introduction In Chapter 2, an early gear fault detection scheme was designed and validated using real
vibration data. The state of the gear in early detection can be classified into one of two states: 1)
there is no gear defect; 2) there exist some minor defects. The F-test statistical method was
applied to infer the state of the gear. The result of the F-test can be 0 or 1. If the result is 0, we
can infer that there is no gear defect. If the result is 1, we can conclude that gear defect has
started to develop; the test system should be stopped and inspected. If there was no gear defect
on inspection, we concluded the F-test gave a false alarm. If there existed some minor defects
on inspection, then maintenance action must be taken immediately to avoid a major breakdown.
Based on the above description, the states of the gear system cannot be observed directly. The
actual states are hidden and unobservable. In situations other than complete system breakdown,
the only way to observe the actual state is to stop and inspect the system. In many situations, the
cost of inspection is very high. The ability to model the deterioration of the gear transmission
system will allow for optimization of the schedule for performing inspection and preventive
maintenance. Furthermore, the model should consider both the unobservable states, also called
hidden states, as well as the observable states.
In this chapter, a partially observable hidden Markov (POHM) model is used to model the
deterioration process of the gear transmission system, and the stochastic relationship between the
condition monitoring data and the system states. POHM model considers both unobservable and
observable states, and it is a generalization of the Hidden Markov model. Hidden Markov model
considers only the unobservable process, and is an extension of a Markov model where the
31
observations are probabilistic functions of the states. A hidden Markov chain is a random
process that involves a finite number of states. These states are linked by possible transitions,
and they are stochastically related to an observation process. The state transition is only
dependent on the current state and not on the past states. The actual sequence of states is not
observable. Hidden Markov models have been successfully used in the research area of speech
recognition (e.g. Lawrence [44]), and later also in the area of system diagnostics and fault
detection (e.g. Daidone et al. [1], Ge et al. [36], LeGland & Mevel [20] Nelwamondo et al.
[21], Bunks et al. [7], Miao & Makis [42], and Ocak & Loparo [30]).
POHM models have been used in various maintenance applications. In Jiang & Makis [58], a
POHM model was proposed to describe a partly observable deteriorating system. The state of
the POHM model evolves as a continuous-time homogeneous Markov process with a finite state
space. With the exception of the breakdown state, the system states are not observable.
However the unobservable states can be estimated using the condition monitoring data. Lin &
Makis [13] considered a filtering and parameter estimation problem for a POHM process with
discrete range observations. In Lin & Makis [14], a general recursive filter for the POHM model
was presented with continuous range observations. Both papers derived recursive formulas for
the state estimation of POHM models. Lin & Makis [15] provided two on-line parameter
estimation algorithms, a recursive maximum likelihood algorithm and a recursive prediction
error algorithm. Kim & Makis [37] derived explicit formulas for the POHM model parameter
estimation by maximizing pseudo likelihood function using the expectation-maximization (EM)
iterative algorithm.
In the following sections, we demonstrate how a POHM model is developed using the full
lifetime condition monitoring data for a gear transmission system. The data were collected from
32
the MDTB which operated from new condition to a failure. The condition monitoring data
includes the motor current data acquired by sensor V04, and the vibration data acquired by
accelerometers A02 and A03. The gear deterioration process is first considered as a continuous-
time homogeneous Markov process with three states: good, warning, and breakdown. The good
state and warning state are considered to be unobservable, and the breakdown state is considered
to be observable. The stochastic relationship between the gear unobservable states and the
residual observations are described by multivariate normal distributions. The residual
observations are obtained from the condition monitoring data by the following steps: first,
following the method introduced in Chapter 2, ARX model residuals are calculated from
vibration data collected by accelerometers A02 and A03, second, we compute the variance of the
ARX residuals for each file and the mean of the motor current data of each data file when the
gear system is in good state (data files 194-246). Then a polynomial model is built to establish
the relationship between the variance of ARX residuals and the mean of motor current data.
Residual observation is the difference between the variance of the ARX residuals and the
estimated variance by the polynomial model using the mean of motor current data as input.
Later the residual observation extracted from each data file is applied to estimate the POHM
model parameters.
The POHM model describes this gear deterioration process where some internal states of a gear
transmission system are unknown and can only be inferred from external observations (section
3.2.). The model parameters are estimated by maximizing pseudo likelihood function using the
EM iterative algorithm in section 3.3. Section 3.4 presents a polynomial model to extract
residual observation from the condition monitoring data. Additionally, a numerical method is
implemented to estimate the POHM model parameters using the motor current data acquired by
V04, and vibration data collected by A02 and A03.
33
3.2 Model Formulation of Gear Deterioration Process
In this study, a gear deterioration process was classified to be either in good state ( 1 s ), warning
state ( 2 s ), or breakdown state ( 3 s ). 1 s and 2 s are classified as operational states and 2 s is
worse than 1 s . 3 s is a non-operational state and is worse than the operational states.
State 1 s can be considered as either a new gearbox or a gearbox with normal wearing. It is not
necessary to take any maintenance action for 1 s . For 2 s , preventive maintenance action must be
taken immediately to avoid a major breakdown. For example, when small fatigue cracks are
initiated or one tooth breaks, this minor gear defect will rapidly propagate to the adjacent teeth of
this gear and the teeth of its meshing gears if no repair or replacement is applied immediately.
The gear defects will also spread to the rest of mechanical parts, such as the gear shafts or
bearings, as the number of the broken teeth increases. When the gearbox stops working properly,
the system is considered to have broken down and to be in 3 s . In the breakdown state,
considerable maintenance costs will be incurred.
3.2.1 Model Assumptions
The following are model assumptions for the deterioration process of a gear transmission system:
1) The deterioration process cannot enhance its condition except by replacement or repair. For
example, it is impossible for a damaged gear to fix itself without applying external
maintenance. Hence, it is impossible for the gear state transiting from a worse state to a
better state, such as 0 1 2 =→ ssPr , 0 2 3 =→ ssPr and 0 1 3 =→ ssPr .
2) The time spent in each state (sojourn time) is a random variable denoted as isT , and it is
assumed to follow an exponential distribution with mean 2 ,1 ,/1 =iv i .
34
3) During the operation period of a gearbox, the operational states ( 1 s and 2 s ) are normally
unobservable except through an inspection. However, the condition monitoring data contain
the information that can be used to infer the state of an operating gear transmission system.
For example, small fatigue cracks or a broken tooth will affect signature of the vibration data.
These vibration signature changes can be applied to infer about the gear operational states.
4) Breakdown state 3 s can be observed directly. The cost of 3 s is higher than that of the
preventive maintenance cost for warning state 2 s . If this assumption is not satisfied, it
would be more economical to replace the gear transmission system after it breaks down
instead of applying preventive maintenance.
3.2.2 Continuous-Time Markov Model for Gear Deterioration Process
Based on the model assumptions, the gear deterioration process ( ) tt X +∈= RX : evolves as
a continuous-time homogeneous Markov process with state space 31 , ≤≤= isS iX . The
transition probability matrix is expressed as
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) X j i
j i Ssstrtr
trtrtrtrtrt 2 22 2
2 11 12 11 1
, ,100
1 0 1
∈⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−
==R (3.1)
where ( ) ( ) ip jp ij stXsttXtr ==+= | )( Pr which is the transition probability that the gear is
in state js at time )( tt p + given that the gear was in is at time pt . Since the sojourn time i
Ts is
assumed to follow an exponential distribution, the transition probability )(tr ij is a function of the
time interval t and is independent of the starting time pt , that is
35
( ) ( ) ( ) ( ) i j ip jp ij sXstXstXsttXtr =====+= 0||)( PrPr (3.2)
Denoting ) , ,( 3 2 3 1 2 1 λλλ=Λ as transition rate parameters, the instantaneous transition rate
matrix of the continuous-time Markov chain can be written as
( )
X j i Sssq 23 23
13 12 3 112
ij , , 000
0 ∈⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−
+−== λλ
λλλλQ (3.3)
where ijq is called instantaneous transition rate and it is interpreted as the rate when the process
makes a transition into js from is .
( ) ( )( )
∑≠
→→
−=
∈≠==
==
ij ij ii
X j i i j
t
ij
t ij
Ssst
sXstXttr
q 00 ,
0|lim
)(lim
Pr
(3.4)
For the continuous-time Markov process, the expected sojourn time ( ) is vTi /1E = can also be
represented as the function of the transition rate parameters through the following equations
03 3 3
3 2 2 2 2
3 112 1 1 1
=−==−=
+=−=
qvqvqv
λλλ
(3.5)
where iv is interpreted as the rate at which the process makes a transition from state is .
The embedded discrete-time Markov transition probability is the probability of the process
transiting into js is after it leaves is , and is given by the following
36
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡++
==100100
0
3 1 2 1
3 1
3 1 2 1
2 1
λλλ
λλλ
j ipP (3.6)
where )( 3 1 2 1 2 1 2 1 λλλ += p is the probability that the gear deterioration process ( )tX transits
into the warning state 2 s from 1 s ; )( 3 1 2 1 3 1 3 1 λλλ += p is the probability that the process
transits into the breakdown state 3 s from 1 s . 13 2 = p indicates that the process can only transit
into breakdown state once the process leaves the warning state 2 s . 13 3 = p indicates that the
state 3 s is an absorption state which means that the gear will stay in the breakdown state 3 s if
no repair or replacement is applied. The embedded transition matrix shows that all 0 = j ip
when ji > to satisfy the assumption that the gear deterioration process cannot transit into a
better state from a worse state without applying external maintenance.
Using the expected sojourn time iv /1 and the instantaneous transition rate j iq , Kolmogorov’s
backward differential equations can be derived and solved to find the continuous-time transition
probability matrix ( )tR . Kolmogorov’s backward differential equations (see e.g. Ross [48]) are
given as
0 and ,, allfor , )()()(' ≥∈−= ∑≠
tSsstrvtrqtr X j iijiik
kjikij (3.7)
Solving the above differential equations, the gear states transition matrix can be represented with
respect to the transition parameters 2 1 λ , 3 1 λ , and 3 2 λ as
37
( ) ( )
( )( )( ) ( ) ( )
( ) X j i
t
ttt
j i Ssstre
trtreee
trt 2 2
2 11 13 2 3 1 2 1
2 1
, ,
100
10
1
3 2
3 1 2 1 3 2 3 1 2 1
∈
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−−−+
−
== −
+−−+−
λ
λλλλλ
λλλλ
R (3.8)
In this section, a three-state homogeneous continuous-time Markov model was developed to
describe the deterioration process of the gear transmission system. The process can only be in
one of the two operational states, or in the breakdown state. The two operational states cannot be
directly observed, but inference can be made about their states from the condition monitoring
data.
3.2.3 Stochastic Relationship between Residual Observation and Operational States
A statistical method is proposed for finding the stochastic relationship between gear operational
states and residual observations. In the CBM for a gear transmission system, the warning state is
the critical state for performing preventive maintenance in order to avoid considerable
breakdown costs. The objective of this section is to decide when the deterioration process
transits into the warning state 2 s using the residual observations extracted from the condition
monitoring data. The condition monitoring data are discrete signals collected at time it , so the
residual observations are also discrete signals. The procedure for extracting residual
observations from condition monitoring data will be presented in section 3.4.
The interval between the time of data collection 1 −it and it is denoted as 1 −−= iii tth . The time
to breakdown is ft . If failures occur in the time interval ih (i.e. ifi ttt 1 <<− ), the system enters
a breakdown state and considerable breakdown costs will be incurred. If no failure occurs in the
38
time interval ih , (i.e. fi tt < ), the i -th condition monitoring data will be collected. The i -th
residual observation contains the information that will be used to infer the unobservable state of
the gear system.
The residual observation that is extracted from the condition monitoring data collected at time it
is denoted ( )it y . For vibration data collected by q accelerometers, the residual observation i Y
is the q dimensional vector
( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y (3.9)
If the gear deterioration process is in good state 1 s , the variation in the residual observation is
only caused by the background noise. ( )it y follows a multivariate normal distribution with
probability density function given by
( ) ( )( )( )
( )( ) ( )( )⎥⎦⎤
⎢⎣⎡ −−−== 1
1- 1
T1 2/1
1 2 1 | 2
1 exp 2
1 | μyΣμyΣ
y ii/q iiXY ttstXtfπ
(3.10)
where mean vector ( )( ) ( )( ) ( )( ) ( )( )[ ]T 2 1 1 EEEE iqiii tytytyt L== yμ . ( )• E is the expected
value of a random variable, the covariance matrix ( )( ) ( )( ) ( )( )[ ]T1 1 1 --Ecov μyμyyΣ iii ttt == is
a symmetric matrix, and T A denotes the transpose of the matrix A .
If the gear deterioration process transits into warning state 2 s in the time interval ih , the
variation in the data will also contain the signatures caused by the minor gear defects. The
residual observation i y will follow a multivariate normal distribution with covariance matrix
2 Σ and mean vector 2 μ
39
( ) ( )( )( )
( )( ) ( )( )⎥⎦⎤
⎢⎣⎡ −−−== 2
1- 2
T2 2/1
2 2 2 | 2
1 exp 2
1 | μyΣμyΣ
y ii/q iiXY ttstXtfπ
(3.11)
The above two equations give us the probability density functions of the residual observations
given the state of the deterioration process at time it . However, the operational states are
unobservable, so a POHM model will be constructed to infer about the probability of the
operational states at time it from the residual observations.
3.2.4 POHM model
POHM model is a Markovian-based model where some sates cannot be observed directly, and
the observations are probabilistic functions of the unobservable states. A POHM model for the
gear deterioration process is characterized by three components defined as
, , πFR Ω = (3.12)
The first component R is the continuous-time homogenous Markov process transition
probability distribution which was defined by Equation (3.8) in section 3.2.2. The second
component F is the observation probability distribution which was defined by Equations (3.10)
and (3.11) in section 3.2.3. The last component iπ =π is the initial distribution given by
( ) 1,2,3 i , i 0 === stXπ i Pr (3.13)
which is interpreted as the probability of is being the initial state. Since the deterioration
process starts in a good state at time 00 =t , the probabilities of is being the initial state are
40
( ) ( ) ( ) 0
0 1
3 0 3
2 0 2
1 0 1
=========
stXπstXπstXπ
PrPrPr
(3.14)
There are two sets of unknown parameters in the POHM model: the relation parameters
) , , , ( 2 2 1 1 ΣμΣμΨ = that describe the stochastic relationship between the residual
observations and the unobservable states and the transition rate parameters ) , ,(: 3 2 3 1 2 1 λλλΛ
that describe the evolution of the gear deterioration process. The POHM model can also be
represented as a function with respect to these two set of parameters as
( )ΛΨΩ ,Ω= f (3.15)
Given an residual observation ( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y at time it with fi tt < , the
probability element of the residual observation can be calculated as,
( )( ) ( ) ( )( ) ( ) ( )( )
( ) ( )( ) 3 3 0
2 2 0 1 1 0
|,;
|,; |,; |
π
ππ
⋅=+
⋅=+⋅==
ΛΨyPr
ΛΨyPr ΛΨyPr ΩyPr
stXt
stXtstXtt
i
iii
(3.16)
Using the results of Equation (3.12), the above equation can be written as
( )( ) ( ) ( )( )
( ) ( ) ( )( ) ( ) ( )( )
( ) ( ) ( )( ) ( ) ( )( )1 0 2 1 0 2
1 0 1 1 0 1
1 0
|,;, |,;
|,;, |,;
|,; |
stXstXstXstXt
stXstXstXstXt
stXtt
iii
iii
ii
==⋅==+
==⋅===
==
ΛΨPrΛΨyPr
ΛΨPr ΛΨyPr
ΛΨyPr ΩyPr (3.17)
The deterioration process is a Markov process, so the above equation can be simplified as
41
( )( ) ( ) ( )( ) ( ) ( )( )
( ) ( )( ) ( ) ( )( ) ΛΨPrΛΨyPr
ΛΨPr ΛΨyPr ΩyPr
1 0 2 2
1 0 1 1
|,; |,;
|,; |,; |
stXstXstXt
stXstXstXtt
iii
iiii
==⋅=+
==⋅==
(3.18)
3.3 Parameter Estimation Using EM Algorithm
A POHM model was built in the previous section to infer about the gear unobservable states 1 s
and 2 s . using residual observations ( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y at time it with
fi tt < . The POHM model ( )ΛΨΩ ,Ω= f was constructed based on the assumption that the
relation parameters ) , , , ( 2 2 1 1 ΣμΣμΨ = and the transition rate parameters
) , ,( 3 2 3 1 2 1 λλλ=Λ are given.
In this section, the formulas to estimate the POHM model parameters using the history data will
be introduced. First, we will construct the pseudo-likelihood function that contains the relation
parameter Ψ and the transition rate parameter Λ . Using the history data, the formulas for the
parameter estimation are obtained by maximizing the pseudo likelihood function using the EM
iteration algorithm.
3.3.1 Random Breakdown Time of a Gear Deterioration Process
The observable breakdown time of the gear system is considered as a random variable and it is
denoted by fT . If no preventive maintenance action is applied, the probability density function
of the breakdown time fT can be written as
( ) [ ]fff
f
tvtv
tv fT ee
vvvvppevptf
2 1
2 1 3 22 1
1 3 1
1 2 1
; −−− −
−+=Λ (3.19)
42
The gear deterioration process starts in a good state 1 s at time 00 =t . There are two paths
representing how the process can make transition to breakdown state 3 s at time 0 >ft as shown
in Figure 3.1. The upper path represents direct entry into breakdown state 3 s immediately after
it leaves the good state 1 s at time ft . The lower path represents entry into breakdown state 3 s
from the warning state 2 s at time ft , where transit into warning state 2 s from good state 1 s
occurs at some time 1 st such that fs tt 1
0 << .
te 1 1
νν − s2
s3
0 ft timet
s1
s1
12p
s1
s2
)(2
2 tt fe −−νν
fte 1 1
νν − 13p
32p
Figure 3.1 Transition Paths from Good State to Breakdown State
We first derive the probability density function of the upper path in Figure 3.1. The probability
density that the process leaves 1 s at time 0 >ft (the sojourn time of the state 1 s ) is
exponentially distributed as
( ) ftvfT evtf
1 1
1s ; −=Λ (3.20)
Based on the transition matrix of the embedded discrete Markov chain in Equation (3.6), the
probability that the process enters 3 s after it leaves the state 1 s is )( 3 1 2 1 3 1 3 1 λλλ += p . Since
the event that the process leaves 1 s and the event that the process enters 3 s are independent, the
joint probability density function for the process leaving 1 s at time ft and entering 3 s is
43
( ) ( ) 13
1 13
1
1 ;; pevptftf f
sf
tvfTf
uT ⋅=⋅= −ΛΛ (3.21)
We now derive the probability density function for the lower path in Figure 3.1. The probability
density function that process leaves 1 s at time 1 st with fs tt 1
0 << is
( ) 1 1
1s
1 ; stv
T evtf −=Λ (3.22)
The probability that the deteriorating process enters 2 s after it leaves the state 1 s is
)( 3 1 2 1 2 1 2 1 λλλ += p . The joint probability density function that the process enters 2 s at time
1 st is
( ) ( ) 1 1
1 1 1 1 1 2
1 12 12 , ; ;, s
sss
tvsTssTT evpptfttf −=⋅= ΛΛ (3.23)
The process will enter 3 s at time ft . Hence, the time interval that the process stays in the 2 s is
)(1 sf tt − . The joint probability density function that process makes a transition into 3 s at time
ft is given by
( ) ( )1 2
1 1 2
2 23 , ;, sf
sf
ttvsfsfTT evpttttf −−=−− Λ (3.24)
The probability density function of the lower path is obtained by integrating 1 st from 0 to ft ,
( ) ( ) ( )
( ) ( )( )[ ]ff
f sfs
f
sssff
tvtv
t
sttvtv
t
ssfsfTTssTTflT
eevv
vvpp
dtevpevp
dtttttfttftf
2 1
2 1 3 22 1
0
2 23
1 21
0 , ,
1 2
1
1 2 1 1
1 1 1 1 2 1 1 2
;,;, ;
−−
−−−
−−
=
⋅=
−−⋅=
∫
∫ ΛΛΛ
(3.25)
44
Thus, the probability density function of the breakdown time in Equation (3.19) can be obtained
by adding the probability density functions of the upper path and lower path:
( ) ( ) ( )ΛΛΛ ;;;
f
lTf
uTfT tftftf
fff+= (3.26)
3.3.2 Sojourn Time of the Deterioration Process in Good State
The sojourn time of the gear deterioration process in good state 1 s is to be considered as a
random variable and denoted by 1 sT . The conditional probability density function of
1 sT , given
the failure time ff tT = and fs tt 1 0 << , can be written as
( ) ( )( )Λ
ΛΛ
;
;,|;
, |
1 1
1 1
fT
sfTTffsTT
tf
ttftTtf
f
sf
fs== (3.27)
The denominator of the above equation is given by Equation (3.19), and the numerator is the
lower path of Figure 3.1 and is given by
( ) ( ) ( ) )( 2 3 2
1 21 , , ,
1 2 1 1
2 1 1 1 2 1 1 ;,;,;, sfs
sfsssf
ttv
tvffTTssTTsfTT evpevpttfttfttf −−− ⋅=⋅= ΛΛΛ (3.28)
If fs tt 1 < , the process cannot directly enter 3 s after it leaves from 1 s at time
1 st . This is due to
the fact that if the process enters 3 s at this time 1 st , we have fs tt 1
= , which contradicts with
fs tt 1 < . Thus, the process can only enter 2 s after the process leaves the state 1 s with the
probability )( 3 1 2 1 2 1 2 1 λλλ += p . Given the breakdown time as ft , the probability density
function of the sojourn time of the gear deterioration process in good state 1 s for fs tt 1 < is
45
( )[ ]fff
sfs
fs tvtv
tv
ttv
tv
ffsTT
eevv
vvppevp
evppevtTtf
2 1
2 1 3 22 1
1 3 1
)( 2 3 221
1
| 1 2 1
1 2 1 1
1 1 |;
−−−
−−−
−−
+
⋅⋅==Λ
(3.29)
If fs tt 1 0 =< , we have the following conditional probability function
( ) ( )( )Λ
ΛΛ
;
;,|;
, |
1
1
fT
ffTTfffTT
tf
ttftTtf
f
sf
fs== (3.30)
where ( ) 31
1 , 1
1 ;, pevttf f
sf
tvffTT ⋅= −Λ is the probability of the process entering state 3 s right
after it leaves state 1 s at time ft . Hence, we have
( )[ ]fff
f
fs tvtv
tv
tv
fffTT
eevv
vvppevp
pevtTtf
2 1
2 1 3 22 1
1 3 1
31
1 |
1 2 1
1
1 |;
−−−
−
−−
+
⋅==Λ
(3.31)
3.3.3 Parameter Estimation using the EM Algorithm
The probability distribution of a residual observation i Y was described as multivariate normal
distribution based on the states of the gear deteriorating process at time it . Assuming the
residual observations 1 Y , 2 Y ,…, n Y are independent, and the joint probability density function
of the residual observations given the sojourn time of state 1 s and the breakdown time ft is
( )
( )
( ) ( )( )ffssnTT
ffssTTffssTT
ffssnTT
ffssTT
tTtTf
tTtTftTtTf
tTtTf
tTtTf
fs
fsfs
fs
fs
,|
2 ,| 1 ,|
2 1 ,|
,|
, |,;
, |,;, |,;
, |,;,,,
, |,;
1 1 1
1 1 1 1 1 1
1 1 1
1 1 1
==
==⋅===
===
==
ΛΨy
ΛΨyΛΨy
ΛΨyyy
ΛΨw
Y
YY
W
W
K
K
(3.32)
46
where ],,,[ 2 1 nyyyw K= is a full lifetime historical residual observations obtained from a gear
transmission system from a good state to a breakdown state. Denote kt , nk <≤1 , as the first
data collection epoch after the process leaves the good state 1 s , i.e. ksk ttt <<− 1 1 . The above
equation becomes
( )
( )( ) ( )( )
( )( ) ( )( )
( )( ) ( )
( )( ) ( )
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎦⎤
⎢⎣⎡ −−−
⋅⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎦⎤
⎢⎣⎡ −−−=
==⋅
===
==
∏
∏
=
−
=
−−
n
kiii/q
k
iii/q
nn kk
kk
ffssTT
tt
tt
stXfstXf
stXfstXf
tTtTffs
2 1- 2
T2 2/1
2 2
1
11
1- 1
T1 2/1
1 2
2 2
11 1 11 1
,|
)()(21 exp
21
)()(21 exp
21
|; |;
|; |;
, |,;1 1 1
μyΣμyΣ
μyΣμyΣ
ΨyΨy
ΨyΨy
ΛΨw
X|YX|Y
X|YX|Y
W
π
π
L
L
(3.33)
In order to estimate the relation parameterΨ and the transition rate parameter Λ , the following
steps are taken: m histories data ( 1 w , 2 w ,…, m w ) of residual observations are obtained, the
breakdown epoch is denoted as jft , and the sojourn time of good state is denoted as j
st 1 with
mj ≤≤1 . Since the data obtained from different histories are independent, the log-likelihood
function is constructed as
( )
( ) ( ) ( )[ ]
( )[ ]∑=
=
=
=
m
j
js
jfiTT
ms
mfmTTsfTTsfTT
msss
mfffm
ttf
ttfttfttf
ttttttfl
sf
sfsfsf
1
, ,
, ,
2 2 2 , ,
1 1 1 , ,
2 1 2 1 2 1
,;,,ln
,;,,,;,,,;,,ln
,;,,,,,,,,,,,ln
11
11 11 11
111
ΛΨw
ΛΨwΛΨwΛΨw
ΛΨwww
W
WWW L
KKK (3.34)
47
Because the states 1 s and 2 s are unobservable, it is very difficult to directly maximize the above
log-likelihood function and to obtain the optimal estimators for the relation parameter Ψ and the
transition rate parameter Λ . The EM algorithm will be applied to estimate the model parameters.
The EM algorithm is an iterative method for finding the maximum log-likelihood estimators of
parameters in statistical models where the model depends on unobservable variables (Dempster
et al. [2]). The EM includes two steps: an expectation (E) step constructs the pseudo-likelihood
function that is a conditional expectation of the log-likelihood evaluated using the current
estimation; a maximization (M) step computes parameters by maximizing the pseudo-likelihood
function constructed in the E-step. These estimated parameters are then used to determine the
unobservable variable in the next E-step. Given the current estimations Ψ and Λ , the pseudo-
likelihood function is given by
( )
( )
( )[ ]
( )[ ] ( )( )∑
∫∫
∑
∑
=
=
=
⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧
==
===
=====
=
m
jt j
sj
ffjj
sTT
t js
jffj
jsjTT
js
jfjTT
m
j
jffj
js
jfiTT
m
i
j
jf
fs
jf
fssf
sf
τdtTτf
τdtTτfτtf
tTttf
Q
Q
1
0
,|
0
| ,
, ,
1
, ,
1
11 1
11 1 11
11
,|ˆ,ˆ;
,|ˆ,ˆ;, ,;,,ln
,,ˆ, ˆ ,;,,ln E
ˆ,ˆ|,
ˆ,ˆ|,
wWΛΨ
wWΛΨwΛΨw
wWΛΛΨΨΛΨw
ΛΨΛΨ
ΛΨΛΨ
W
WW
W
(3.35)
The first term inside the integration of the numerator of Equation (3.35) is separated into two
terms
48
( )( ) ( ) ( )[ ]
( ) ( ) ( )
( ) ( ) ( )
( ) ( )[ ] ( )jff
jssjTT
jff
jsTT
jfT
jff
jssjTT
jff
jsTT
jfT
jff
jssjTT
jff
jsTT
jfT
jff
jssjTT
jff
jsTT
jfT
js
jfjTT
tTτTftTτftf
tTτTftTτftf
tTτTftTτftf
tTτTftTτftf
τtf
fsfsf
fsfsf
fsfsf
fsfsf
sf
,|
|
,|
|
,|
|
,|
|
, ,
,| ; ln| ; ; ln
,| ; ln| ; ln ; ln
,|, ; ln|, ; ln, ; ln
,|, ; |, ; , ; ln
, ;,, ln
11 1 1 1
11 1 1 1
11 1 1 1
11 1 1 1
11
==+==
==+=+=
==+=+=
====
ΨwΛΛ
ΨwΛΛ
ΛΨwΛΨΛΨ
ΛΨwΛΨΛΨ
ΛΨw
W
W
W
W
W(3.36)
Given j w and jft , the second term inside the integration of the numerator of Equation (3.35) can
be decomposed as follows
( )( ) ( )
( )jffjT
jff
isTT
jff
jssjTT
jffjj
jsTT
tTf
tTτftTτTf
tTτf
f
fsfs
fs
|
|
,|
,|
|ˆ,ˆ;
|ˆ; ,|ˆ;
,|ˆ,ˆ ;
1 1 11 1
1 1
=
=⋅===
==
ΛΨw
ΛΨw
wWΛΨ
W
W
W
(3.37)
The term inside the integration of the denominator of Equation (3.35) is the same as Equation
(3.37). The pseudo-likelihood function can be rewritten as
( ) ( ) ( )ΛΨΨΛΨΛΛΨΛΨ ΨΛˆ,ˆ| ˆ,ˆ| ˆ,ˆ|, QQQ += (3.38)
where
( )( ) ( )[ ] ( ) ( )
( ) ( )( )
( )[ ] ( ) ( )( ) ( )∑
∫∫
∑∫
∫
=
=
=⋅==
=⋅=====
=⋅==
=⋅====
m
it j
sj
ffi
sTTj
ffj
ssjTT
t js
jff
isTT
jff
jssjTT
jff
jssjTT
m
it j
sj
ffi
sTTj
ffj
ssjTT
t js
jff
isTT
jff
jssjTT
jff
jsTT
jfT
jf
fsfs
jf
fsfsfs
jf
fsfs
jf
fsfsfsf
dtTτftTτTf
dtTτftTτTftTτTf
Q
dtTτftTτTf
dtTτftTτTftTτftf
Q
10
|
,|
0
|
,|
,|
10
|
,|
0
|
,|
|
11 1 11 1
11 1 11 1 11 1
11 1 11 1
11 1 11 1 1 1
|ˆ; ,|ˆ;
|ˆ; ,|ˆ; ,| ; ln
ˆ,ˆ|
|ˆ; ,|ˆ;
|ˆ; ,|ˆ; | ; ; ln
ˆ,ˆ|
τ
τ
τ
τ
ΛΨw
ΛΨwΨw
ΛΨΨ
ΛΨw
ΛΨwΛΛ
ΛΨΛ
W
WW
Ψ
W
W
Λ
(3.39)
49
The M-step chooses Ψ and Λ by maximizing the ( )ΛΨΛΨ ˆ,ˆ|,Q . The updated Ψ and Λ can
be obtained by solving the following two differential equations:
( ) ( )
( ) ( ) 0
0
1
1
=∂
∂=
∂∂
=∂
∂=
∂∂
∑
∑
=
=
m
i
i
m
i
i
ˆ,ˆ|Qˆ,ˆ|Q
ˆ,ˆ|Qˆ,ˆ|Q
ΨΛΨΨ
ΨΛΨΨ
ΛΛΨΛ
ΛΛΨΛ
ΨΨ
ΛΛ
(3.40)
After the updated Ψ and Λ are calculated from the above equations, the E-step and M-step is
repeated until the update parameters convergence to constant values *Ψ and *Λ .
3.4 Numerical Results 3.4.1 Residual Observations Extraction
In this section, residual observations are extracted from the vibration data collected by
accelerometers A02 and A03. The ARX model residuals are obtained following the procedure
presented in Section 2. In section 2, a time-synchronous averaging (TSA) algorithm was first
applied to average out the contributions of both the noise and signals whose periods are not equal
to the shaft rotation period of the gear of interest. Filtered TSA envelopes were then extracted
from the TSA vibration data for each data file. The files from 194 to 338 were analyzed except
file 212 that was reported unreliable due to some accelerometer problems. All the data files were
separated into two groups. The files 194 - 246 were used to fit the ARX model and the files 194
- 338 were used to estimate the POHM model parameters.
It is possible to build a multivariate ARX models based on the two-dimensional vibration data
collected by both A02 and A03. Since the univariate ARX model for A02 was given in Section
2, the same procedure will be applied to build an ARX model for A03. The ARX model for
accelerometer A03 is ( ) ( ) )( )( )1( )( 3 66 kkkk NxBωyBδ y ++−= and the parameters are given below
50
( ) 2 3 327.1537.221.1 BBBω +−= (3.41)
( )
66 65 64
63 62 61 60 59
58 57 56 55 54
53 52 51 50 49 48
47 46 45 44 43 42
41 40 39 38 37
36 35 34 33 32 31
30 29 28 27 26 25
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 9 8 7
6 5 4 3 2 65
0.07068 - 0.02268 0.01672
0.06597 0.06862 - 0.0075320.039020.0001627
0.05632-0.01372 004592.00.03815- 0.1147
0.05343 - 06538.0 0.1123 0.1115-0.01921- 0.04052
07984.0 0.03789 0.1312 0.04328 0.09520.001385-
0.02075 0.03948 - 0.0005747 0.0088050.06063
0.06339 - 0.1687 0.2372 - 0.09378- 0.1338 0.0998
0.024840.07188 0.0432107812.00.13380.1265-
1185.0 0.06088- 0.02701 - 0.04363 0.1172- 0.1371
0.02291193.00.27907868.00.2754-0.3689
0.1542- 0.01244- 0.005257-0.117-0.09517 0.05079
07838.02897.00.048560.63621.5692.005
BBB
BBBBB
BBBBB
BBBBBB
BBBBBB
BBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
Bδ
++
++++
+++
−++
++−++
−+++
+++
+−+−+
+++
+−+−+
++
+−++−=
(3.42)
After we obtain the ARX model residuals for A02 and A03, the standard deviations are
calculated for each data file.
It is well known that a motor current is directly related to the output load of the gear transmission
system and is easier to collect than the load measurement, so the motor current data is included
in condition monitoring data too.
51
Figure 3.2 (a), (b) Residual Standard Deviations of A02 and A03, and (c) Motor Current Mean
The Figure 3.2 shows that the motor current data can be expressed as a proportion of the standard
deviation of ARX model residuals collected by A02 and A03. We then apply least squares
method to estimate first-order polynomial model for each accelerometer. The models are
obtained as
914.140032.0ˆ
2596.130023.0ˆ
A03 A03
A02 A02
+×=
+×=
CS
CS (3.43)
where A02 C and A03 C are the mean of motor current of each data file, and A02 S and A02 S are the
estimated standard deviations of ARX model residuals. For the data files 194-246, the filtered
out standard deviation are plotted in Figure 3.3
52
Figure 3.3 (a) and (b) Residuals of Polynomial Model for A02 and A03
The extracted residual observations for the POHM model parameter estimation can be obtained
by the following equations
( )( )914.140032.0ˆ
2596.130023.0ˆ
A03A03A03A032
A02A02A02A021
+×−=−=
+×−=−=
iiiii
iiiii
CSSSy
CSSSy (3.44)
3.4.2 POHM Parameters Obtained by Expert Judgment
In the testing, the data collection interval is not a constant, some data collection intervals are only
1 minutes, but some intervals are 30 minutes. To simplify the parameter estimation procedure,
we selected the data that were collected with the time interval around 30 minutes, so there are 38
data files: 194, 197, 198, 199, 207, 211, 213, 221, 225, 226, 227, 238, 239, 240, 241, 252, 253,
255, 265, 267, 268, 269, 281, 282, 283, 284, 295, 296, 307, 308, 309, 310, 320, 322, 323, 334,
337, 338. The residual observations are
[ ] ⎥⎦
⎤⎢⎣
⎡==
)()()()()()(
)()()(38 2 2 2 1 2
38 1 2 1 1 1 38 2 1 1 tytyty
tytytyttt
L
LK yyyw (3.45)
53
and 1w is plotted in the following figures
Figure 3.4 Residual Observations Collected by Accelerometers A02 and A03
From the above figures and subjective judgment, the residual observations collected by each
accelerometer can be separated into two groups. The first group contains the data files from 194
to 295, and second group contains files from 296 to 338. Based on the analysis in section 2, the
first group is considered that the gear system is in good state 1 s , and the second group is
considered to be in warning state 2 s . The time interval between each sample is 5.0=Δh hour.
Assuming the process transits into warning state from good state in the middle of the interval of
data collection for files 295 and 296, the sojourn time of the good state is hts Δ= 5.271
. The time
spent in warning state is hts Δ= 5.102
, and the gearbox failure occurred at ht f Δ= 38 . Based on
the history data 1w , the relation parameters can be estimated as
00)(0 3 1 3 1 2 1 3 1 3 1 =⇒=+⇒= λλλλ p (3.46)
The sojourn time of the warning state is estimated as
54
1905.05.05.10
11
2
2 =×
==st
v (3.47)
and 1905.02 3 2 == vλ . The sojourn time of the good state is
07273.05.05.27
11
1
1 =×
==st
v (3.48)
and 07273.03 11 12 =−= λλ v . When the gearbox is in the good state, the mean vector is
calculated by
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
∑
∑
=
=
8152.1
2356.1
271
271
27
12
27
11
1
i
i
i
i
y
y
μ (3.49)
and the covariance matrix is
( )( ) ⎥⎦
⎤⎢⎣
⎡=−−= ∑
= 5165.223106.173106.170051.14
261 27
11 1 1
i
Tii μyμyΣ (3.50)
When the gearbox is in the warning state, the mean vector is
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
∑
∑
=
=
6970.37
3266.26
111
111
38
282
38
281
2
i
i
i
i
y
y
μ (3.51)
and the covariance matrix is
55
( )( ) ⎥⎦
⎤⎢⎣
⎡=−−= ∑
= 0404.2856591.1896591.1896485.132
101 38
282 2 2
i
Tii μyμyΣ (3.52)
3.4.3 Estimated POHM Parameters using the EM Iteration Algorithm
In the previous section, the POHM parameters were obtained by assuming the deterioration
process transited into the warning state at time hts Δ= 5.271
, which required subjective judgment.
In this section, the POHM parameters will be estimated using the EM algorithm, and compare
with the results obtain in section 3.4.2 using subjective judgment.
The number of iterations for the EM algorithm is selected as 10, and the EM estimation of mean
vectors is shown in the following figures
Figure 3.5 EM algorithm Estimated Mean Vectors 1 μ and 2 μ
The dotted lines are the mean vectors estimated using subjective judgment, and the solid lines are
the mean vectors estimated using the EM algorithm. The above figures show that the initial
values are selected as 0. The estimated mean values converge to the mean vectors obtained in
previous section. Starting with the 6-th iteration, relative differences are less than 0.3%. The
56
following table gives the estimated mean and its convergence that is measured by the absolute
changes of the updated estimation. Starting from iteration 6, the increments of all estimated
means are less than 50.1 −E .
Table 3.1 EM Algorithm Iteration Results: Mean Vectors 1 μ and 2 μ
A02 A03
1 μ 2 μ 1 μ 2 μ
Value Convergence Value Convergence Value Convergence Value Convergence
initial 0 0 0 0
1 -0.6312 1.00E+00 9.4310 1.00E+00 -0.4943 1.00E+00 13.4983 1.00E+00
2 1.1221 1.56E+00 23.9001 6.05E-01 1.6960 1.29E+00 34.1370 6.05E-01
3 1.2261 8.48E-02 26.1742 8.69E-02 1.8059 6.09E-02 37.4684 8.89E-02
4 1.2318 4.64E-03 26.2677 3.56E-03 1.8115 3.08E-03 37.6083 3.72E-03
5 1.2322 2.93E-04 26.2734 2.17E-04 1.8119 1.95E-04 37.6169 2.28E-04
6 1.2322 7.66E-06 26.2735 5.67E-06 1.8119 5.09E-06 37.6171 5.94E-06
7 1.2322 1.93E-07 26.2735 1.43E-07 1.8119 1.28E-07 37.6171 1.50E-07
8 1.2322 4.83E-09 26.2735 3.57E-09 1.8119 3.21E-09 37.6171 3.74E-09
9 1.2322 1.21E-10 26.2735 8.94E-11 1.8119 8.04E-11 37.6171 9.37E-11
10 1.2322 3.03E-12 26.2735 2.24E-12 1.8119 2.01E-12 37.6171 2.35E-12
The following figures show each element of the estimated covariance matrix 1 Σ and 2 Σ for
good state and warning state respectively.
57
Figure 3.6 EM Algorithm Estimated Covariance Parameters 1 Σ and 2 Σ
The initial value for the covariance 1 Σ and 2 Σ estimation is selected as a symmetrical matrix
⎥⎦
⎤⎢⎣
⎡300030
. The estimated covariance matrices converge to our calculated matrix using
subjective judgment, but there exist small discrepancies between covariance calculated by
subjective judgment and the EM algorithm. After 5-th iterations, the discrepancies become less
than 4% for 1 Σ and 9% for 2 Σ . These discrepancies may be caused by standard error to
calculate covariance matrix from very small sample of history data, since only one history is
available for the covariance matrix calculation. Better estimation would be obtained as more
data histories are collected. The following two tables give the EM estimated covariance matrix
and their convergence speed at each iteration. Starting from iteration 6, the increments of all
elements in 1 Σ are less than 50.1 −E .
Table 3.2 EM Algorithm Iteration Results: Covariance Matrix 1 Σ
58
)1,1(1 Σ )1,2()2,1( 1 1 ΣΣ = )2,2(1 Σ
Value Convergence Value Convergence Value Convergence
initial 30 0 30
1 11.3023 1.65E+00 15.3237 1.00E+00 21.6475 3.86E-01
2 16.7311 3.24E-01 20.8341 2.64E-01 27.0157 1.99E-01
3 13.5038 2.39E-01 16.6975 2.48E-01 21.7263 2.43E-01
4 13.4886 1.13E-03 16.6755 1.32E-03 21.6944 1.47E-03
5 13.4883 1.91E-05 16.6749 3.76E-05 21.6932 5.49E-05
6 13.4883 4.39E-07 16.6749 9.34E-07 21.6932 1.40E-06
7 13.4883 1.08E-08 16.6749 2.33E-08 21.6932 3.51E-08
8 13.4883 2.71E-10 16.6749 5.84E-10 21.6932 8.78E-10
9 13.4883 6.77E-12 16.6749 1.46E-11 21.6932 2.20E-11
10 13.4883 1.71E-13 16.6749 3.66E-13 21.6932 5.53E-13
As shown in the next table, the increments of all elements in 2 Σ are less than 50.1 −E after
iteration 6.
Table 3.3 EM Algorithm Iteration Results: Covariance Matrix 2 Σ
)1,1(2 Σ )1,2()2,1( 2 2 ΣΣ = )2,2(2 Σ
Value Convergence Value Convergence Value Convergence
initial 30 0 30
1 270.1911 8.89E-01 384.8667 1.00E+00 553.3765 9.46E-01
2 367.2451 2.64E-01 525.9849 2.68E-01 764.8557 2.76E-01
3 128.2178 1.86E+00 183.7324 1.86E+00 275.8238 1.77E+00
4 121.5471 5.49E-02 173.8773 5.67E-02 261.3171 5.55E-02
5 121.4466 8.28E-04 173.7243 8.80E-04 261.0879 8.78E-04
59
6 121.4442 2.00E-05 173.7206 2.13E-05 261.0823 2.12E-05
7 121.4441 4.97E-07 173.7205 5.29E-07 261.0822 5.28E-07
8 121.4441 1.24E-08 173.7205 1.32E-08 261.0822 1.32E-08
9 121.4441 3.11E-10 173.7205 3.32E-10 261.0822 3.31E-10
10 121.4441 7.79E-12 173.7205 8.30E-12 261.0822 8.28E-12
The following figures show EM estimation results of transition rate parameters
) , ,( 3 2 3 1 2 1 λλλ=Λ when the initial value is selected as 0.5.
Figure 3.7 EM Algorithm Estimated States Transition Parameters Λ
The estimated transition parameters converge to the state transition parameters based on the
subjective judgment. Starting from iteration 5, the relative differences between the EM
estimated states transition parameters and our calculated parameters in previous section are less
than 0.3%. The following table gives the estimated values and their convergence speed for each
60
iterative step. Starting from iteration 5, the increments of all estimated parameters are less than
50.1 −E
Table 3.4 EM Algorithm Iteration Results: States Transition Parameters Λ
2 1 λ 3 1 λ 3 2 λ
Value Convergence Value Convergence Value Convergence
intial 0.5 0.5 0.5
1 0.5003 6.87E-04 0.0588 7.50E+00 0.0000 4.00E+05
2 0.0764 5.55E+00 0.1691 6.52E-01 0 0
3 0.0729 4.78E-02 0.1892 1.06E-01 0 0
4 0.0728 1.77E-03 0.1900 4.58E-03 0 0
5 0.0728 1.07E-04 0.1901 2.80E-04 0 0
6 0.0728 2.85E-06 0.1901 7.44E-06 0 0
7 0.0728 7.19E-08 0.1901 1.88E-07 0 0
8 0.0728 1.80E-09 0.1901 4.70E-09 0 0
9 0.0728 4.51E-11 0.1901 1.18E-10 0 0
10 0.0728 1.13E-12 0.1901 2.95E-12 0 0
Based on the above analysis, all the parameters estimated using the EM algorithm converged to
the values that calculated using subjective judgments. Some small discrepancies exist for the
covariance matrix estimations. As mentioned, only one history is available for this study. The
accuracy of the estimation should increase if more data histories are collected.
3.5 Conclusions In this chapter, a POHM model was built to describe the relationship among the residual
observations and the three states of a gear deterioration process. Once the POHM model was
61
well defined, two sets of parameters were needed to be estimated: 1) the relation parameters
) , , , ( 2 2 1 1 ΣμΣμΨ = that described the stochastic relationship between the residual
observations and the unobservable operational states of gearbox transmission system, and 2) the
transition rate parameters ) , ,( 3 2 3 1 2 1 λλλ=Λ that described the evolution of the gear
deterioration process. It was very difficult to find a feasible solution by directly maximumizing
likelihood function of the POHM model. The EM iterative algorithm was introduced to make
the parameter estimation possible by maximizing a pseudo likelihood function.
Finally, a POHM model was built using full lifetime condition monitoring data obtained from the
MDTB operating from good state to breakdown state. The condition monitoring data files with
equal data collection interval were selected, and residual observations were extracted from motor
current data and vibration data. The EM iterative algorithm provided a feasible solution for the
pseudo-likelihood function of the POHM model. The numerical iteration procedure generated a
reasonable parameter estimates compared to our results from subjective judgment. It was
confirmed that the POHM model can be applied to estimate gear hidden states based on the
condition monitoring data. In the next chapter, a Bayesian process control framework will be
developed to optimize maintenance policy and the schedule the optimal epoch for condition
monitoring data collection.
62
Chapter 4 Bayesian Process Control to Optimize Gearbox CBM Policy
4.1 Introduction In chapter 2, we presented the method to detect an early gear defect occurrence using the
vibration data collected from the gear transmission system testing. There were 338 vibration
data collections during 116 hours. At the end of the test, the system was shut down and
inspection took place. Five fully broken and two partially broken teeth were observed on the
driven gear. In chapter 3, a POHM model was built to describe the gear deterioration process as
well as the stochastic relationship between the condition monitoring data and the unobservable
states. This raises the following question: What will be the best time to stop the gear system for
inspection? If we stop the system and take the inspection too early, there would have been no
gear defect formed. If we stop the system too late, the defect would have propagated to the other
parts of the system and the maintenance costs will be higher than the cost at the optimal
maintenance epoch. We may consider stopping the system every half hour, so that we would not
miss the optimal maintenance epoch. However, the high frequency of inspections will increase
the system downtime and the cost of inspection significantly. In order to find out the optimal
maintenance epoch, a Bayesian process control model is developed to optimize the gear
transmission system inspection schedule by minimizing the total expected cost. Bayesian
approach is used to determine the optimal CBM strategy based on the posterior probability that
the gear deteriorating process is out of control by minimizing the total expected cost over a finite
horizon, or the long-run expected average cost.
In 1952, Girshick & Rubin [34] first introduced a Bayesian approach to the control of a
production process. They showed that the process is stopped for repair when the posterior
63
probability is greater than a control limit. Taylor [28] and [29] have presented a method of
selecting the posterior probability control limit and proven that the Bayesian approach is an
optimal way to determine the control parameters compared with non-Bayesian charts. There are
several Bayesian control charts that have been proposed for monitoring a univariate process
mean in finite production run: Carter [41] was the first that discussed issues related to optimal
selection of sample size and sampling intervals in the Bayesian process control context; Tagaras
[25]and [26] have proposed Bayesian X charts with adaptive sample sizes and sampling
intervals; Porteus & Angelus[18] applied Bayesian charts to identify opportunities of cost
reduction in statistical process control; Nenes & Tagaras [23] derived properties that facilitated
the cost computation and the optimization of a Bayesian chart, and evaluated the effectiveness of
the chart by comparing its costs against the expected costs; Nikolaidis et al. [60] presented the
economic design of Shewhart and Bayesian control charts for monitoring a critical stage of the
main production process of a tile manufacturer. Makis [52] presented a theoretical development
of a multivariate Bayesian control chart with fixed sample size and sampling interval for finite
production runs. He has proven the Bayesian control policy is optimal under standard operating
and cost assumptions.
In this chapter, we propose a Bayesian process control framework, which allows for optimal
computation of the two decision variables: the time interval for next data collection and the
control limit at next data collection epoch. The decision variables are calculated by minimizing
the total expected cost from current stage to the end of production run. The dynamic
programming is applied to update the decision variables after new monitoring data are acquired,
and the maintenance decision is made by taking all available information into account. Section
4.2 presents the problem statement and assumptions. In section 4.3, a Bayesian process control
64
model is derived to formulate the relationship between the decision variables and the total
expected costs. Section 4.4 introduces the development of the maintenance cost function.
In section 4.5, dynamic programming is introduced to optimize the time interval for next data
collection and the control limit at next data collection epoch by minimizing the total expected
cost. Section 4.6 presents numerical results.
4.2 Problem Statement and Assumptions In this chapter, a Bayesian process control model will be developed to make an optimal
maintenance decision for the gear transmission system. The time interval for next data collection
and the control limit at next data collection epoch are calculated based on the on-line condition
monitoring data. If the posterior probability of gear deterioration process exceeds the control
limits, the system will be immediately stopped and inspected. Otherwise, the gear deterioration
process is considered to be in good state 1 s . The system will keep running and the inspection
decision will be made at the next data collection epoch. The Bayesian process control is
designed based on the POHM model that was built in chapter 3. The assumptions of section 3.2.1
are kept for the Bayesian process control model.
Based on the results from chapter 3, the mean vector and covariance matrix of the deterioration
process will increase if the gear state transits from good state 1 s to warning state 2 s . In this
chapter, we assume that the gear deterioration process mean vector shifts 1 μ from 2 μ to with
unchanged covariance matrix i.e. ΣΣΣ == 1 2 , when the gear state transits into 2 s from 1 s .
The shift size of mean vector can be measured by the Mahalanobis distance (M-distance) that is
defined as
65
( ) ( )[ ] 021
1 2 1- T
1 2 >−−= μμΣμμd (4.1)
There are two main reasons to assume that the covariance matrix remains constant after the gear
state transits into 2 s from 1 s . Firstly, the constant covariance matrix dramatically simplifies the
development for the Bayesian control model. Using the constant covariance matrix, the explicit
expression of control limit from the posterior probability can be obtained. Secondly, the constant
covariance matrix reduces the computation time, which makes Bayesian control model more
suitable for on-line application. Hence, we can assume the time interval for each data collection
and analysis can be ignored, and the gear state will not change during the time interval of data
collection and data analysis.
In addition we assume a finite production run, and the length of the production run is denoted as
Ht . At the end of the production run the gear system will be stopped and a full inspection will
be taken no matter what maintenance decision is made by the control chart. If any defect is
found during the inspection, the gear system will be repaired or replaced and the gear
transmission system is returned to the good state. The gear system will be reset for a new
production run. There are 1−H potential data collection epochs equally distributed for time
00 =t to Ht . The Bayesian control chart will decide if the condition monitoring data will be
collected at each potential epoch. The time interval between any two adjacent potential data
collection epochs is obtained by
Ht H
h
=Δ (4.2)
Then the potential data collection epoch can be written as hi it Δ⋅= , where 1 , ,2 ,1 −= Hi K .
66
In the Bayesian control model, the decision variables are determined by minimizing the total
expected cost. There are four costs that will be considered in the cost function:
1) 0 C is denoted as the cost for collecting and analyzing the condition monitoring data. 0 C is
incurred at the epoch of each data collection.
2) 1 C is denoted as inspection cost. If the posterior probability of the monitoring process
exceeds its control limit, the gear system will be stopped and inspected. If during the
inspection, the gear is found to be in good state, no maintenance is needed. The control
chart made a wrong inspection decision of the gear actual state. This situation is called a
false alarm. The inspection cost 1 C includes the cost of the inspection and the system
downtime due to the inspection.
3) 2 C is denoted as maintenance cost for repair or replacement when control chart gives a true
alarm such that some gear defects are found during the inspection. The total cost for a true
alarm includes downtime cost, inspection cost, and maintenance cost, i.e. ( 2 1 CC + ). After
the restoration, the gear system is considered to be in good state and the posterior probability
is reset to 0.
4) 3 C is denoted as a breakdown cost for the gear transmission system. The cost of breakdown
state 3 s is higher than that of the maintenance cost for warning state 2 s , i.e. ( 2 3 CC > ). If
this assumption is not satisfied, it would be more economical to replace the gear
transmission system after it breaks down instead of applying preventive maintenance. If the
system enters warning state and no repair or replacement is applied, the defects will
propagate to the rest of the system until the whole system breaks down. Since it is very
67
difficult to measure the exact loss rate at each time of the system, an average loss rate is used
to approximate the true loss rate due to the gear running in warning state. If the sojourn time
of warning state is 01 2 >v/ , the loss rate during operation in warning state is approximated
by ( )2 3 2 CCv −⋅
4.3 The Bayesian model In chapter 3, we assumed the time spent in each state follows an exponential distribution with
mean 2 ,1 ,/1 =iv i with density function as
( ) elsewhere., 0
, 0, 1;
⎩⎨⎧ >
==⋅ τ
ττi
i
-v i
is Tev
v μf (4.3)
The condition monitoring data were collected at time it with )1( , ,2 ,1 −∈ Hi K , and the
control chart decided what will be the next data collection epoch jit + . The time interval to the
next data collection is denoted as ijih ttj −=Δ⋅ + . If the gear deteriorating process is in good
state at it , the probability that the gear deteriorating process stays in good state at jit + is
( ) ( ) ( )
0
1
1
1
1 1 1 1 1 1 | h
h j -vj -vh sh si sji s edevjTjTtTtT Δ⋅⋅Δ⋅ ⋅
+ =−=Δ⋅≤−=Δ⋅>=>> ∫ ττPrPrPr (4.4)
and the probability that the gear deteriorating process transits into warning state in the time
interval hj Δ⋅ is
( ) 1 1 1
1 1 hj -vi sji s etT|tT Δ⋅⋅
+ −=>>−Pr (4.5)
68
4.3.1 Posterior Probability of Gear Deteriorating Process in Warning State
In this section, iP denotes the posterior probability that the gear deteriorating process is in
warning state, and iP is updated based on the condition monitoring data collected at time it .
jiP + ~ is the prior probability that the process is in warning state at time jit + given all the
observations up to it . Makis [51] has proven that jiP + ~ could be obtained by
hj viji ePP 1- )1(1~ Δ⋅⋅
+ ⋅−−= (4.6)
The prior probability that the process is in good state at time jit + given all the observations up to
it can be represented as
hj viji ePP 1- )1(~1 Δ⋅⋅
+ ⋅−=− (4.7)
At time jit + the residual observation )( jit +y is calculated from the new collected condition
monitoring data. By using Bayes’ theorem, the posterior probability jiP + that the gear
deteriorating process is in warning state at time jit + can be represented as
( )( )( )( ) ( )( ) ji jijiXYji jijiXY
ji jijiXYji
PstXtfPstXtf
PstXtfP
++++++
++++
⋅=+−⋅=
⋅==
2 | 1 |
2 | ~ |)()~1( |)(
~ |)(
yy
y (4.8)
Substituting Equation (4.6) into above equation, the posterior probability at time jit + becomes
( )( )
[ ]hh
h
j vi
j vi
jijiXY
jijiXY
j vi
ji
ePePstXtf
stXtf
ePP 1 1
1
-
-
2 |
1 |
-
)1(1)1()( |)(
)( |)(
)1(1 Δ⋅⋅Δ⋅⋅
++
++
Δ⋅⋅
+
⋅−−+⋅−⋅=
=
⋅−−=
y
y
(4.9)
69
where ( )1 | )( |)( jijiXY stXtf =++y and ( )2 | )( |)( jijiXY stXtf =++y are defined as multivariate
normal distributions in chapter 3, that is
( )( )
( )( ) ( )
( )( ) ( )⎥⎦
⎤⎢⎣⎡ −−−
⎥⎦⎤
⎢⎣⎡ −−−
==
=
++
++
++
++
2 1-
2 T
2 2/1 2
2
1 1-
1 T
1 2/1 1
2
2 |
1 |
)()(21 exp
21
)()(21 exp
21
)( |)(
)( |)(
μyΣμyΣ
μyΣμyΣ
y
y
jiji/q
jiji/q
jijiXY
jijiXY
tt
tt
stXtf
stXtf
π
π(4.10)
In the section 4.2, the assumption ΣΣΣ == 1 2 was made for simplifying the Bayesian process
control model, so the above equation can be written as
( )( )
( )( ) ( )
( )( ) ( )
( ) ( ) ( ) ( )[ ]
( )⎥⎦⎤
⎢⎣⎡ −−=
⎭⎬⎫
⎩⎨⎧ −−−−−−=
⎥⎦⎤
⎢⎣⎡ −−−
⎥⎦⎤
⎢⎣⎡ −−−
=
=
=
+
++++
++
++
++
++
2
2 1-
T
2 1 1-
T
1
2 1-
T
2 2/1 2
1 1-
T
1 2/1 2
2 |
1 |
)(21exp
)()()()( 21 exp
)()(21 exp
21
)()(21 exp
21
)( |)(
)( |)(
dtz
tttt
tt
tt
stXtf
stXtf
ji
jijijiji
jiji/q
jiji/q
jijiXY
jijiXY
μyΣμyμyΣμy
μyΣμyΣ
μyΣμyΣ
y
y
π
π
(4.11)
where d is the M-distance defined as
( ) ( )[ ] 21
1 2 1- T
1 2 μμΣμμ −−=d , (4.12)
and
( ) ( )1 2 1- T
1 )(2 μμΣμy −−= ++ jiji tz . (4.13)
70
Hence, substituting Equation (4.11) into Equation (4.9), the posterior probability jiP + that the
process is in warning state at time jit + becomes
( ) [ ]hh
h
j vi
j viji
j vi
ji
ePePdtz
ePP 1 1
1
-
-
2
-
)1(1)1()(21exp
)1(1 Δ⋅⋅Δ⋅⋅
+
Δ⋅⋅
+
⋅−−+⋅−⋅⎥⎦⎤
⎢⎣⎡ −−
⋅−−=
(4.14)
From the above equation, the posterior probability jiP + is a function of iP and )( jit +y . iP is the
posterior probability that the process is in warning state at it . )( jit +y is the residual observation
extracted from the condition monitoring data collected at jit + . Since the process is in good state
at 00 =t , the posterior probability of the process in warning at 0 t is
00 =P (4.15)
Each time after the new data was collected, the posterior probability is updated recursively using
Equation (4.14).
4.3.2 Distribution of jiz +
In Equation (4.13), )( jit +y is a column vector and follows multivariate normal distribution, and
jiz + is a scalar and follows univariate distribution. Using Equation (4.13), the multivariate
problem is reduced to a univariate problem. In the next section, we will show that jiz + is
univariate normal distributed with [ ]2)(2 ,0 dN if the gear deteriorating process is in good state,
and univariate normal distributed with [ ]22 )(2 ,2 ddN if the process is in warning state.
71
If the gear deteriorating process is in good state, ( )jit + y has a multivariate normal distribution
with mean ( )[ ]jit += 1 E yμ and covariance matrix ( )[ ]jit += yΣ . jiz + is the linear combination
of ( )jit + y , so jiz + is normally distributed based on the properties of a multivariate normal
distribution. The mean of jiz + is
( ) ( )[ ][ ] ( )
( ) ( )02
)]([2
)(2E)(E
1 2 1- T
1 1
1 2 1- T
1
1 2 1- T
1
=−−=
−−=
−−=
+
++
μμΣμμ
μμΣμy
μμΣμy
ji
jiji
tE
tz
(4.16)
and the variance of jiz + is (see e.g. Johnson & Wichern [43])
( ) ( )[ ]( )[ ] [ ] ( )
( ) ( )
( ) ( )2
1 2 1- T
1 2
1 2 1- 1- T
1 2
1 2 1- T
1 T
1 2 1-
1 2 1- T
1
)2(
4
4
)(cov4
)(4
)()(
d
t
tCov
zCovzVar
ji
ji
jiji
=
−−⋅=
−⋅−⋅=
−⋅−−⋅=
−−=
=
+
+
++
μμΣμμ
μμΣΣΣμμ
μμΣμyμμΣ
μμΣμy
(4.17)
If the gear deteriorating process is in warning state, ( )jit + y has a multivariate normal
distribution with mean ( )[ ]jit += 2 E yμ and covariance matrix ( )[ ]jit += yΣ . The mean of jiz +
is
72
( ) ( )[ ]( ) ( )
2
1 2 1- T
1 2
1 2 1- T
1
2
2
)(2E)(E
d
tz jiji
=
−−=
−−= ++
μμΣμμ
μμΣμy
(4.18)
and the variance of jiz + changes as
2 )2()( dzVar ji =+ (4.19)
Based on the above analysis, the standard deviation of dz 2 =σ does not change if the gear
deterioration process transits into warning state from good state. However, the mean of z shifts
from 0 to 2 2d . From Equation (4.13), we can draw a conclusion that the univariate control
chart designed to detect the mean shift for the univariate process jiz + is equivalent to the
multivariate control chart designed to detect the mean shift for the residual observation ( )jit + y .
4.3.3 Control Limit of Gear Deteriorating Process in Good State
First, we rearrange the Equation (4.14) as
( )( ) ( )( )
( )( ) ( )( )
( )( ) ( )( ) z
ij v
ji
ij v
jiz
ij v
ji
ij v
ji
ij v
ji
ij v
jiji
PeP
PePd
d
PeP
PePd
dd
PeP
PePdz
h
h
h
h
h
h
-
-
-
-
-
-
2
11 1
1ln1
2
11 1
1ln1
220
11 1
1ln2
1
1
1
1
1
1
σμ⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−++=
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−+⋅+=
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−+=
Δ⋅⋅+
Δ⋅⋅+
Δ⋅⋅+
Δ⋅⋅+
Δ⋅⋅+
Δ⋅⋅+
+
(4.20)
where 0 =zμ is the mean of jiz + and dz 2 =σ is the standard variation of jiz + with the
process in good state. The above equation can be written as
73
zjizji kz σμ ⋅+= ++ (4.21)
where
( )( ) ( )( ) ⎥
⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−+=
Δ⋅⋅+
Δ⋅⋅+
+
ij v
ji
ij v
jiji
PeP
PePd
dkh
h
-
-
11 1
1ln1
2 1
1
(4.22)
Assuming we calculated an optimal * * ji
k+
and the time interval for next data collection hj * Δ by
minimizing the total cost function, we will apply the * * ji
k+
to detect the mean shift of * jiz
+. If
zjizjikz
* ** σμ ⋅+≤
++, the mean of * ji
z+
is consider as unchanged. If the
zjizjikz
* * * σμ ⋅+>++
, the mean of * jiz
+ is shifted.
Given hj * Δ , and *
*jik
+, the posterior probability control limit of the process in good state ∗
+
* jiP
can be calculated from the following equation
( )
( ) ( )( ) ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−⋅−
−⋅+=
Δ⋅⋅∗+
Δ⋅⋅∗+
+
ij v
ji
ij v
jiji
PeP
PeP
ddk
h
h
-
-
*
11 1
1ln1
2 *
1*
*
1*
* (4.23)
Rearranging the above equation gives us the posterior probability control limit
( )( ) ( ) ( )
( )( ) 2 exp
11 111
1
2 exp 111
11
*
2
-
*
2
-
-
-
*
* 1
*
* 1
* 1
*
1
*
dkdPe
dkdPePe
PeP
jii
j v
jiij v
ij v
ij v
ji
h
hh
h
+Δ⋅⋅
+Δ⋅⋅Δ⋅⋅
Δ⋅⋅∗+
−⋅⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−−−
=
−⋅−+−−
−−=
(4.24)
74
If the mean of * jiz
+ does not shift, we will have the following equation
*
* ****
jijizjizzjiz kkkk++++
≤⇒⋅+≤⋅+ σμσμ (4.25)
which is equivalent to
∗++
≤ * * jiji
PP (4.26)
At time it , hj * Δ (the optimal time interval for next data collection) and ∗
+
* jiP (the optimal
posterior probability control limit at next data collection epoch) are calculated by dynamic
programming. At time * jit
+ after the new data are collected, the posterior probability * ji
P+
are
calculated using Equation (4.14). The following policy is applied to decide if inspection will be
performed or not:
a) No inspection is needed, if ∗++
≤ * * jiji
PP ;
b) Stop and inspect the system immediately, if ∗++
> * * jiji
PP
If the posterior probability * jiP
+ is larger than the optimal control limit ∗
+
*jiP , which means the
mean of the residual observation may shift, so the inspection will be applied to check if the gear
defects are formed or not. If ∗++
≤ ** jiji
PP , the gear should be still in good state, so no
inspection will be applied. The time interval for the next data collection and the control limit
will be decided using all the available information until the time * jit
+.
75
4.4 Cost Function for Gear Deterioration Process The cost function considered in this thesis has a similar structure as the one considered in
Tagaras [25]. When jit + is less than the length of production run Ht , the expected cost incurred
between it and jit +
( )[ ]
( )
( )⎭⎬⎫
−⋅⎥⎦
⎤⎢⎣
⎡−Δ+−
−Δ−+
+−++⎩⎨⎧⋅−+
−⋅Δ++⋅+=
Δ−
Δ−Δ−
+Δ−
+Δ−
++
)1(
)1(1)1(
))(( )1()( 1
)()(),,(
2 3 2 1
1
2 1 *
1
1 *
0
0
2 3 2 2 1 *
1 0 *
1
1 1
1 1
CCve
ejje
CCkeCkeCP
CCvjCCkCPPjPV
h
hh
hh
j
jh
hj
jij
jij
i
hjiiijii
ν
νν
νν
νν
αα
α
(4.27)
where the probability of false alarm (type I error)
∫+−
∞−
−++ =−=
)( 2/*
* 0
* 2
21)()( jikd u
jiji duekdkπ
α Φ (4.28)
and )( * 1 jik +α is the probability of detecting a mean shift (1- Prtype II error) when there are
some minor defects initiated:
∫+
∞−
−++ =−=
* 2 - 2/*
*
1 21)()( jik u
jiji duekkπ
α Φ (4.29)
with
( )
( ) ( )( ) ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−⋅−
−⋅+=
Δ⋅⋅∗+
Δ⋅⋅∗+
+
ij v
ji
ij v
jiji
PeP
PeP
ddk
h
h
-
-
*
11 1
1ln1
2 *
1*
*
1*
* (4.30)
76
When Hji tt =+ , the expected cost becomes
( )
( ) ( ) ⎥⎦
⎤⎢⎣
⎡−Δ+−
−Δ−⋅−⋅−+
Δ−=
Δ−
Δ−Δ−
)1()1(1)1(1
),(
1
1 1
1
1
2 3 2
2 3 2
h
hh
j
jh
hj
i
hiii
eejjeCCvP
jCCvPPjV
ν
νν
νν (4.31)
In the next section, we will show the derivation of the above cost functions.
4.4.1 Cost Function before the End of the Production Run
At time it , the condition monitoring data are collected and analyzed. The posterior probability
of the process in warning state iP is calculated using the new residual observation )( ity
obtained at it , and control limit ∗ iP is calculated at the previous data collection epoch. When
* )( izzi kz ≤− σμ , we will decide the epoch for the next data collection Hji tt <+ , where
1 2 1 −−∈ iH,,,j K . If the gear's actual state was warning state at it and inspection is applied
at jit + , the cost includes three components:
1) 0 C is the cost for collecting and analyzing the condition monitoring data at jit + ;
2) )( )( 2 1 *
1 CCk ji +⋅+α is the cost for applying inspection and performing preventive
maintenance when the Bayesian control chart gives a true alarm (such that the gear system
was in warning state);
3) ( ) 2 3 2 CCvj h −⋅Δ is the cost due to the gear running in warning state, which is calculated
by multiplying the time interval between it and jit + by the loss rate when the gear
transmission system operates in warning state.
If the gear actual state was in good state at time it , the cost includes four components:
1) 0 C is the cost for collecting and analyzing the condition monitoring data at jit + ;
77
2) 1 *
0 )( 1 Cke ji
j h+
Δ− αν is the cost for false alarm. hje 1 Δ−ν is the probability that the gear
deteriorating process stays in good state in the time interval hj Δ . )( * 0 jik +α is the
probability that the control chart signals a false alarm at time jit + . The control chart made
the decision to inspect the system, but the gear system is still in a good state at the time of
inspection.
3) ))(( )1( 2 1 *
1 1 CCke ji
j h +− +Δ− αν is the cost for true alarm. )1( 1 hje Δ−− ν is the probability that
the gear deteriorating process transits into warning state in the time interval hj Δ . )( * 1 jik +α
is the probability that the control chart signals a true alarm at time jit + . The control chart
made the decision to inspect the system, and the gear system transits into a warning state
before the time of inspection.
4) ( )2 3 2 1
1
)1()1(1)1(
1
1 1 CCv
eejje
h
hh
j
jh
hj −⋅⎥
⎦
⎤⎢⎣
⎡−Δ+−
−Δ⋅− Δ−
Δ−Δ−
ν
νν
νν is the cost of the gear running in
warning state. )1( 1 hje Δ−− ν is the probability that the gear deteriorating process transits into
warning state in the time interval hj Δ . ⎥⎦
⎤⎢⎣
⎡−Δ+−
−Δ Δ−
Δ−
)1()1(1
1
1
1
1
h
h
j
jh
h eejj ν
ν
νν is the average time
the process is in the warning state within the time interval hj Δ . ( )2 3 2 CCv − is the loss rate
when the gear transmission system operates in warning state.
Given the process is in good state at time it , we derive the formula of the average time that the
gear deteriorating process is in the warning state within the time interval hj Δ as
( ) ( )1 2 1 2 )(,)(|E)(,)(|E 1 2
stXstXjstXstX ijishijis ==−Δ=== ++ ττ (4.32)
78
where ( )1 2 )(,)(|E1
stXstX ijis ==+τ is the average time that the process stays in good state
within the time interval hj Δ , given the process is in good state at time it , and can be obtained
by (Duncan [5])
( ) ( )( )
)1()1(1
1;
1;)(,)(|E
1
1
1
1 1
1
1 1
1
1 1 1
1 1 11
1
1
1
0
1
0
1
0 1
0 1 1 2
h
h
h s
h s
h
h
j
jh
j
s
j
ss
j
s ss T
j
s ss Tsijis
eej
de
de
dv μf
dv μfstXstX
Δ−
Δ−
Δ −
Δ −
Δ
Δ
+
−Δ+−
=
⋅=
=
====
∫∫∫∫
ν
ν
τν
τν
νν
τν
τντ
ττ
ττττ
(4.33)
4.4.2 Cost Function at the End of the Production Run
The cost function for jit + with Hji tt =+ is expressed by Equation (4.31). The first term on the
right side of the equation is the cost when the process is already in warning state at time it . The
second term is the expected cost when the process transits from good state to warning state
within the time interval between it and Ht . If the process starts in good state at time it , and no
transition occurs between it and Ht , the gear will be in good state at time Ht , so its cost is 0.
4.5 Dynamic Programming to Optimize the Control Limit and the Data Collecting Intervals
In section 4.4, a cost function is set up to describe the expected costs between it and jit + , and
the cost functions are also the function of * jiP + and hj Δ . In this section, a dynamic
programming will be applied to calculate the optimal hj *Δ and *
* jiP
+ by minimizing the total
expected cost from time it to the end of the production run Ht .
79
First, the minimum expected total cost from time it to the end of the production run Ht is
defined recursively by
[ ] ),(V min arg),,( *
,
* *
*
* jijiiPj
ijii GEjPPjPGji
++++=
+
(4.34)
where 1 , ,2 ,1 −= Hi K and 1 , ,2 ,1 −−= iHj K . If Hij =+ , then the expected cost is the cost
at the end of the production run, where
( )
( ) ( ) ⎥⎦
⎤⎢⎣
⎡−Δ+−
−Δ⋅−⋅−⋅−+
Δ−=
Δ−
Δ−Δ−
)1()1(1)1(1
),(
1
1 1
1
1
2 3 2
2 3 2
h
hh
j
jh
hj
i
hiii
eejjeCCvP
jCCvPPjG
ν
νν
νν
(4.35)
One possible approach to solve Equation (4.34) is by considering all the possible situations.
However, the amount of computation will become considerably large. The dynamic
programming is very useful approach to solve the equation like the Equation (4.34) with much
less effort than by exhaustive enumeration. Using a backward recursion algorithm, dynamic
programming starts with the problem in the last stage and finds the optimal solution for this
smaller problem. It then includes the previous state into the problem, finding the current optimal
solution from the preceding one until the original problem is solved entirely (e.g. Ross [47]).
The basic features that characterize dynamic programming problems are presented:
1) The optimal values of decision variables are to be determined by dynamic programming (i.e.
j and * jiP + );
2) The state variables describe the evolution of the gear deteriorating process (i.e. the posterior
probability of the process in warning state iP );
80
3) The recursive relationship describes what the state variables will be in the next stage, given
the state variables and decision variables at current stage (i.e. Equation (4.14) and Equation
(4.15));
4) The objective function is a mathematical function to be optimized(i.e. Equation (4.34) and
Equation (4.35)).
In order to find the optimal epoch * jit
+ for the next data collection and the control limit *
* jiP
+ at
stage 1 , ,2 ,1 −= Hi K , the dynamic programming model is formulated as follows:
),( Minimize * jPG jii + , (4.36)
subject to
iHjHi
−=−=
, ,2 ,11 , ,2 ,1
K
K
Since dynamic programming is only suitable to solve discrete problems, it is necessary to
discretize these continuous variables before we apply dynamic programming to find the optimal
inspection schedule for the gear deterioration process. The data collection interval is already a
discrete variable as hj Δ with iHj −= , ,2 ,1 K . The state variable jiP + is divided into n equal
intervals from 0 to 1 with the length n/1 : the discretized 0 =+ jiP when 01 =+iP , the discretized
nP ji 2/1 =+ when ( ]nP ji /1 ,0 ∈+ , the discretized nP ji 2/3 =+ when ( ]nnP ji /2 ,/1 ∈+ , ..., and
the discretized nnP ji 2)12( −=+ when ( ]1 ,)1( nnP ji −∈+ .
When * * *jiji PP
++ > , the inspection will be performed. If this is a false alarm, the posterior
probability is reset to 0. If it is a true alarm, repair or replacement will be applied. After the
81
preventive maintenance, the posterior probability will also be reset to 0. Hence, the posterior
probability * jiP + will be reset to 0 when the posterior probability * jiP
+ exceeds the control limit
* *ji
P+
, and the probability of 0 * =+ ji
P can be expressed by the following equation
( )[ ] ( ) )(1 )( 11 , ,0 * * 1
* * * 0
*
*
1 *
1 * * jii
jjii
jjiiji kPekPekjPP hh
+Δ−
+Δ−
++−+−−== αα ννPr (4.37)
where
( )
( ) ( )( ) ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−⋅−
−⋅+=
Δ⋅⋅∗+
Δ⋅⋅∗+
+
ij v
ji
ij v
jiji
PeP
PeP
ddk
h
h
-
-
*
11 1
1ln1
2 *
1*
*
1*
* (4.38)
Except for 0 * =+ ji
P , the probability of the discrete *ji
P+
is equivalent to the continuous *ji
P+
in
the interval ( ]
** , jiji PP ++ . The probability of the posterior probability *ji
P+
in the interval
( ]
** , jiji PP ++ can be derived as
( )[ ] ( ) ( ) ( )( )[ ] ( ) ( ) ( )
⎭⎬⎫
⎩⎨⎧ −+−−−
⎭⎬⎫
⎩⎨⎧
−+−−−=
+≤−+≤=
+≤<+=
≤<
+Δ−
+Δ−
+Δ−
+Δ−
++++++
++++
++++
1 11 -
1 11
, , , ,
, ,
, ,
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*1
*
*1
* *
1 * *
1
* * * *
* * *
* *
**
jiij
jiij
jiij
jiij
jiizjizjijiizjizji
jiizjizjizjiz
jiijijiji
kPedkPe
kPedkPe
PjPσkμzPjPσkμz
PjPσkμzσkμ
PjPPPP
hh
hh
ΦΦ
ΦΦ
PrPr
Pr
Pr
νν
νν
(4.39)
where
82
( )( ) ( )( )
( )( ) ( )( )
( )( ) ( )( )⎪
⎪⎪
⎩
⎪⎪⎪
⎨
⎧
<<⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−⋅⋅+
≤<⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−⋅⋅+
=
<⎥⎥⎦
⎤
⎢⎢⎣
⎡
−−⋅−
−⋅+=
+++Δ−+
Δ−+
+++Δ−+
Δ−+
+
++Δ−+
Δ−+
+
* * *
*
* *
*
* *
* * * *
*
*
*
*
* *
* *
*
*
*
*
* * *
if , 11 1
1 ln1
2
if , 11 1
1ln12
if ,11 1
1ln1
2
1
1
1
1
1
1
jijiji
ij
ji
ij
ji
jijiji
ij
ji
ij
ji
ji
jiji
ij
ji
ij
jiji
PPPPeP
PePd
d
PPPPeP
PePd
d
k
PPPeP
PePd
dk
h
h
h
h
h
h
ν
ν
ν
ν
ν
ν
(4.40)
and
* *
* *
* *
* * * if ,0 , , jijijiijijiji PPPjPPPP ++++++ >=≤<Pr (4.41)
When the lower bound of posterior probability * jiP + is larger than the control limit * *
jiP + , the
process will be restored and * jiP + is reset to 0, so the probability in Equation(4.41) is equal to 0.
After we calculated the control limits and next data collection interval using dynamic
programming, the following calculations are required at each data collection epoch to monitoring
the state of the gear deterioration process:
Setp 1. To collect the condition monitoring data and extract residual observation )( ity using
the approach introduced in chapter 2 and chapter 3;
Setp 2. To compute iz using Equation (4.13) from the residual observation )( ity ;
Setp 3. To compute the posterior probability iP using Equation (4.14) from the residual
observation iz ;
83
Setp 4. To compare the posterior probability iP with the control limit * iP that is obtained from
previous data collection epoch. If * ii PP ≤ , the optimal time interval * ji
t+
and control
limit * * ji
P+
for the next data collection are decided using the dynamic programming. On
the other hand, if * ii PP > , the gear system is stopped and inspected immediately. When
gear defects are found, repair or replacement is applied. If no defect is found during
inspection, the control chart gives a false alarm. Then, the posterior probability will be
reset to 0. The optimal time interval * jit
+and control limit *
* jiP
+ for the next data
collection are calculated using the dynamic programming;
Setp 5. Continue to Step 1 until the Hjitt * =+
;
Setp 6. Take a full inspection for the gear transmission system at Ht , repair or replacement will
be performed if necessary. Then, the system will be reset for a new production run.
4.6 Numerical Results In this section, two examples are provided to demonstrate the numerical results by applying the
Bayesian process control model to optimize the gear system inspection policy. The POHM
model parameters, which were obtained from chapter 3, are used in both examples, i.e.
0728.01 =v , 191.02 =v , ⎥⎦
⎤⎢⎣
⎡=
8119.12322.1
1 μ , ⎥⎦
⎤⎢⎣
⎡=
6171.372735.26
2 μ , and ⎥⎦
⎤⎢⎣
⎡==
6932.216749.166749.164833.13
1 ΣΣ .
The data collection interval is considered as multiples of certain time unit 5.0 =Δ h and the
length of production run H is 38. The state variable iP is divided into 100 equal intervals of
length 0.01: the discretized 0 =iP when 0 =iP , the discretized 005.0 =iP when
( ]01.0 ,0 ∈+ jiP , the discretized 15.0 =+ jiP when ( ]02.0 ,01.0 ∈+ jiP , ..., and the discretized
84
995.0 =+ jiP when ( ]1 ,99.0 ∈+ jiP . The range of the control limit * jiP + is from 0.01 to 0.99,
which is divided into 49 equal subintervals, so the decision variable is chosen from the
discretized value 99.0...,,03.0,01.0* ∈+ jiP .
The data collection cost 0 C is chosen to be 1; the cost of false alarm 1 C is 50; the cost of
preventive maintenance 2 C is 75, and the breakdown cost 3 C is 150. The following figure is
the optimal costs calculated using the above parameters.
Figure 4.1 Total expected cost at each production stage with breakdown cost 150
From the above figure, we observe that the optimal cost is non-decreasing in the posterior
probability iP and the production running time it . The total expected cost at 0 t is 100.53
At the first stage 00 =t , the posterior probability 0 P is 0 because we assume the gear system
starts in good state. At time 5.631 =t , the following figures show the simulation results of the
optimal time interval for next data collection hj * Δ and the optimal control limit *
*31 jP + .
85
Figure 4.2 Optimal data collection intervals and control limits with breakdown cost 150
When the current stage posterior probability becomes large, the gear system may have already
been in warning state or will transit into warning state soon. In order to avoid the breakdown
cost, the next data collection time interval will decrease. The Bayesian process control model
makes the same decision as shown in Figure 4.2 (a).
Figure 4.2 (b) shows that the first point 0.11 is the probability that the gear system will be reset
at the next data collection epoch ( 5.8413 =+t ). Using Figure 4.2 (a) and (b), the time interval for
the next data collection and the control limit for the next data collection epoch can be decided.
For example, the posterior probability is calculated as 05.031 =P , which corresponds to the sixth
point. Then the next data will be collected at 8313 =+t , and the posterior probability control limit
will be 13.0 313 =∗
+P .
86
In Figure 4.2 (b), the dot-dash line is a reference line where control limit at next data collection
epoch is equal to current posterior probability. When the posterior probability is less than 0.17,
the control limit fluctuates around 0.14. When the posterior probability exceeds the point 0.17,
the control limit follows an increasing trend with respect to posterior probability, but the
increasing of control limit is slower than the increasing of posterior probability, which means the
difference between a posterior probability and its corresponding control limit for the next data
collection epoch will be larger as the posterior probability increase. For example, if the current
posterior probability is 0.3, the control limit at next data collection epoch ( 714 =t ) will be 0.25.
The difference between the current posterior probability and the control limit at next data
collection epoch is 0.05. If the current posterior probability is 0.6, the control limit at the next
data collection epoch ( 714 =t ) will be 0.51. The difference between the current posterior
probability and the control limit at next data collection epoch is 0.09.
If we increase the breakdown cost 3 C to 400 and keep all other parameters unchanged, the
following figure shows the optimal costs.
Figure 4.3 Total expected cost at each production stage with breakdown cost 400
87
The trend of the total expected cost does not change. The total expected cost at 0 t is 156.69,
which is higher than total expected cost when the breakdown cost is 150. The following figures
show the simulation results for the optimal control limit and the optimal time interval for next
data collection with respect to the posterior probability at current stage.
Figure 4.4 Optimal data collection intervals and control limits with breakdown cost 400
Figure 4.4 (a) shows that all the time intervals for next data collection are 0.5 hour except the
first point. They are less or equal to the time intervals when the breakdown cost is 150. It is
reasonable to increase the data collection frequency when the breakdown cost increases.
Figure 4.4 (b) shows that the control limit follows an increasing trend with respect to posterior
probability, but the increasing of control limit is slower than the increasing of posterior
probability, which is similar as the results from Figure 4.2 (b). However, the difference between
the control limit and current posterior probability in Figure 4.4 (b) is larger than the difference in
88
Figure 4.2 (b). For example, if the current posterior probability is 0.3, the control limit at next
data collection epoch ( 714 =t ) will be 0.19. The difference is 0.11, which is greater than the
difference (0.09) when the breakdown cost is 150. With the same posterior probability, the
control limit will be lower for a higher breakdown cost ( 4003 =C ) gear system than for the
lower breakdown cost ( 1503 =C ) gear system.
4.7 Conclusions In order to monitor and control the gear deteriorating process, the Bayesian process control
model was derived to optimize the control limit and data collection epoch by minimizing the
total costs of a gear transmission system. We assume the gear production run is finite. After the
condition monitoring data are collected at each data collection epoch, the residual observation is
extracted from the vibration data collected by the accelerometers A02 and A03. Since the
vibration data were collected by two accelerometers, a multivariate Bayesian process control
model was developed in this research.
Firstly, we developed a recursive formula to calculate the posterior probability of gear in
warning state using all the available information. Then, we derived the cost function with
respect to the time interval for the next data collection and the process control limit at next data
collection epoch. Using dynamic programming, the optimal time interval for the next data
collection and the control limit were calculated by minimizing the total expected cost from the
current stage to the end of the production run. In the cost function, we considered the data
collection cost, inspection cost, maintenance cost, and the system breakdown cost. Finally, the
optimal Bayesian process control parameters were calculated using the POHM model parameters.
89
Chapter 5 Conclusions and Future Research
In this thesis, we developed an early fault detection scheme and an optimized CBM strategy
based on the condition monitoring data collected from the gear transmission system under
varying load. An ARX model with F-test was proposed for detecting the early gear defects. By
minimizing the total expected cost until the end of the production run, two decision variables
were optimized for the CBM strategy: the time interval for next data collection, and the control
limits at next data collection epoch. The purpose of this chapter is to conclude the results
obtained from Chapters 2, 3, and 4, as well as to propose future research related to this study.
5.1 Conclusions In Chapter 2, a gearbox fault detection scheme was proposed using vibration data collected under
varying load. A modified TSA algorithm was introduced to compute a TSA signal for ARX
model building. The ARX model was fitted using the filtered TSA envelope as an external input
and the TSA signal as the output. The parameters of the ARX model were estimated using the
least squares algorithm, and the order of the ARX model was determined using BIC criterion.
The F -test was applied to the variances of the ARX model residuals for each data file, and the
corresponding p-value was proposed as a quantitative indicator of fault advancement over a full
lifetime of a gearbox. Based on the results obtained, we concluded that the method presented in
Chapter 2 can detect a gear fault occurrence earlier than the other methods mentioned in the
chapter. Also, our method is faster which makes it more suitable than the other methods for on-
line gearbox fault detection and damage localization.
In Chapter 3, a POHM model was built to describe the relationship among the residual
observations and the three states of a gear deterioration process. Once the POHM model was
90
well defined, two sets of parameters were needed to be estimated: 1) the relation parameters
) , , , ( 2 2 1 1 ΣμΣμΨ = that described the stochastic relationship between the residual
observations and the unobservable operational states of gearbox transmission system, and 2) the
transition rate parameters ) , ,( 3 2 3 1 2 1 λλλ=Λ that described the evolution of the gear
deterioration process. It was very difficult to find a feasible solution by directly maximizing the
likelihood function of the POHM model. The EM iterative algorithm was introduced to make
the parameter estimation possible by maximizing a pseudo likelihood function. Finally, a
POHM model was built using full lifetime condition monitoring data obtained from the MDTB
operating from good state to breakdown state. The condition monitoring data files with equal
data collection interval were selected, and residual observations were extracted from motor
current data and vibration data. The EM iterative algorithm provided a feasible solution for the
pseudo-likelihood function of the POHM model. The numerical iteration procedure generated a
reasonable parameter estimates compared to our results from subjective judgment. It was
confirmed that the POHM model can be applied to estimate gear hidden states based on the
condition monitoring data.
In Chapter 4, a multivariate Bayesian process control model was derived to optimize the control
limit and data collection epoch by minimizing the total costs of a gear transmission system. We
assumed that the gear production run is finite. Firstly, we developed a recursive formula to
calculate the posterior probability of gear in warning state using all the available information.
Then, we derived the cost function with respect to the time interval for the next data collection
and the process control limit at next data collection epoch. Using dynamic programming, the
optimal time interval for the next data collection and the control limit were calculated by
minimizing the total expected cost from the current stage to the end of the production run. In the
91
cost function, we considered the data collection cost factor, inspection cost factor, maintenance
cost factor, and the system breakdown cost factor. Finally, the optimal Bayesian process control
parameters were calculated using the POHM model parameters.
5.2 Future Research 5.2.1 Application of the Bayesian Process Control Model to Planetary Gear System
In this thesis, the typical parallel gear transmission system was studied. However, there is
another gear transmission system that is widely applied in industry, i.e. planetary gear
transmission system. The schematic representation of a planetary gear system is shown in Figure
5.1, which consists of three planets, a sun gear, an annulus gear and a carrier. In general, the sun
gear is connected to the drive shaft, and the sun gear drives three planetary gears, which are
contained within an internal toothed ring gear. The planetary gears are mounted on the planetary
carrier. The planetary carrier is connected with output shaft. When the sun gear rotates, it drives
the three planetary gears inside the ring gearboxes.
Figure 5.1 Schematic Representation of Planetary Gear Transmission System
The balanced planetary kinematics and the associated load sharing make the planetary gear
transmission system truly ideal for industrial applications. In our lab, we have already designed
and built a test rig for future study of a planetary gear train. The test rig includes a three stage
92
planetary gear train, drive motor, load motor, motor control system, accelerometers and data
acquisition systems as shown in the following picture.
Figure 5.2 Lab Planetary Gear Train Test Rig
The condition monitoring data were collected from when the gear system in good state. In future,
the optimized CBM strategy introduced in this thesis will be adjusted for monitoring the
planetary gear system. First, the approach for extracting ARX model residuals should be
modified based on the mechanism of the planetary gear system. Then the POHM model and
Bayesian process control model can be directly applied based on the new ARX model residuals.
Since fifteen spare planetary gearboxes are available for future testing, the POHM model
parameters can be estimated using 15 full lifetime condition monitoring histories.
5.2.2 Relaxation of Model Assumptions
During the building of the POHM model and Bayesian process control model, we made some
assumptions on the gear deterioration process. Most of the assumptions are made based on the
real industrial situation. However, there are four assumptions that can be relaxed:
1) The time spent in each state was assumed to follow an exponential distribution. Based on
this assumption, the continuous Markov model was used to describe the gear deterioration
93
process. If the assumption is relaxed, a semi-Markov model may be built to model the gear
deterioration process, and a partially observable hidden semi-Markov model will be used to
estimate the parameters for the Bayesian process control.
2) Three states were considered in the POHM model, and the cost for each state was involved in
the total expected cost function. If more states will be introduced into POHM model, the
cost function will be closer to real situation, so that the optimized CBM strategy will be
more economical than the three-state model. Building a POHM model to describe a four-
state or multi-state process will not be very difficult, but the estimation of the POHM model
parameters presents a challenge.
3) In this research, we assumed a finite production run. Dynamic programming was applied to
find the optimal time interval for minimizing the total expected cost over a finite production
run. In the future, the gear deterioration process will be considered as an infinite production
run, and the total expected cost will be replaced by an expected average cost per unit time.
4) The Bayesian process control model was used to monitor the mean shift of the gear
deterioration process, and the process covariance matrix was assumed to be unchanged. In
chapter 3, we have observed that that the covariance matrix shifted when the process
transited from good state to warning state. If we can consider the shift of the Bayesian
process mean and covariance matrix, the accuracy of monitoring the gear states transition
will be improved. This consideration will also lead to reduction in the cost due to false alarm,
and the cost of the gear running in warning state. In future research, the assumption of
unchanged covariance matrix can be relaxed in two steps. First the Bayesian process control
model can be built based on the assumption that 1 2 ΣΣ ⋅= n , where n is a scalar. Then two
general covariance matrices 1 Σ and 2 Σ can be considered.
94
Bibliography [1] A. Daidone, F.D., Giandomenico, A. Bondavalli, S. Chiaradonna, Hidden Markov Models
as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution,
25th IEEE Symposium on Reliable Distributed Systems (2006) 245-256.
[2] A. Dempster, N. Laird and D. Rubin, Maximum Likelihood from Incomplete Data via the
EM Algorithm, Journal of the Royal Statistical Society, Series B, 39(1) (1977) 1-38.
[3] A. Hambaba, H. E. Huff, U. Kaul, Time-Scale Local Approach for Vibration Monitoring,
2001 IEEE Aerospace Conference Proceedings, 7 (2001), 3201-3209.
[4] A. Hambaba, H.E. Huff, U. Kaul, , Detection and Diagnosis of Changes in the Time-Scale
Eigenstructure for Vibrating Systems, IEEE Aerospace Conference Proceedings, 7(2001),
3211-3220.
[5] A.J. Duncan, The Economic Design of X -Charts Used to Maintain Current Control of a
Process, Journal of the American Statistical Association, 51 (1956) 228-242.
[6] B. Liu, V. Makis, Gearbox failure diagnosis based on vector autoregressive modelling of
vibration data and dynamic principal component analysis, IMA Journal of Management
Mathematics, 19 (2008) 39-50.
[7] C. Bunks, D. McCrthy, T. Al-Ani, Condition-Based Maintenance of Machines using
Hidden Markov Models, Mechanical System and Signal Process,14 (2000) 597-612.
[8] C. Kar, A.R. Mohanty, Vibration and current transient monitoring for gearbox fault
detection using multiresolution Fourier transform, Journal of Sound and Vibration, 311 (1-2)
(2008) 109-132.
[9] C.K. Tan, D. Mba, Identification of the Acoustic Emission Source during a Comparative
Study on Diagnosis of a Spur Gearbox,” Tribology International, 38(2005), 469-480.
95
[10] C.M. Jarque, A.K. Bera, Efficient tests for normality, homoscedasticity and serial
independence of regression residuals, Economics Letters 6 (3) (1980) 255–259.
[11] C.M. Jarque, A.K. Bera, Efficient tests for normality, homoscedasticity and serial
independence of regression residuals: Monte Carlo evidence, Economics Letters 7 (4) (1981)
313–318.
[12] D. Lin, M. Wiseman, D. Banjevic and A.K.S. Jardine, An Approach to Signal Processing
and Condition-Based Maintenance for Gearboxes Subject to Tooth Failure, Mechanical
Systems and Signal Processing, 18(2004), 993-1007.
[13] D. Lin, V. Makis, Recursive Filters for a Partially Observable System Subject to Random
Failure,” Advances in Applied Probability, 35(1) (2003) 207-227.
[14] D. Lin, V. Makis, Filters and Parameter Estimation for a Partially Observable System
Subject to Random Failure with Continuous-range Observations,” Advances in Applied
Probability, 36(4) (2004) 1212-1230.
[15] D. Lin, V. Makis, On-Line Parameter Estimation for a Partially Observable System Subject
to Random Failure, Naval Research Logistics, 53(5) (2006) 477-483.
[16] D.P. Townsend, Dudley’s Gear Handbook, 2nd Edition, McGraw-Hill, Inc 1992.
[17] D. Wang, Q. Miao, R. Kang, Robust health evaluation of gearbox subject to tooth failure
with wavelet decomposition, Journal of Sound and Vibration, 324 (3-5) (2009) 109-132.
[18] E.L. Porteus, A. Angelus, Opportunities for improved statistical process control,
Management Science, 43 (1997) 1214-1228.
[19] E.P.G. Box, M.G. Jenkins, C.G. Reinsel, Time series analysis: forecasting and control 4th
Edition, John Wiley & Sons, Inc., 2008.
[20] F. LeGland, L. Mevel, Fault Detection in Hidden Markov Models: A Local Asymptotic
Approach, Proceedings of the 39th IEEE conference on Decision and Control, (2000).
96
[21] F.V. Nelwamondo, T. Marwala, U. Mahola, Early Classifications of Bearing Faults using
Hidden Markov Models, Gaussian Mixture Models, Mel-Frequency Cepstral Coefficients
and Fractals, International Journal of Innovative Computing, Information and Control, 2(6)
(2006) 1281-1299.
[22] G. Dalpiaz, A. Rivola, R. Rubini, Effectiveness and sensitivity of vibration processing
techniques for local fault detection in gears, Mechanical System and Signal Processing 3
(2000) 387–412.
[23] G. Nenes, G. Tagaras, The economically designed two-sided Bayesian X control chart,
European Journal of Operational Research, 183 (2007) 263-277.
[24] G. Schwarz, Estimating the dimension of a model, The Annals of Statistics, 6 (1978) 461-
464.
[25] G. Tagaras, A dynamic programming approach to the economic design of X-charts, IIE
Transactions, 26 (1994) 48-56.
[26] G. Tagaras, Dynamic control charts for finite production runs, European Journal of
Operational Research, 91 (1996) 38-55.
[27] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd
International Symposium on Information Theory, (Petrov, B. N. and Csaki, F. eds),
Akademiai Kiado, Budapest, (1973) 267-281.
[28] H.M. Taylor, Markovian sequential replacement processes, Annals of Mathematical
Statistics, 36 (1965) 1677-1694.
[29] H.M. Taylor, Statistical control of a Gaussian process, Technometrics, 9 (1967) 29-41.
[30] H. Ocak, K.A. Loparo, A new Bearing Fault Detection and Diagnosis Scheme Based on
Hidden Markov Modeling of Vibration Signals,” Acoustic, Speech and Signal Processing,
2001, Proceedings, (2001).
97
[31] J.R. Davis, Gear Materials, Properties, and Manufacture, ASM International, 2005.
[32] L. Ljung, System identification: theory for the user, 2nd Edition, Prentice Hall, Inc., 1999.
[33] L. Qu, J. Lin, A Difference Resonator for Detecting Weak Signals, Measurement, 26(1999),
69-77.
[34] M.A. Girshich, H. Rubin, A Bayes’ approach to a quality control model. Annals of
Mathematical Statistics, 23 (1952) 114-125.
[35] M.E Badaoui, F. Guillet, J. Danière, New Applications of the Real Cepstrum to Gear
Signals, Including Definition of a Robust Fault Indicator, Mechanical Systems and Signal
Processing, 18(2004), 1031-1046.
[36] M. Ge, R. Du, Y. Xu, Hidden Markov Model based fault diagnosis for stamping processes,”
Mechanical Systems and Signal Processing, 18 (2004) 391-408.
[37] M.J. Kim and V. Makis, Parameter Estimation in a CBM Model with Multivariate
Observations, Proceedings of the 2009 Industrial Engineering Research Conference,
(2009)1910-1915
[38] MDTB data (Data CDs: test-runs #14), Condition–based maintenance department, Applied
Research Laboratory, The Pennsylvania State University, 1998.
[39] P.D. McFadden, Detecting Fatigue Cracks in Gears by Amplitude and Phase Demodulation
of the Meshing Vibration, ASME Transactions Journal of Acoustics Stress and Reliability in
Design 108 (1986) 165–170.
[40] P.D. McFadden, Examination of a technique for the early detection of failure in gears by
signal processing of the time domain average of the meshing vibration, Mechanical systems
and signal processing 1 (1987) 173–183.
[41] P.L. Carter, A Bayesian approach to quality control. Management Science, 18 (1972) 647-
655.
98
[42] Q. Miao, V. Makis, Condition Monitoring and Classification of Rotating Machinery using
Waelets and Hidden Markov Models, , Mechanical System and Signal Process,21 (2007)
840-855.
[43] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, 6th Edition,
Pearson Prentice Hall, (2007).
[44] R.R. Lawrence, A Tutorial on Hidden Markov Models and Selected Application in Speech
Recognition, Proceedings of the IEEE, 77(2) (1989) 257-286.
[45] R.E. Walpole, H.R. Myers, L.S. Myers, K. Ye, Probability & statistics for engineers &
scientists, 8th Edition, Person Education, Inc. 2007.
[46] S.M. Kay, Modern spectral estimation: theory and application, Prentice Hall, 1988.
[47] S.M. Ross, Introduction to Stochastic Dyanmic Programming, Academic Press, Inc, (1982).
[48] S.M. Ross, Introduction to Probability Models, 9th Edition, Elsevier Inc., (2007).
[49] S. Rofe, Signal processing methods for gearbox fault detection, Defence Science and
Technology Organization Technical Report, DSTO-TR-0476, Australia, 1997.
[50] T. Toutountzakis, C.K. Tan, D. Mba, Application of Acoustic Emission to Seeded Gear
Fault Detection,” NDT & E International, 38(2005), 27-36.
[51] V. Makis, Multivariate Bayesian Control Chart, Operational Research, 56(2) (2008) 487-
496.
[52] V. Makis, Finite horizon multivariate Bayesian process control, European Journal of
Operational Research, 194 (2009) 795-806.
[53] W. Wang, A.K. Wong, Autoregressive model-based gear fault diagnosis, Journal of
Vibration and Acoustics, Transactions of the ASME, 124(2) (2002) 172-179.
[54] W. Wang, F. Ismail, F. Golnaraghi A Neuro-Fuzzy Approach to Gear System Monitoring,”
IEEE Transactions on Fuzzy Systems, 12(2004), 710-723.
99
[55] W.J. Wang, P.D. McFadden, Early detection of gar failure by vibration analysis-I.
Calculation of the time-frequency distribution, Mechanical Systems and Signal Processing,
7 (3) (1994) 289-307.
[56] W.J. Wang, P.D. McFadden, Application of wavelets to gearbox vibration signals for fault
detection, Journal of Sound and Vibration, 192 (5) (1996) 109-132.
[57] W.Q. Wang, F. Ismail, M.F. Golnaraghi, Assessment of Gear Damage Monitoring
Techniques using Vibration Measurements,” Mechanical Systems and Signal Processing,
15(2001), 905-922.
[58] X. Jiang, V. Makis, Optimal Replacement under Partial Observations, Mathematics of
Operations Research, 28(2), (2003) 382-394.
[59] X. Wang, V. Makis, M. Yang, A wavelet Approach to Fault Diagnosis of a Gearbox under
Varying Load Conditions, Journal of Sound and Vibration, 329(2010), 1570-1585.
[60] Y. Nikolaidis, G. Rigas, G. Tagaras, Using economically designed Shewhart and adaptive
X charts for monitoring the quality of titles, Quality and Reliability Engineering
International, 23 (2007) 233-245.
[61] Y. Zhan, V. Makis, Adaptive state detection of gearboxes under varying load conditions
based on parametric modelling, Mechanical Systems and Signal Processing, 20 (2006) 188-
221.
[62] Y. Zhan, V. Makis, A Robust Diagnostic Model for Gearboxes Subject to Vibration
Monitoring, Journal of Sound and Vibration, 290(2006), 928-955.
[63] Y. Zhan, V. Makis, A.K.S. Jardine, Adaptive State Detection of Gearboxes under Varying
Load Conditions Based on Parametric Modelling, Mechanical Systems and Signal
Processing, 20(2006), 188-221.
100
Appendix A: ARX model
ARX model is a time-series model that relates the exogenous input )(kx and output )(ky that
corrupted by noise )(kN . If we assume that the exogenous input process )(kx is generated
independently of the noise process )(kN with standard deviation Nσ , we can write the ARX
model as Box et al. [19]
)( )( )1( )( )( )2( )1( )(
s 1 0
2 1
ksbkωbkωbkωrkkkk r
Nxxxyyyy
+−−++−−+−+−++−+−=L
L δδδ
(A.1)
where 0≥b . Let B denote the backshift operator, which is defined by )1()( −= kk xxB ; hence
)()( bkkb −= xxB . The above difference equation can be written as
)( )( ) (
)( ) (
)( )( )( )(
)( )( )( )(
s 1 0
2 2 1
s 1 0
2 2 1
kbkωωω
k
kbkωbkωbkω
kkkk
bs
rr
s
rr
NxBBB
yBBB
NxBxBx
yByByBy
+−++++
+++=
+−++−+−+
+++=
L
L
L
L
δδδ
δδδ
(A.2)
If we define two operators by
( ) sss
1 0 BBBω ωωω +++= L (A.3)
( ) rrr
1 BBBδ δδ ++= L (A.4)
The ARX model can be rewritten as
( ) ( ) )( )( )( )( kkkk bsr NxBBωyBδy ++= (A.5)
The model order r , s and b are identified based on different criteria, and the unknown
parameters 1 δ , 2 δ , … r δ , 0 ω , 1 ω , ... r ω , 2 Nσ , which are estimated from the observation data.
101
Appendix B: F-test
The fundamentals of F-test are well presented e.g. in Walpole et al. [45], and the following will
provide only a brief summary useful for the analysis in this thesis. A continuous random
variable z has a F-distribution, with degrees of freedom 01 >ν and 02 >ν if its probability
density function is given by
( )[ ] ( )
( ) ( )( )
elsewhere.,0
,0, 1
)(
2/)(
2 1 2 21
1 21
2/)2( 2/2 1 2 1 2
1
,
2 1
1 1
2 1
⎪⎪⎪
⎩
⎪⎪⎪
⎨
⎧>
++
=+
−
zz
z
zfνν
νν
νννννν
ννννΓΓ
Γ
(B.1)
where ( ) •Γ is the Gamma Function defined by
( ) 0for
0
1 >=∫∞
α, dxexa
-x- αΓ (B.1)
An F-distribution is a continuous probability distribution with enormous applications in testing
whether the population variances of two samples are equal. Suppose there are two normal
populations with population variance 21 σ and 2
2 σ , and two samples X and Y are random
selected from these two populations respectively.
Sample X Sample Y
Samples size 1 n 2 n
Samples , , ,1 21 nXXX K=X , , ,
2 21 nYYY K=Y
Samples mean ∑=
=1
11
1 n
iiX
nX ∑
=
=2
12
1 n
iiY
nY
102
Samples variance ( )11
121
1
−
−=∑=
n
XXS
n
ii
( )
12
122
2
−
−=∑=
n
YYS
n
ii
Population variance 21 σ 2
2 σ
Then, the random variable
22
22
21
21
, //
2 1
SSF
σσ
νν = (B.2)
has a F-distribution with 11 1 −= nν and 12 2 −= nν degrees of freedom. To test the null
hypothesis 0 H that 22
21
σσ = against the alternative hypothesis 1 H that 22
21
σσ > , the
ratio 22
21 / SSf = is a value of the F-distribution. Therefore, the critical regions of significance
level α corresponding to the alternative hypothesis 1 H is )(2 1 , αννff > , that is
( ) ( ) )(1 ,1 22
21
2 1 α−−> nn
fSS (B.3)
If the above equation is satisfied, we can say null hypothesis 0 H is rejected at significance level
α and conclude that 21 σ is greater than 2
2 σ with )%1(100 α−⋅ confidence.