Early Fault Detection Scheme and Optimized CBM Strategy ...€¦ · Early Fault Detection Scheme...

Early Fault Detection Scheme and Optimized CBM Strategy for Gear Transmission System Operating under

Varying Loads

by

Ming Yang

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Mechanical and Industrial Engineering University of Toronto

© Copyright by Ming Yang (2011)

II

Early Fault Detection Scheme and Optimized CBM Strategy for

Gear Transmission System Operating under Varying Loads

Ming Yang

Doctor of Philosophy

Graduate Department of Mechanical and Industrial Engineering University of Toronto

2011

Abstract The development of fault detection schemes and optimized condition-based maintenance (CBM)

strategies for gear transmission systems has received considerable attention in recent years.

Most models considered the gear systems operating under constant load. Constant load

assumptions imply that changes in condition monitoring data are caused only by deterioration of

the gear systems. However, most real gear systems operate under varying loads and speeds

which affect the condition monitoring data signature of the system. This typically makes it

difficult to recognize the occurrence of an impending fault and to optimize a CBM strategy.

This thesis first presents a novel approach to detect and localize the gear failure occurrence for a

gear system operating under varying load conditions. An autoregressive model with exogenous

variables is fitted to the time-synchronously averaged (TSA) condition monitoring data when the

gear transmission system operated under various load conditions in good state. The fault

detection and localization indicator is calculated by applying F-test to the residual signals of the

ARX model. Then, the gear deteriorating process is modeled as a three state hidden continuous

Markov model with partial information, and the model parameters are estimated using ARX

model residuals. The pseudo likelihood function is maximized and the EM algorithm is applied.

III

Finally, a multivariate Bayesian process control is developed to optimize the decision variables

of the CBM strategy: the time interval for next data collection and the control limit for the

posterior probability calculated at next data collection epoch. The decision variables are

calculated by minimizing the total expected cost from current stage to the end of the production

run. Dynamic programming is applied to update the decision variables after new monitoring data

are acquired, and the maintenance decision is made by taking all available information into

account.

IV

Acknowledgments

I would like to take this opportunity to thank everyone who has helped me during the

development of this thesis.

I am very grateful to my supervisor Dr. Viliam Makis. It is his vast knowledge, considerable

enthusiasm and great patience that guided me throughout the development of the research. His

careful reviews of my manuscript helped me a lot in this thesis development. I also enjoyed and

appreciated the thoughtful insights and advice from the other members of my PhD committee Dr.

Jean Zu and Dr. Roy H. Kwon, and I would like to thank them for their reviews of my thesis.

Also, I would like to acknowledge Syncrude Canada Ltd., NSERC, and the University of

Toronto Fellowship for providing financial support for my research. In addition, I would like to

thank all the people currently and formerly at the Department of Mechanical and Industrial

Engineering for providing a great social and professional environment.

I would also like to express my appreciation to all the members of Quality, Reliability and

Maintenance Laboratory, and Vibration Monitoring, Signal Processing and CBM Laboratory,

especially Lawrence Yip, Jing Yu, Rui Jiang, ZhiJian Yin, and Michael J. Kim for their help and

friendship.

Finally, I am grateful to my parents for their constant support and encouragement in spite of the

distance, and to my beautiful wife who has stood by my side and always supported me in all of

my endeavors.

V

Table of Contents

Chapter 1 Introduction................................................................................................................. 1

1.1 Motivation........................................................................................................................... 1

1.2 Thesis Outline ..................................................................................................................... 2

1.3 Thesis Contributions ........................................................................................................... 6

Chapter 2 ARX Model-Based Gearbox Fault Detection and Localization Operating

under Varying Load................................................................................................................. 8

2.1 Introduction......................................................................................................................... 8

2.2 Experimental set-up and data collection ........................................................................... 10

2.3 Time-Synchronous Averaging Algorithm and Filtered Envelope Extraction .................. 14

2.4 ARX Model using Gear Vibration Data under Varying Load.......................................... 18

2.4.1 ARX Model based on the Modified TSA Signal and Filtered TSA Envelope ........ 19

2.4.2 ARX Model Order Identification and Parameters Estimation.............................. 21

2.4.3 Numerical Results ................................................................................................. 23

2.5 Hypothesis testing............................................................................................................. 26

2.6 Conclusions....................................................................................................................... 28

Chapter 3 POHM Model for Monitoring Gear Deterioration Condition ............................. 30

3.1 Introduction....................................................................................................................... 30

3.2 Model Formulation of Gear Deterioration Process........................................................... 33

3.2.1 Model Assumptions ............................................................................................... 33

3.2.2 Continuous-Time Markov Model for Gear Deterioration Process....................... 34

3.2.3 Stochastic Relationship between Residual Observation and Operational States. 37

3.2.4 POHM model ........................................................................................................ 39

VI

3.3 Parameter Estimation Using EM Algorithm..................................................................... 41

3.3.1 Random Breakdown Time of a Gear Deterioration Process ................................ 41

3.3.2 Sojourn Time of the Deterioration Process in Good State ................................... 44

3.3.3 Parameter Estimation using the EM Algorithm.................................................... 45

3.4 Numerical Results............................................................................................................. 49

3.4.1 Residual Observations Extraction ........................................................................ 49

3.4.2 POHM Parameters Obtained by Expert Judgment .............................................. 52

3.4.3 Estimated POHM Parameters using the EM Iteration Algorithm........................ 55

3.5 Conclusions....................................................................................................................... 60

Chapter 4 Bayesian Process Control to Optimize Gearbox CBM Policy .............................. 62

4.1 Introduction....................................................................................................................... 62

4.2 Problem Statement and Assumptions ............................................................................... 64

4.3 The Bayesian model.......................................................................................................... 67

4.3.1 Posterior Probability of Gear Deteriorating Process in Warning State .............. 68

4.3.2 Distribution of jiz + ............................................................................................... 70

4.3.3 Control Limit of Gear Deteriorating Process in Good State................................ 72

4.4 Cost Function for Gear Deterioration Process .................................................................. 75

4.4.1 Cost Function before the End of the Production Run........................................... 76

4.4.2 Cost Function at the End of the Production Run.................................................. 78

4.5 Dynamic Programming to Optimize the Control Limit and the Data Collecting

Intervals............................................................................................................................. 78

4.6 Numerical Results............................................................................................................. 83

4.7 Conclusions....................................................................................................................... 88

Chapter 5 Conclusions and Future Research........................................................................... 89

VII

5.1 Conclusions....................................................................................................................... 89

5.2 Future Research ................................................................................................................ 91

5.2.1 Application of the Bayesian Process Control Model to Planetary Gear System.. 91

5.2.2 Relaxation of Model Assumptions......................................................................... 92

Bibliography ................................................................................................................................ 94

VIII

List of Tables

Table 3.1 EM Algorithm Iteration Results: Mean Vectors 1 μ and 2 μ ....................................... 56

Table 3.2 EM Algorithm Iteration Results: Covariance Matrix 1 Σ ............................................. 57

Table 3.3 EM Algorithm Iteration Results: Covariance Matrix 2 Σ ............................................ 58

Table 3.4 EM Algorithm Iteration Results: States Transition Parameters Λ ............................. 60

IX

List of Figures

Figure 1.1 Schematic Representation of Parallel Gear Transmission System................................ 1

Figure 2.1 Mechanical Diagnostic Test Rig ................................................................................. 11

Figure 2.2 Locations of Accelerometers ....................................................................................... 11

Figure 2.3 (a) Gearbox Output Torque and (b) Gearbox Input Speed.......................................... 12

Figure 2.4 Broken Teeth in Test Run #14..................................................................................... 13

Figure 2.5 Vibration Data: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%

Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100%

Output Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50%

Output Torque. ........................................................................................................... 17

Figure 2.6 TSA Signals (Dotted Line) and Filtered TSA Envelop (Solid Line): (a) Healthy State

300% Output Torque, (b) Faulty State at 300% Output Torque; (c) Healthy State at

100% Output Torque, (d) Faulty State at 100% Output Torque; (e) Healthy State at

50% Output Torque, (f) Faulty State at 50% Output Torque..................................... 18

Figure 2.7 TSA Data for ARX Model Building: (a) Filtered TSA Envelopes used as an ARX

Model Input (b) TSA Vibration Data used as an ARX Model Output ...................... 23

Figure 2.8 ARX Model Order Selection Using BIC Criterion ..................................................... 24

Figure 2.9 Autocorrelation Function for ARX Model Residuals ................................................. 25

Figure 2.10 ARX Residuals: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%

Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100%

Output Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50%

Output Torque. ........................................................................................................... 26

Figure 2.11 F-test of ARX Model Residuals (1 Indicates Rejection of 0 H ) ............................... 28

X

Figure 3.1 Transition Paths from Good State to Breakdown State............................................... 42

Figure 3.2 (a), (b) Residual Standard Deviations of A02 and A03, and (c) Motor Current Mean51

Figure 3.3 (a) and (b) Residuals of Polynomial Model for A02 and A03 .................................... 52

Figure 3.4 Residual Observations Collected by Accelerometers A02 and A03 ........................... 53

Figure 3.5 EM algorithm Estimated Mean Vectors 1 μ and 2 μ ................................................... 55

Figure 3.6 EM Algorithm Estimated Covariance Parameters 1 Σ and 2 Σ .................................. 57

Figure 3.7 EM Algorithm Estimated States Transition Parameters Λ ........................................ 59

Figure 4.1 Total expected cost at each production stage with breakdown cost 150..................... 84

Figure 4.2 Optimal data collection intervals and control limits with breakdown cost 150 .......... 85

Figure 4.3 Total expected cost at each production stage with breakdown cost 400..................... 86

Figure 4.4 Optimal data collection intervals and control limits with breakdown cost 400 .......... 87

Figure 5.1 Schematic Representation of Planetary Gear Transmission System........................... 91

Figure 5.2 Lab Planetary Gear Train Test Rig.............................................................................. 92

XI

List of Appendices

Appendix A: ARX model ........................................................................................................... 100

Appendix B: F-test...................................................................................................................... 101

1

Chapter 1 Introduction

1.1 Motivation Gears are one of the most efficient and compact mechanical designs to smoothly transmit torques

and change the angular velocities from one shaft to another via gear tooth contact Townsend [16].

The schematic of a gear transmission system is shown in Figure 1.1, which consists of a pinion

gear (the smaller gear) and a driven gear (the larger gear). The pinion gear has fewer teeth than

the driven gear. Normally, a gear transmission system is designed to reduce the angular velocity

in order to increase the output torque. In such a speed reduction gear transmission system, the

pinion is connected with an input shaft, and the driven gear is connected with an output shaft.

Figure 1.1 Schematic Representation of Parallel Gear Transmission System

Compared with other mechanical transmission systems, gear systems have very compact

structure due to short centre distances between two gears, and can achieve high reduction ratio

and a constant value of velocity ratio. Hence, gear transmission systems can be found in nearly

every rotating machine. Since the gear transmission system is the key mechanical component,

the failure of gear transmission system will directly impact the overall system. To improve gear

2

transmission systems reliability and reduce unplanned failure, it is necessary to apply condition

monitoring and to implement condition-based maintenance (CBM), where CBM is a

maintenance program that recommends maintenance decisions based on the information

collected through condition monitoring.

In industry, most of gears are sealed in a gearbox to reduce the noise due to gear tooth meshing,

to avoid the risk of injury due to the high rotation speed, and to prevent the leakage of lubricate

oil that is used to protect the meshing face of gears. However, the sealed gears make it

impossible to directly monitor and inspect the gears state without stopping the system. Davis [31]

indicated that gear transmission systems could fail in many different ways, and there is often no

indication of difficulty until total failure occurs, except for an increase in noise level and

vibration. Collecting and analyzing vibration data has been very popular for early fault detection,

condition monitoring, and CBM of gear transmission systems. The development of fault

detection and maintenance schemes for such gear system has received considerable attention in

recent years, such as Badaoui et al. [35], Hambaba et al. [3], Hambaba et al. [4], Lin et al. [12],

Miao & Makis [42], Qu & Lin [33], Tan & Mba [9], Toutountzakis et al. [50], Wang et al. [54],

Wang et al. [57], and Liu & Makis [6], mostly considering constant load. However, in real

production situations, most gear transmission system operate under varying load which affects

the vibration signature of the system and in general makes it difficult to recognize the occurrence

of an incipient fault. Only a few studies dealing with this problem have been found, such as

Zhan et al. [63], Zhan & Makis [62], and Wang et al. [59].

1.2 Thesis Outline In this study, we focus on the development of an early fault detection scheme and optimized a

CBM strategy based on the condition monitoring data collected from a gear transmission system

3

operating under varying load conditions. Chapter 2 presents an autoregressive model with

exogenous variables ARX-based approach to extract the fault-relating signature from condition

monitoring data. Then F-test is applied to detect and localize the gear early failure. In Chapter 3,

a partially observable hidden Markov (POHM) model is used to describe the deterioration

process of the gear transmission system and the stochastic relationship between condition

monitoring data and the gear deterioration. Using the ARX model residuals, the POHM model

parameters are estimated using expectation-maximization (EM) algorithm. Chapter 4 deals with

an optimal choice of the epoch of monitoring data collection and stopping policy to perform

inspection and possible preventive maintenance based on the POHM model and the ARX model

residuals. The conclusions and future research are summarized in Chapter 5.

Chapter 2

This Chapter presents a novel approach to detect and localize the gear failure for a gear

transmission system operating under varying load conditions. The data used in this research

have been collected from Mechanical Diagnostics Test Bed, which was designed to provide

experimental vibration data for research on diagnostics of a commercial transmission, at the

Applied Research Laboratory of the Pennsylvania State University. The gear transmission

system was driven at a constant input speed using a drive motor, and the variable torque was

applied by absorption motor. In this test, the gearbox contained a 70-tooth driven gear and a 21-

tooth pinion gear. The full lifetime condition monitoring data are collected from the experimental

system operating from a new condition to a breakdown under varying load. The data files were

separated into two groups. The first group data were used to build the model and the second

group data were used to detect the gearbox failure.

4

Vibration data obtained from the gearbox represent a complex combination of information plus

noise produced by the background. Firstly, a time-synchronous averaging (TSA) algorithm is

applied to average out the contributions of both the noise and signals whose periods are not equal

to the shaft rotation period of the gear of interest in a gearbox. Filtered TSA envelopes are then

extracted from the TSA vibration data for each data file. Using the TSA signals as output and

the filtered TSA envelopes as input variable, an ARX model is identified based on Bayesian

information criterion (BIC), and the model parameters are estimated using the least squares

algorithm. The residual signal of an ARX model represents the difference from the average

tooth-meshing vibration in a healthy state and usually shows evidence of faults earlier and more

clearly than the TSA signal. Since the majority of energy in the healthy state of the target gear is

concentrated at the meshing fundamental and its harmonics, the ARX model residuals are much

less sensitive to the load variation and shaft imbalance than the TSA signal.

The variance of the ARX model residuals is further investigated in this research. A statistical

hypothesis test (F-test) is conducted to test the equality of the population variance and sample

variance of the ARX model residuals for each data file. The p-value of the F-test is used as

indicator of fault advancement over a full lifetime of a gear transmission system, which reduces

the complexity of the decision making based on the gear motion residuals. The F-test results can

predict not only when the gear failure occurs, but also where the gear failure is localized.

Chapter 3

In Chapter 3, the deterioration process of the gear transmission system is modeled as a

homogeneous continuous-time Markov chain with three states: good state 1 s , warning state 2 s ,

and breakdown state 3 s . State 1 s , and state 2 s are classified as operational states and 3 s is a

non-operational state. State 1 s is considered as either new gear or gear with minor defects, and

5

it is not necessary to take any maintenance action at this state. State 2 s is worse than 1 s and

preventive maintenance action must be taken immediately to avoid major breakdown. State 3 s

is breakdown state and can be observed directly.

If the gear transmission system breakdown costs are higher than preventive maintenance costs,

which is true in real application, the warning state becomes a critical state in the gear

deteriorating process. It is important to decide when the gear deteriorating process transits into

warning state, so that we can apply preventive maintenance to avoid the considerable breakdown

cost. However, the warning state is unobservable, so the residual information, which is extracted

from condition monitoring data, is used to estimate the epoch when the gear transits into warning

state. The stochastic relationship between the gear real states and residual information is

described by a multivariate normal distribution which depends on the actual system state.

Partially observable hidden Markov (POHM) model can be applied to the system where the

internal states are not known, but can only be inferred from external observations. Because the

maximum likelihood estimation of the POHM model parameters is difficult to apply, the pseudo

likelihood function is introduced to estimate the model parameters. The model parameters are

obtained by maximizing the pseudo likelihood function, and the feasible solution is the result of

the expectation-maximization (EM) iterative algorithm. The POHM model parameters are

calibrated using the real condition monitoring data collected from Mechanical Diagnostics Test

Bed. The model plays an important role in establishing a Bayesian process control model to

optimize the data collection schedule and CBM strategy in chapter 4.

Chapter 4

In this chapter, a multivariate Bayesian control model is presented to optimize an optimal CBM

strategy for the gear transmission system. First, we assume the gear transmission system

6

production run is finite, so the system deteriorating process is considered as a finite process.

Once the length of the production run is achieved, the system will be stop and an inspection will

be applied no matter what maintenance decision is made based on the condition monitoring data.

Before the system breakdown state 3 s is observed, there are two states in the system

deteriorating process: good state 1 s and warning state 2 s . Right after new data are collected, a

posterior probability that the process transits into warning state is calculated. If the posterior

probability is greater than a pre-calculated control limit, the system will be stopped for

inspection. On the other hand, if posterior probability is less than the control limit, this posterior

probability will be applied to compute an epoch of the next data collection and a control limit to

make a inspection decision after next data collection.

In this chapter, we optimize the epoch of each data collection and the inspection schedule by

minimizing the total expected cost using dynamic programming. Dynamic programming is a

well known method of solving the complex optimization problems by using backward recursion

algorithm. However, dynamic programming requires that the state variable and the decision

variable should be in a finite set, so we discretize the posterior probability, sampling interval and

control limit variable. Then, dynamic programming optimality equation is established with the

control limit and the data collection interval as decision variables, and the posterior probability

as a state variable.

1.3 Thesis Contributions The main original contributions of this research are as follows:

1) An ARX model-based approach to extract the gear fault-relating signature of the gear

transmission system operating under varying load condition has been established. The

ARX model residuals dramatically alleviate the impact of the gear meshing signature and

7

varying load signature, and show evidence of gear failure occurrence earlier and more

clearly than the original vibration data.

2) F-test as a quantitative indicator of gear fault occurrence has been introduced. The test

reduces the complexity of the decision making based on the ARX model residuals. The F-

test results can tell us not only when the gear failure occurs, but also where the gear failure is

localized

3) A POHM model for the gear transmission system based on the expectation-

maximization iterative algorithm has been developed. The numerical iteration procedure

generated a reasonable parameter estimates compared to our results from subjective

judgment. It is confirmed that the POHM model can be applied to estimate gear hidden

states based on the condition monitoring data.

4) A multivariate Bayesian control model for optimizing the gear transmission system

CBM strategy has been developed. When new data are collected, the Bayes theorem is

applied to get updated posterior probability, which is used to recalculate the epoch of next

data collection and make an on-line inspection decision by minimizing total expected cost.

8

Chapter 2 ARX Model-Based Gearbox Fault Detection and Localization

Operating under Varying Load

2.1 Introduction Gears are the most efficient and compact devices used to transmit torques and change the angular

velocities. They are widely applied in many machines, such as mining machines, automobiles,

helicopters, and aircraft turbine engines. Gearbox vibration data carries a lot of useful

information and it has been very popular for condition monitoring and early fault detection of

gearboxes. The condition monitoring and fault detection schemes improve gear transmission

systems reliability and reduce their failure occurrence. The development of fault detection and

diagnostic schemes for gear transmission systems has been an active area of research in recent

years, due to the need for many manufacturing companies to reduce unplanned production

capacity loss caused by gear transmission systems failure and to improve equipment reliability

through condition monitoring and failure prevention. Vibration data are mostly collected by

accelerometers mounted on the shell of a gearbox. However, the accelerometers collect not only

gear vibration signals, but also the vibration signals associated with all transmission components

(such as rotating shafts, bearings, and motor rotation), which makes it very difficult to recognize

the occurrence of an incipient fault.

In recent articles, advanced non-parametric approaches have been considered such as wavelets in

Wang et al. [17] and Wang & McFadden [56], short-time Fourier transform in Wang &

McFadden [55], and multiresolution Fourier transform in Kar & Mohanty [8]. Some papers have

applied parametric time-series models for gearbox fault detection, mostly assuming constant load.

In Kay [46], the author showed that time series models are appropriate for representing spectra

with sharp peaks (but not deep valleys) and are particularly useful for modeling sinusoidal data,

9

which is the case of the gear meshing signals. For example, in Wang & Wong [53], an

autoregressive (AR) model was considered for gear fault detection fitted to the vibration signal

of the gear of interest in its healthy state. The model was used as a linear prediction error filter

to process the future state signal from the same gear. The gear state was assessed using the error

signal obtained as the difference between the filtered and unfiltered signals. Zhan & Makis [62]

presented an adaptive Kalman filter-based AR model with varying coefficients fitted to the gear

motion residual signal in the healthy state of the gear of interest considering several load

conditions. The compromised model order was determined using several criteria. Liu & Makis

[6] proposed a gear failure diagnosis method based on vector autoregressive (VAR) modeling of

the vibration signals, dimensionality reduction applying dynamic principal component analysis

and condition monitoring using a multivariate Q control chart. Rofe [49] considered an

autoregressive moving-average (ARMA) model as a prediction error filter to detect gear fault

accounting for load variation, and the variance and kurtosis of ARMA model residuals were used

to indicate the presence or absence of a fault. However, different models were fitted for different

loads, i.e., each model was built under the assumption that the load was a constant.

If the load is assumed to be a constant, the vibration signals caused by a fluctuating load are not

interpreted correctly. The fluctuating load condition dramatically affects the vibration signature

of the system, so that the model developed under the constant load assumption cannot recognize

whether the vibration signature changes are caused by the load variation or by a failure

occurrence. It should be noted that most real production systems operate under varying load.

The previous AR, ARMA and VAR models which were fitted to vibration data did not consider

load variation. The proper approach would be to consider load as additional information in a

time series model, so that an autoregressive model with an exogenous input (ARX) or a

10

multivariate vector ARX model would be the appropriate choice. To our knowledge, this novel

approach has not been considered in the vibration literature.

In this section, an ARX model is proposed to consider varying load as the input, so that the

model and the developed fault detection scheme can be used in real situations. The section is

organized as follows. Subsection 2.2 briefly describes the experimental gear rig and the data

acquisition system used in this study. In Subsection 2.3, a time-synchronous averaging (TSA)

algorithm is applied to average out the contributions of both the noise and signals whose periods

are not equal to the shaft rotation period of the gear of interest in a gearbox. Filtered TSA

envelopes are then extracted from the TSA vibration data for each data file under varying load

condition. In Subsection 2.4, using the TSA signals as output and the filtered TSA envelopes as

input variable, an ARX model is identified and fitted to the data. In Subsection 2.5, F -test

applied to the ARX model residuals is performed and the corresponding p-value is proposed as a

quantitative indicator of fault advancement over a full lifetime of a gearbox. The conclusions are

summarized in Subsection 2.6.

2.2 Experimental set-up and data collection The vibration data used in this research have been obtained from the Applied Research

Laboratory of the Pennsylvania State University. The data were collected using the Mechanical

Diagnostics Test Bed (as shown in Figure 2.1), which was designed to provide experimental

vibration data for research on diagnostics of a commercial transmission. The gearbox was driven

at a set input speed using a 22.38KW, 1750 rpm drive motor, and the torque was applied by a

55.95KW absorption motor (MDTB [38]). A vector unit capable of controlling the current

output of the absorption motor accomplished the variation of the torque. The MDTB is highly

efficient because the electrical power that is generated by the absorber is fed back to the driver

11

motor. The mechanical and electrical losses are sustained by a small fraction of wall power. The

MDTB has the capability of testing single and double reduction industrial gearboxes with ratios

from about 1.2:1 to 6:1. The gearboxes are nominally in the 3.675~14.7 kW range. The system is

defined to provide the maximum versatility to speed and torque settings.

Figure 2.1 Mechanical Diagnostic Test Rig

The vibration data of test-run #14 is used in this study. In this test, the gearbox contained a 70-

tooth driven gear and a 21-tooth pinion gear. The nominal output of the gearbox was 62.66 Nm.

The vibration data were collected by accelerometer A02 and A03 mounted on the shell of the

gearbox as shown in the following figure.

Figure 2.2 Locations of Accelerometers

12

Each data file obtained from accelerometer A02 includes 200,000 samples, which were collected

in 10 second windows with 20 kHz sampling frequency. The resolution of analog-to-digital

converter was 16-bit, which assured that the accuracy of the accelerometers was preserved. The

motor velocity was acquired using the drive motor speed vector feedback V01. The gearbox

output load was acquired from the absorption motor torque vector feedback V05. The sampling

frequencies of V01 and V05 were 1 kHz. After the vibration data collection, V01 and V05

started collecting load data. In each data file, there were 10,000 samples, which were collected

during 10 second periods with 1kHz sampling frequency. Hence, the vibration data were not

synchronized with the V01 and V05 data. The whole test took 116 hours, and there were 338

data files in total. The gearbox was run at 100% (62.66 Nm) output torque for 96 hours. Then,

the gearbox was run under varying load from file 194 to 338. The output load ramped up from

50% (31.33 Nm) to 300% (187.98 Nm) in 8 minutes, and then ramped down back to 50% again

in 8 minutes. Figure 2.3 shows the mean of the drive motor speed and gearbox output torque.

Figure 2.3 (a) Gearbox Output Torque and (b) Gearbox Input Speed

13

The gearbox input speed fluctuated in the range less than 0.06%, so the speed can be consider as

a constant. After 20 hours, the test-run #14 was shut down with five fully broken and two

partially broken teeth on the driven gear which occurred as shown in the following figure.

Figure 2.4 Broken Teeth in Test Run #14

In this research, we considered the data files obtained under varying load condition, so the files

from 194 to 338 were analyzed except file 212 that was reported unreliable in MDTB [38] due to

some accelerometer problems. All the data files were separated into two groups. The files 194 -

246 were used to build the ARX model and the files 247 - 338 were used to detect the gearbox

failure.

We chose six data files to demonstrate our procedures. The file 283 was selected, since it was

the first file showing that the gear fault occurred. The results obtained for file 283 were

compared with the result obtained for file 254, both of these files were collected when the

gearbox was ramped up to 300% output load. Also, we chose files 259 and 288 for a comparison,

since for both of them the output load was ramped down to 100%. When the gearbox runs at a

low output load condition, the detection of gear failure is much more difficult than when the

gearbox runs at a high output load condition. For the low load condition, we compared the

14

results using the data files 261 and 290, both were collected at 50% output torque. Based on the

inspection performed after gearbox failure, the data files 254, 259, and 261 were collected when

the gearbox was in healthy state, and files 283, 288, and 290 were collected when the gearbox

was in failure state.

2.3 Time-Synchronous Averaging Algorithm and Filtered Envelope Extraction

The filtered TSA envelope is composed of the information of load variation and gear shaft

vibration due to shaft imbalance. This envelope is considered as an input for the ARX model. A

method how to extract the filtered TSA envelope from the collected vibration data is presented in

this section. First, the TSA signal is obtained and then the envelope carrying load information is

extracted from the TSA signal. Finally, a low pass filter is applied to remove the gear fault

signature from the TSA envelope.

Vibration data obtained from the gearbox represent a complex combination of information plus

noise produced by the background. Time-domain Synchronous Averaging Algorithm (TSA)

providing an average time signal of one individual gear over a large number of cycles has been

commonly used in the detection of gear faults, see e.g. McFadden [39], [40], and Dalpiaz et al.

[22]. TSA reduces the contributions of both noise and signals whose periods are not equal to the

given gear shaft rotation period by synchronizing the sampling of the vibration signal with the

shaft rotation of a particular gear and evaluating the ensemble average over many revolutions

with the start of each revolution at the same angular position. Suppose there are n data points in

each vibration data file collected from a gearbox operating under a constant speed. )(kV ,

nk ,...,2,1= , is used to denote this discrete time vibration data. Since the gearbox is running

15

under a constant speed, the number of sampling points which correspond to one complete

revolution of the gear of interest is described as

⎥⎥

⎤⎢⎢

⎡=

t m

s

/NffK (2.1)

where s f is the sampling frequency, mf is the fundamental meshing frequency of the gear of

interest, tN is the number of teeth of the given gear, and ⎡ ⎤ • is the ceil function returning the

closest higher integer. The number of cycles to be averaged can be obtained from

⎣ ⎦ 0 / KNM = , where N is total number of sampling points in each data file and ⎣ ⎦• is the

floor function returning the closest lower integer. In the time domain, the conventional TSA

signal of )(kV is defined as

KkiKkM

k M

i,...,2,1 ,)(1)(

1

00

TSA 0

=+= ∑−

=

VV (2.2)

If the gearbox operates under various speeds, we will re-sample the original vibration data. For

example, if the input speed increases, we decrease the re-sampling frequency; if the input speed

decreases, we increase the re-sampling frequency. In this way, the number of data points which

correspond to one complete revolution of the gear becomes a constant, so the Equation (2.2) can

be applied to calculate the TSA signals.

In next section, the TSA signal is going to be processed by an ARX model with an optimal

model order b , s and r , which requires that the number of TSA signals is cK + , where

( )sbrc += , max , to obtain K residuals corresponding to a complete revolution of the gear of

interest. Thus, the equation to calculate the TSA signal is modified as follows:

16

cKkiKkM

kM

i+=+= ∑

−

=

,...,2,1 ,)(1)( 1

0

TSA VV (2.3)

where ⎣ ⎦ ⎡ ⎤ -/ c/KKNM = .

The TSA signals contain not only gear fault signatures, but also gear meshing frequencies, gear

shaft imbalance signals, and load variation signatures. An ARX model is introduced to partially

filter out the gear meshing frequencies, shaft imbalance signals, and load variation signatures

from the TSA signals using a filtered TSA envelope as an external input. We will now present a

method how to extract a filtered TSA envelope. First, we localize peaks of the meshing signal of

each gear. The upper envelope is made up of all the local maximum values. We filter out the

peaks caused by the local gear fault, and fit a piecewise linear function ul to the modified upper

envelope. Using the same method, the other piecewise function dl is fitted to the lower

envelope using the local minimum value of each gear meshing signal. The estimated load

signals )(ˆ kl for each data file are given as:

( ) ( )2

)(ˆ klklkl du −= (2.4)

When a local gear fault occurs, a peak may be found in the estimated load signals and this kind

of peak can be considered as a impulse component when compared with estimated load signals

)(ˆ kl . The filtered TSA envelope )( kx is extracted from the estimated load signals )(ˆ kl

by applying a is applied to average out the impact this impulse component caused by the local

gear fault from. So, the filtered TSA envelope )( kx contain the information related to

varying load and shaft imbalance, and this information is not affected by local gear faults.

17

Then, we applied previously described TSA technique to the vibration data collected by

accelerometer A02. Each data file contained the data collected during 10 seconds. As shown in

Figure 2.5, the horizontal axis is the time and the vertical axis is the amplitude of the acceleration.

Figure 2.5 Vibration Data: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%

Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100% Output

Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50% Output Torque.

Based on these figures, it is difficult to obtain any information of gear state, so we apply the TSA

algorithm and extract filtered TSA envelopes, which are shown in the following figures

18

Figure 2.6 TSA Signals (Dotted Line) and Filtered TSA Envelop (Solid Line): (a) Healthy State

300% Output Torque, (b) Faulty State at 300% Output Torque; (c) Healthy State at 100% Output

Torque, (d) Faulty State at 100% Output Torque; (e) Healthy State at 50% Output Torque, (f)

Faulty State at 50% Output Torque.

2.4 ARX Model using Gear Vibration Data under Varying Load In this study, an ARX model is utilized to detect and localize gear failure for a gearbox operating

under varying load. The residual signal of an ARX model represents the difference from the

average tooth-meshing vibration in a healthy state and usually shows evidence of faults earlier

and more clearly than the TSA signal. Since the majority of energy in the healthy state of the

target gear is concentrated at the meshing fundamental and its harmonics, the ARX model

residual signal will be much less sensitive to the load variation and shaft imbalance than the TSA

signal. In the next section, we present an approach to identify the model order and to estimate

parameters of an ARX model and show numerical results using real data.

19

2.4.1 ARX Model based on the Modified TSA Signal and Filtered TSA Envelope

The TSA signal TSA V allows signal frequencies, which are synchronous with the meshing

frequency of interest, to be isolated from all other signals. However, TSA V still contains a large

numbers of components which are not related to the gear fault signature, such as the gear

meshing signal g V , the load variation signature, and the gear shaft vibration signal caused by

shaft geometric asymmetry and assembly errors (Rofe [49]). In gear fault detection and

diagnosis, it is necessary to isolate these signals from the TSA signal to obtain a residual signal

that is more efficiently indicative of the gear fault.

For healthy gears, the gear meshing signal is modulated by some low shaft order (first and/or

second order) functions. The spectrum of the gear meshing signal is therefore dominated by the

sharp spectral peaks at the fundamental gear meshing frequency and its harmonics accompanied

by some low-order modulation sidebands. The TSA signal of a gear meshing signal can be

expressed by the following equation (McFadden [40]):

[ ] bKkkβkfkAH

kH

h h h h h h +=+++= ∑

=

,...,2,1 , ) ( 2 cos ) (1 1)(0

g θaV π (2.5)

where Hh , ... , 1 , 0= is the meshing harmonic number, hA is the amplitude at the th- h

harmonic frequency (i.e., s fhf h ×= ), ) ( k ha is the amplitude modulation function, hβ is

the initial phase, and ) ( k hθ is the phase modulation function at the th- h harmonic.

Kay [46] proved that the AR model is appropriate to represent spectra of gear meshing signals

with sharp peaks. For a gearbox operating under constant load and speed, and the gear shaft

imbalances ignored, Wang & Wong [53] successfully applied an AR model to represent gear

meshing signal. The AR model is given by the following equation:

20

( ) )()()( g g g kkk r nVBδV += (2.6)

where

( ) rrr

1 BBBδ δδ L−= (2.7)

B denotes the backshift operator and g n are the AR models residuals.

When the load variation and shaft imbalance are present, the filtered TSA envelope )( kx can

be used to represent the combination of the signature caused by varying load and shaft imbalance.

Then, the TSA signal TSA V is described as an ARX model with the filtered TSA envelope as an

exogenous variable by the following equation:

( ) )( )( )()( g TSA kkkk p nxBwVV ++= (2.8)

where

( ) ppp www

1 0 BBBw L++= (2.9)

and )( kn is an error term. More details about ARX model are introduced in Appendix A.

Submitting Equation (2.6) into Equation (2.8), we can represent the TSA signal as

( ) ( )

( ) ( )( ) ( ) ( ) ( ) ( ) ( )

( ) ( )[ ]( ) ( )[ ] ( )[ ]

( ) ( ) ( )[ ] ( )[ ] )()( 1)( 1 )(

)()( 1)( 1

)()( )(

)( -)( -)( )(

)( )( )()(

)( )( )()()(

g TSA

g

g g

g g

g g

g g TSA

kkkk

kkk

kkk

kkkk

kkkk

kkkkk

rrpr

rrp

pr

rprrpr

pr

pr

nnBδxBδBwVBδ

nnBδxBδBw

nxBwVBδ

nBδxBwBδnBδxBwBδ

nxBwnVBδ

nxBwnVBδV

+−+−+=

+−+−+

++=

++

+++=

+++=

(2.10)

21

In order to rewrite the above equation as an ARX model, letting ( ) ( ) ( )[ ]BδBwBBω rpb

s

1 −=

and ( )[ ] )()(1)( g kkk r nnBδN +−= , we have an ARX model described by the following

equation:

( ) ( ) )( )( )()(

TSA

TSA kkkk bsr NxBBωVBδV ++= (2.11)

where )( kN is the noise term approximated by the ARX model residuals. For fault detection,

only the information embedded in residuals is of interest.

2.4.2 ARX Model Order Identification and Parameters Estimation

In this section, we show how to identify an ARX model order and estimate the model parameters.

The structure of an ARX model is represented by the equation:

( ) ( ) )( )( )( )( kkkk bsr NxBBωyBδy ++= (2.12)

where

( ) sss

1 0 BBB ωωωω +++= L (2.13)

( ) rrr

1 BBB δδδ ++= L (2.14)

y is the TSA vibration signal )(TSA kV , x is the ARX model exogenous input representing

the filtered TSA envelop, and )( kN is assumed as a Gaussian white noise with zero mean and

variance 2 Nσ when the gearbox is in a healthy state.

In order to test the normality assumption of )( kN , we apply the Jarque-Bera test (Jarque &

Bera [10] and [11]) that is a goodness-of-fit measure of departure from normality based on the

sample kurtosis JK and skewness JS with sample size Jn . The test statistic J is defined as

22

( )⎥⎦

⎤⎢⎣

⎡ −+=

43

6

2 2

J

JJ KSnJ (2.15)

If the sample size Jn is large (e.g. 3 >Jn , preferably 100 >Jn ), J is chi-square distribution

with two degrees of freedom. Based on the Jarque-Bera test with significant level 0.01, each

ARX residual corresponding to from data files 194 to 246 is normal distribution except data file

210. Then we calculate the autocorrelation of all ARX model residual from data files 194 to 246,

and the Figure 2.9 shows that the uncorrelated assumption about )( kN is reasonable.

For given model order parameters b , s and r , the parameters of the ARX model are estimated

using the least squares algorithm (Ljung [32]):

)( )( )( )( ˆ1

1-

1∑∑==

⎥⎦

⎤⎢⎣

⎡=

m

k

m

k

T m kkkk yθ ϕϕϕ (2.16)

where m is the number of data values, mθ is the estimated ARX model parameter vector

[ ]T

sr

0 1 ˆˆˆˆˆ ωωδδ KK=θ (2.17)

and ϕ is the input and output series vector used to build the ARX model,

[ ]T , )(, , )(, )(, , )1()( sbkbkrkkk −−−−−−−= xxyy KKϕ (2.18)

The model order parameters b , s and r are determined using Bayesian information criterion

(BIC) (Schwarz [24])

mmdσ

Nrsb) ln( ) ˆ (ln BIC 2

, , += (2.19)

23

where 2ˆ Nσ is the maximum likelihood estimate of 2

Nσ , and 1+++= rsbd is the number of

estimated parameters in the model. BIC imposes a greater penalty for the number of estimated

model parameters than does the Akaike information criterion (Akaike [27]):

constant 2 ) ˆ (ln AIC 2 , , ++=

mdσ

Nrsb (2.20)

Once the ARX model is identified and all its parameters are estimated, the ARX model residuals

can be calculated using the following equation:

( )[ ] ( ) )( ˆ-)(ˆ1)( ˆ kkk bsr xBB yBN ωδ−= (2.21)

2.4.3 Numerical Results

In this study, the filtered TSA envelope and TSA data TSA V from files 194 to 246 (shown in

Figure 2.7) are used to fit an ARX model. Comparing Figure 2.7 with Figure 2.3, we can find

out that the filtered TSA envelope contains the components caused not only by varying load but

also by the gear shaft imbalance.

Figure 2.7 TSA Data for ARX Model Building: (a) Filtered TSA Envelopes used as an ARX

Model Input (b) TSA Vibration Data used as an ARX Model Output

24

As mentioned above, least squares algorithm is applied to estimate the ARX model parameters.

The order of the model is identified by BIC approach that was introduced in Section 2.4.2.

Figure 2.8 shows that the minimum value of BIC for our ARX model is obtained for 0=b , 3=s

and 66=r .

Figure 2.8 ARX Model Order Selection Using BIC Criterion

The parameters of this ARX model ( ) ( ) )( )( )1( )( 3 66 kkkk NxBωyBδ y ++−= were obtained as

follows.

( ) 2 3 5991.0674.1082.1 BBBω +−= (2.22)

and

25

( )

66 65 64 63 62 61

60 59 58 57 56 55

54 53 52 51 50 49

48 47 46 45 44 43

42 41 40 39 38 37

36 35 34 33 32 31

30 29 28 27 26 25

24 23 22 21 20 19

18 17 16 15 14 13

12 11 10 9 8 7

6 5 4 3 2 66

0.196-0.23730.0418-0.04786 0.01407 - 0.05247

0.03023-0.01164-0.00834-0.02966 0.03620.007412

0.06028 0.119 - 0.02916 0.02914 - 0.0716 0.03102-

0.024 0.02912 0.02091 0.02584 0.04177 0.01523-

0.05132 0.02264 0.02903 - 0.123 0.08044 - 0.02661

0.1782 - 0.2217 0.2306 - 0.01246 0.1178 0.03604

0.012860.001733 0.05813-0.0011250.10340.1493 -

0.09999 0.06914 0.2211 - 0.1128 0.1622 0.136 -

0.0005206-0.0126 0.03125-0.04267 0.1695-0.1699

0.1243 - 0.01803 - 0.06916 0.1159- 0.08425 0.011 -

0.11690.008993 0.12930.29651.01-1.598

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

Bδ

+++

+−+

+++

+++−+

+−++

++++

+−++

++++

+++

++

−+++=

(2.23)

The residual checking described in Box et al. [19] is used to examine whether the ARX model is

correct. Figure 2.9 shows the autocorrelations of the ARX model residuals )( kN using the

data from files194 to 246. All autocorrelations are very close to zero and do not show any

marked correlation pattern. Hence, the residual checking indicates that the model is adequate

and correct.

Figure 2.9 Autocorrelation Function for ARX Model Residuals

26

Now, we compute the ARX residuals for each data file from 194 to 338.

Figure 2.10 shows the ARX residuals for data files 254, 283, 259, 288, 261, and 290. The gear

failure still cannot be easily detected using the ARX residuals, so the F-test is introduced to

analyze the ARX model residuals. The results of the F-test can tell us not only when the gear

failure occurs, but also where the gear failure is localized.

Figure 2.10 ARX Residuals: (a) Healthy State 300% Output Torque, (b) Faulty State at 300%

Output Torque; (c) Healthy State at 100% Output Torque, (d) Faulty State at 100% Output

Torque; (e) Healthy State at 50% Output Torque, (f) Faulty State at 50% Output Torque.

2.5 Hypothesis testing The variance of residuals shows a strong relationship with the state of each gear tooth, so

variance of the ARX model residuals is investigated in this research. After the ARX model

residuals are obtained, an F-test is applied to the variance of the residuals. More details about F-

test are given in Appendix B. The F-test result is then used as indicator to decide whether the

27

gear default occurred or not, which reduces the complexity of the decision making based on the

gear motion residuals.

First, the 70-tooth driven gear is divided into 10 sections so that each section contains 7 adjacent

teeth corresponding to 36 degrees out of the total of 360 degrees of one shaft rotation. The ARX

model residuals are uncorrelated and independent when the teeth are in the healthy-state. Hence,

the model residuals would be normally distributed with zero mean. In the healthy state, it is

reasonable to assume that all the gears are identical, so we denote the population variance of the

ARX residuals of each data file as 2 σ , the sample variance of the sn residuals in each section as

2 s S , and the sample variance of the nn residuals of the other 63 teeth as 2

nS . Then, F-statistic is

utilized to test the equality of the population variance 2 sσ and the no section variance 2

nσ .

Hence, we test the hypothesis

221

220

n

s

n

s

: H

: H

σσ

σσ

>

= (2.24)

using the following F-statistic

,2

2

, n

s

SSF

ns=νν (2.25)

where 1 s −= snν and 1 −= nn nν are degrees of freedom. Since we have a one sided test, the

rejection of the null hypothesis 0 H means that the variance 2 sσ is greater than 2

nσ . The

residuals of the selected section are more variable than the residuals of other sections, which

indicate a gear fault occurrence in this section.

Figure 2.11 shows that there is no rejection of the null hypothesis 0 H for gear teeth 1~35 and

57~70. In the section containing gear teeth 43~49, the 0 H is rejected starting from the data file

28

283, which indicates that the gear fault occurred. Also, for the sections of 36~42 and 50~56, the

hypothesis 0 H is rejected starting from the data files 335 and 338, respectively.

Figure 2.11 F-test of ARX Model Residuals (1 Indicates Rejection of 0 H )

2.6 Conclusions In this section, a gearbox fault detection scheme was proposed using vibration data collected

under varying load condition. A modified TSA algorithm was introduced to compute a TSA

signal for an ARX model building. An ARX model takes into account the effect of the

exogenous input variables on the output variables. In this research, the ARX model was fitted

using the filtered TSA envelope as an exogenous input and the TSA signal as the output. In each

data file, the TSA signal includes about 2500 data points collected in 10 sec. The model order is

66 and the load varying information is included in the TSA envelopes. The parameters of the

ARX model were estimated using the least squares algorithm, and the order of the ARX model

29

was determined using BIC criterion. Then, the F -test was applied to the variances of the ARX

model residuals for each data file, and the corresponding p-value was proposed as a quantitative

indicator of fault advancement over a full lifetime of a gearbox. The method presented in this

section was compared with the method in Zhan & Makis [61]. The data from the test-run # 14

was analyzed in both studies. The method presented in this chapter can detect a gear fault

occurrence earlier than their method. As shown in Figure 2.11, the gear defects started from

section of 43~59 and spread to the adjacent sections later. At the end of the testing, the gearbox

was checked and seven teeth were broken and spread in 3 adjacent sections, which gives

evidence that the ARX model can localize the gear defects, estimate the severity of the gear

defects and describe the gear deteriorating process. Also, our method is much faster and hence,

very suitable for on-line gearbox fault detection and damage localization. The results obtained in

this chapter are very promising and more research should be done in this area.

30

Chapter 3 POHM Model for Monitoring Gear Deterioration Condition

3.1 Introduction In Chapter 2, an early gear fault detection scheme was designed and validated using real

vibration data. The state of the gear in early detection can be classified into one of two states: 1)

there is no gear defect; 2) there exist some minor defects. The F-test statistical method was

applied to infer the state of the gear. The result of the F-test can be 0 or 1. If the result is 0, we

can infer that there is no gear defect. If the result is 1, we can conclude that gear defect has

started to develop; the test system should be stopped and inspected. If there was no gear defect

on inspection, we concluded the F-test gave a false alarm. If there existed some minor defects

on inspection, then maintenance action must be taken immediately to avoid a major breakdown.

Based on the above description, the states of the gear system cannot be observed directly. The

actual states are hidden and unobservable. In situations other than complete system breakdown,

the only way to observe the actual state is to stop and inspect the system. In many situations, the

cost of inspection is very high. The ability to model the deterioration of the gear transmission

system will allow for optimization of the schedule for performing inspection and preventive

maintenance. Furthermore, the model should consider both the unobservable states, also called

hidden states, as well as the observable states.

In this chapter, a partially observable hidden Markov (POHM) model is used to model the

deterioration process of the gear transmission system, and the stochastic relationship between the

condition monitoring data and the system states. POHM model considers both unobservable and

observable states, and it is a generalization of the Hidden Markov model. Hidden Markov model

considers only the unobservable process, and is an extension of a Markov model where the

31

observations are probabilistic functions of the states. A hidden Markov chain is a random

process that involves a finite number of states. These states are linked by possible transitions,

and they are stochastically related to an observation process. The state transition is only

dependent on the current state and not on the past states. The actual sequence of states is not

observable. Hidden Markov models have been successfully used in the research area of speech

recognition (e.g. Lawrence [44]), and later also in the area of system diagnostics and fault

detection （e.g. Daidone et al. [1], Ge et al. [36], LeGland & Mevel [20] Nelwamondo et al.

[21], Bunks et al. [7], Miao & Makis [42], and Ocak & Loparo [30]）.

POHM models have been used in various maintenance applications. In Jiang & Makis [58], a

POHM model was proposed to describe a partly observable deteriorating system. The state of

the POHM model evolves as a continuous-time homogeneous Markov process with a finite state

space. With the exception of the breakdown state, the system states are not observable.

However the unobservable states can be estimated using the condition monitoring data. Lin &

Makis [13] considered a filtering and parameter estimation problem for a POHM process with

discrete range observations. In Lin & Makis [14], a general recursive filter for the POHM model

was presented with continuous range observations. Both papers derived recursive formulas for

the state estimation of POHM models. Lin & Makis [15] provided two on-line parameter

estimation algorithms, a recursive maximum likelihood algorithm and a recursive prediction

error algorithm. Kim & Makis [37] derived explicit formulas for the POHM model parameter

estimation by maximizing pseudo likelihood function using the expectation-maximization (EM)

iterative algorithm.

In the following sections, we demonstrate how a POHM model is developed using the full

lifetime condition monitoring data for a gear transmission system. The data were collected from

32

the MDTB which operated from new condition to a failure. The condition monitoring data

includes the motor current data acquired by sensor V04, and the vibration data acquired by

accelerometers A02 and A03. The gear deterioration process is first considered as a continuous-

time homogeneous Markov process with three states: good, warning, and breakdown. The good

state and warning state are considered to be unobservable, and the breakdown state is considered

to be observable. The stochastic relationship between the gear unobservable states and the

residual observations are described by multivariate normal distributions. The residual

observations are obtained from the condition monitoring data by the following steps: first,

following the method introduced in Chapter 2, ARX model residuals are calculated from

vibration data collected by accelerometers A02 and A03, second, we compute the variance of the

ARX residuals for each file and the mean of the motor current data of each data file when the

gear system is in good state (data files 194-246). Then a polynomial model is built to establish

the relationship between the variance of ARX residuals and the mean of motor current data.

Residual observation is the difference between the variance of the ARX residuals and the

estimated variance by the polynomial model using the mean of motor current data as input.

Later the residual observation extracted from each data file is applied to estimate the POHM

model parameters.

The POHM model describes this gear deterioration process where some internal states of a gear

transmission system are unknown and can only be inferred from external observations (section

3.2.). The model parameters are estimated by maximizing pseudo likelihood function using the

EM iterative algorithm in section 3.3. Section 3.4 presents a polynomial model to extract

residual observation from the condition monitoring data. Additionally, a numerical method is

implemented to estimate the POHM model parameters using the motor current data acquired by

V04, and vibration data collected by A02 and A03.

33

3.2 Model Formulation of Gear Deterioration Process

In this study, a gear deterioration process was classified to be either in good state ( 1 s ), warning

state ( 2 s ), or breakdown state ( 3 s ). 1 s and 2 s are classified as operational states and 2 s is

worse than 1 s . 3 s is a non-operational state and is worse than the operational states.

State 1 s can be considered as either a new gearbox or a gearbox with normal wearing. It is not

necessary to take any maintenance action for 1 s . For 2 s , preventive maintenance action must be

taken immediately to avoid a major breakdown. For example, when small fatigue cracks are

initiated or one tooth breaks, this minor gear defect will rapidly propagate to the adjacent teeth of

this gear and the teeth of its meshing gears if no repair or replacement is applied immediately.

The gear defects will also spread to the rest of mechanical parts, such as the gear shafts or

bearings, as the number of the broken teeth increases. When the gearbox stops working properly,

the system is considered to have broken down and to be in 3 s . In the breakdown state,

considerable maintenance costs will be incurred.

3.2.1 Model Assumptions

The following are model assumptions for the deterioration process of a gear transmission system:

1) The deterioration process cannot enhance its condition except by replacement or repair. For

example, it is impossible for a damaged gear to fix itself without applying external

maintenance. Hence, it is impossible for the gear state transiting from a worse state to a

better state, such as 0 1 2 =→ ssPr , 0 2 3 =→ ssPr and 0 1 3 =→ ssPr .

2) The time spent in each state (sojourn time) is a random variable denoted as isT , and it is

assumed to follow an exponential distribution with mean 2 ,1 ,/1 =iv i .

34

3) During the operation period of a gearbox, the operational states ( 1 s and 2 s ) are normally

unobservable except through an inspection. However, the condition monitoring data contain

the information that can be used to infer the state of an operating gear transmission system.

For example, small fatigue cracks or a broken tooth will affect signature of the vibration data.

These vibration signature changes can be applied to infer about the gear operational states.

4) Breakdown state 3 s can be observed directly. The cost of 3 s is higher than that of the

preventive maintenance cost for warning state 2 s . If this assumption is not satisfied, it

would be more economical to replace the gear transmission system after it breaks down

instead of applying preventive maintenance.

3.2.2 Continuous-Time Markov Model for Gear Deterioration Process

Based on the model assumptions, the gear deterioration process ( ) tt X +∈= RX : evolves as

a continuous-time homogeneous Markov process with state space 31 , ≤≤= isS iX . The

transition probability matrix is expressed as

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) X j i

j i Ssstrtr

trtrtrtrtrt 2 22 2

2 11 12 11 1

, ,100

1 0 1

∈⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−

==R (3.1)

where ( ) ( ) ip jp ij stXsttXtr ==+= | )( Pr which is the transition probability that the gear is

in state js at time )( tt p + given that the gear was in is at time pt . Since the sojourn time i

Ts is

assumed to follow an exponential distribution, the transition probability )(tr ij is a function of the

time interval t and is independent of the starting time pt ， that is

35

( ) ( ) ( ) ( ) i j ip jp ij sXstXstXsttXtr =====+= 0||)( PrPr (3.2)

Denoting ) , ,( 3 2 3 1 2 1 λλλ=Λ as transition rate parameters, the instantaneous transition rate

matrix of the continuous-time Markov chain can be written as

( )

X j i Sssq 23 23

13 12 3 112

ij , , 000

0 ∈⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡−

+−== λλ

λλλλQ (3.3)

where ijq is called instantaneous transition rate and it is interpreted as the rate when the process

makes a transition into js from is .

( ) ( )( )

∑≠

→→

−=

∈≠==

==

ij ij ii

X j i i j

t

ij

t ij

qq

Ssst

sXstXttr

q 00 ,

0|lim

)(lim

Pr

(3.4)

For the continuous-time Markov process, the expected sojourn time ( ) is vTi /1E = can also be

represented as the function of the transition rate parameters through the following equations

03 3 3

3 2 2 2 2

3 112 1 1 1

=−==−=

+=−=

qvqvqv

λλλ

(3.5)

where iv is interpreted as the rate at which the process makes a transition from state is .

The embedded discrete-time Markov transition probability is the probability of the process

transiting into js is after it leaves is , and is given by the following

36

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡++

==100100

0

3 1 2 1

3 1

3 1 2 1

2 1

λλλ

λλλ

j ipP (3.6)

where )( 3 1 2 1 2 1 2 1 λλλ += p is the probability that the gear deterioration process ( )tX transits

into the warning state 2 s from 1 s ; )( 3 1 2 1 3 1 3 1 λλλ += p is the probability that the process

transits into the breakdown state 3 s from 1 s . 13 2 = p indicates that the process can only transit

into breakdown state once the process leaves the warning state 2 s . 13 3 = p indicates that the

state 3 s is an absorption state which means that the gear will stay in the breakdown state 3 s if

no repair or replacement is applied. The embedded transition matrix shows that all 0 = j ip

when ji > to satisfy the assumption that the gear deterioration process cannot transit into a

better state from a worse state without applying external maintenance.

Using the expected sojourn time iv /1 and the instantaneous transition rate j iq , Kolmogorov’s

backward differential equations can be derived and solved to find the continuous-time transition

probability matrix ( )tR . Kolmogorov’s backward differential equations (see e.g. Ross [48]) are

given as

0 and ,, allfor , )()()(' ≥∈−= ∑≠

tSsstrvtrqtr X j iijiik

kjikij (3.7)

Solving the above differential equations, the gear states transition matrix can be represented with

respect to the transition parameters 2 1 λ , 3 1 λ , and 3 2 λ as

37

( ) ( )

( )( )( ) ( ) ( )

( ) X j i

t

ttt

j i Ssstre

trtreee

trt 2 2

2 11 13 2 3 1 2 1

2 1

, ,

100

10

1

3 2

3 1 2 1 3 2 3 1 2 1

∈

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

−

−−−+

−

== −

+−−+−

λ

λλλλλ

λλλλ

R (3.8)

In this section, a three-state homogeneous continuous-time Markov model was developed to

describe the deterioration process of the gear transmission system. The process can only be in

one of the two operational states, or in the breakdown state. The two operational states cannot be

directly observed, but inference can be made about their states from the condition monitoring

data.

3.2.3 Stochastic Relationship between Residual Observation and Operational States

A statistical method is proposed for finding the stochastic relationship between gear operational

states and residual observations. In the CBM for a gear transmission system, the warning state is

the critical state for performing preventive maintenance in order to avoid considerable

breakdown costs. The objective of this section is to decide when the deterioration process

transits into the warning state 2 s using the residual observations extracted from the condition

monitoring data. The condition monitoring data are discrete signals collected at time it , so the

residual observations are also discrete signals. The procedure for extracting residual

observations from condition monitoring data will be presented in section 3.4.

The interval between the time of data collection 1 −it and it is denoted as 1 −−= iii tth . The time

to breakdown is ft . If failures occur in the time interval ih (i.e. ifi ttt 1 <<− ), the system enters

a breakdown state and considerable breakdown costs will be incurred. If no failure occurs in the

38

time interval ih , (i.e. fi tt < ), the i -th condition monitoring data will be collected. The i -th

residual observation contains the information that will be used to infer the unobservable state of

the gear system.

The residual observation that is extracted from the condition monitoring data collected at time it

is denoted ( )it y . For vibration data collected by q accelerometers, the residual observation i Y

is the q dimensional vector

( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y (3.9)

If the gear deterioration process is in good state 1 s , the variation in the residual observation is

only caused by the background noise. ( )it y follows a multivariate normal distribution with

probability density function given by

( ) ( )( )( )

( )( ) ( )( )⎥⎦⎤

⎢⎣⎡ −−−== 1

1- 1

T1 2/1

1 2 1 | 2

1 exp 2

1 | μyΣμyΣ

y ii/q iiXY ttstXtfπ

(3.10)

where mean vector ( )( ) ( )( ) ( )( ) ( )( )[ ]T 2 1 1 EEEE iqiii tytytyt L== yμ . ( )• E is the expected

value of a random variable, the covariance matrix ( )( ) ( )( ) ( )( )[ ]T1 1 1 --Ecov μyμyyΣ iii ttt == is

a symmetric matrix, and T A denotes the transpose of the matrix A .

If the gear deterioration process transits into warning state 2 s in the time interval ih , the

variation in the data will also contain the signatures caused by the minor gear defects. The

residual observation i y will follow a multivariate normal distribution with covariance matrix

2 Σ and mean vector 2 μ

39

( ) ( )( )( )

( )( ) ( )( )⎥⎦⎤

⎢⎣⎡ −−−== 2

1- 2

T2 2/1

2 2 2 | 2

1 exp 2

1 | μyΣμyΣ

y ii/q iiXY ttstXtfπ

(3.11)

The above two equations give us the probability density functions of the residual observations

given the state of the deterioration process at time it . However, the operational states are

unobservable, so a POHM model will be constructed to infer about the probability of the

operational states at time it from the residual observations.

3.2.4 POHM model

POHM model is a Markovian-based model where some sates cannot be observed directly, and

the observations are probabilistic functions of the unobservable states. A POHM model for the

gear deterioration process is characterized by three components defined as

, , πFR Ω = (3.12)

The first component R is the continuous-time homogenous Markov process transition

probability distribution which was defined by Equation (3.8) in section 3.2.2. The second

component F is the observation probability distribution which was defined by Equations (3.10)

and (3.11) in section 3.2.3. The last component iπ =π is the initial distribution given by

( ) 1,2,3 i , i 0 === stXπ i Pr (3.13)

which is interpreted as the probability of is being the initial state. Since the deterioration

process starts in a good state at time 00 =t , the probabilities of is being the initial state are

40

( ) ( ) ( ) 0

0 1

3 0 3

2 0 2

1 0 1

=========

stXπstXπstXπ

PrPrPr

(3.14)

There are two sets of unknown parameters in the POHM model: the relation parameters

) , , , ( 2 2 1 1 ΣμΣμΨ = that describe the stochastic relationship between the residual

observations and the unobservable states and the transition rate parameters ) , ,(: 3 2 3 1 2 1 λλλΛ

that describe the evolution of the gear deterioration process. The POHM model can also be

represented as a function with respect to these two set of parameters as

( )ΛΨΩ ,Ω= f (3.15)

Given an residual observation ( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y at time it with fi tt < , the

probability element of the residual observation can be calculated as,

( )( ) ( ) ( )( ) ( ) ( )( )

( ) ( )( ) 3 3 0

2 2 0 1 1 0

|,;

|,; |,; |

π

ππ

⋅=+

⋅=+⋅==

ΛΨyPr

ΛΨyPr ΛΨyPr ΩyPr

stXt

stXtstXtt

i

iii

(3.16)

Using the results of Equation (3.12), the above equation can be written as

( )( ) ( ) ( )( )

( ) ( ) ( )( ) ( ) ( )( )

( ) ( ) ( )( ) ( ) ( )( )1 0 2 1 0 2

1 0 1 1 0 1

1 0

|,;, |,;

|,;, |,;

|,; |

stXstXstXstXt

stXstXstXstXt

stXtt

iii

iii

ii

==⋅==+

==⋅===

==

ΛΨPrΛΨyPr

ΛΨPr ΛΨyPr

ΛΨyPr ΩyPr (3.17)

The deterioration process is a Markov process, so the above equation can be simplified as

41

( )( ) ( ) ( )( ) ( ) ( )( )

( ) ( )( ) ( ) ( )( ) ΛΨPrΛΨyPr

ΛΨPr ΛΨyPr ΩyPr

1 0 2 2

1 0 1 1

|,; |,;

|,; |,; |

stXstXstXt

stXstXstXtt

iii

iiii

==⋅=+

==⋅==

(3.18)

3.3 Parameter Estimation Using EM Algorithm

A POHM model was built in the previous section to infer about the gear unobservable states 1 s

and 2 s . using residual observations ( ) ( ) ( ) ( )[ ]T 2 1 ,,, iqiii tytytyt K=y at time it with

fi tt < . The POHM model ( )ΛΨΩ ,Ω= f was constructed based on the assumption that the

relation parameters ) , , , ( 2 2 1 1 ΣμΣμΨ = and the transition rate parameters

) , ,( 3 2 3 1 2 1 λλλ=Λ are given.

In this section, the formulas to estimate the POHM model parameters using the history data will

be introduced. First, we will construct the pseudo-likelihood function that contains the relation

parameter Ψ and the transition rate parameter Λ . Using the history data, the formulas for the

parameter estimation are obtained by maximizing the pseudo likelihood function using the EM

iteration algorithm.

3.3.1 Random Breakdown Time of a Gear Deterioration Process

The observable breakdown time of the gear system is considered as a random variable and it is

denoted by fT . If no preventive maintenance action is applied, the probability density function

of the breakdown time fT can be written as

( ) [ ]fff

f

tvtv

tv fT ee

vvvvppevptf

2 1

2 1 3 22 1

1 3 1

1 2 1

; −−− −

−+=Λ (3.19)

42

The gear deterioration process starts in a good state 1 s at time 00 =t . There are two paths

representing how the process can make transition to breakdown state 3 s at time 0 >ft as shown

in Figure 3.1. The upper path represents direct entry into breakdown state 3 s immediately after

it leaves the good state 1 s at time ft . The lower path represents entry into breakdown state 3 s

from the warning state 2 s at time ft , where transit into warning state 2 s from good state 1 s

occurs at some time 1 st such that fs tt 1

0 << .

te 1 1

νν − s2

s3

0 ft timet

s1

s1

12p

s1

s2

)(2

2 tt fe −−νν

fte 1 1

νν − 13p

32p

Figure 3.1 Transition Paths from Good State to Breakdown State

We first derive the probability density function of the upper path in Figure 3.1. The probability

density that the process leaves 1 s at time 0 >ft (the sojourn time of the state 1 s ) is

exponentially distributed as

( ) ftvfT evtf

1 1

1s ; −=Λ (3.20)

Based on the transition matrix of the embedded discrete Markov chain in Equation (3.6), the

probability that the process enters 3 s after it leaves the state 1 s is )( 3 1 2 1 3 1 3 1 λλλ += p . Since

the event that the process leaves 1 s and the event that the process enters 3 s are independent, the

joint probability density function for the process leaving 1 s at time ft and entering 3 s is

43

( ) ( ) 13

1 13

1

1 ;; pevptftf f

sf

tvfTf

uT ⋅=⋅= −ΛΛ (3.21)

We now derive the probability density function for the lower path in Figure 3.1. The probability

density function that process leaves 1 s at time 1 st with fs tt 1

0 << is

( ) 1 1

1s

1 ; stv

T evtf −=Λ (3.22)

The probability that the deteriorating process enters 2 s after it leaves the state 1 s is

)( 3 1 2 1 2 1 2 1 λλλ += p . The joint probability density function that the process enters 2 s at time

1 st is

( ) ( ) 1 1

1 1 1 1 1 2

1 12 12 , ; ;, s

sss

tvsTssTT evpptfttf −=⋅= ΛΛ (3.23)

The process will enter 3 s at time ft . Hence, the time interval that the process stays in the 2 s is

)(1 sf tt − . The joint probability density function that process makes a transition into 3 s at time

ft is given by

( ) ( )1 2

1 1 2

2 23 , ;, sf

sf

ttvsfsfTT evpttttf −−=−− Λ (3.24)

The probability density function of the lower path is obtained by integrating 1 st from 0 to ft ,

( ) ( ) ( )

( ) ( )( )[ ]ff

f sfs

f

sssff

tvtv

t

sttvtv

t

ssfsfTTssTTflT

eevv

vvpp

dtevpevp

dtttttfttftf

2 1

2 1 3 22 1

0

2 23

1 21

0 , ,

1 2

1

1 2 1 1

1 1 1 1 2 1 1 2

;,;, ;

−−

−−−

−−

=

⋅=

−−⋅=

∫

∫ ΛΛΛ

(3.25)

44

Thus, the probability density function of the breakdown time in Equation (3.19) can be obtained

by adding the probability density functions of the upper path and lower path:

( ) ( ) ( )ΛΛΛ ;;;

f

lTf

uTfT tftftf

fff+= (3.26)

3.3.2 Sojourn Time of the Deterioration Process in Good State

The sojourn time of the gear deterioration process in good state 1 s is to be considered as a

random variable and denoted by 1 sT . The conditional probability density function of

1 sT , given

the failure time ff tT = and fs tt 1 0 << , can be written as

( ) ( )( )Λ

ΛΛ

;

;,|;

, |

1 1

1 1

fT

sfTTffsTT

tf

ttftTtf

f

sf

fs== (3.27)

The denominator of the above equation is given by Equation (3.19), and the numerator is the

lower path of Figure 3.1 and is given by

( ) ( ) ( ) )( 2 3 2

1 21 , , ,

1 2 1 1

2 1 1 1 2 1 1 ;,;,;, sfs

sfsssf

ttv

tvffTTssTTsfTT evpevpttfttfttf −−− ⋅=⋅= ΛΛΛ (3.28)

If fs tt 1 < , the process cannot directly enter 3 s after it leaves from 1 s at time

1 st . This is due to

the fact that if the process enters 3 s at this time 1 st , we have fs tt 1

= , which contradicts with

fs tt 1 < . Thus, the process can only enter 2 s after the process leaves the state 1 s with the

probability )( 3 1 2 1 2 1 2 1 λλλ += p . Given the breakdown time as ft , the probability density

function of the sojourn time of the gear deterioration process in good state 1 s for fs tt 1 < is

45

( )[ ]fff

sfs

fs tvtv

tv

ttv

tv

ffsTT

eevv

vvppevp

evppevtTtf

2 1

2 1 3 22 1

1 3 1

)( 2 3 221

1

| 1 2 1

1 2 1 1

1 1 |;

−−−

−−−

−−

+

⋅⋅==Λ

(3.29)

If fs tt 1 0 =< , we have the following conditional probability function

( ) ( )( )Λ

ΛΛ

;

;,|;

, |

1

1

fT

ffTTfffTT

tf

ttftTtf

f

sf

fs== (3.30)

where ( ) 31

1 , 1

1 ;, pevttf f

sf

tvffTT ⋅= −Λ is the probability of the process entering state 3 s right

after it leaves state 1 s at time ft . Hence, we have

( )[ ]fff

f

fs tvtv

tv

tv

fffTT

eevv

vvppevp

pevtTtf

2 1

2 1 3 22 1

1 3 1

31

1 |

1 2 1

1

1 |;

−−−

−

−−

+

⋅==Λ

(3.31)

3.3.3 Parameter Estimation using the EM Algorithm

The probability distribution of a residual observation i Y was described as multivariate normal

distribution based on the states of the gear deteriorating process at time it . Assuming the

residual observations 1 Y , 2 Y ,…, n Y are independent, and the joint probability density function

of the residual observations given the sojourn time of state 1 s and the breakdown time ft is

( )

( )

( ) ( )( )ffssnTT

ffssTTffssTT

ffssnTT

ffssTT

tTtTf

tTtTftTtTf

tTtTf

tTtTf

fs

fsfs

fs

fs

,|

2 ,| 1 ,|

2 1 ,|

,|

, |,;

, |,;, |,;

, |,;,,,

, |,;

1 1 1

1 1 1 1 1 1

1 1 1

1 1 1

==

==⋅===

===

==

ΛΨy

ΛΨyΛΨy

ΛΨyyy

ΛΨw

Y

YY

W

W

K

K

(3.32)

46

where ],,,[ 2 1 nyyyw K= is a full lifetime historical residual observations obtained from a gear

transmission system from a good state to a breakdown state. Denote kt , nk <≤1 , as the first

data collection epoch after the process leaves the good state 1 s , i.e. ksk ttt <<− 1 1 . The above

equation becomes

( )

( )( ) ( )( )

( )( ) ( )( )

( )( ) ( )

( )( ) ( )

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎦⎤

⎢⎣⎡ −−−

⋅⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎦⎤

⎢⎣⎡ −−−=

==⋅

===

==

∏

∏

=

−

=

−−

n

kiii/q

k

iii/q

nn kk

kk

ffssTT

tt

tt

stXfstXf

stXfstXf

tTtTffs

2 1- 2

T2 2/1

2 2

1

11

1- 1

T1 2/1

1 2

2 2

11 1 11 1

,|

)()(21 exp

21

)()(21 exp

21

|; |;

|; |;

, |,;1 1 1

μyΣμyΣ

μyΣμyΣ

ΨyΨy

ΨyΨy

ΛΨw

X|YX|Y

X|YX|Y

W

π

π

L

L

(3.33)

In order to estimate the relation parameterΨ and the transition rate parameter Λ , the following

steps are taken: m histories data ( 1 w , 2 w ,…, m w ) of residual observations are obtained, the

breakdown epoch is denoted as jft , and the sojourn time of good state is denoted as j

st 1 with

mj ≤≤1 . Since the data obtained from different histories are independent, the log-likelihood

function is constructed as

( )

( ) ( ) ( )[ ]

( )[ ]∑=

=

=

=

m

j

js

jfiTT

ms

mfmTTsfTTsfTT

msss

mfffm

ttf

ttfttfttf

ttttttfl

sf

sfsfsf

1

, ,

, ,

2 2 2 , ,

1 1 1 , ,

2 1 2 1 2 1

,;,,ln

,;,,,;,,,;,,ln

,;,,,,,,,,,,,ln

11

11 11 11

111

ΛΨw

ΛΨwΛΨwΛΨw

ΛΨwww

W

WWW L

KKK (3.34)

47

Because the states 1 s and 2 s are unobservable, it is very difficult to directly maximize the above

log-likelihood function and to obtain the optimal estimators for the relation parameter Ψ and the

transition rate parameter Λ . The EM algorithm will be applied to estimate the model parameters.

The EM algorithm is an iterative method for finding the maximum log-likelihood estimators of

parameters in statistical models where the model depends on unobservable variables (Dempster

et al. [2]). The EM includes two steps: an expectation (E) step constructs the pseudo-likelihood

function that is a conditional expectation of the log-likelihood evaluated using the current

estimation; a maximization (M) step computes parameters by maximizing the pseudo-likelihood

function constructed in the E-step. These estimated parameters are then used to determine the

unobservable variable in the next E-step. Given the current estimations Ψ and Λ , the pseudo-

likelihood function is given by

( )

( )

( )[ ]

( )[ ] ( )( )∑

∫∫

∑

∑

=

=

=

⎪⎭

⎪⎬

⎫

⎪⎩

⎪⎨

⎧

==

===

=====

=

m

jt j

sj

ffjj

sTT

t js

jffj

jsjTT

js

jfjTT

m

j

jffj

js

jfiTT

m

i

j

jf

fs

jf

fssf

sf

τdtTτf

τdtTτfτtf

tTttf

Q

Q

1

0

,|

0

| ,

, ,

1

, ,

1

11 1

11 1 11

11

,|ˆ,ˆ;

,|ˆ,ˆ;, ,;,,ln

,,ˆ, ˆ ,;,,ln E

ˆ,ˆ|,

ˆ,ˆ|,

wWΛΨ

wWΛΨwΛΨw

wWΛΛΨΨΛΨw

ΛΨΛΨ

ΛΨΛΨ

W

WW

W

(3.35)

The first term inside the integration of the numerator of Equation (3.35) is separated into two

terms

48

( )( ) ( ) ( )[ ]

( ) ( ) ( )

( ) ( ) ( )

( ) ( )[ ] ( )jff

jssjTT

jff

jsTT

jfT

jff

jssjTT

jff

jsTT

jfT

jff

jssjTT

jff

jsTT

jfT

jff

jssjTT

jff

jsTT

jfT

js

jfjTT

tTτTftTτftf

tTτTftTτftf

tTτTftTτftf

tTτTftTτftf

τtf

fsfsf

fsfsf

fsfsf

fsfsf

sf

,|

|

,|

|

,|

|

,|

|

, ,

,| ; ln| ; ; ln

,| ; ln| ; ln ; ln

,|, ; ln|, ; ln, ; ln

,|, ; |, ; , ; ln

, ;,, ln

11 1 1 1

11 1 1 1

11 1 1 1

11 1 1 1

11

==+==

==+=+=

==+=+=

====

ΨwΛΛ

ΨwΛΛ

ΛΨwΛΨΛΨ

ΛΨwΛΨΛΨ

ΛΨw

W

W

W

W

W(3.36)

Given j w and jft , the second term inside the integration of the numerator of Equation (3.35) can

be decomposed as follows

( )( ) ( )

( )jffjT

jff

isTT

jff

jssjTT

jffjj

jsTT

tTf

tTτftTτTf

tTτf

f

fsfs

fs

|

|

,|

,|

|ˆ,ˆ;

|ˆ; ,|ˆ;

,|ˆ,ˆ ;

1 1 11 1

1 1

=

=⋅===

==

ΛΨw

ΛΨw

wWΛΨ

W

W

W

(3.37)

The term inside the integration of the denominator of Equation (3.35) is the same as Equation

(3.37). The pseudo-likelihood function can be rewritten as

( ) ( ) ( )ΛΨΨΛΨΛΛΨΛΨ ΨΛˆ,ˆ| ˆ,ˆ| ˆ,ˆ|, QQQ += (3.38)

where

( )( ) ( )[ ] ( ) ( )

( ) ( )( )

( )[ ] ( ) ( )( ) ( )∑

∫∫

∑∫

∫

=

=

=⋅==

=⋅=====

=⋅==

=⋅====

m

it j

sj

ffi

sTTj

ffj

ssjTT

t js

jff

isTT

jff

jssjTT

jff

jssjTT

m

it j

sj

ffi

sTTj

ffj

ssjTT

t js

jff

isTT

jff

jssjTT

jff

jsTT

jfT

jf

fsfs

jf

fsfsfs

jf

fsfs

jf

fsfsfsf

dtTτftTτTf

dtTτftTτTftTτTf

Q

dtTτftTτTf

dtTτftTτTftTτftf

Q

10

|

,|

0

|

,|

,|

10

|

,|

0

|

,|

|

11 1 11 1

11 1 11 1 11 1

11 1 11 1

11 1 11 1 1 1

|ˆ; ,|ˆ;

|ˆ; ,|ˆ; ,| ; ln

ˆ,ˆ|

|ˆ; ,|ˆ;

|ˆ; ,|ˆ; | ; ; ln

ˆ,ˆ|

τ

τ

τ

τ

ΛΨw

ΛΨwΨw

ΛΨΨ

ΛΨw

ΛΨwΛΛ

ΛΨΛ

W

WW

Ψ

W

W

Λ

(3.39)

49

The M-step chooses Ψ and Λ by maximizing the ( )ΛΨΛΨ ˆ,ˆ|,Q . The updated Ψ and Λ can

be obtained by solving the following two differential equations:

( ) ( )

( ) ( ) 0

0

1

1

=∂

∂=

∂∂

=∂

∂=

∂∂

∑

∑

=

=

m

i

i

m

i

i

ˆ,ˆ|Qˆ,ˆ|Q

ˆ,ˆ|Qˆ,ˆ|Q

ΨΛΨΨ

ΨΛΨΨ

ΛΛΨΛ

ΛΛΨΛ

ΨΨ

ΛΛ

(3.40)

After the updated Ψ and Λ are calculated from the above equations, the E-step and M-step is

repeated until the update parameters convergence to constant values *Ψ and *Λ .

3.4 Numerical Results 3.4.1 Residual Observations Extraction

In this section, residual observations are extracted from the vibration data collected by

accelerometers A02 and A03. The ARX model residuals are obtained following the procedure

presented in Section 2. In section 2, a time-synchronous averaging (TSA) algorithm was first

applied to average out the contributions of both the noise and signals whose periods are not equal

to the shaft rotation period of the gear of interest. Filtered TSA envelopes were then extracted

from the TSA vibration data for each data file. The files from 194 to 338 were analyzed except

file 212 that was reported unreliable due to some accelerometer problems. All the data files were

separated into two groups. The files 194 - 246 were used to fit the ARX model and the files 194

- 338 were used to estimate the POHM model parameters.

It is possible to build a multivariate ARX models based on the two-dimensional vibration data

collected by both A02 and A03. Since the univariate ARX model for A02 was given in Section

2, the same procedure will be applied to build an ARX model for A03. The ARX model for

accelerometer A03 is ( ) ( ) )( )( )1( )( 3 66 kkkk NxBωyBδ y ++−= and the parameters are given below

50

( ) 2 3 327.1537.221.1 BBBω +−= (3.41)

( )

66 65 64

63 62 61 60 59

58 57 56 55 54

53 52 51 50 49 48

47 46 45 44 43 42

41 40 39 38 37

36 35 34 33 32 31

30 29 28 27 26 25

24 23 22 21 20 19

18 17 16 15 14 13

12 11 10 9 8 7

6 5 4 3 2 65

0.07068 - 0.02268 0.01672

0.06597 0.06862 - 0.0075320.039020.0001627

0.05632-0.01372 004592.00.03815- 0.1147

0.05343 - 06538.0 0.1123 0.1115-0.01921- 0.04052

07984.0 0.03789 0.1312 0.04328 0.09520.001385-

0.02075 0.03948 - 0.0005747 0.0088050.06063

0.06339 - 0.1687 0.2372 - 0.09378- 0.1338 0.0998

0.024840.07188 0.0432107812.00.13380.1265-

1185.0 0.06088- 0.02701 - 0.04363 0.1172- 0.1371

0.02291193.00.27907868.00.2754-0.3689

0.1542- 0.01244- 0.005257-0.117-0.09517 0.05079

07838.02897.00.048560.63621.5692.005

BBB

BBBBB

BBBBB

BBBBBB

BBBBBB

BBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

BBBBBB

Bδ

++

++++

+++

−++

++−++

−+++

+++

+−+−+

+++

+−+−+

++

+−++−=

(3.42)

After we obtain the ARX model residuals for A02 and A03, the standard deviations are

calculated for each data file.

It is well known that a motor current is directly related to the output load of the gear transmission

system and is easier to collect than the load measurement, so the motor current data is included

in condition monitoring data too.

51

Figure 3.2 (a), (b) Residual Standard Deviations of A02 and A03, and (c) Motor Current Mean

The Figure 3.2 shows that the motor current data can be expressed as a proportion of the standard

deviation of ARX model residuals collected by A02 and A03. We then apply least squares

method to estimate first-order polynomial model for each accelerometer. The models are

obtained as

914.140032.0ˆ

2596.130023.0ˆ

A03 A03

A02 A02

+×=

+×=

CS

CS (3.43)

where A02 C and A03 C are the mean of motor current of each data file, and A02 S and A02 S are the

estimated standard deviations of ARX model residuals. For the data files 194-246, the filtered

out standard deviation are plotted in Figure 3.3

52

Figure 3.3 (a) and (b) Residuals of Polynomial Model for A02 and A03

The extracted residual observations for the POHM model parameter estimation can be obtained

by the following equations

( )( )914.140032.0ˆ

2596.130023.0ˆ

A03A03A03A032

A02A02A02A021

+×−=−=

+×−=−=

iiiii

iiiii

CSSSy

CSSSy (3.44)

3.4.2 POHM Parameters Obtained by Expert Judgment

In the testing, the data collection interval is not a constant, some data collection intervals are only

1 minutes, but some intervals are 30 minutes. To simplify the parameter estimation procedure,

we selected the data that were collected with the time interval around 30 minutes, so there are 38

data files: 194, 197, 198, 199, 207, 211, 213, 221, 225, 226, 227, 238, 239, 240, 241, 252, 253,

255, 265, 267, 268, 269, 281, 282, 283, 284, 295, 296, 307, 308, 309, 310, 320, 322, 323, 334,

337, 338. The residual observations are

[ ] ⎥⎦

⎤⎢⎣

⎡==

)()()()()()(

)()()(38 2 2 2 1 2

38 1 2 1 1 1 38 2 1 1 tytyty

tytytyttt

L

LK yyyw (3.45)

53

and 1w is plotted in the following figures

Figure 3.4 Residual Observations Collected by Accelerometers A02 and A03

From the above figures and subjective judgment, the residual observations collected by each

accelerometer can be separated into two groups. The first group contains the data files from 194

to 295, and second group contains files from 296 to 338. Based on the analysis in section 2, the

first group is considered that the gear system is in good state 1 s , and the second group is

considered to be in warning state 2 s . The time interval between each sample is 5.0=Δh hour.

Assuming the process transits into warning state from good state in the middle of the interval of

data collection for files 295 and 296, the sojourn time of the good state is hts Δ= 5.271

. The time

spent in warning state is hts Δ= 5.102

, and the gearbox failure occurred at ht f Δ= 38 . Based on

the history data 1w , the relation parameters can be estimated as

00)(0 3 1 3 1 2 1 3 1 3 1 =⇒=+⇒= λλλλ p (3.46)

The sojourn time of the warning state is estimated as

54

1905.05.05.10

11

2

2 =×

==st

v (3.47)

and 1905.02 3 2 == vλ . The sojourn time of the good state is

07273.05.05.27

11

1

1 =×

==st

v (3.48)

and 07273.03 11 12 =−= λλ v . When the gearbox is in the good state, the mean vector is

calculated by

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

∑

∑

=

=

8152.1

2356.1

271

271

27

12

27

11

1

i

i

i

i

y

y

μ (3.49)

and the covariance matrix is

( )( ) ⎥⎦

⎤⎢⎣

⎡=−−= ∑

= 5165.223106.173106.170051.14

261 27

11 1 1

i

Tii μyμyΣ (3.50)

When the gearbox is in the warning state, the mean vector is

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

∑

∑

=

=

6970.37

3266.26

111

111

38

282

38

281

2

i

i

i

i

y

y

μ (3.51)

and the covariance matrix is

55

( )( ) ⎥⎦

⎤⎢⎣

⎡=−−= ∑

= 0404.2856591.1896591.1896485.132

101 38

282 2 2

i

Tii μyμyΣ (3.52)

3.4.3 Estimated POHM Parameters using the EM Iteration Algorithm

In the previous section, the POHM parameters were obtained by assuming the deterioration

process transited into the warning state at time hts Δ= 5.271

, which required subjective judgment.

In this section, the POHM parameters will be estimated using the EM algorithm, and compare

with the results obtain in section 3.4.2 using subjective judgment.

The number of iterations for the EM algorithm is selected as 10, and the EM estimation of mean

vectors is shown in the following figures

Figure 3.5 EM algorithm Estimated Mean Vectors 1 μ and 2 μ

The dotted lines are the mean vectors estimated using subjective judgment, and the solid lines are

the mean vectors estimated using the EM algorithm. The above figures show that the initial

values are selected as 0. The estimated mean values converge to the mean vectors obtained in

previous section. Starting with the 6-th iteration, relative differences are less than 0.3%. The

56

following table gives the estimated mean and its convergence that is measured by the absolute

changes of the updated estimation. Starting from iteration 6, the increments of all estimated

means are less than 50.1 −E .

Table 3.1 EM Algorithm Iteration Results: Mean Vectors 1 μ and 2 μ

A02 A03

1 μ 2 μ 1 μ 2 μ

Value Convergence Value Convergence Value Convergence Value Convergence

initial 0 0 0 0

1 -0.6312 1.00E+00 9.4310 1.00E+00 -0.4943 1.00E+00 13.4983 1.00E+00

2 1.1221 1.56E+00 23.9001 6.05E-01 1.6960 1.29E+00 34.1370 6.05E-01

3 1.2261 8.48E-02 26.1742 8.69E-02 1.8059 6.09E-02 37.4684 8.89E-02

4 1.2318 4.64E-03 26.2677 3.56E-03 1.8115 3.08E-03 37.6083 3.72E-03

5 1.2322 2.93E-04 26.2734 2.17E-04 1.8119 1.95E-04 37.6169 2.28E-04

6 1.2322 7.66E-06 26.2735 5.67E-06 1.8119 5.09E-06 37.6171 5.94E-06

7 1.2322 1.93E-07 26.2735 1.43E-07 1.8119 1.28E-07 37.6171 1.50E-07

8 1.2322 4.83E-09 26.2735 3.57E-09 1.8119 3.21E-09 37.6171 3.74E-09

9 1.2322 1.21E-10 26.2735 8.94E-11 1.8119 8.04E-11 37.6171 9.37E-11

10 1.2322 3.03E-12 26.2735 2.24E-12 1.8119 2.01E-12 37.6171 2.35E-12

The following figures show each element of the estimated covariance matrix 1 Σ and 2 Σ for

good state and warning state respectively.

57

Figure 3.6 EM Algorithm Estimated Covariance Parameters 1 Σ and 2 Σ

The initial value for the covariance 1 Σ and 2 Σ estimation is selected as a symmetrical matrix

⎥⎦

⎤⎢⎣

⎡300030

. The estimated covariance matrices converge to our calculated matrix using

subjective judgment, but there exist small discrepancies between covariance calculated by

subjective judgment and the EM algorithm. After 5-th iterations, the discrepancies become less

than 4% for 1 Σ and 9% for 2 Σ . These discrepancies may be caused by standard error to

calculate covariance matrix from very small sample of history data, since only one history is

available for the covariance matrix calculation. Better estimation would be obtained as more

data histories are collected. The following two tables give the EM estimated covariance matrix

and their convergence speed at each iteration. Starting from iteration 6, the increments of all

elements in 1 Σ are less than 50.1 −E .

Table 3.2 EM Algorithm Iteration Results: Covariance Matrix 1 Σ

58

)1,1(1 Σ )1,2()2,1( 1 1 ΣΣ = )2,2(1 Σ

Value Convergence Value Convergence Value Convergence

initial 30 0 30

1 11.3023 1.65E+00 15.3237 1.00E+00 21.6475 3.86E-01

2 16.7311 3.24E-01 20.8341 2.64E-01 27.0157 1.99E-01

3 13.5038 2.39E-01 16.6975 2.48E-01 21.7263 2.43E-01

4 13.4886 1.13E-03 16.6755 1.32E-03 21.6944 1.47E-03

5 13.4883 1.91E-05 16.6749 3.76E-05 21.6932 5.49E-05

6 13.4883 4.39E-07 16.6749 9.34E-07 21.6932 1.40E-06

7 13.4883 1.08E-08 16.6749 2.33E-08 21.6932 3.51E-08

8 13.4883 2.71E-10 16.6749 5.84E-10 21.6932 8.78E-10

9 13.4883 6.77E-12 16.6749 1.46E-11 21.6932 2.20E-11

10 13.4883 1.71E-13 16.6749 3.66E-13 21.6932 5.53E-13

As shown in the next table, the increments of all elements in 2 Σ are less than 50.1 −E after

iteration 6.

Table 3.3 EM Algorithm Iteration Results: Covariance Matrix 2 Σ

)1,1(2 Σ )1,2()2,1( 2 2 ΣΣ = )2,2(2 Σ


initial 30 0 30

1 270.1911 8.89E-01 384.8667 1.00E+00 553.3765 9.46E-01

2 367.2451 2.64E-01 525.9849 2.68E-01 764.8557 2.76E-01

3 128.2178 1.86E+00 183.7324 1.86E+00 275.8238 1.77E+00

4 121.5471 5.49E-02 173.8773 5.67E-02 261.3171 5.55E-02

5 121.4466 8.28E-04 173.7243 8.80E-04 261.0879 8.78E-04

59

6 121.4442 2.00E-05 173.7206 2.13E-05 261.0823 2.12E-05

7 121.4441 4.97E-07 173.7205 5.29E-07 261.0822 5.28E-07

8 121.4441 1.24E-08 173.7205 1.32E-08 261.0822 1.32E-08

9 121.4441 3.11E-10 173.7205 3.32E-10 261.0822 3.31E-10

10 121.4441 7.79E-12 173.7205 8.30E-12 261.0822 8.28E-12

The following figures show EM estimation results of transition rate parameters

) , ,( 3 2 3 1 2 1 λλλ=Λ when the initial value is selected as 0.5.

Figure 3.7 EM Algorithm Estimated States Transition Parameters Λ

The estimated transition parameters converge to the state transition parameters based on the

subjective judgment. Starting from iteration 5, the relative differences between the EM

estimated states transition parameters and our calculated parameters in previous section are less

than 0.3%. The following table gives the estimated values and their convergence speed for each

60

iterative step. Starting from iteration 5, the increments of all estimated parameters are less than

50.1 −E

Table 3.4 EM Algorithm Iteration Results: States Transition Parameters Λ

2 1 λ 3 1 λ 3 2 λ


intial 0.5 0.5 0.5

1 0.5003 6.87E-04 0.0588 7.50E+00 0.0000 4.00E+05

2 0.0764 5.55E+00 0.1691 6.52E-01 0 0

3 0.0729 4.78E-02 0.1892 1.06E-01 0 0

4 0.0728 1.77E-03 0.1900 4.58E-03 0 0

5 0.0728 1.07E-04 0.1901 2.80E-04 0 0

6 0.0728 2.85E-06 0.1901 7.44E-06 0 0

7 0.0728 7.19E-08 0.1901 1.88E-07 0 0

8 0.0728 1.80E-09 0.1901 4.70E-09 0 0

9 0.0728 4.51E-11 0.1901 1.18E-10 0 0

10 0.0728 1.13E-12 0.1901 2.95E-12 0 0

Based on the above analysis, all the parameters estimated using the EM algorithm converged to

the values that calculated using subjective judgments. Some small discrepancies exist for the

covariance matrix estimations. As mentioned, only one history is available for this study. The

accuracy of the estimation should increase if more data histories are collected.

3.5 Conclusions In this chapter, a POHM model was built to describe the relationship among the residual

observations and the three states of a gear deterioration process. Once the POHM model was

61

well defined, two sets of parameters were needed to be estimated: 1) the relation parameters

) , , , ( 2 2 1 1 ΣμΣμΨ = that described the stochastic relationship between the residual

observations and the unobservable operational states of gearbox transmission system, and 2) the

transition rate parameters ) , ,( 3 2 3 1 2 1 λλλ=Λ that described the evolution of the gear

deterioration process. It was very difficult to find a feasible solution by directly maximumizing

likelihood function of the POHM model. The EM iterative algorithm was introduced to make

the parameter estimation possible by maximizing a pseudo likelihood function.

Finally, a POHM model was built using full lifetime condition monitoring data obtained from the

MDTB operating from good state to breakdown state. The condition monitoring data files with

equal data collection interval were selected, and residual observations were extracted from motor

current data and vibration data. The EM iterative algorithm provided a feasible solution for the

pseudo-likelihood function of the POHM model. The numerical iteration procedure generated a

reasonable parameter estimates compared to our results from subjective judgment. It was

confirmed that the POHM model can be applied to estimate gear hidden states based on the

condition monitoring data. In the next chapter, a Bayesian process control framework will be

developed to optimize maintenance policy and the schedule the optimal epoch for condition

monitoring data collection.

62

Chapter 4 Bayesian Process Control to Optimize Gearbox CBM Policy

4.1 Introduction In chapter 2, we presented the method to detect an early gear defect occurrence using the

vibration data collected from the gear transmission system testing. There were 338 vibration

data collections during 116 hours. At the end of the test, the system was shut down and

inspection took place. Five fully broken and two partially broken teeth were observed on the

driven gear. In chapter 3, a POHM model was built to describe the gear deterioration process as

well as the stochastic relationship between the condition monitoring data and the unobservable

states. This raises the following question: What will be the best time to stop the gear system for

inspection? If we stop the system and take the inspection too early, there would have been no

gear defect formed. If we stop the system too late, the defect would have propagated to the other

parts of the system and the maintenance costs will be higher than the cost at the optimal

maintenance epoch. We may consider stopping the system every half hour, so that we would not

miss the optimal maintenance epoch. However, the high frequency of inspections will increase

the system downtime and the cost of inspection significantly. In order to find out the optimal

maintenance epoch, a Bayesian process control model is developed to optimize the gear

transmission system inspection schedule by minimizing the total expected cost. Bayesian

approach is used to determine the optimal CBM strategy based on the posterior probability that

the gear deteriorating process is out of control by minimizing the total expected cost over a finite

horizon, or the long-run expected average cost.

In 1952, Girshick & Rubin [34] first introduced a Bayesian approach to the control of a

production process. They showed that the process is stopped for repair when the posterior

63

probability is greater than a control limit. Taylor [28] and [29] have presented a method of

selecting the posterior probability control limit and proven that the Bayesian approach is an

optimal way to determine the control parameters compared with non-Bayesian charts. There are

several Bayesian control charts that have been proposed for monitoring a univariate process

mean in finite production run: Carter [41] was the first that discussed issues related to optimal

selection of sample size and sampling intervals in the Bayesian process control context; Tagaras

[25]and [26] have proposed Bayesian X charts with adaptive sample sizes and sampling

intervals; Porteus & Angelus[18] applied Bayesian charts to identify opportunities of cost

reduction in statistical process control; Nenes & Tagaras [23] derived properties that facilitated

the cost computation and the optimization of a Bayesian chart, and evaluated the effectiveness of

the chart by comparing its costs against the expected costs; Nikolaidis et al. [60] presented the

economic design of Shewhart and Bayesian control charts for monitoring a critical stage of the

main production process of a tile manufacturer. Makis [52] presented a theoretical development

of a multivariate Bayesian control chart with fixed sample size and sampling interval for finite

production runs. He has proven the Bayesian control policy is optimal under standard operating

and cost assumptions.

In this chapter, we propose a Bayesian process control framework, which allows for optimal

computation of the two decision variables: the time interval for next data collection and the

control limit at next data collection epoch. The decision variables are calculated by minimizing

the total expected cost from current stage to the end of production run. The dynamic

programming is applied to update the decision variables after new monitoring data are acquired,

and the maintenance decision is made by taking all available information into account. Section

4.2 presents the problem statement and assumptions. In section 4.3, a Bayesian process control

64

model is derived to formulate the relationship between the decision variables and the total

expected costs. Section 4.4 introduces the development of the maintenance cost function.

In section 4.5, dynamic programming is introduced to optimize the time interval for next data

collection and the control limit at next data collection epoch by minimizing the total expected

cost. Section 4.6 presents numerical results.

4.2 Problem Statement and Assumptions In this chapter, a Bayesian process control model will be developed to make an optimal

maintenance decision for the gear transmission system. The time interval for next data collection

and the control limit at next data collection epoch are calculated based on the on-line condition

monitoring data. If the posterior probability of gear deterioration process exceeds the control

limits, the system will be immediately stopped and inspected. Otherwise, the gear deterioration

process is considered to be in good state 1 s . The system will keep running and the inspection

decision will be made at the next data collection epoch. The Bayesian process control is

designed based on the POHM model that was built in chapter 3. The assumptions of section 3.2.1

are kept for the Bayesian process control model.

Based on the results from chapter 3, the mean vector and covariance matrix of the deterioration

process will increase if the gear state transits from good state 1 s to warning state 2 s . In this

chapter, we assume that the gear deterioration process mean vector shifts 1 μ from 2 μ to with

unchanged covariance matrix i.e. ΣΣΣ == 1 2 , when the gear state transits into 2 s from 1 s .

The shift size of mean vector can be measured by the Mahalanobis distance (M-distance) that is

defined as

65

( ) ( )[ ] 021

1 2 1- T

1 2 >−−= μμΣμμd (4.1)

There are two main reasons to assume that the covariance matrix remains constant after the gear

state transits into 2 s from 1 s . Firstly, the constant covariance matrix dramatically simplifies the

development for the Bayesian control model. Using the constant covariance matrix, the explicit

expression of control limit from the posterior probability can be obtained. Secondly, the constant

covariance matrix reduces the computation time, which makes Bayesian control model more

suitable for on-line application. Hence, we can assume the time interval for each data collection

and analysis can be ignored, and the gear state will not change during the time interval of data

collection and data analysis.

In addition we assume a finite production run, and the length of the production run is denoted as

Ht . At the end of the production run the gear system will be stopped and a full inspection will

be taken no matter what maintenance decision is made by the control chart. If any defect is

found during the inspection, the gear system will be repaired or replaced and the gear

transmission system is returned to the good state. The gear system will be reset for a new

production run. There are 1−H potential data collection epochs equally distributed for time

00 =t to Ht . The Bayesian control chart will decide if the condition monitoring data will be

collected at each potential epoch. The time interval between any two adjacent potential data

collection epochs is obtained by

Ht H

h

=Δ (4.2)

Then the potential data collection epoch can be written as hi it Δ⋅= , where 1 , ,2 ,1 −= Hi K .

66

In the Bayesian control model, the decision variables are determined by minimizing the total

expected cost. There are four costs that will be considered in the cost function:

1) 0 C is denoted as the cost for collecting and analyzing the condition monitoring data. 0 C is

incurred at the epoch of each data collection.

2) 1 C is denoted as inspection cost. If the posterior probability of the monitoring process

exceeds its control limit, the gear system will be stopped and inspected. If during the

inspection, the gear is found to be in good state, no maintenance is needed. The control

chart made a wrong inspection decision of the gear actual state. This situation is called a

false alarm. The inspection cost 1 C includes the cost of the inspection and the system

downtime due to the inspection.

3) 2 C is denoted as maintenance cost for repair or replacement when control chart gives a true

alarm such that some gear defects are found during the inspection. The total cost for a true

alarm includes downtime cost, inspection cost, and maintenance cost, i.e. ( 2 1 CC + ). After

the restoration, the gear system is considered to be in good state and the posterior probability

is reset to 0.

4) 3 C is denoted as a breakdown cost for the gear transmission system. The cost of breakdown

state 3 s is higher than that of the maintenance cost for warning state 2 s , i.e. ( 2 3 CC > ). If

this assumption is not satisfied, it would be more economical to replace the gear

transmission system after it breaks down instead of applying preventive maintenance. If the

system enters warning state and no repair or replacement is applied, the defects will

propagate to the rest of the system until the whole system breaks down. Since it is very

67

difficult to measure the exact loss rate at each time of the system, an average loss rate is used

to approximate the true loss rate due to the gear running in warning state. If the sojourn time

of warning state is 01 2 >v/ , the loss rate during operation in warning state is approximated

by ( )2 3 2 CCv −⋅

4.3 The Bayesian model In chapter 3, we assumed the time spent in each state follows an exponential distribution with

mean 2 ,1 ,/1 =iv i with density function as

( ) elsewhere., 0

, 0, 1;

⎩⎨⎧ >

==⋅ τ

ττi

i

-v i

is Tev

v μf (4.3)

The condition monitoring data were collected at time it with )1( , ,2 ,1 −∈ Hi K , and the

control chart decided what will be the next data collection epoch jit + . The time interval to the

next data collection is denoted as ijih ttj −=Δ⋅ + . If the gear deteriorating process is in good

state at it , the probability that the gear deteriorating process stays in good state at jit + is

( ) ( ) ( )

0

1

1

1

1 1 1 1 1 1 | h

h j -vj -vh sh si sji s edevjTjTtTtT Δ⋅⋅Δ⋅ ⋅

+ =−=Δ⋅≤−=Δ⋅>=>> ∫ ττPrPrPr (4.4)

and the probability that the gear deteriorating process transits into warning state in the time

interval hj Δ⋅ is

( ) 1 1 1

1 1 hj -vi sji s etT|tT Δ⋅⋅

+ −=>>−Pr (4.5)

68

4.3.1 Posterior Probability of Gear Deteriorating Process in Warning State

In this section, iP denotes the posterior probability that the gear deteriorating process is in

warning state, and iP is updated based on the condition monitoring data collected at time it .

jiP + ~ is the prior probability that the process is in warning state at time jit + given all the

observations up to it . Makis [51] has proven that jiP + ~ could be obtained by

hj viji ePP 1- )1(1~ Δ⋅⋅

+ ⋅−−= (4.6)

The prior probability that the process is in good state at time jit + given all the observations up to

it can be represented as

hj viji ePP 1- )1(~1 Δ⋅⋅

+ ⋅−=− (4.7)

At time jit + the residual observation )( jit +y is calculated from the new collected condition

monitoring data. By using Bayes’ theorem, the posterior probability jiP + that the gear

deteriorating process is in warning state at time jit + can be represented as

( )( )( )( ) ( )( ) ji jijiXYji jijiXY

ji jijiXYji

PstXtfPstXtf

PstXtfP

++++++

++++

⋅=+−⋅=

⋅==

2 | 1 |

2 | ~ |)()~1( |)(

~ |)(

yy

y (4.8)

Substituting Equation (4.6) into above equation, the posterior probability at time jit + becomes

( )( )

[ ]hh

h

j vi

j vi

jijiXY

jijiXY

j vi

ji

ePePstXtf

stXtf

ePP 1 1

1

-

-

2 |

1 |

-

)1(1)1()( |)(

)( |)(

)1(1 Δ⋅⋅Δ⋅⋅

++

++

Δ⋅⋅

+

⋅−−+⋅−⋅=

=

⋅−−=

y

y

(4.9)

69

where ( )1 | )( |)( jijiXY stXtf =++y and ( )2 | )( |)( jijiXY stXtf =++y are defined as multivariate

normal distributions in chapter 3, that is

( )( )

( )( ) ( )

( )( ) ( )⎥⎦

⎤⎢⎣⎡ −−−

⎥⎦⎤

⎢⎣⎡ −−−

==

=

++

++

++

++

2 1-

2 T

2 2/1 2

2

1 1-

1 T

1 2/1 1

2

2 |

1 |

)()(21 exp

21

)()(21 exp

21

)( |)(

)( |)(

μyΣμyΣ

μyΣμyΣ

y

y

jiji/q

jiji/q

jijiXY

jijiXY

tt

tt

stXtf

stXtf

π

π(4.10)

In the section 4.2, the assumption ΣΣΣ == 1 2 was made for simplifying the Bayesian process

control model, so the above equation can be written as

( )( )

( )( ) ( )

( )( ) ( )

( ) ( ) ( ) ( )[ ]

( )⎥⎦⎤

⎢⎣⎡ −−=

⎭⎬⎫

⎩⎨⎧ −−−−−−=

⎥⎦⎤

⎢⎣⎡ −−−

⎥⎦⎤

⎢⎣⎡ −−−

=

=

=

+

++++

++

++

++

++

2

2 1-

T

2 1 1-

T

1

2 1-

T

2 2/1 2

1 1-

T

1 2/1 2

2 |

1 |

)(21exp

)()()()( 21 exp

)()(21 exp

21

)()(21 exp

21

)( |)(

)( |)(

dtz

tttt

tt

tt

stXtf

stXtf

ji

jijijiji

jiji/q

jiji/q

jijiXY

jijiXY

μyΣμyμyΣμy

μyΣμyΣ

μyΣμyΣ

y

y

π

π

(4.11)

where d is the M-distance defined as

( ) ( )[ ] 21

1 2 1- T

1 2 μμΣμμ −−=d , (4.12)

and

( ) ( )1 2 1- T

1 )(2 μμΣμy −−= ++ jiji tz . (4.13)

70

Hence, substituting Equation (4.11) into Equation (4.9), the posterior probability jiP + that the

process is in warning state at time jit + becomes

( ) [ ]hh

h

j vi

j viji

j vi

ji

ePePdtz

ePP 1 1

1

-

-

2

-

)1(1)1()(21exp

)1(1 Δ⋅⋅Δ⋅⋅

+

Δ⋅⋅

+

⋅−−+⋅−⋅⎥⎦⎤

⎢⎣⎡ −−

⋅−−=

(4.14)

From the above equation, the posterior probability jiP + is a function of iP and )( jit +y . iP is the

posterior probability that the process is in warning state at it . )( jit +y is the residual observation

extracted from the condition monitoring data collected at jit + . Since the process is in good state

at 00 =t , the posterior probability of the process in warning at 0 t is

00 =P (4.15)

Each time after the new data was collected, the posterior probability is updated recursively using

Equation (4.14).

4.3.2 Distribution of jiz +

In Equation (4.13), )( jit +y is a column vector and follows multivariate normal distribution, and

jiz + is a scalar and follows univariate distribution. Using Equation (4.13), the multivariate

problem is reduced to a univariate problem. In the next section, we will show that jiz + is

univariate normal distributed with [ ]2)(2 ,0 dN if the gear deteriorating process is in good state,

and univariate normal distributed with [ ]22 )(2 ,2 ddN if the process is in warning state.

71

If the gear deteriorating process is in good state, ( )jit + y has a multivariate normal distribution

with mean ( )[ ]jit += 1 E yμ and covariance matrix ( )[ ]jit += yΣ . jiz + is the linear combination

of ( )jit + y , so jiz + is normally distributed based on the properties of a multivariate normal

distribution. The mean of jiz + is

( ) ( )[ ][ ] ( )

( ) ( )02

)]([2

)(2E)(E

1 2 1- T

1 1

1 2 1- T

1

1 2 1- T

1

=−−=

−−=

−−=

+

++

μμΣμμ

μμΣμy

μμΣμy

ji

jiji

tE

tz

(4.16)

and the variance of jiz + is (see e.g. Johnson & Wichern [43])

( ) ( )[ ]( )[ ] [ ] ( )

( ) ( )

( ) ( )2

1 2 1- T

1 2

1 2 1- 1- T

1 2

1 2 1- T

1 T

1 2 1-

1 2 1- T

1

)2(

4

4

)(cov4

)(4

)()(

d

t

tCov

zCovzVar

ji

ji

jiji

=

−−⋅=

−⋅−⋅=

−⋅−−⋅=

−−=

=

+

+

++

μμΣμμ

μμΣΣΣμμ

μμΣμyμμΣ

μμΣμy

(4.17)

If the gear deteriorating process is in warning state, ( )jit + y has a multivariate normal

distribution with mean ( )[ ]jit += 2 E yμ and covariance matrix ( )[ ]jit += yΣ . The mean of jiz +

is

72

( ) ( )[ ]( ) ( )

2

1 2 1- T

1 2

1 2 1- T

1

2

2

)(2E)(E

d

tz jiji

=

−−=

−−= ++

μμΣμμ

μμΣμy

(4.18)

and the variance of jiz + changes as

2 )2()( dzVar ji =+ (4.19)

Based on the above analysis, the standard deviation of dz 2 =σ does not change if the gear

deterioration process transits into warning state from good state. However, the mean of z shifts

from 0 to 2 2d . From Equation (4.13), we can draw a conclusion that the univariate control

chart designed to detect the mean shift for the univariate process jiz + is equivalent to the

multivariate control chart designed to detect the mean shift for the residual observation ( )jit + y .

4.3.3 Control Limit of Gear Deteriorating Process in Good State

First, we rearrange the Equation (4.14) as

( )( ) ( )( )

( )( ) ( )( )

( )( ) ( )( ) z

ij v

ji

ij v

jiz

ij v

ji

ij v

ji

ij v

ji

ij v

jiji

PeP

PePd

d

PeP

PePd

dd

PeP

PePdz

h

h

h

h

h

h

-

-

-

-

-

-

2

11 1

1ln1

2

11 1

1ln1

220

11 1

1ln2

1

1

1

1

1

1

σμ⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−++=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−+⋅+=

⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−+=

Δ⋅⋅+

Δ⋅⋅+

Δ⋅⋅+

Δ⋅⋅+

Δ⋅⋅+

Δ⋅⋅+

+

(4.20)

where 0 =zμ is the mean of jiz + and dz 2 =σ is the standard variation of jiz + with the

process in good state. The above equation can be written as

73

zjizji kz σμ ⋅+= ++ (4.21)

where

( )( ) ( )( ) ⎥

⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−+=

Δ⋅⋅+

Δ⋅⋅+

+

ij v

ji

ij v

jiji

PeP

PePd

dkh

h

-

-

11 1

1ln1

2 1

1

(4.22)

Assuming we calculated an optimal * * ji

k+

and the time interval for next data collection hj * Δ by

minimizing the total cost function, we will apply the * * ji

k+

to detect the mean shift of * jiz

+. If

zjizjikz

* ** σμ ⋅+≤

++, the mean of * ji

z+

is consider as unchanged. If the

zjizjikz

* * * σμ ⋅+>++

, the mean of * jiz

+ is shifted.

Given hj * Δ , and *

*jik

+, the posterior probability control limit of the process in good state ∗

+

* jiP

can be calculated from the following equation

( )

( ) ( )( ) ⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−⋅−

−⋅+=

Δ⋅⋅∗+

Δ⋅⋅∗+

+

ij v

ji

ij v

jiji

PeP

PeP

ddk

h

h

-

-

*

11 1

1ln1

2 *

1*

*

1*

* (4.23)

Rearranging the above equation gives us the posterior probability control limit

( )( ) ( ) ( )

( )( ) 2 exp

11 111

1

2 exp 111

11

*

2

-

*

2

-

-

-

*

* 1

*

* 1

* 1

*

1

*

dkdPe

dkdPePe

PeP

jii

j v

jiij v

ij v

ij v

ji

h

hh

h

+Δ⋅⋅

+Δ⋅⋅Δ⋅⋅

Δ⋅⋅∗+

−⋅⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−−−

=

−⋅−+−−

−−=

(4.24)

74

If the mean of * jiz

+ does not shift, we will have the following equation

*

* ****

jijizjizzjiz kkkk++++

≤⇒⋅+≤⋅+ σμσμ (4.25)

which is equivalent to

∗++

≤ * * jiji

PP (4.26)

At time it , hj * Δ (the optimal time interval for next data collection) and ∗

+

* jiP (the optimal

posterior probability control limit at next data collection epoch) are calculated by dynamic

programming. At time * jit

+ after the new data are collected, the posterior probability * ji

P+

are

calculated using Equation (4.14). The following policy is applied to decide if inspection will be

performed or not:

a) No inspection is needed, if ∗++

≤ * * jiji

PP ;

b) Stop and inspect the system immediately, if ∗++

> * * jiji

PP

If the posterior probability * jiP

+ is larger than the optimal control limit ∗

+

*jiP , which means the

mean of the residual observation may shift, so the inspection will be applied to check if the gear

defects are formed or not. If ∗++

≤ ** jiji

PP , the gear should be still in good state, so no

inspection will be applied. The time interval for the next data collection and the control limit

will be decided using all the available information until the time * jit

+.

75

4.4 Cost Function for Gear Deterioration Process The cost function considered in this thesis has a similar structure as the one considered in

Tagaras [25]. When jit + is less than the length of production run Ht , the expected cost incurred

between it and jit +

( )[ ]

( )

( )⎭⎬⎫

−⋅⎥⎦

⎤⎢⎣

⎡−Δ+−

−Δ−+

+−++⎩⎨⎧⋅−+

−⋅Δ++⋅+=

Δ−

Δ−Δ−

+Δ−

+Δ−

++

)1(

)1(1)1(

))(( )1()( 1

)()(),,(

2 3 2 1

1

2 1 *

1

1 *

0

0

2 3 2 2 1 *

1 0 *

1

1 1

1 1

CCve

ejje

CCkeCkeCP

CCvjCCkCPPjPV

h

hh

hh

j

jh

hj

jij

jij

i

hjiiijii

ν

νν

νν

νν

αα

α

(4.27)

where the probability of false alarm (type I error)

∫+−

∞−

−++ =−=

)( 2/*

* 0

* 2

21)()( jikd u

jiji duekdkπ

α Φ (4.28)

and )( * 1 jik +α is the probability of detecting a mean shift (1- Prtype II error) when there are

some minor defects initiated:

∫+

∞−

−++ =−=

* 2 - 2/*

*

1 21)()( jik u

jiji duekkπ

α Φ (4.29)

with

( )

( ) ( )( ) ⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−⋅−

−⋅+=

Δ⋅⋅∗+

Δ⋅⋅∗+

+

ij v

ji

ij v

jiji

PeP

PeP

ddk

h

h

-

-

*

11 1

1ln1

2 *

1*

*

1*

* (4.30)

76

When Hji tt =+ , the expected cost becomes

( )

( ) ( ) ⎥⎦

⎤⎢⎣

⎡−Δ+−

−Δ−⋅−⋅−+

Δ−=

Δ−

Δ−Δ−

)1()1(1)1(1

),(

1

1 1

1

1

2 3 2

2 3 2

h

hh

j

jh

hj

i

hiii

eejjeCCvP

jCCvPPjV

ν

νν

νν (4.31)

In the next section, we will show the derivation of the above cost functions.

4.4.1 Cost Function before the End of the Production Run

At time it , the condition monitoring data are collected and analyzed. The posterior probability

of the process in warning state iP is calculated using the new residual observation )( ity

obtained at it , and control limit ∗ iP is calculated at the previous data collection epoch. When

* )( izzi kz ≤− σμ , we will decide the epoch for the next data collection Hji tt <+ , where

1 2 1 −−∈ iH,,,j K . If the gear's actual state was warning state at it and inspection is applied

at jit + , the cost includes three components:

1) 0 C is the cost for collecting and analyzing the condition monitoring data at jit + ;

2) )( )( 2 1 *

1 CCk ji +⋅+α is the cost for applying inspection and performing preventive

maintenance when the Bayesian control chart gives a true alarm (such that the gear system

was in warning state);

3) ( ) 2 3 2 CCvj h −⋅Δ is the cost due to the gear running in warning state, which is calculated

by multiplying the time interval between it and jit + by the loss rate when the gear

transmission system operates in warning state.

If the gear actual state was in good state at time it , the cost includes four components:

1) 0 C is the cost for collecting and analyzing the condition monitoring data at jit + ;

77

2) 1 *

0 )( 1 Cke ji

j h+

Δ− αν is the cost for false alarm. hje 1 Δ−ν is the probability that the gear

deteriorating process stays in good state in the time interval hj Δ . )( * 0 jik +α is the

probability that the control chart signals a false alarm at time jit + . The control chart made

the decision to inspect the system, but the gear system is still in a good state at the time of

inspection.

3) ))(( )1( 2 1 *

1 1 CCke ji

j h +− +Δ− αν is the cost for true alarm. )1( 1 hje Δ−− ν is the probability that

the gear deteriorating process transits into warning state in the time interval hj Δ . )( * 1 jik +α

is the probability that the control chart signals a true alarm at time jit + . The control chart

made the decision to inspect the system, and the gear system transits into a warning state

before the time of inspection.

4) ( )2 3 2 1

1

)1()1(1)1(

1

1 1 CCv

eejje

h

hh

j

jh

hj −⋅⎥

⎦

⎤⎢⎣

⎡−Δ+−

−Δ⋅− Δ−

Δ−Δ−

ν

νν

νν is the cost of the gear running in

warning state. )1( 1 hje Δ−− ν is the probability that the gear deteriorating process transits into

warning state in the time interval hj Δ . ⎥⎦

⎤⎢⎣

⎡−Δ+−

−Δ Δ−

Δ−

)1()1(1

1

1

1

1

h

h

j

jh

h eejj ν

ν

νν is the average time

the process is in the warning state within the time interval hj Δ . ( )2 3 2 CCv − is the loss rate

when the gear transmission system operates in warning state.

Given the process is in good state at time it , we derive the formula of the average time that the

gear deteriorating process is in the warning state within the time interval hj Δ as

( ) ( )1 2 1 2 )(,)(|E)(,)(|E 1 2

stXstXjstXstX ijishijis ==−Δ=== ++ ττ (4.32)

78

where ( )1 2 )(,)(|E1

stXstX ijis ==+τ is the average time that the process stays in good state

within the time interval hj Δ , given the process is in good state at time it , and can be obtained

by (Duncan [5])

( ) ( )( )

)1()1(1

1;

1;)(,)(|E

1

1

1

1 1

1

1 1

1

1 1 1

1 1 11

1

1

1

0

1

0

1

0 1

0 1 1 2

h

h

h s

h s

h

h

j

jh

j

s

j

ss

j

s ss T

j

s ss Tsijis

eej

de

de

dv μf

dv μfstXstX

Δ−

Δ−

Δ −

Δ −

Δ

Δ

+

−Δ+−

=

⋅=

=

====

∫∫∫∫

ν

ν

τν

τν

νν

τν

τντ

ττ

ττττ

(4.33)

4.4.2 Cost Function at the End of the Production Run

The cost function for jit + with Hji tt =+ is expressed by Equation (4.31). The first term on the

right side of the equation is the cost when the process is already in warning state at time it . The

second term is the expected cost when the process transits from good state to warning state

within the time interval between it and Ht . If the process starts in good state at time it , and no

transition occurs between it and Ht , the gear will be in good state at time Ht , so its cost is 0.

4.5 Dynamic Programming to Optimize the Control Limit and the Data Collecting Intervals

In section 4.4, a cost function is set up to describe the expected costs between it and jit + , and

the cost functions are also the function of * jiP + and hj Δ . In this section, a dynamic

programming will be applied to calculate the optimal hj *Δ and *

* jiP

+ by minimizing the total

expected cost from time it to the end of the production run Ht .

79

First, the minimum expected total cost from time it to the end of the production run Ht is

defined recursively by

[ ] ),(V min arg),,( *

,

* *

*

* jijiiPj

ijii GEjPPjPGji

++++=

+

(4.34)

where 1 , ,2 ,1 −= Hi K and 1 , ,2 ,1 −−= iHj K . If Hij =+ , then the expected cost is the cost

at the end of the production run, where

( )

( ) ( ) ⎥⎦

⎤⎢⎣

⎡−Δ+−

−Δ⋅−⋅−⋅−+

Δ−=

Δ−

Δ−Δ−

)1()1(1)1(1

),(

1

1 1

1

1

2 3 2

2 3 2

h

hh

j

jh

hj

i

hiii

eejjeCCvP

jCCvPPjG

ν

νν

νν

(4.35)

One possible approach to solve Equation (4.34) is by considering all the possible situations.

However, the amount of computation will become considerably large. The dynamic

programming is very useful approach to solve the equation like the Equation (4.34) with much

less effort than by exhaustive enumeration. Using a backward recursion algorithm, dynamic

programming starts with the problem in the last stage and finds the optimal solution for this

smaller problem. It then includes the previous state into the problem, finding the current optimal

solution from the preceding one until the original problem is solved entirely (e.g. Ross [47]).

The basic features that characterize dynamic programming problems are presented:

1) The optimal values of decision variables are to be determined by dynamic programming (i.e.

j and * jiP + );

2) The state variables describe the evolution of the gear deteriorating process (i.e. the posterior

probability of the process in warning state iP );

80

3) The recursive relationship describes what the state variables will be in the next stage, given

the state variables and decision variables at current stage (i.e. Equation (4.14) and Equation

(4.15));

4) The objective function is a mathematical function to be optimized(i.e. Equation (4.34) and

Equation (4.35)).

In order to find the optimal epoch * jit

+ for the next data collection and the control limit *

* jiP

+ at

stage 1 , ,2 ,1 −= Hi K , the dynamic programming model is formulated as follows:

),( Minimize * jPG jii + , (4.36)

subject to

iHjHi

−=−=

, ,2 ,11 , ,2 ,1

K

K

Since dynamic programming is only suitable to solve discrete problems, it is necessary to

discretize these continuous variables before we apply dynamic programming to find the optimal

inspection schedule for the gear deterioration process. The data collection interval is already a

discrete variable as hj Δ with iHj −= , ,2 ,1 K . The state variable jiP + is divided into n equal

intervals from 0 to 1 with the length n/1 : the discretized 0 =+ jiP when 01 =+iP , the discretized

nP ji 2/1 =+ when ( ]nP ji /1 ,0 ∈+ , the discretized nP ji 2/3 =+ when ( ]nnP ji /2 ,/1 ∈+ , ..., and

the discretized nnP ji 2)12( −=+ when ( ]1 ,)1( nnP ji −∈+ .

When * * *jiji PP

++ > , the inspection will be performed. If this is a false alarm, the posterior

probability is reset to 0. If it is a true alarm, repair or replacement will be applied. After the

81

preventive maintenance, the posterior probability will also be reset to 0. Hence, the posterior

probability * jiP + will be reset to 0 when the posterior probability * jiP

+ exceeds the control limit

* *ji

P+

, and the probability of 0 * =+ ji

P can be expressed by the following equation

( )[ ] ( ) )(1 )( 11 , ,0 * * 1

* * * 0

*

*

1 *

1 * * jii

jjii

jjiiji kPekPekjPP hh

+Δ−

+Δ−

++−+−−== αα ννPr (4.37)

where

( )

( ) ( )( ) ⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−⋅−

−⋅+=

Δ⋅⋅∗+

Δ⋅⋅∗+

+

ij v

ji

ij v

jiji

PeP

PeP

ddk

h

h

-

-

*

11 1

1ln1

2 *

1*

*

1*

* (4.38)

Except for 0 * =+ ji

P , the probability of the discrete *ji

P+

is equivalent to the continuous *ji

P+

in

the interval ( ]

** , jiji PP ++ . The probability of the posterior probability *ji

P+

in the interval

( ]

** , jiji PP ++ can be derived as

( )[ ] ( ) ( ) ( )( )[ ] ( ) ( ) ( )

⎭⎬⎫

⎩⎨⎧ −+−−−

⎭⎬⎫

⎩⎨⎧

−+−−−=

+≤−+≤=

+≤<+=

≤<

+Δ−

+Δ−

+Δ−

+Δ−

++++++

++++

++++

1 11 -

1 11

, , , ,

, ,

, ,

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*1

*

*1

* *

1 * *

1

* * * *

* * *

* *

**

jiij

jiij

jiij

jiij

jiizjizjijiizjizji

jiizjizjizjiz

jiijijiji

kPedkPe

kPedkPe

PjPσkμzPjPσkμz

PjPσkμzσkμ

PjPPPP

hh

hh

ΦΦ

ΦΦ

PrPr

Pr

Pr

νν

νν

(4.39)

where

82

( )( ) ( )( )

( )( ) ( )( )

( )( ) ( )( )⎪

⎪⎪

⎩

⎪⎪⎪

⎨

⎧

<<⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−⋅⋅+

≤<⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−⋅⋅+

=

<⎥⎥⎦

⎤

⎢⎢⎣

⎡

−−⋅−

−⋅+=

+++Δ−+

Δ−+

+++Δ−+

Δ−+

+

++Δ−+

Δ−+

+

* * *

*

* *

*

* *

* * * *

*

*

*

*

* *

* *

*

*

*

*

* * *

if , 11 1

1 ln1

2

if , 11 1

1ln12

if ,11 1

1ln1

2

1

1

1

1

1

1

jijiji

ij

ji

ij

ji

jijiji

ij

ji

ij

ji

ji

jiji

ij

ji

ij

jiji

PPPPeP

PePd

d

PPPPeP

PePd

d

k

PPPeP

PePd

dk

h

h

h

h

h

h

ν

ν

ν

ν

ν

ν

(4.40)

and

* *

* *

* *

* * * if ,0 , , jijijiijijiji PPPjPPPP ++++++ >=≤<Pr (4.41)

When the lower bound of posterior probability * jiP + is larger than the control limit * *

jiP + , the

process will be restored and * jiP + is reset to 0, so the probability in Equation(4.41) is equal to 0.

After we calculated the control limits and next data collection interval using dynamic

programming, the following calculations are required at each data collection epoch to monitoring

the state of the gear deterioration process:

Setp 1. To collect the condition monitoring data and extract residual observation )( ity using

the approach introduced in chapter 2 and chapter 3;

Setp 2. To compute iz using Equation (4.13) from the residual observation )( ity ;

Setp 3. To compute the posterior probability iP using Equation (4.14) from the residual

observation iz ;

83

Setp 4. To compare the posterior probability iP with the control limit * iP that is obtained from

previous data collection epoch. If * ii PP ≤ , the optimal time interval * ji

t+

and control

limit * * ji

P+

for the next data collection are decided using the dynamic programming. On

the other hand, if * ii PP > , the gear system is stopped and inspected immediately. When

gear defects are found, repair or replacement is applied. If no defect is found during

inspection, the control chart gives a false alarm. Then, the posterior probability will be

reset to 0. The optimal time interval * jit

+and control limit *

* jiP

+ for the next data

collection are calculated using the dynamic programming;

Setp 5. Continue to Step 1 until the Hjitt * =+

;

Setp 6. Take a full inspection for the gear transmission system at Ht , repair or replacement will

be performed if necessary. Then, the system will be reset for a new production run.

4.6 Numerical Results In this section, two examples are provided to demonstrate the numerical results by applying the

Bayesian process control model to optimize the gear system inspection policy. The POHM

model parameters, which were obtained from chapter 3, are used in both examples, i.e.

0728.01 =v , 191.02 =v , ⎥⎦

⎤⎢⎣

⎡=

8119.12322.1

1 μ , ⎥⎦

⎤⎢⎣

⎡=

6171.372735.26

2 μ , and ⎥⎦

⎤⎢⎣

⎡==

6932.216749.166749.164833.13

1 ΣΣ .

The data collection interval is considered as multiples of certain time unit 5.0 =Δ h and the

length of production run H is 38. The state variable iP is divided into 100 equal intervals of

length 0.01: the discretized 0 =iP when 0 =iP , the discretized 005.0 =iP when

( ]01.0 ,0 ∈+ jiP , the discretized 15.0 =+ jiP when ( ]02.0 ,01.0 ∈+ jiP , ..., and the discretized

84

995.0 =+ jiP when ( ]1 ,99.0 ∈+ jiP . The range of the control limit * jiP + is from 0.01 to 0.99,

which is divided into 49 equal subintervals, so the decision variable is chosen from the

discretized value 99.0...,,03.0,01.0* ∈+ jiP .

The data collection cost 0 C is chosen to be 1; the cost of false alarm 1 C is 50; the cost of

preventive maintenance 2 C is 75, and the breakdown cost 3 C is 150. The following figure is

the optimal costs calculated using the above parameters.

Figure 4.1 Total expected cost at each production stage with breakdown cost 150

From the above figure, we observe that the optimal cost is non-decreasing in the posterior

probability iP and the production running time it . The total expected cost at 0 t is 100.53

At the first stage 00 =t , the posterior probability 0 P is 0 because we assume the gear system

starts in good state. At time 5.631 =t , the following figures show the simulation results of the

optimal time interval for next data collection hj * Δ and the optimal control limit *

*31 jP + .

85

Figure 4.2 Optimal data collection intervals and control limits with breakdown cost 150

When the current stage posterior probability becomes large, the gear system may have already

been in warning state or will transit into warning state soon. In order to avoid the breakdown

cost, the next data collection time interval will decrease. The Bayesian process control model

makes the same decision as shown in Figure 4.2 (a).

Figure 4.2 (b) shows that the first point 0.11 is the probability that the gear system will be reset

at the next data collection epoch ( 5.8413 =+t ). Using Figure 4.2 (a) and (b), the time interval for

the next data collection and the control limit for the next data collection epoch can be decided.

For example, the posterior probability is calculated as 05.031 =P , which corresponds to the sixth

point. Then the next data will be collected at 8313 =+t , and the posterior probability control limit

will be 13.0 313 =∗

+P .

86

In Figure 4.2 (b), the dot-dash line is a reference line where control limit at next data collection

epoch is equal to current posterior probability. When the posterior probability is less than 0.17,

the control limit fluctuates around 0.14. When the posterior probability exceeds the point 0.17,

the control limit follows an increasing trend with respect to posterior probability, but the

increasing of control limit is slower than the increasing of posterior probability, which means the

difference between a posterior probability and its corresponding control limit for the next data

collection epoch will be larger as the posterior probability increase. For example, if the current

posterior probability is 0.3, the control limit at next data collection epoch ( 714 =t ) will be 0.25.

The difference between the current posterior probability and the control limit at next data

collection epoch is 0.05. If the current posterior probability is 0.6, the control limit at the next

data collection epoch ( 714 =t ) will be 0.51. The difference between the current posterior

probability and the control limit at next data collection epoch is 0.09.

If we increase the breakdown cost 3 C to 400 and keep all other parameters unchanged, the

following figure shows the optimal costs.

Figure 4.3 Total expected cost at each production stage with breakdown cost 400

87

The trend of the total expected cost does not change. The total expected cost at 0 t is 156.69,

which is higher than total expected cost when the breakdown cost is 150. The following figures

show the simulation results for the optimal control limit and the optimal time interval for next

data collection with respect to the posterior probability at current stage.

Figure 4.4 Optimal data collection intervals and control limits with breakdown cost 400

Figure 4.4 (a) shows that all the time intervals for next data collection are 0.5 hour except the

first point. They are less or equal to the time intervals when the breakdown cost is 150. It is

reasonable to increase the data collection frequency when the breakdown cost increases.

Figure 4.4 (b) shows that the control limit follows an increasing trend with respect to posterior

probability, but the increasing of control limit is slower than the increasing of posterior

probability, which is similar as the results from Figure 4.2 (b). However, the difference between

the control limit and current posterior probability in Figure 4.4 (b) is larger than the difference in

88

Figure 4.2 (b). For example, if the current posterior probability is 0.3, the control limit at next

data collection epoch ( 714 =t ) will be 0.19. The difference is 0.11, which is greater than the

difference (0.09) when the breakdown cost is 150. With the same posterior probability, the

control limit will be lower for a higher breakdown cost ( 4003 =C ) gear system than for the

lower breakdown cost ( 1503 =C ) gear system.

4.7 Conclusions In order to monitor and control the gear deteriorating process, the Bayesian process control

model was derived to optimize the control limit and data collection epoch by minimizing the

total costs of a gear transmission system. We assume the gear production run is finite. After the

condition monitoring data are collected at each data collection epoch, the residual observation is

extracted from the vibration data collected by the accelerometers A02 and A03. Since the

vibration data were collected by two accelerometers, a multivariate Bayesian process control

model was developed in this research.

Firstly, we developed a recursive formula to calculate the posterior probability of gear in

warning state using all the available information. Then, we derived the cost function with

respect to the time interval for the next data collection and the process control limit at next data

collection epoch. Using dynamic programming, the optimal time interval for the next data

collection and the control limit were calculated by minimizing the total expected cost from the

current stage to the end of the production run. In the cost function, we considered the data

collection cost, inspection cost, maintenance cost, and the system breakdown cost. Finally, the

optimal Bayesian process control parameters were calculated using the POHM model parameters.

89

Chapter 5 Conclusions and Future Research

In this thesis, we developed an early fault detection scheme and an optimized CBM strategy

based on the condition monitoring data collected from the gear transmission system under

varying load. An ARX model with F-test was proposed for detecting the early gear defects. By

minimizing the total expected cost until the end of the production run, two decision variables

were optimized for the CBM strategy: the time interval for next data collection, and the control

limits at next data collection epoch. The purpose of this chapter is to conclude the results

obtained from Chapters 2, 3, and 4, as well as to propose future research related to this study.

5.1 Conclusions In Chapter 2, a gearbox fault detection scheme was proposed using vibration data collected under

varying load. A modified TSA algorithm was introduced to compute a TSA signal for ARX

model building. The ARX model was fitted using the filtered TSA envelope as an external input

and the TSA signal as the output. The parameters of the ARX model were estimated using the

least squares algorithm, and the order of the ARX model was determined using BIC criterion.

The F -test was applied to the variances of the ARX model residuals for each data file, and the

corresponding p-value was proposed as a quantitative indicator of fault advancement over a full

lifetime of a gearbox. Based on the results obtained, we concluded that the method presented in

Chapter 2 can detect a gear fault occurrence earlier than the other methods mentioned in the

chapter. Also, our method is faster which makes it more suitable than the other methods for on-

line gearbox fault detection and damage localization.

In Chapter 3, a POHM model was built to describe the relationship among the residual

observations and the three states of a gear deterioration process. Once the POHM model was

90

well defined, two sets of parameters were needed to be estimated: 1) the relation parameters

) , , , ( 2 2 1 1 ΣμΣμΨ = that described the stochastic relationship between the residual

observations and the unobservable operational states of gearbox transmission system, and 2) the

transition rate parameters ) , ,( 3 2 3 1 2 1 λλλ=Λ that described the evolution of the gear

deterioration process. It was very difficult to find a feasible solution by directly maximizing the

likelihood function of the POHM model. The EM iterative algorithm was introduced to make

the parameter estimation possible by maximizing a pseudo likelihood function. Finally, a

POHM model was built using full lifetime condition monitoring data obtained from the MDTB

operating from good state to breakdown state. The condition monitoring data files with equal

data collection interval were selected, and residual observations were extracted from motor

current data and vibration data. The EM iterative algorithm provided a feasible solution for the

pseudo-likelihood function of the POHM model. The numerical iteration procedure generated a

reasonable parameter estimates compared to our results from subjective judgment. It was

confirmed that the POHM model can be applied to estimate gear hidden states based on the

condition monitoring data.

In Chapter 4, a multivariate Bayesian process control model was derived to optimize the control

limit and data collection epoch by minimizing the total costs of a gear transmission system. We

assumed that the gear production run is finite. Firstly, we developed a recursive formula to

calculate the posterior probability of gear in warning state using all the available information.

Then, we derived the cost function with respect to the time interval for the next data collection

and the process control limit at next data collection epoch. Using dynamic programming, the

optimal time interval for the next data collection and the control limit were calculated by

minimizing the total expected cost from the current stage to the end of the production run. In the

91

cost function, we considered the data collection cost factor, inspection cost factor, maintenance

cost factor, and the system breakdown cost factor. Finally, the optimal Bayesian process control

parameters were calculated using the POHM model parameters.

5.2 Future Research 5.2.1 Application of the Bayesian Process Control Model to Planetary Gear System

In this thesis, the typical parallel gear transmission system was studied. However, there is

another gear transmission system that is widely applied in industry, i.e. planetary gear

transmission system. The schematic representation of a planetary gear system is shown in Figure

5.1, which consists of three planets, a sun gear, an annulus gear and a carrier. In general, the sun

gear is connected to the drive shaft, and the sun gear drives three planetary gears, which are

contained within an internal toothed ring gear. The planetary gears are mounted on the planetary

carrier. The planetary carrier is connected with output shaft. When the sun gear rotates, it drives

the three planetary gears inside the ring gearboxes.

Figure 5.1 Schematic Representation of Planetary Gear Transmission System

The balanced planetary kinematics and the associated load sharing make the planetary gear

transmission system truly ideal for industrial applications. In our lab, we have already designed

and built a test rig for future study of a planetary gear train. The test rig includes a three stage

92

planetary gear train, drive motor, load motor, motor control system, accelerometers and data

acquisition systems as shown in the following picture.

Figure 5.2 Lab Planetary Gear Train Test Rig

The condition monitoring data were collected from when the gear system in good state. In future,

the optimized CBM strategy introduced in this thesis will be adjusted for monitoring the

planetary gear system. First, the approach for extracting ARX model residuals should be

modified based on the mechanism of the planetary gear system. Then the POHM model and

Bayesian process control model can be directly applied based on the new ARX model residuals.

Since fifteen spare planetary gearboxes are available for future testing, the POHM model

parameters can be estimated using 15 full lifetime condition monitoring histories.

5.2.2 Relaxation of Model Assumptions

During the building of the POHM model and Bayesian process control model, we made some

assumptions on the gear deterioration process. Most of the assumptions are made based on the

real industrial situation. However, there are four assumptions that can be relaxed:

1) The time spent in each state was assumed to follow an exponential distribution. Based on

this assumption, the continuous Markov model was used to describe the gear deterioration

93

process. If the assumption is relaxed, a semi-Markov model may be built to model the gear

deterioration process, and a partially observable hidden semi-Markov model will be used to

estimate the parameters for the Bayesian process control.

2) Three states were considered in the POHM model, and the cost for each state was involved in

the total expected cost function. If more states will be introduced into POHM model, the

cost function will be closer to real situation, so that the optimized CBM strategy will be

more economical than the three-state model. Building a POHM model to describe a four-

state or multi-state process will not be very difficult, but the estimation of the POHM model

parameters presents a challenge.

3) In this research, we assumed a finite production run. Dynamic programming was applied to

find the optimal time interval for minimizing the total expected cost over a finite production

run. In the future, the gear deterioration process will be considered as an infinite production

run, and the total expected cost will be replaced by an expected average cost per unit time.

4) The Bayesian process control model was used to monitor the mean shift of the gear

deterioration process, and the process covariance matrix was assumed to be unchanged. In

chapter 3, we have observed that that the covariance matrix shifted when the process

transited from good state to warning state. If we can consider the shift of the Bayesian

process mean and covariance matrix, the accuracy of monitoring the gear states transition

will be improved. This consideration will also lead to reduction in the cost due to false alarm,

and the cost of the gear running in warning state. In future research, the assumption of

unchanged covariance matrix can be relaxed in two steps. First the Bayesian process control

model can be built based on the assumption that 1 2 ΣΣ ⋅= n , where n is a scalar. Then two

general covariance matrices 1 Σ and 2 Σ can be considered.

94

Bibliography [1] A. Daidone, F.D., Giandomenico, A. Bondavalli, S. Chiaradonna, Hidden Markov Models

as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution,

25th IEEE Symposium on Reliable Distributed Systems (2006) 245-256.

[2] A. Dempster, N. Laird and D. Rubin, Maximum Likelihood from Incomplete Data via the

EM Algorithm, Journal of the Royal Statistical Society, Series B, 39(1) (1977) 1-38.

[3] A. Hambaba, H. E. Huff, U. Kaul, Time-Scale Local Approach for Vibration Monitoring,

2001 IEEE Aerospace Conference Proceedings, 7 (2001), 3201-3209.

[4] A. Hambaba, H.E. Huff, U. Kaul, , Detection and Diagnosis of Changes in the Time-Scale

Eigenstructure for Vibrating Systems, IEEE Aerospace Conference Proceedings, 7(2001),

3211-3220.

[5] A.J. Duncan, The Economic Design of X -Charts Used to Maintain Current Control of a

Process, Journal of the American Statistical Association, 51 (1956) 228-242.

[6] B. Liu, V. Makis, Gearbox failure diagnosis based on vector autoregressive modelling of

vibration data and dynamic principal component analysis, IMA Journal of Management

Mathematics, 19 (2008) 39-50.

[7] C. Bunks, D. McCrthy, T. Al-Ani, Condition-Based Maintenance of Machines using

Hidden Markov Models, Mechanical System and Signal Process,14 (2000) 597-612.

[8] C. Kar, A.R. Mohanty, Vibration and current transient monitoring for gearbox fault

detection using multiresolution Fourier transform, Journal of Sound and Vibration, 311 (1-2)

(2008) 109-132.

[9] C.K. Tan, D. Mba, Identification of the Acoustic Emission Source during a Comparative

Study on Diagnosis of a Spur Gearbox,” Tribology International, 38(2005), 469-480.

95

[10] C.M. Jarque, A.K. Bera, Efficient tests for normality, homoscedasticity and serial

independence of regression residuals, Economics Letters 6 (3) (1980) 255–259.

[11] C.M. Jarque, A.K. Bera, Efficient tests for normality, homoscedasticity and serial

independence of regression residuals: Monte Carlo evidence, Economics Letters 7 (4) (1981)

313–318.

[12] D. Lin, M. Wiseman, D. Banjevic and A.K.S. Jardine, An Approach to Signal Processing

and Condition-Based Maintenance for Gearboxes Subject to Tooth Failure, Mechanical

Systems and Signal Processing, 18(2004), 993-1007.

[13] D. Lin, V. Makis, Recursive Filters for a Partially Observable System Subject to Random

Failure,” Advances in Applied Probability, 35(1) (2003) 207-227.

[14] D. Lin, V. Makis, Filters and Parameter Estimation for a Partially Observable System

Subject to Random Failure with Continuous-range Observations,” Advances in Applied

Probability, 36(4) (2004) 1212-1230.

[15] D. Lin, V. Makis, On-Line Parameter Estimation for a Partially Observable System Subject

to Random Failure, Naval Research Logistics, 53(5) (2006) 477-483.

[16] D.P. Townsend, Dudley’s Gear Handbook, 2nd Edition, McGraw-Hill, Inc 1992.

[17] D. Wang, Q. Miao, R. Kang, Robust health evaluation of gearbox subject to tooth failure

with wavelet decomposition, Journal of Sound and Vibration, 324 (3-5) (2009) 109-132.

[18] E.L. Porteus, A. Angelus, Opportunities for improved statistical process control,

Management Science, 43 (1997) 1214-1228.

[19] E.P.G. Box, M.G. Jenkins, C.G. Reinsel, Time series analysis: forecasting and control 4th

Edition, John Wiley & Sons, Inc., 2008.

[20] F. LeGland, L. Mevel, Fault Detection in Hidden Markov Models: A Local Asymptotic

Approach, Proceedings of the 39th IEEE conference on Decision and Control, (2000).

96

[21] F.V. Nelwamondo, T. Marwala, U. Mahola, Early Classifications of Bearing Faults using

Hidden Markov Models, Gaussian Mixture Models, Mel-Frequency Cepstral Coefficients

and Fractals, International Journal of Innovative Computing, Information and Control, 2(6)

(2006) 1281-1299.

[22] G. Dalpiaz, A. Rivola, R. Rubini, Effectiveness and sensitivity of vibration processing

techniques for local fault detection in gears, Mechanical System and Signal Processing 3

(2000) 387–412.

[23] G. Nenes, G. Tagaras, The economically designed two-sided Bayesian X control chart,

European Journal of Operational Research, 183 (2007) 263-277.

[24] G. Schwarz, Estimating the dimension of a model, The Annals of Statistics, 6 (1978) 461-

464.

[25] G. Tagaras, A dynamic programming approach to the economic design of X-charts, IIE

Transactions, 26 (1994) 48-56.

[26] G. Tagaras, Dynamic control charts for finite production runs, European Journal of

Operational Research, 91 (1996) 38-55.

[27] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd

International Symposium on Information Theory, (Petrov, B. N. and Csaki, F. eds),

Akademiai Kiado, Budapest, (1973) 267-281.

[28] H.M. Taylor, Markovian sequential replacement processes, Annals of Mathematical

Statistics, 36 (1965) 1677-1694.

[29] H.M. Taylor, Statistical control of a Gaussian process, Technometrics, 9 (1967) 29-41.

[30] H. Ocak, K.A. Loparo, A new Bearing Fault Detection and Diagnosis Scheme Based on

Hidden Markov Modeling of Vibration Signals,” Acoustic, Speech and Signal Processing,

2001, Proceedings, (2001).

97

[31] J.R. Davis, Gear Materials, Properties, and Manufacture, ASM International, 2005.

[32] L. Ljung, System identification: theory for the user, 2nd Edition, Prentice Hall, Inc., 1999.

[33] L. Qu, J. Lin, A Difference Resonator for Detecting Weak Signals, Measurement, 26(1999),

69-77.

[34] M.A. Girshich, H. Rubin, A Bayes’ approach to a quality control model. Annals of

Mathematical Statistics, 23 (1952) 114-125.

[35] M.E Badaoui, F. Guillet, J. Danière, New Applications of the Real Cepstrum to Gear

Signals, Including Definition of a Robust Fault Indicator, Mechanical Systems and Signal

Processing, 18(2004), 1031-1046.

[36] M. Ge, R. Du, Y. Xu, Hidden Markov Model based fault diagnosis for stamping processes,”

Mechanical Systems and Signal Processing, 18 (2004) 391-408.

[37] M.J. Kim and V. Makis, Parameter Estimation in a CBM Model with Multivariate

Observations, Proceedings of the 2009 Industrial Engineering Research Conference,

(2009)1910-1915

[38] MDTB data (Data CDs: test-runs #14), Condition–based maintenance department, Applied

Research Laboratory, The Pennsylvania State University, 1998.

[39] P.D. McFadden, Detecting Fatigue Cracks in Gears by Amplitude and Phase Demodulation

of the Meshing Vibration, ASME Transactions Journal of Acoustics Stress and Reliability in

Design 108 (1986) 165–170.

[40] P.D. McFadden, Examination of a technique for the early detection of failure in gears by

signal processing of the time domain average of the meshing vibration, Mechanical systems

and signal processing 1 (1987) 173–183.

[41] P.L. Carter, A Bayesian approach to quality control. Management Science, 18 (1972) 647-

655.

98

[42] Q. Miao, V. Makis, Condition Monitoring and Classification of Rotating Machinery using

Waelets and Hidden Markov Models, , Mechanical System and Signal Process,21 (2007)

840-855.

[43] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, 6th Edition,

Pearson Prentice Hall, (2007).

[44] R.R. Lawrence, A Tutorial on Hidden Markov Models and Selected Application in Speech

Recognition, Proceedings of the IEEE, 77(2) (1989) 257-286.

[45] R.E. Walpole, H.R. Myers, L.S. Myers, K. Ye, Probability & statistics for engineers &

scientists, 8th Edition, Person Education, Inc. 2007.

[46] S.M. Kay, Modern spectral estimation: theory and application, Prentice Hall, 1988.

[47] S.M. Ross, Introduction to Stochastic Dyanmic Programming, Academic Press, Inc, (1982).

[48] S.M. Ross, Introduction to Probability Models, 9th Edition, Elsevier Inc., (2007).

[49] S. Rofe, Signal processing methods for gearbox fault detection, Defence Science and

Technology Organization Technical Report, DSTO-TR-0476, Australia, 1997.

[50] T. Toutountzakis, C.K. Tan, D. Mba, Application of Acoustic Emission to Seeded Gear

Fault Detection,” NDT & E International, 38(2005), 27-36.

[51] V. Makis, Multivariate Bayesian Control Chart, Operational Research, 56(2) (2008) 487-

496.

[52] V. Makis, Finite horizon multivariate Bayesian process control, European Journal of

Operational Research, 194 (2009) 795-806.

[53] W. Wang, A.K. Wong, Autoregressive model-based gear fault diagnosis, Journal of

Vibration and Acoustics, Transactions of the ASME, 124(2) (2002) 172-179.

[54] W. Wang, F. Ismail, F. Golnaraghi A Neuro-Fuzzy Approach to Gear System Monitoring,”

IEEE Transactions on Fuzzy Systems, 12(2004), 710-723.

99

[55] W.J. Wang, P.D. McFadden, Early detection of gar failure by vibration analysis-I.

Calculation of the time-frequency distribution, Mechanical Systems and Signal Processing,

7 (3) (1994) 289-307.

[56] W.J. Wang, P.D. McFadden, Application of wavelets to gearbox vibration signals for fault

detection, Journal of Sound and Vibration, 192 (5) (1996) 109-132.

[57] W.Q. Wang, F. Ismail, M.F. Golnaraghi, Assessment of Gear Damage Monitoring

Techniques using Vibration Measurements,” Mechanical Systems and Signal Processing,

15(2001), 905-922.

[58] X. Jiang, V. Makis, Optimal Replacement under Partial Observations, Mathematics of

Operations Research, 28(2), (2003) 382-394.

[59] X. Wang, V. Makis, M. Yang, A wavelet Approach to Fault Diagnosis of a Gearbox under

Varying Load Conditions, Journal of Sound and Vibration, 329(2010), 1570-1585.

[60] Y. Nikolaidis, G. Rigas, G. Tagaras, Using economically designed Shewhart and adaptive

X charts for monitoring the quality of titles, Quality and Reliability Engineering

International, 23 (2007) 233-245.

[61] Y. Zhan, V. Makis, Adaptive state detection of gearboxes under varying load conditions

based on parametric modelling, Mechanical Systems and Signal Processing, 20 (2006) 188-

221.

[62] Y. Zhan, V. Makis, A Robust Diagnostic Model for Gearboxes Subject to Vibration

Monitoring, Journal of Sound and Vibration, 290(2006), 928-955.

[63] Y. Zhan, V. Makis, A.K.S. Jardine, Adaptive State Detection of Gearboxes under Varying

Load Conditions Based on Parametric Modelling, Mechanical Systems and Signal

Processing, 20(2006), 188-221.

100

Appendix A: ARX model

ARX model is a time-series model that relates the exogenous input )(kx and output )(ky that

corrupted by noise )(kN . If we assume that the exogenous input process )(kx is generated

independently of the noise process )(kN with standard deviation Nσ , we can write the ARX

model as Box et al. [19]

)( )( )1( )( )( )2( )1( )(

s 1 0

2 1

ksbkωbkωbkωrkkkk r

Nxxxyyyy

+−−++−−+−+−++−+−=L

L δδδ

(A.1)

where 0≥b . Let B denote the backshift operator, which is defined by )1()( −= kk xxB ; hence

)()( bkkb −= xxB . The above difference equation can be written as

)( )( ) (

)( ) (

)( )( )( )(

)( )( )( )(

s 1 0

2 2 1

s 1 0

2 2 1

kbkωωω

k

kbkωbkωbkω

kkkk

bs

rr

s

rr

NxBBB

yBBB

NxBxBx

yByByBy

+−++++

+++=

+−++−+−+

+++=

L

L

L

L

δδδ

δδδ

(A.2)

If we define two operators by

( ) sss

1 0 BBBω ωωω +++= L (A.3)

( ) rrr

1 BBBδ δδ ++= L (A.4)

The ARX model can be rewritten as

( ) ( ) )( )( )( )( kkkk bsr NxBBωyBδy ++= (A.5)

The model order r , s and b are identified based on different criteria, and the unknown

parameters 1 δ , 2 δ , … r δ , 0 ω , 1 ω , ... r ω , 2 Nσ , which are estimated from the observation data.

101

Appendix B: F-test

The fundamentals of F-test are well presented e.g. in Walpole et al. [45], and the following will

provide only a brief summary useful for the analysis in this thesis. A continuous random

variable z has a F-distribution, with degrees of freedom 01 >ν and 02 >ν if its probability

density function is given by

( )[ ] ( )

( ) ( )( )

elsewhere.,0

,0, 1

)(

2/)(

2 1 2 21

1 21

2/)2( 2/2 1 2 1 2

1

,

2 1

1 1

2 1

⎪⎪⎪

⎩

⎪⎪⎪

⎨

⎧>

++

=+

−

zz

z

zfνν

νν

νννννν

ννννΓΓ

Γ

(B.1)

where ( ) •Γ is the Gamma Function defined by

( ) 0for

0

1 >=∫∞

α, dxexa

-x- αΓ (B.1)

An F-distribution is a continuous probability distribution with enormous applications in testing

whether the population variances of two samples are equal. Suppose there are two normal

populations with population variance 21 σ and 2

2 σ , and two samples X and Y are random

selected from these two populations respectively.

Sample X Sample Y

Samples size 1 n 2 n

Samples , , ,1 21 nXXX K=X , , ,

2 21 nYYY K=Y

Samples mean ∑=

=1

11

1 n

iiX

nX ∑

=

=2

12

1 n

iiY

nY

102

Samples variance ( )11

121

1

−

−=∑=

n

XXS

n

ii

( )

12

122

2

−

−=∑=

n

YYS

n

ii

Population variance 21 σ 2

2 σ

Then, the random variable

22

22

21

21

, //

2 1

SSF

σσ

νν = (B.2)

has a F-distribution with 11 1 −= nν and 12 2 −= nν degrees of freedom. To test the null

hypothesis 0 H that 22

21

σσ = against the alternative hypothesis 1 H that 22

21

σσ > , the

ratio 22

21 / SSf = is a value of the F-distribution. Therefore, the critical regions of significance

level α corresponding to the alternative hypothesis 1 H is )(2 1 , αννff > , that is

( ) ( ) )(1 ,1 22

21

2 1 α−−> nn

fSS (B.3)

If the above equation is satisfied, we can say null hypothesis 0 H is rejected at significance level

α and conclude that 21 σ is greater than 2

2 σ with )%1(100 α−⋅ confidence.

Early Fault Detection Scheme and Optimized CBM Strategy ...€¦ · Early Fault Detection Scheme...

Documents

Transcript of Early Fault Detection Scheme and Optimized CBM Strategy ...€¦ · Early Fault Detection Scheme...