A Process for Data Driven Prognostics

16
1 A PROCESS FOR DATA DRIVEN PROGNOSTICS Eric Bechhoefer NRG Systems 110 Riggs Road Hingesburg, VT 05461 Telephone: (802) 482-2255 [email protected] David He Professor, Dept of Mechanical and Industrial Engineering University of Illinois at Chicago 842 W. Taylor Street, Room 3027 ERF Chicago, IL 60607=7022 [email protected] Abstract: A prognostic is an estimate of the remaining useful life of a monitored part. While diagnostics alone can support condition based maintenance practices, prognostics facilitates changes in logistics which can greatly reduce cost or increase readiness and availability of the monitored system. A successful prognostic requires four processes. First, feature extraction of measured data to estimate damage. Second, a threshold for the feature which when exceeded, it is appropriate to perform maintenance. Third, given a future load profile, a model that can estimate the life of the component based on the current damage state. Finally, an estimate of the confidence in the prognostic is needed. This paper outlines a process for a data driven prognostics by: describing appropriate condition indicators for gear fault, threshold setting those CIs through fusion into a component health indicator, using a state space process to estimate the remaining useful life given the current component health, and a state estimate to quantify the confidence in the estimate of the remaining useful life. Finally, an gear fault run to failure test is used as an example. Key words: Confidence; Condition indicators; Health indicator; Paris’ Law, Remaining useful life; Introduction: Condition based maintenance (CBM) systems have been shown to product cost saving by reducing scheduled maintenance cost. However, CBM systems can be leverages into far great cost saving by developing a prognostics capability. The ability to estimate the remaining useful life (RUL) on a component can greatly improve the availability and reduce logistics cost. Prognostics are the maturation of the CBM system. Consider the effect on wind farm operations if a prognostics capability was available. Major maintenance event require a heavy lift crane. The availability of a crane can be limited and cost of rental is large. If a crane is needed to replace a gearbox, when there

description

A prognostic is an estimate of the remaining useful life of a monitored part. While diagnostics alone can support condition based maintenance practices, prognostics facilitates changes in logistics which can greatly reduce cost or increase readiness and availability of the monitored system. A successful prognostic requires four processes. First, feature extraction of measured data to estimate damage. Second, a threshold for the feature which when exceeded, it is appropriate to perform maintenance. Third, given a future load profile, a model that can estimate the life of the component based on the current damage state. Finally, an estimate of the confidence in the prognostic is needed. This paper outlines a process for a data driven prognostics by: describing appropriate condition indicators for gear fault, threshold setting those CIs through fusion into a component health indicator, using a state space process to estimate the remaining useful life given the current component health, and a state estimate to quantify the confidence in the estimate of the remaining useful life. Finally, an gear fault run to failure test is used as an example.

Transcript of A Process for Data Driven Prognostics

Page 1: A Process for Data Driven Prognostics

1

A PROCESS FOR DATA DRIVEN PROGNOSTICS

Eric Bechhoefer

NRG Systems

110 Riggs Road

Hingesburg, VT 05461

Telephone: (802) 482-2255

[email protected]

David He

Professor, Dept of Mechanical and Industrial Engineering

University of Illinois at Chicago

842 W. Taylor Street, Room 3027 ERF

Chicago, IL 60607=7022

[email protected]

Abstract: A prognostic is an estimate of the remaining useful life of a monitored part.

While diagnostics alone can support condition based maintenance practices, prognostics

facilitates changes in logistics which can greatly reduce cost or increase readiness and

availability of the monitored system. A successful prognostic requires four processes.

First, feature extraction of measured data to estimate damage. Second, a threshold for the

feature which when exceeded, it is appropriate to perform maintenance. Third, given a

future load profile, a model that can estimate the life of the component based on the

current damage state. Finally, an estimate of the confidence in the prognostic is needed.

This paper outlines a process for a data driven prognostics by: describing appropriate

condition indicators for gear fault, threshold setting those CIs through fusion into a

component health indicator, using a state space process to estimate the remaining useful

life given the current component health, and a state estimate to quantify the confidence in

the estimate of the remaining useful life. Finally, an gear fault run to failure test is used as

an example.

Key words: Confidence; Condition indicators; Health indicator; Paris’ Law, Remaining

useful life;

Introduction: Condition based maintenance (CBM) systems have been shown to product

cost saving by reducing scheduled maintenance cost. However, CBM systems can be

leverages into far great cost saving by developing a prognostics capability. The ability to

estimate the remaining useful life (RUL) on a component can greatly improve the

availability and reduce logistics cost. Prognostics are the maturation of the CBM system.

Consider the effect on wind farm operations if a prognostics capability was available.

Major maintenance event require a heavy lift crane. The availability of a crane can be

limited and cost of rental is large. If a crane is needed to replace a gearbox, when there

Page 2: A Process for Data Driven Prognostics

2

are other wind turbines with small RULs, there is cost saving and improvements to

readiness by conducting maintenance on those other marginal turbines.

Alternatively, if the operator of a fleet of helicopter knows the RUL of their assets, the

oeprator can deploy those aircraft that have the highest RUL and be assured that the

aircraft will not need major maintenance while deployed.

The knowledge of a RUL allows the Logistician to reduce inventory spares. This affects

the man power need for maintainers and facilitates more efficient operations. That said,

there are currently few deployed prognostic health management (PHM) systems. While

CBM is a maturing technology, PHM is relatively immature and difficult to implement.

The ability to estimate the RUL requires four pieces of information:

An estimate of the current equipment health,

A limit or threshold where it is appropriate to do maintenance,

An estimate of the future equipment load, and

A model to estimate the time from the current state to the limit/threshold based on

projected load.

The current health of a system can be determined by CBM system. Such systems measure

some feature representing damage. For example, for a pump or generator, shaft order one

acceleration is a measure of health. The limit for this vibration for some equipment can

be found in such standards as [1]. Most monitored components, such as gears and

bearing, are not covered by ISO standard; there are no formal or standardized limits.

Additionally, while the load for stationary equipment may be well known, for many

systems (helicopter or wind turbines, for example) the load is a variable.

Damage models to predict future equipment health fall into two categories: physics of

failure and data driven. Physics of failure models are in their nature appealing. There is a

cost associated with building the models, which then require validation and testing.

Further, the robustness of these models in application may not be satisfactory: how are

material/manufacturing variance, unknown usage and maintenance accounted for in a real

application? Data drive methods, while not capable of giving an absolute level of

damage, can give a relative limit which may give acceptable performance.

Presented here is an end to end process for data driven prognostic for a vibration sensor.

Descriptions of how condition indicators (CIs) are generated for a gear run to failure test

are given. The CIs are fused into a health indicator (HI) through a statistical process (to

control the probability of false alarm). Given the current HI, the time until the HI reaches

a predetermined value, using Paris’ Law, is used to calculate the remaining useful life

(RUL). Once the RUL is calculated, a bound can them be calculated and a confidence in

the RUL is given. Finally, this process will be demonstrated on a spiral bevel gear.

Condition Indicators - Feature Extraction to Improve Signal to Noise: Vibration

signatures for machinery faults tend to be small relative to other vibration signatures. For

example, in the typical gearbox, the energy associated with gear mesh and shaft

vibrations will be orders of magnitude larger than a fault feature. Spectral analysis or root

mean squares (RMS) of vibration are not powerful enough CIs to find an early fault, let

Page 3: A Process for Data Driven Prognostics

3

alone provide information useful for prognostics. Techniques to improve the signal to

noise are needed to remove tones associated with nominal components, while preserving

the fault signatures.

Gear analysis is based on operations of the time synchronous average [3]. Time

synchronous averaging (TSA) is a signal processing technique that extracts periodic

waveforms from noisy data. The TSA is well suited for gearbox analysis, where it allows

the vibration signature of the gear under analysis to be separated from other gears and

noise sources in the gearbox that are not synchronous with that gear. Additionally,

variations in shaft speed can be corrected, which would otherwise result in spreading of

spectral energy into an adjacent gear mesh bins. In order to do this, a signal is phased-

locked with the angular position of a shaft under analysis.

This phase information can be provided through a n per revolution tachometer signal

(such as a Hall sensor or optical encoder, where the time at which the tachometer signal

crosses from low to high is called the zero crossing) or though demodulation of gear

mesh signatures [3].

The model for vibration in a shaft in a gear box was given in [2] as: x(t) = i=1:K Xi(1+

ai(t))cos(2 i fm(t)+ i)+b(t)

where:

Xi is the amplitude of the kth mesh harmonic

fm(t) is the average mesh frequency

ak(t) is the amplitude modulation function of the kth mesh harmonic.

i(t) is the phase modulation function of the kth mesh harmonic.

i is the initial phase of harmonic k, and

b(t) is additive background noise.

The mesh frequency is a function of the shaft rotational speed: fm = Nf, where N is the

number of teeth on the gear and f is the shaft speed, with no reduction in the analysis

performance

This vibration model assumes that f is constant. In most systems, there is some wander in

the shaft speed due to changes in load or feedback delay in the control system. This

change in speed will result in smearing of amplitude energy in the frequency domain. The

smearing effect, and non synchronous noise, is reduced by resampling the time domain

signal into the angular domain: mx( ) = E[x( )] = mx( + ). The variable is the period

of the cycle to which the gearbox operation is periodic, and E[] is the expectation (e.g.

ensemble mean). This makes the assumption that mx( ) is stationary and ergodic. If this

assumption is true, than non-synchronous noise is reduce by 1/sqrt(rev), where rev is the

number of cycles measured for the TSA

Do not indent paragraphs.

TSA Techniques for Condition Indicators: The TSA is an example of angular

resampling [2], [4], where the number of data points in one shaft revolution (rn) are

interpolated into m number of data points, such that:

For all shaft revolutions n, m is larger than r,

Page 4: A Process for Data Driven Prognostics

4

And m = 2ceiling (log2 (r))

(typical for radix 2 Fast Fourier Transform).

Linear, bandwidth limited linear interpolation, and spline techniques have been used [5].

In this study, linear interpolation was used as it is considerable faster than spline or

bandwidth limited filtering, with no reduction in analysis performance of the TSA.

The TSA itself can be used for CIs. Typically, a CI is a statistics of a waveform (in the

case the TSA). Common statistics are RMS, Peak to Peak, Crest Factor, Kurtosis and

Skewness. For shaft, shaft order 1, 2 and 3 (first, second and third shaft rate harmonic)

can be used to determine shaft out of balance, bent shaft, and/or shaft coupling damage,

respectively. Error! Reference source not found. outlines the process of generating the

TSA, and shaft CIs.

Figure 1 Generation of the TSA and selected CIs

Gear Fault Indicators: There are at least six failure modes for gears [6]: surface

disturbances, scuffing, deformations, surface fatigue, fissures/cracks and tooth breakage.

Each type of failure mode, potentially, can generate a different fault signature.

Additionally, relative to the energy associated with gear mesh tone and other noise

sources, the fault signatures are typically small. A number of researchers have proposed

analysis techniques to identify these different faults [7],[8] [9]. Typically, these analyses

are based on the operation of the TSA. Examples of analysis are:

Residual, where shaft order 1, 2, and 3 frequencies, and the gear mesh harmonics,

of the TSA are removed. Faults such as a soft/broken tooth generate a 1 per rev

impacts in the TSA. In the frequency domain of the TSA, these impacts are

expressed as multiple harmonic of the 1 per rev. The shaft order 1, 2 and 3

frequencies and gear mesh harmonics in the frequency domain, and then the

inverse FFT is performed. This allows the impact signature to become prominent

in the time domain. CIs are statistics of this waveform (RMS, Peak 2 Peak, Crest

Factor, Kurtosis).

Energy operator, which is a type of residual of the autocorrelation function. For a

nominal gear, the predominant vibration is gear mesh. Surface disturbances,

Page 5: A Process for Data Driven Prognostics

5

scuffing, etc, generate small higher frequency values which are not removed by

autocorrelation. Formally, the EO is: TSA2:n-1 x TSA2:n-1 x – TSA1:n-2 x TSA3:n .

The bold indicates a vector of TSA values. The CIs of the EO are the standard

statistics of the EO vector

Narrowband Analysis operates the TSA by filtering out all tones except that of the

gear mesh and with a given bandwidth. It is calculated by zeroing bins in of the

Fourier transform of the TSA, except the gear mesh. The bandwidth is typically

10% of the number of teeth on the gear under analysis. For example, a 23 tooth

gear analysis would retain bins 21, 22, 23, 24, and 25, and there conjugates in

Fourier domain. Then the inverse FFT is taken, and statistics of waveform are

taken. Narrowband analysis can capture sideband modulation of the gear mesh

tone due to misalignment, or a cracked/broken tooth.

Amplitude Modulation (AM) analysis is the absolute value of the Hilbert

transform of the Narrowband signal. For a gear with minimum transmission error,

the AM analysis feature should be a constant value. Faults will greatly increase

the kurtosis of the signal

Frequency Modulation (FM) analysis is the derivative of the angle of the Hilbert

transform of the Narrowband signal. It’s is a powerful tool capable of detecting

changes of phase due to uneven tooth loading, characteristic of a number of fault

types.

For a more complete description of these analyses, see [7], or [8]. Error! Reference

source not found. is an example of the processing to generator the gear CIs for a spiral

bevel gear with surface pitting and scuffing. This gear fault will be used throughout the

paper.

Figure 2 Process for Generating Gear CIs

Page 6: A Process for Data Driven Prognostics

6

Threshold Setting and Component Health: In a physics of failure prognostics method,

modeling would estimate the CI generated for some level of damage. When the measured

CI exceeds the modeled threshold value, maintenance is performance. In a data driven

process, maintenance is performance when a statistically set threshold is exceeded. Thus,

the performance of a data driven method is completely determined by the quality of the

threshold setting process.

The concept of thresholding was explored in [10], where for a given, single CI, a

probability density function (PDF) for the Rician/Rice statistical distribution was used to

set a threshold based on an probability of false alarm (PFA). This is contracted with [11],

who explored the relationship between CI threshold and PFA to describe the receiver

operating characteristics (ROC) of the CI for a given fault. Additionally, Dempsey used

the ROC to evaluate the performance of the CI for a fault type. These methods support a

data driven approach for prognostics by formalizing a method for threshold setting.

Estimation of RUL given a threshold is complicated in that there numerous failure modes

for a gear. Further, no single CI has been identified that works with all fault modes. This

suggests one of two possible architectures for a prognostics system:

Estimate the RUL for each Gear CI used, where the reported RUL is the

minimum remaining useful life of each CI, or

Fuse n number of CI into a gear health indicator (HI) and calculate the RUL based

on the HI.

Computationally, the use of HIs is attractive. Health indicators (HI) provide decision

making tools for the end user on the status of system health. Health indicators consist of

the integration of several condition indicators into one value that provides the health

status of the component to the end user [11]. Highlighted in [12] are a number of

advantages of the HI over CIs, such as: controlling false alarm rate, improved detection,

and simplification of user display. Further, in [13] is described a threshold setting process

for gear health, where the HI is a function of the CI distributions. They give a generalized

process of for threshold setting, where the HI is a function of distribution of CIs,

regardless of the correlation between the CIs.

Gear Health as a Function of Distribution: Prior to detailing the mathematical methods

used to develop the HI, a nomenclature for component health is needed. To simplify

presentation and knowledge creation for a user, a uniform meaning across all components

in the monitored machine should be developed. The measured CI statistics (e.g. PDFs)

will be unique for each component type (due to different rates, materials, loads, etc). This

means that the critical values (thresholds) will be different for each monitored

component. By using the HI paradigm, one can normalized the CIs, such that the HI is

independent of the component. Further, using guidance from [14], the HI will be

designed such that there are two alert levels: warning and alarm. Further, a common

nomenclature for the HI can be developed, such that:

The HI ranges from 0 to 1, where the probability of exceeding an HI of 0.5 is the

PFA,

Page 7: A Process for Data Driven Prognostics

7

A warning alert is generated when the HI is greater than or equal to 0.75.

Maintenance should be planned by estimating the RUL until the HI is 1.0.

An alarm alert is generated when the HI is greater than or equal to 1.0. Continued

operations could cause collateral damage.

Note that this nomenclature does not define a probability of failure for the component, or

that the component fails when the HI is 1.0. Rather, it suggests a change in operator

behavior to a proactive maintenance policy: perform maintenance prior to the generations

of cascading faults. For example, by performing maintenance on a bearing prior the

bearing shedding extensive material, costly gearbox replacement can be avoided.

Controlling for the Correlation Between CIs: All CIs have a probability distribution

(PDF). Any operation on the CI to form a health index (HI) is then a function of

distributions [15]. Functions such as:

The maximum of n CI (the order statistics)

The sum of n CIs, or

The norm of n CIs (energy)

are valid if and only if the distribution (e.g. CIs) are independent and identical [15]. For

Gaussian distribution, subtracting the mean and dividing by the standard deviation will

give identical Z distributions. The issue of ensuring independence is much more difficult.

In general, the correlation between CIs is non-zero. As an example, many of the

correlation coefficients used in this study were near 1 (see

Table 1).

Table 1 Correlation Coefficients for the Six CIs Used in the Study

ij CI 1 CI 2 CI 3 CI 4 CI 5 CI 6

CI 1 1 0.84 0.79 0.66 -0.47 0.74

CI 2 1 0.46 0.27 -0.59 0.36

CI 3 1 0.96 -0.03 0.97

CI 4 1 0.11 0.98

CI 5 1 0.05

CI 6 1

This correlation between CIs implies that for a given function of distributions to have a

threshold that operationally meets the design PFA, the CIs must be whitened (e.g. de-

correlated). In [16], Fukunaga presents a whitening transform using the Eigenvector

matrix multiplied by the square root for the Eigenvalues (diagonal matrix) of the

covariance of the CIs: A = 1/2

T, where

T is the transpose of the eigenvalue matrix

and and is the eigenvalue matrix. The transform is not orthonormal: the Euclidean

distances are not preserved in the transform. While ideal for maximizing the distance

Page 8: A Process for Data Driven Prognostics

8

(separation) between classes (such as in a Baysian classifier), the distribution of the

original CI is not preserved. This property of the transform makes it inappropriate for

threshold setting.

If the CIs represented a metric such as shaft order acceleration, then one can construct an

HI which is the square of the normalized power (e.g. square root of the acceleration

squared). This can be defined as normalized energy, as per [17], who able to whiten the

CI and establish a threshold for a given PFA.

A more general whitening solution can be found using Cholesky decomposition (see

[13]). The Cholesky decomposition of Hermitian, positive definite matrix results in A =

LL*, where L is a lower triangular, and L

* is its conjugate transpose. By definition, the

inverse covariance is positive definite Hermitian. It then follows that if: LL* =

-1, then

Y = L x CIT. The vector CI is the correlated CIs used for the HI calculation, and Y is 1 to

n independent CI with unit variance (one CI representing the trivial case). The Cholesky

decomposition, in effect, creates the square root of the inverse covariance. This in turn is

analogous to dividing the CI by its standard deviation (the trivial case of one CI). In turn,

Y = L x CIT creates the necessary independent and identical distributions required to

calculate the critical values for a function of distributions.

As an example of the importance of correlation on, consider a simple HI function: HI =

CI1 + CI2. The CIs will be normally distributed with mean 0 and standard deviation of 1.

The standard deviation of this HI is: HI = sqrt(2

CI1 + 2

CI2 + 2 CI1,CI2 x HI x CI1 x

CI2), where CI1,CI2 is the correlation between CI1 and CI2. If one assumes CI1,CI2 is 0.0,

then HI = 1.414 (e.g. the sqrt(2)). For a PFA of 10-6

, the threshold is then 6.722.

Consider the case in which the observed correlation is closer to 1 (e.g. CI1,CI2 is 1.0), then

the observed HI = 2. For a threshold of 6.722, the operational PFA is 4 x 10-4

. This is

390 times greater than the designed PFA. This illustrates the effect of correlation on

threshold setting.

HI Based on Rayleigh PDFs: The CIs used for this example have Rayleigh like PDFs

(e.g. heavily tailed). Consequently, the HI function was designed using the Rayleigh

distribution. The PDF for the Rayleigh distribution uses a single parameter, , resulting

in the mean ( = *( /2)0.5

) and variance (2 = (2 - /2) *

2). The PDF of the Rayleigh

is: x/2exp(x/2

2). Note that when applying these equations to the whitening process, the

value for for each CI will then be: 2 = 1, and =

2 / (2 - /2)

0.5 = 1.5264. For a more

complete analysis, see [17].

A number of HI functions could be used, but experience has shown [13] that the greatest

signal to noise is achieve where the HI function is the norm of n CIs. This represents the

normalized energy of the CIs. If the CIs are IID, it can be shown that the function defines

a Nakagami PDF [17]. The statistics for the Nakagami are: = n, and = 1/(2- /2)*2*n.

For this study, data was collected from experiments performed in the Spiral Bevel Gear

Test facility at NASA Glenn. A description of the test rig and test procedure is given in

[13]. Six CIs where used, so that: = 6, and = 27.96. For a PFA of 10-6

, the threshold

10.882, with the HI function calculated as: HI = .05/10.882 x ( i=1:6 Y i 2)1/2

.

Page 9: A Process for Data Driven Prognostics

9

The six CIs used for the HI calculation were: Residual RMS, Energy Operator RMS,

FM0, NB KT, AM KT and FM RMS. These CIs were chosen because they exhibited

good sensitivity to the fault. Residual Kurtosis and Energy Ratio also were good

indicators, but were not chosen because;

It has been the researcher’s experience that these CIs become ineffective when

used in complex gear boxes, and

As the faults progresses, these CIs lose effectiveness. The residual kurtosis can in

fact decrease, while the energy ratio will approach 1.

Covariance and mean values for the six CI were calculated by sampling healthy data from

four gears prior to the fault propagating. This was done by randomly selecting 100 data

points from each gear, and calculating the covariance and means over the resulting 400

data points. The selected CI’s PDF were not Gaussian, but exhibited a high degree of

skewness. Because of this, the PDFs were “left shifted” by subtracting an offset such that

the PDFs exhibited Rayleigh like distributions. The estimated gear health is plotted in

Error! Reference source not found., where the damage on the gear at the end of the test

is seen in the upper left corner.

Figure 3 Gear Health, Torque, and Image of Gear Damage at HI 1.5

The key issue with a data driven prognostic is the appropriateness of the threshold. When

the HI is 1.0, is the damage such that it is appropriate to do maintenance? From the

example (Error! Reference source not found.), it is apparent that an HI of 1 displays

Page 10: A Process for Data Driven Prognostics

10

damage warranting maintenance. Because it is appropriate to performance maintenance

when the HI is 1.0 or greater, one can state that the RUL is the time from the current state

until the estimated HI is 1.0.

State Space Models for Prognostics: State-space representation of data provides a

versatile and robust way to model systems. Starting with the definition of the states, and

the basic principles underlying the characterization of phenomena under study, once can

propagate the states as a data driven stochastic process.

The choice of which type of state space model to use is driven by the nature of the system

dynamics and noise source. If the phenomenology of the system has linear dynamics with

Gaussian noise, a Kalman filter (KF) is used. If it is a non-linear process with Gaussian

noise, a sigma-point Bayesian process (e.g. unscented Kalman filter - UKF) or extended

Kalman filter (EKF) is appropriate. For non-linear dynamics with non-linear noise, we

use a sequential Monte Carlo method employing sequential estimation of the probability

distribution using “importance sampling” techniques. This method is generally referred to

as particle filtering (PF) [18]

A state space model estimates the state variable on the basis of measurement of the

output and input control variables [19]. In general, a system plant can be defined by: x =

Ax+ Bu, and y =Cx, where x is the state variable, x is the rate of change of the state

variable, and y is the output of the system.

An observer is a subsystem used to reconstruct the state space of the plant. The model of

the observer is the same as that of the plant, except that one adds an additional term

which includes the estimated error to account for inaccuracies in the A and B matrixes.

This means that any hidden state (such as RUL) can be reconstructed if we can model the

plant (e.g. failure propagation) successfully.

The observer is defined as: E[x] = E[Ax]+ Bu +K(y-E[Cx]), where E[x] is the estimate

state derivative, and E[Cx]is the expectation of the system output. The matrix K is called

the Kalman gain matrix (linear, Gaussian case). It is a weighting matrix that maps the

differences between the measured output y and the estimated output E[Cx]. A KF is used

to optimally set the Kalman Gain matrix.

A KF is a recursive algorithm that optimally filters the measured state based on a priori

information such as the measurement noise, the unknown behavior of the state, and

relationship between the input and output states (e.g. the plant), and the time between

measurements. Computationally, it is attractive because it can be designed with no matrix

inversion and it is a one step, iterative process. The filtering process is given as:

Prediction

Xt|t-1 = AXt-1|t-1 State

Pt|t-1 = A Pt-1|t-1AT + Q Covariance

Gain

K = Pt|t-1CT [CPt|t-1C

T + R]

-1

Update

Pt|t = (I – KC) Pt|t-1 State Covariance

X t|t = Xt|t-1 + K(Y-C Xt|t-1) State Update

where:

t|t-1 is the condition statement (e.g. t

given the information at t-1)

X is the state information (x, dx/dt,

dx2/dt

2)

A is the state transition matrix

Y is the measured data

K is the Kalman Gain

Page 11: A Process for Data Driven Prognostics

11

P is the state covariance matrix

Q is the process noise model

C is the measurement matrix

R is the measurement variance

For nonlinear systems with Gaussian noise (UKF or EKF), the state prediction is a

function of Xt|t-1 and the state transition matrix A, and C is the derivative of the state with

respect to the measurement.

For non-linear, non-Gaussian noise problems, particle filters (PF) are attractive. PF is

based on representing the filtering distribution as a set of particles. The particles are

generated using sequential importance re-sampling (a Monte Carlo technique), where a

proposed distribution is used to approximate a posterior distribution by appropriate

weighting. In this example, the state update is nonlinear and the measurement noise is

Gaussian. As such, an extended Kalman filter was used.

System Dynamics for Estimating the RUL: The state space model can be constructed

as a parallel system to the plant (e.g. the system under study). This requires an

appropriate model to simulate the system dynamics. In general, failure modes

propagating in mechanical systems are difficult to model at a level of fidelity that would

generate any meaningful results (e.g. Health and RUL based on physics of failure). One

needs a generalized, data driven process that can model the plant adequately enough to

generate RUL with small error.

Since 1953, a number of fault growth theories have been proposed, such as: net area

stress theories, accumulated strain hypothesis, dislocation theories, and others [20].

Through substitution of variables, most of these theories can be generalized by the Paris’

Law: da/dN = D( K)n. Paris’ Law governs the rate of crack growth in a homogenous

material, where:

da/dN is the rate of change of the half crack length,

D is a material constant of the crack growth equation,

K is the range of strain K during a fatigue cycle,

n is the exponent of the crack growth equation.

The range of strain, K is given as: K= 2 ( a)1/2

, where

is gross strain,

is a geometric correction factor, and

a is the half crack length.

These variables are specific to a given material and test article. In practice, the variables

are unknown. This requires some simplifying assumptions to be made to facilitate

analysis. For many materials, the crack growth exponent is 2, (see [20]). The geometric

correction factor , is set to 1 (a constant which will accounted for in the calculation of

D), which allows Paris’ law to be reduced to: da/dN = D(42

a).

Taking the inverse da/dN gives the rate of change in cycles per change in crack length,

or: dN/da = 1/[D(42

a)]. Integrating over crack length give the number of cycles (for

near synchronous systems, RUL is N x rpm): N = 1/[D(42

a)](ln(af) – ln(ao), where the

Page 12: A Process for Data Driven Prognostics

12

current measured crack is ao and the final crack length af. Since the crack length is

unknown, the current state, HI, will be used as a surrogate for ao while af will be 1.0 (the

RUL is the time from the current HI state until the HI is 1.0). N is the RUL times some

constant (RPM for example). The material crack constant, D, can be estimated as: D =

da/dN /(42

a). Gross strain cannot generally be measured, thus, an appropriate surrogate

value (e.g. torque, or yaw misalignment) will be used.

The use of Paris’s law for the calculation of RUL was given by [21] and [22], but lacked

a measure of confidence (e.g. how good was the prognostics). Confidence is an important

requirement for a PHM system [23].

A Prognostic and Confidence in the Prognostic: In practice, a prognostic or PHM

capability would be used to schedule maintenance or assist in assets management and

logistic support. The asset owner/operator will make decisions which effect the

operational availability and future revenues based on the PHM system. They will need an

intuitive, simple display that conveys information on: current health, RUL, and

confidence in the RUL prediction.

Model confidence is essential in any RUL (see [23]). For any RUL calculation, given 1

hour of nominal usage, the RUL should decrease by 1 (e.g. dN/dt is approximately -1:

one hour of life is consumed for each hour of operation). Further, a measure of model

drift or convergence is the second derivative d2N/dt

2: a value close to zero indicates

convergence. When these conditions are met, the model used for calculation of the RUL

is consistent, and is indicative of a good estimate of the RUL of the component.

One can use visual cues for of the prognostics based on model convergence. Visual cues,

such as color, can indicate the confidence in the RUL:

Low Confidence: Yellow, abs(dN/dt-1) > 3 and abs(d2N/dt

2) > 0.5

Medium Confidence: Blue abs(dN/dt-1) > 2 and abs(d2N/dt

2) > 0.5

High Confidence: Green, abs(dN/dt-1) < 2 and abs(d2N/dt

2) < 0.5

A key requirement of the prognostic model is the ability to predict what the health of the

component will be some time in the future. For a given state space mode, the RUL or any

predicted health is an expectation based on the current state and future usage (e.g.

damage or strain). The Paris’ law is driven by delta strain: changes in strain will affect

the RUL. Future health is then based on the mean strain and a bound on that strain to give

a range on the RUL (one benefit in a PF model is a direct distribution of the RUL). This

strain information could be based on forecast weather or usage for a wind turbine or type

of mission for a helicopter. The health at any time in the future is then: af = exp(ND(42

) + ln(ao)).

Test Article and a Prognostics Example: Data used for this example was provided by

the Spiral Bevel Gear Test facility at NASA. A description of the test rig and test

procedure is given in [13], [24]. The tests consisted of running the gears under load

through a “back to back” configuration, with acquisitions made at 1 minute intervals,

generating time synchronous averages (TSA) on the gear shaft (36 teeth). The pinion, on

which the damage occurred, has 12 teeth. This is highly accelerated life testing, and as

such, the RUL estimates are compressed. The calculated HI (see Error! Reference

Page 13: A Process for Data Driven Prognostics

13

source not found.) where used to update sequential, the state estimator. At each update,

the HI, dHI/dt, RUL, dRUL/dt and d2RUL/dt

2 and were calculated. The confidence of the

RUL was then evaluated. The fault starts to propagate at approximately 25 hours into the

test. Error! Reference source not found. displays the HI state at 26.85 hours. The state

estimate of health has increased form a nominal value of .2 to .4, with and RUL of 2.5

hours.

Figure 4 Initial Low Confidence Prognostic

The confidence is low: note that the prognostic is lagging the actual RUL by

approximately 0.5 hours. However, the actual RUL is still within the estimated

confidence bound of the RUL. As the fault continues to propagate (Error! Reference

source not found.), the confidence in the prognostics has improved and the estimate

RUL is concurrent with the actual RUL.

Page 14: A Process for Data Driven Prognostics

14

Figure 5 High Confidence Prognostic with Small Error Bounds

In practice, it is anticipated that the time period of the RUL will be thousands of hours for

equipment such as wind turbines and hundreds of hours for devices such as helicopter

transmissions (see [21], where a prognostics of 100 to 150 hours of flight time was

observed).

Conclusion: Data driven prognostics requires four conditions: The ability to extract a

feature related to damage, A process to set thresholds, A fault model to propagate the

current state to the desired threshold, and A measure of confidence in the prognostics.

Critical to a successful estimation of remaining useful life is an appropriate threshold.

The process described is based on hypothesis testing and sets a threshold relative to a

probability of false alarm. Refinement in RUL estimation will require feedback from

depot level repair services to validate the appropriateness of the threshold.

Physics of failure models may ultimately give an absolute level of damage for a given CI

value. That said, the cost associated with model development and validation may be

great. The advantage of a data driven approach is the generality of the model, and the

ability to set threshold with nominal components. This leads to a relative low application

cost a faster deployment of systems.

Page 15: A Process for Data Driven Prognostics

15

References:

[1] ISO 10816-3:2009 (2009). Mechanical vibration – Evaluation of Machine

Vibration by Measurement on Non-rotating Parts.

[2] McFadden, P. (1987). A revised model for the extraction of periodic waveforms

by time domain averaging. Mechanical Systems and Signal Processing 1 (1), 83-95

[3] Combet, L., Gelman L. (2007). An automated methodology for performing time

synchronous averaging of a gearbox signal without seed sensor. Mechanical Systems and

Signal Processing, 21 (6), 2590-2606.

[4] Randal, Robert B. (2011). Vibration-based Condition Monitoring. West Sussex,

United Kingdom, John Wiley&Sons.

[5] Bechhoefer, E., Kingsley, M. (2009). A Review of Time Synchronous Average

Algorithms. Annual Conference of the Prognostics and Health Management Society

[6] ISO 10825. (2007) Gears -- Wear and damage to gear teeth -- Terminology

[7] Vecer, P., Kreidl, M., Smid, R. (2005). Condition Indicators for Gearbox

Condition Monitoring Systems. Acta Polytechnica. 45 (6).

[8] McFadden, P., Smith, J., (1985), A Signal Processing Technique for detecting

local defects in a gear from a signal average of the vibration. Proc Instn Mech Engrs, 199

(4)

[9] Zakrajsek, J. Townsend, D., Decker, H. (1993). An Analysis of Gear Fault

Detection Method as Applied to Pitting Fatigue Failure Damage. NASA Technical

Memorandum 105950.

[10] Byington, C., Safa-Bakhsh, R., Watson., M., Kalgren, P. (2003). Metrics

Evaluation and Tool Development for Health and Usage Monitoring System Technology.

HUMS 2003 Conference, DSTO-GD-0348

[11] Dempsy, P., Keller, J. (2008). Signal Detection Theory Applied to Helicopter

Transmissions Diagnostics Thresholds. NASA Technical Memorandum 2008-215262

[12] Bechhoefer, E., Duke, A., Mayhew, E. (2007). A Case for Health Indicators vs.

Condition Indicators in Mechanical Diagnostics. American Helicopter Society Forum 63,

Virginia Beach.

[13] Bechhoefer, E., He, D., Dempsey, P. (2011). Gear Threshold Setting Based On a

Probability of False Alarm. Annual Conference of the Prognostics and Health

Management Society.

[14] GL Renewables, (2007), Guidelines for the Certification of Condition Monitoring

Systems for Wind Turbines. http://www.gl-

group.com/en/certification/renewables/CertificationGuidelines.php

[15] Wackerly, D., Mendenhall, W., Scheaffer, R.,(1996), Mathematical Statistics with

Applications, Buxbury Press, Belmont, 1996.

[16] Fukunaga, K., (1990), Introduction to Statistical Pattern Recognition, Academic

Press, London, 1990, page 75.

[17] Bechhoefer, E., Bernhard, A. (2007). A Generalized Process for Optimal

Threshold Setting in HUMS. IEEE Aerospace Conference, Big Sky.

[18] Candy, J. (2009). Bayesian Signal Processing: Classical, Modern, and Particle

Filtering Methods, John Wiley & Sons, Hoboken.

[19] Brogan, W. (1991). Modern Control Theory, Prentice Hall, Upper Saddle River,

NJ, 07458, 1991. [20] Frost, N., March, K., Pook, L. (1999). Metal Fatigue, 1999, Dover Publications,

Page 16: A Process for Data Driven Prognostics

16

Mineola, NY., page 228-244. [21] Bechhoefer, E., Bernhard, A., He, D., Use of Paris Law for Prediction of

Component Remaining Life, IEEE Aerospace Conference, Big Sky. 2008

[22] M. Orchard, M., Vachtsevanos, G. (2007). A Particle Filtering Approach for On-

Line Failure Prognosis in a Planetary Carrier Plate. International Journal of Fuzzy Logic

and Intelligent Systems, 7 (4), 221-227

[23] Vachtsevanos, G., Lewis, F. L., Roemer, M. Hess, A., and Wu, A. (2006).

Intelligent Fault Diagnosis and Prognosis for Engineering Systems, 1st ed. Hoboken,

New Jersey: John Wiley & Sons, Inc, 2006.

[24] Dempsey, P., Afjeh, A., (2002). Integrating Oil Debris and Vibration Gear

Damage Detection Technologies Using Fuzzy Logic, NASA Technical Memorandum

2002-211126

Bibliography:

[1] First...

[2] Second...