A Process for Data Driven Prognostics
-
Upload
eric-bechhoefer -
Category
Documents
-
view
134 -
download
0
description
Transcript of A Process for Data Driven Prognostics
1
A PROCESS FOR DATA DRIVEN PROGNOSTICS
Eric Bechhoefer
NRG Systems
110 Riggs Road
Hingesburg, VT 05461
Telephone: (802) 482-2255
David He
Professor, Dept of Mechanical and Industrial Engineering
University of Illinois at Chicago
842 W. Taylor Street, Room 3027 ERF
Chicago, IL 60607=7022
Abstract: A prognostic is an estimate of the remaining useful life of a monitored part.
While diagnostics alone can support condition based maintenance practices, prognostics
facilitates changes in logistics which can greatly reduce cost or increase readiness and
availability of the monitored system. A successful prognostic requires four processes.
First, feature extraction of measured data to estimate damage. Second, a threshold for the
feature which when exceeded, it is appropriate to perform maintenance. Third, given a
future load profile, a model that can estimate the life of the component based on the
current damage state. Finally, an estimate of the confidence in the prognostic is needed.
This paper outlines a process for a data driven prognostics by: describing appropriate
condition indicators for gear fault, threshold setting those CIs through fusion into a
component health indicator, using a state space process to estimate the remaining useful
life given the current component health, and a state estimate to quantify the confidence in
the estimate of the remaining useful life. Finally, an gear fault run to failure test is used as
an example.
Key words: Confidence; Condition indicators; Health indicator; Paris’ Law, Remaining
useful life;
Introduction: Condition based maintenance (CBM) systems have been shown to product
cost saving by reducing scheduled maintenance cost. However, CBM systems can be
leverages into far great cost saving by developing a prognostics capability. The ability to
estimate the remaining useful life (RUL) on a component can greatly improve the
availability and reduce logistics cost. Prognostics are the maturation of the CBM system.
Consider the effect on wind farm operations if a prognostics capability was available.
Major maintenance event require a heavy lift crane. The availability of a crane can be
limited and cost of rental is large. If a crane is needed to replace a gearbox, when there
2
are other wind turbines with small RULs, there is cost saving and improvements to
readiness by conducting maintenance on those other marginal turbines.
Alternatively, if the operator of a fleet of helicopter knows the RUL of their assets, the
oeprator can deploy those aircraft that have the highest RUL and be assured that the
aircraft will not need major maintenance while deployed.
The knowledge of a RUL allows the Logistician to reduce inventory spares. This affects
the man power need for maintainers and facilitates more efficient operations. That said,
there are currently few deployed prognostic health management (PHM) systems. While
CBM is a maturing technology, PHM is relatively immature and difficult to implement.
The ability to estimate the RUL requires four pieces of information:
An estimate of the current equipment health,
A limit or threshold where it is appropriate to do maintenance,
An estimate of the future equipment load, and
A model to estimate the time from the current state to the limit/threshold based on
projected load.
The current health of a system can be determined by CBM system. Such systems measure
some feature representing damage. For example, for a pump or generator, shaft order one
acceleration is a measure of health. The limit for this vibration for some equipment can
be found in such standards as [1]. Most monitored components, such as gears and
bearing, are not covered by ISO standard; there are no formal or standardized limits.
Additionally, while the load for stationary equipment may be well known, for many
systems (helicopter or wind turbines, for example) the load is a variable.
Damage models to predict future equipment health fall into two categories: physics of
failure and data driven. Physics of failure models are in their nature appealing. There is a
cost associated with building the models, which then require validation and testing.
Further, the robustness of these models in application may not be satisfactory: how are
material/manufacturing variance, unknown usage and maintenance accounted for in a real
application? Data drive methods, while not capable of giving an absolute level of
damage, can give a relative limit which may give acceptable performance.
Presented here is an end to end process for data driven prognostic for a vibration sensor.
Descriptions of how condition indicators (CIs) are generated for a gear run to failure test
are given. The CIs are fused into a health indicator (HI) through a statistical process (to
control the probability of false alarm). Given the current HI, the time until the HI reaches
a predetermined value, using Paris’ Law, is used to calculate the remaining useful life
(RUL). Once the RUL is calculated, a bound can them be calculated and a confidence in
the RUL is given. Finally, this process will be demonstrated on a spiral bevel gear.
Condition Indicators - Feature Extraction to Improve Signal to Noise: Vibration
signatures for machinery faults tend to be small relative to other vibration signatures. For
example, in the typical gearbox, the energy associated with gear mesh and shaft
vibrations will be orders of magnitude larger than a fault feature. Spectral analysis or root
mean squares (RMS) of vibration are not powerful enough CIs to find an early fault, let
3
alone provide information useful for prognostics. Techniques to improve the signal to
noise are needed to remove tones associated with nominal components, while preserving
the fault signatures.
Gear analysis is based on operations of the time synchronous average [3]. Time
synchronous averaging (TSA) is a signal processing technique that extracts periodic
waveforms from noisy data. The TSA is well suited for gearbox analysis, where it allows
the vibration signature of the gear under analysis to be separated from other gears and
noise sources in the gearbox that are not synchronous with that gear. Additionally,
variations in shaft speed can be corrected, which would otherwise result in spreading of
spectral energy into an adjacent gear mesh bins. In order to do this, a signal is phased-
locked with the angular position of a shaft under analysis.
This phase information can be provided through a n per revolution tachometer signal
(such as a Hall sensor or optical encoder, where the time at which the tachometer signal
crosses from low to high is called the zero crossing) or though demodulation of gear
mesh signatures [3].
The model for vibration in a shaft in a gear box was given in [2] as: x(t) = i=1:K Xi(1+
ai(t))cos(2 i fm(t)+ i)+b(t)
where:
Xi is the amplitude of the kth mesh harmonic
fm(t) is the average mesh frequency
ak(t) is the amplitude modulation function of the kth mesh harmonic.
i(t) is the phase modulation function of the kth mesh harmonic.
i is the initial phase of harmonic k, and
b(t) is additive background noise.
The mesh frequency is a function of the shaft rotational speed: fm = Nf, where N is the
number of teeth on the gear and f is the shaft speed, with no reduction in the analysis
performance
This vibration model assumes that f is constant. In most systems, there is some wander in
the shaft speed due to changes in load or feedback delay in the control system. This
change in speed will result in smearing of amplitude energy in the frequency domain. The
smearing effect, and non synchronous noise, is reduced by resampling the time domain
signal into the angular domain: mx( ) = E[x( )] = mx( + ). The variable is the period
of the cycle to which the gearbox operation is periodic, and E[] is the expectation (e.g.
ensemble mean). This makes the assumption that mx( ) is stationary and ergodic. If this
assumption is true, than non-synchronous noise is reduce by 1/sqrt(rev), where rev is the
number of cycles measured for the TSA
Do not indent paragraphs.
TSA Techniques for Condition Indicators: The TSA is an example of angular
resampling [2], [4], where the number of data points in one shaft revolution (rn) are
interpolated into m number of data points, such that:
For all shaft revolutions n, m is larger than r,
4
And m = 2ceiling (log2 (r))
(typical for radix 2 Fast Fourier Transform).
Linear, bandwidth limited linear interpolation, and spline techniques have been used [5].
In this study, linear interpolation was used as it is considerable faster than spline or
bandwidth limited filtering, with no reduction in analysis performance of the TSA.
The TSA itself can be used for CIs. Typically, a CI is a statistics of a waveform (in the
case the TSA). Common statistics are RMS, Peak to Peak, Crest Factor, Kurtosis and
Skewness. For shaft, shaft order 1, 2 and 3 (first, second and third shaft rate harmonic)
can be used to determine shaft out of balance, bent shaft, and/or shaft coupling damage,
respectively. Error! Reference source not found. outlines the process of generating the
TSA, and shaft CIs.
Figure 1 Generation of the TSA and selected CIs
Gear Fault Indicators: There are at least six failure modes for gears [6]: surface
disturbances, scuffing, deformations, surface fatigue, fissures/cracks and tooth breakage.
Each type of failure mode, potentially, can generate a different fault signature.
Additionally, relative to the energy associated with gear mesh tone and other noise
sources, the fault signatures are typically small. A number of researchers have proposed
analysis techniques to identify these different faults [7],[8] [9]. Typically, these analyses
are based on the operation of the TSA. Examples of analysis are:
Residual, where shaft order 1, 2, and 3 frequencies, and the gear mesh harmonics,
of the TSA are removed. Faults such as a soft/broken tooth generate a 1 per rev
impacts in the TSA. In the frequency domain of the TSA, these impacts are
expressed as multiple harmonic of the 1 per rev. The shaft order 1, 2 and 3
frequencies and gear mesh harmonics in the frequency domain, and then the
inverse FFT is performed. This allows the impact signature to become prominent
in the time domain. CIs are statistics of this waveform (RMS, Peak 2 Peak, Crest
Factor, Kurtosis).
Energy operator, which is a type of residual of the autocorrelation function. For a
nominal gear, the predominant vibration is gear mesh. Surface disturbances,
5
scuffing, etc, generate small higher frequency values which are not removed by
autocorrelation. Formally, the EO is: TSA2:n-1 x TSA2:n-1 x – TSA1:n-2 x TSA3:n .
The bold indicates a vector of TSA values. The CIs of the EO are the standard
statistics of the EO vector
Narrowband Analysis operates the TSA by filtering out all tones except that of the
gear mesh and with a given bandwidth. It is calculated by zeroing bins in of the
Fourier transform of the TSA, except the gear mesh. The bandwidth is typically
10% of the number of teeth on the gear under analysis. For example, a 23 tooth
gear analysis would retain bins 21, 22, 23, 24, and 25, and there conjugates in
Fourier domain. Then the inverse FFT is taken, and statistics of waveform are
taken. Narrowband analysis can capture sideband modulation of the gear mesh
tone due to misalignment, or a cracked/broken tooth.
Amplitude Modulation (AM) analysis is the absolute value of the Hilbert
transform of the Narrowband signal. For a gear with minimum transmission error,
the AM analysis feature should be a constant value. Faults will greatly increase
the kurtosis of the signal
Frequency Modulation (FM) analysis is the derivative of the angle of the Hilbert
transform of the Narrowband signal. It’s is a powerful tool capable of detecting
changes of phase due to uneven tooth loading, characteristic of a number of fault
types.
For a more complete description of these analyses, see [7], or [8]. Error! Reference
source not found. is an example of the processing to generator the gear CIs for a spiral
bevel gear with surface pitting and scuffing. This gear fault will be used throughout the
paper.
Figure 2 Process for Generating Gear CIs
6
Threshold Setting and Component Health: In a physics of failure prognostics method,
modeling would estimate the CI generated for some level of damage. When the measured
CI exceeds the modeled threshold value, maintenance is performance. In a data driven
process, maintenance is performance when a statistically set threshold is exceeded. Thus,
the performance of a data driven method is completely determined by the quality of the
threshold setting process.
The concept of thresholding was explored in [10], where for a given, single CI, a
probability density function (PDF) for the Rician/Rice statistical distribution was used to
set a threshold based on an probability of false alarm (PFA). This is contracted with [11],
who explored the relationship between CI threshold and PFA to describe the receiver
operating characteristics (ROC) of the CI for a given fault. Additionally, Dempsey used
the ROC to evaluate the performance of the CI for a fault type. These methods support a
data driven approach for prognostics by formalizing a method for threshold setting.
Estimation of RUL given a threshold is complicated in that there numerous failure modes
for a gear. Further, no single CI has been identified that works with all fault modes. This
suggests one of two possible architectures for a prognostics system:
Estimate the RUL for each Gear CI used, where the reported RUL is the
minimum remaining useful life of each CI, or
Fuse n number of CI into a gear health indicator (HI) and calculate the RUL based
on the HI.
Computationally, the use of HIs is attractive. Health indicators (HI) provide decision
making tools for the end user on the status of system health. Health indicators consist of
the integration of several condition indicators into one value that provides the health
status of the component to the end user [11]. Highlighted in [12] are a number of
advantages of the HI over CIs, such as: controlling false alarm rate, improved detection,
and simplification of user display. Further, in [13] is described a threshold setting process
for gear health, where the HI is a function of the CI distributions. They give a generalized
process of for threshold setting, where the HI is a function of distribution of CIs,
regardless of the correlation between the CIs.
Gear Health as a Function of Distribution: Prior to detailing the mathematical methods
used to develop the HI, a nomenclature for component health is needed. To simplify
presentation and knowledge creation for a user, a uniform meaning across all components
in the monitored machine should be developed. The measured CI statistics (e.g. PDFs)
will be unique for each component type (due to different rates, materials, loads, etc). This
means that the critical values (thresholds) will be different for each monitored
component. By using the HI paradigm, one can normalized the CIs, such that the HI is
independent of the component. Further, using guidance from [14], the HI will be
designed such that there are two alert levels: warning and alarm. Further, a common
nomenclature for the HI can be developed, such that:
The HI ranges from 0 to 1, where the probability of exceeding an HI of 0.5 is the
PFA,
7
A warning alert is generated when the HI is greater than or equal to 0.75.
Maintenance should be planned by estimating the RUL until the HI is 1.0.
An alarm alert is generated when the HI is greater than or equal to 1.0. Continued
operations could cause collateral damage.
Note that this nomenclature does not define a probability of failure for the component, or
that the component fails when the HI is 1.0. Rather, it suggests a change in operator
behavior to a proactive maintenance policy: perform maintenance prior to the generations
of cascading faults. For example, by performing maintenance on a bearing prior the
bearing shedding extensive material, costly gearbox replacement can be avoided.
Controlling for the Correlation Between CIs: All CIs have a probability distribution
(PDF). Any operation on the CI to form a health index (HI) is then a function of
distributions [15]. Functions such as:
The maximum of n CI (the order statistics)
The sum of n CIs, or
The norm of n CIs (energy)
are valid if and only if the distribution (e.g. CIs) are independent and identical [15]. For
Gaussian distribution, subtracting the mean and dividing by the standard deviation will
give identical Z distributions. The issue of ensuring independence is much more difficult.
In general, the correlation between CIs is non-zero. As an example, many of the
correlation coefficients used in this study were near 1 (see
Table 1).
Table 1 Correlation Coefficients for the Six CIs Used in the Study
ij CI 1 CI 2 CI 3 CI 4 CI 5 CI 6
CI 1 1 0.84 0.79 0.66 -0.47 0.74
CI 2 1 0.46 0.27 -0.59 0.36
CI 3 1 0.96 -0.03 0.97
CI 4 1 0.11 0.98
CI 5 1 0.05
CI 6 1
This correlation between CIs implies that for a given function of distributions to have a
threshold that operationally meets the design PFA, the CIs must be whitened (e.g. de-
correlated). In [16], Fukunaga presents a whitening transform using the Eigenvector
matrix multiplied by the square root for the Eigenvalues (diagonal matrix) of the
covariance of the CIs: A = 1/2
T, where
T is the transpose of the eigenvalue matrix
and and is the eigenvalue matrix. The transform is not orthonormal: the Euclidean
distances are not preserved in the transform. While ideal for maximizing the distance
8
(separation) between classes (such as in a Baysian classifier), the distribution of the
original CI is not preserved. This property of the transform makes it inappropriate for
threshold setting.
If the CIs represented a metric such as shaft order acceleration, then one can construct an
HI which is the square of the normalized power (e.g. square root of the acceleration
squared). This can be defined as normalized energy, as per [17], who able to whiten the
CI and establish a threshold for a given PFA.
A more general whitening solution can be found using Cholesky decomposition (see
[13]). The Cholesky decomposition of Hermitian, positive definite matrix results in A =
LL*, where L is a lower triangular, and L
* is its conjugate transpose. By definition, the
inverse covariance is positive definite Hermitian. It then follows that if: LL* =
-1, then
Y = L x CIT. The vector CI is the correlated CIs used for the HI calculation, and Y is 1 to
n independent CI with unit variance (one CI representing the trivial case). The Cholesky
decomposition, in effect, creates the square root of the inverse covariance. This in turn is
analogous to dividing the CI by its standard deviation (the trivial case of one CI). In turn,
Y = L x CIT creates the necessary independent and identical distributions required to
calculate the critical values for a function of distributions.
As an example of the importance of correlation on, consider a simple HI function: HI =
CI1 + CI2. The CIs will be normally distributed with mean 0 and standard deviation of 1.
The standard deviation of this HI is: HI = sqrt(2
CI1 + 2
CI2 + 2 CI1,CI2 x HI x CI1 x
CI2), where CI1,CI2 is the correlation between CI1 and CI2. If one assumes CI1,CI2 is 0.0,
then HI = 1.414 (e.g. the sqrt(2)). For a PFA of 10-6
, the threshold is then 6.722.
Consider the case in which the observed correlation is closer to 1 (e.g. CI1,CI2 is 1.0), then
the observed HI = 2. For a threshold of 6.722, the operational PFA is 4 x 10-4
. This is
390 times greater than the designed PFA. This illustrates the effect of correlation on
threshold setting.
HI Based on Rayleigh PDFs: The CIs used for this example have Rayleigh like PDFs
(e.g. heavily tailed). Consequently, the HI function was designed using the Rayleigh
distribution. The PDF for the Rayleigh distribution uses a single parameter, , resulting
in the mean ( = *( /2)0.5
) and variance (2 = (2 - /2) *
2). The PDF of the Rayleigh
is: x/2exp(x/2
2). Note that when applying these equations to the whitening process, the
value for for each CI will then be: 2 = 1, and =
2 / (2 - /2)
0.5 = 1.5264. For a more
complete analysis, see [17].
A number of HI functions could be used, but experience has shown [13] that the greatest
signal to noise is achieve where the HI function is the norm of n CIs. This represents the
normalized energy of the CIs. If the CIs are IID, it can be shown that the function defines
a Nakagami PDF [17]. The statistics for the Nakagami are: = n, and = 1/(2- /2)*2*n.
For this study, data was collected from experiments performed in the Spiral Bevel Gear
Test facility at NASA Glenn. A description of the test rig and test procedure is given in
[13]. Six CIs where used, so that: = 6, and = 27.96. For a PFA of 10-6
, the threshold
10.882, with the HI function calculated as: HI = .05/10.882 x ( i=1:6 Y i 2)1/2
.
9
The six CIs used for the HI calculation were: Residual RMS, Energy Operator RMS,
FM0, NB KT, AM KT and FM RMS. These CIs were chosen because they exhibited
good sensitivity to the fault. Residual Kurtosis and Energy Ratio also were good
indicators, but were not chosen because;
It has been the researcher’s experience that these CIs become ineffective when
used in complex gear boxes, and
As the faults progresses, these CIs lose effectiveness. The residual kurtosis can in
fact decrease, while the energy ratio will approach 1.
Covariance and mean values for the six CI were calculated by sampling healthy data from
four gears prior to the fault propagating. This was done by randomly selecting 100 data
points from each gear, and calculating the covariance and means over the resulting 400
data points. The selected CI’s PDF were not Gaussian, but exhibited a high degree of
skewness. Because of this, the PDFs were “left shifted” by subtracting an offset such that
the PDFs exhibited Rayleigh like distributions. The estimated gear health is plotted in
Error! Reference source not found., where the damage on the gear at the end of the test
is seen in the upper left corner.
Figure 3 Gear Health, Torque, and Image of Gear Damage at HI 1.5
The key issue with a data driven prognostic is the appropriateness of the threshold. When
the HI is 1.0, is the damage such that it is appropriate to do maintenance? From the
example (Error! Reference source not found.), it is apparent that an HI of 1 displays
10
damage warranting maintenance. Because it is appropriate to performance maintenance
when the HI is 1.0 or greater, one can state that the RUL is the time from the current state
until the estimated HI is 1.0.
State Space Models for Prognostics: State-space representation of data provides a
versatile and robust way to model systems. Starting with the definition of the states, and
the basic principles underlying the characterization of phenomena under study, once can
propagate the states as a data driven stochastic process.
The choice of which type of state space model to use is driven by the nature of the system
dynamics and noise source. If the phenomenology of the system has linear dynamics with
Gaussian noise, a Kalman filter (KF) is used. If it is a non-linear process with Gaussian
noise, a sigma-point Bayesian process (e.g. unscented Kalman filter - UKF) or extended
Kalman filter (EKF) is appropriate. For non-linear dynamics with non-linear noise, we
use a sequential Monte Carlo method employing sequential estimation of the probability
distribution using “importance sampling” techniques. This method is generally referred to
as particle filtering (PF) [18]
A state space model estimates the state variable on the basis of measurement of the
output and input control variables [19]. In general, a system plant can be defined by: x =
Ax+ Bu, and y =Cx, where x is the state variable, x is the rate of change of the state
variable, and y is the output of the system.
An observer is a subsystem used to reconstruct the state space of the plant. The model of
the observer is the same as that of the plant, except that one adds an additional term
which includes the estimated error to account for inaccuracies in the A and B matrixes.
This means that any hidden state (such as RUL) can be reconstructed if we can model the
plant (e.g. failure propagation) successfully.
The observer is defined as: E[x] = E[Ax]+ Bu +K(y-E[Cx]), where E[x] is the estimate
state derivative, and E[Cx]is the expectation of the system output. The matrix K is called
the Kalman gain matrix (linear, Gaussian case). It is a weighting matrix that maps the
differences between the measured output y and the estimated output E[Cx]. A KF is used
to optimally set the Kalman Gain matrix.
A KF is a recursive algorithm that optimally filters the measured state based on a priori
information such as the measurement noise, the unknown behavior of the state, and
relationship between the input and output states (e.g. the plant), and the time between
measurements. Computationally, it is attractive because it can be designed with no matrix
inversion and it is a one step, iterative process. The filtering process is given as:
Prediction
Xt|t-1 = AXt-1|t-1 State
Pt|t-1 = A Pt-1|t-1AT + Q Covariance
Gain
K = Pt|t-1CT [CPt|t-1C
T + R]
-1
Update
Pt|t = (I – KC) Pt|t-1 State Covariance
X t|t = Xt|t-1 + K(Y-C Xt|t-1) State Update
where:
t|t-1 is the condition statement (e.g. t
given the information at t-1)
X is the state information (x, dx/dt,
dx2/dt
2)
A is the state transition matrix
Y is the measured data
K is the Kalman Gain
11
P is the state covariance matrix
Q is the process noise model
C is the measurement matrix
R is the measurement variance
For nonlinear systems with Gaussian noise (UKF or EKF), the state prediction is a
function of Xt|t-1 and the state transition matrix A, and C is the derivative of the state with
respect to the measurement.
For non-linear, non-Gaussian noise problems, particle filters (PF) are attractive. PF is
based on representing the filtering distribution as a set of particles. The particles are
generated using sequential importance re-sampling (a Monte Carlo technique), where a
proposed distribution is used to approximate a posterior distribution by appropriate
weighting. In this example, the state update is nonlinear and the measurement noise is
Gaussian. As such, an extended Kalman filter was used.
System Dynamics for Estimating the RUL: The state space model can be constructed
as a parallel system to the plant (e.g. the system under study). This requires an
appropriate model to simulate the system dynamics. In general, failure modes
propagating in mechanical systems are difficult to model at a level of fidelity that would
generate any meaningful results (e.g. Health and RUL based on physics of failure). One
needs a generalized, data driven process that can model the plant adequately enough to
generate RUL with small error.
Since 1953, a number of fault growth theories have been proposed, such as: net area
stress theories, accumulated strain hypothesis, dislocation theories, and others [20].
Through substitution of variables, most of these theories can be generalized by the Paris’
Law: da/dN = D( K)n. Paris’ Law governs the rate of crack growth in a homogenous
material, where:
da/dN is the rate of change of the half crack length,
D is a material constant of the crack growth equation,
K is the range of strain K during a fatigue cycle,
n is the exponent of the crack growth equation.
The range of strain, K is given as: K= 2 ( a)1/2
, where
is gross strain,
is a geometric correction factor, and
a is the half crack length.
These variables are specific to a given material and test article. In practice, the variables
are unknown. This requires some simplifying assumptions to be made to facilitate
analysis. For many materials, the crack growth exponent is 2, (see [20]). The geometric
correction factor , is set to 1 (a constant which will accounted for in the calculation of
D), which allows Paris’ law to be reduced to: da/dN = D(42
a).
Taking the inverse da/dN gives the rate of change in cycles per change in crack length,
or: dN/da = 1/[D(42
a)]. Integrating over crack length give the number of cycles (for
near synchronous systems, RUL is N x rpm): N = 1/[D(42
a)](ln(af) – ln(ao), where the
12
current measured crack is ao and the final crack length af. Since the crack length is
unknown, the current state, HI, will be used as a surrogate for ao while af will be 1.0 (the
RUL is the time from the current HI state until the HI is 1.0). N is the RUL times some
constant (RPM for example). The material crack constant, D, can be estimated as: D =
da/dN /(42
a). Gross strain cannot generally be measured, thus, an appropriate surrogate
value (e.g. torque, or yaw misalignment) will be used.
The use of Paris’s law for the calculation of RUL was given by [21] and [22], but lacked
a measure of confidence (e.g. how good was the prognostics). Confidence is an important
requirement for a PHM system [23].
A Prognostic and Confidence in the Prognostic: In practice, a prognostic or PHM
capability would be used to schedule maintenance or assist in assets management and
logistic support. The asset owner/operator will make decisions which effect the
operational availability and future revenues based on the PHM system. They will need an
intuitive, simple display that conveys information on: current health, RUL, and
confidence in the RUL prediction.
Model confidence is essential in any RUL (see [23]). For any RUL calculation, given 1
hour of nominal usage, the RUL should decrease by 1 (e.g. dN/dt is approximately -1:
one hour of life is consumed for each hour of operation). Further, a measure of model
drift or convergence is the second derivative d2N/dt
2: a value close to zero indicates
convergence. When these conditions are met, the model used for calculation of the RUL
is consistent, and is indicative of a good estimate of the RUL of the component.
One can use visual cues for of the prognostics based on model convergence. Visual cues,
such as color, can indicate the confidence in the RUL:
Low Confidence: Yellow, abs(dN/dt-1) > 3 and abs(d2N/dt
2) > 0.5
Medium Confidence: Blue abs(dN/dt-1) > 2 and abs(d2N/dt
2) > 0.5
High Confidence: Green, abs(dN/dt-1) < 2 and abs(d2N/dt
2) < 0.5
A key requirement of the prognostic model is the ability to predict what the health of the
component will be some time in the future. For a given state space mode, the RUL or any
predicted health is an expectation based on the current state and future usage (e.g.
damage or strain). The Paris’ law is driven by delta strain: changes in strain will affect
the RUL. Future health is then based on the mean strain and a bound on that strain to give
a range on the RUL (one benefit in a PF model is a direct distribution of the RUL). This
strain information could be based on forecast weather or usage for a wind turbine or type
of mission for a helicopter. The health at any time in the future is then: af = exp(ND(42
) + ln(ao)).
Test Article and a Prognostics Example: Data used for this example was provided by
the Spiral Bevel Gear Test facility at NASA. A description of the test rig and test
procedure is given in [13], [24]. The tests consisted of running the gears under load
through a “back to back” configuration, with acquisitions made at 1 minute intervals,
generating time synchronous averages (TSA) on the gear shaft (36 teeth). The pinion, on
which the damage occurred, has 12 teeth. This is highly accelerated life testing, and as
such, the RUL estimates are compressed. The calculated HI (see Error! Reference
13
source not found.) where used to update sequential, the state estimator. At each update,
the HI, dHI/dt, RUL, dRUL/dt and d2RUL/dt
2 and were calculated. The confidence of the
RUL was then evaluated. The fault starts to propagate at approximately 25 hours into the
test. Error! Reference source not found. displays the HI state at 26.85 hours. The state
estimate of health has increased form a nominal value of .2 to .4, with and RUL of 2.5
hours.
Figure 4 Initial Low Confidence Prognostic
The confidence is low: note that the prognostic is lagging the actual RUL by
approximately 0.5 hours. However, the actual RUL is still within the estimated
confidence bound of the RUL. As the fault continues to propagate (Error! Reference
source not found.), the confidence in the prognostics has improved and the estimate
RUL is concurrent with the actual RUL.
14
Figure 5 High Confidence Prognostic with Small Error Bounds
In practice, it is anticipated that the time period of the RUL will be thousands of hours for
equipment such as wind turbines and hundreds of hours for devices such as helicopter
transmissions (see [21], where a prognostics of 100 to 150 hours of flight time was
observed).
Conclusion: Data driven prognostics requires four conditions: The ability to extract a
feature related to damage, A process to set thresholds, A fault model to propagate the
current state to the desired threshold, and A measure of confidence in the prognostics.
Critical to a successful estimation of remaining useful life is an appropriate threshold.
The process described is based on hypothesis testing and sets a threshold relative to a
probability of false alarm. Refinement in RUL estimation will require feedback from
depot level repair services to validate the appropriateness of the threshold.
Physics of failure models may ultimately give an absolute level of damage for a given CI
value. That said, the cost associated with model development and validation may be
great. The advantage of a data driven approach is the generality of the model, and the
ability to set threshold with nominal components. This leads to a relative low application
cost a faster deployment of systems.
15
References:
[1] ISO 10816-3:2009 (2009). Mechanical vibration – Evaluation of Machine
Vibration by Measurement on Non-rotating Parts.
[2] McFadden, P. (1987). A revised model for the extraction of periodic waveforms
by time domain averaging. Mechanical Systems and Signal Processing 1 (1), 83-95
[3] Combet, L., Gelman L. (2007). An automated methodology for performing time
synchronous averaging of a gearbox signal without seed sensor. Mechanical Systems and
Signal Processing, 21 (6), 2590-2606.
[4] Randal, Robert B. (2011). Vibration-based Condition Monitoring. West Sussex,
United Kingdom, John Wiley&Sons.
[5] Bechhoefer, E., Kingsley, M. (2009). A Review of Time Synchronous Average
Algorithms. Annual Conference of the Prognostics and Health Management Society
[6] ISO 10825. (2007) Gears -- Wear and damage to gear teeth -- Terminology
[7] Vecer, P., Kreidl, M., Smid, R. (2005). Condition Indicators for Gearbox
Condition Monitoring Systems. Acta Polytechnica. 45 (6).
[8] McFadden, P., Smith, J., (1985), A Signal Processing Technique for detecting
local defects in a gear from a signal average of the vibration. Proc Instn Mech Engrs, 199
(4)
[9] Zakrajsek, J. Townsend, D., Decker, H. (1993). An Analysis of Gear Fault
Detection Method as Applied to Pitting Fatigue Failure Damage. NASA Technical
Memorandum 105950.
[10] Byington, C., Safa-Bakhsh, R., Watson., M., Kalgren, P. (2003). Metrics
Evaluation and Tool Development for Health and Usage Monitoring System Technology.
HUMS 2003 Conference, DSTO-GD-0348
[11] Dempsy, P., Keller, J. (2008). Signal Detection Theory Applied to Helicopter
Transmissions Diagnostics Thresholds. NASA Technical Memorandum 2008-215262
[12] Bechhoefer, E., Duke, A., Mayhew, E. (2007). A Case for Health Indicators vs.
Condition Indicators in Mechanical Diagnostics. American Helicopter Society Forum 63,
Virginia Beach.
[13] Bechhoefer, E., He, D., Dempsey, P. (2011). Gear Threshold Setting Based On a
Probability of False Alarm. Annual Conference of the Prognostics and Health
Management Society.
[14] GL Renewables, (2007), Guidelines for the Certification of Condition Monitoring
Systems for Wind Turbines. http://www.gl-
group.com/en/certification/renewables/CertificationGuidelines.php
[15] Wackerly, D., Mendenhall, W., Scheaffer, R.,(1996), Mathematical Statistics with
Applications, Buxbury Press, Belmont, 1996.
[16] Fukunaga, K., (1990), Introduction to Statistical Pattern Recognition, Academic
Press, London, 1990, page 75.
[17] Bechhoefer, E., Bernhard, A. (2007). A Generalized Process for Optimal
Threshold Setting in HUMS. IEEE Aerospace Conference, Big Sky.
[18] Candy, J. (2009). Bayesian Signal Processing: Classical, Modern, and Particle
Filtering Methods, John Wiley & Sons, Hoboken.
[19] Brogan, W. (1991). Modern Control Theory, Prentice Hall, Upper Saddle River,
NJ, 07458, 1991. [20] Frost, N., March, K., Pook, L. (1999). Metal Fatigue, 1999, Dover Publications,
16
Mineola, NY., page 228-244. [21] Bechhoefer, E., Bernhard, A., He, D., Use of Paris Law for Prediction of
Component Remaining Life, IEEE Aerospace Conference, Big Sky. 2008
[22] M. Orchard, M., Vachtsevanos, G. (2007). A Particle Filtering Approach for On-
Line Failure Prognosis in a Planetary Carrier Plate. International Journal of Fuzzy Logic
and Intelligent Systems, 7 (4), 221-227
[23] Vachtsevanos, G., Lewis, F. L., Roemer, M. Hess, A., and Wu, A. (2006).
Intelligent Fault Diagnosis and Prognosis for Engineering Systems, 1st ed. Hoboken,
New Jersey: John Wiley & Sons, Inc, 2006.
[24] Dempsey, P., Afjeh, A., (2002). Integrating Oil Debris and Vibration Gear
Damage Detection Technologies Using Fuzzy Logic, NASA Technical Memorandum
2002-211126
Bibliography:
[1] First...
[2] Second...