Minor maintenance actions and their impact on diagnostic and prognostic CBM models

9
J Intell Manuf (2012) 23:303–311 DOI 10.1007/s10845-009-0352-0 Minor maintenance actions and their impact on diagnostic and prognostic CBM models Neil Montgomery · Dragan Banjevic · Andrew K. S. Jardine Received: 15 September 2009 / Accepted: 19 October 2009 / Published online: 19 November 2009 © Springer Science+Business Media, LLC 2009 Abstract Minor maintenance actions can affect condition- monitoring measurements, which may in turn affect the accuracy of diagnostic and prognostic techniques used in condition-based maintenance (CBM). Outputs of a CBM model include the calculation of optimal maintenance deci- sions, conditional reliability, and the calculation of remaining useful life, among other measures. It is necessary to have a model for the manner in which the condition monitoring data changes over time to produce these output measures; many models have been developed to do so. It is also common to record minor maintenance actions carried out on critical assets, with lubricant changes being one of the most com- mon, but it is unusual for models to consider the impact of such maintenance actions that affect the condition mon- itoring data. In this paper we discuss the impact of minor maintenance on CBM models. A dataset on a collection of gearboxes, consisting of reliability and oil analysis informa- tion, including data on oil changes and oil additions, is used to illustrate the benefit of modelling minor maintenance actions. Keywords Maintenance · Decision support systems · Data-driven maintenance model · Condition-based maintenance · Oil analysis · Maintenance models · Remaining useful life · Minor maintenance N. Montgomery (B ) · D. Banjevic · A. K. S. Jardine Center for Maintenance Optimization and Reliability Engineering, University of Toronto, Toronto, Canada e-mail: [email protected] D. Banjevic e-mail: [email protected] A. K. S. Jardine e-mail: [email protected] Introduction This paper concerns itself with practical mathematical models for maintenance policies that are determined primar- ily by the data that are collected concerning an asset or a class of assets. There are three interrelated types of data that we will consider. First, there are the maintenance records, which will include installation times, failure times and reasons, and records of various maintenance actions. These maintenance actions will include anything from a complete renewal of the asset, to the replacement or repair of a component, all the way down to minor maintenance actions such as an oil change or oil addition in the case of an asset whose health depends in part on lubrication. Second, for some assets there will be condition-monitoring data used to monitor asset health. Third, for some assets there will be data available on the financial and environment impacts of failures and mainte- nance actions. Together, these three types of data can be used to select and implement the appropriate maintenance strategy. The main contribution of this paper will be to show the extent to which the minor maintenance actions can have an impact on the accuracy of data-driven mathematical models for maintenance that are becoming widely popular. First, we will provide a brief review of very well-known basic material about maintenance policies for the purpose of highlighting which aspects can be data-driven, and to what extent. We will then provide a description of a well-established data- driven CBM methodology that can account for minor main- tenance actions. We will illustrate the benefits of considering minor maintenance actions in an example using real data col- lected on a collection of gearboxes. We extend our previous results in this area by introducing a method for incorporating data collected on the amount of oil periodically added to the gearbox. 123

Transcript of Minor maintenance actions and their impact on diagnostic and prognostic CBM models

J Intell Manuf (2012) 23:303–311DOI 10.1007/s10845-009-0352-0

Minor maintenance actions and their impact on diagnosticand prognostic CBM models

Neil Montgomery · Dragan Banjevic ·Andrew K. S. Jardine

Received: 15 September 2009 / Accepted: 19 October 2009 / Published online: 19 November 2009© Springer Science+Business Media, LLC 2009

Abstract Minor maintenance actions can affect condition-monitoring measurements, which may in turn affect theaccuracy of diagnostic and prognostic techniques used incondition-based maintenance (CBM). Outputs of a CBMmodel include the calculation of optimal maintenance deci-sions, conditional reliability, and the calculation of remaininguseful life, among other measures. It is necessary to have amodel for the manner in which the condition monitoring datachanges over time to produce these output measures; manymodels have been developed to do so. It is also commonto record minor maintenance actions carried out on criticalassets, with lubricant changes being one of the most com-mon, but it is unusual for models to consider the impactof such maintenance actions that affect the condition mon-itoring data. In this paper we discuss the impact of minormaintenance on CBM models. A dataset on a collection ofgearboxes, consisting of reliability and oil analysis informa-tion, including data on oil changes and oil additions, is used toillustrate the benefit of modelling minor maintenance actions.

Keywords Maintenance · Decision support systems ·Data-driven maintenance model · Condition-basedmaintenance · Oil analysis · Maintenance models ·Remaining useful life · Minor maintenance

N. Montgomery (B) · D. Banjevic · A. K. S. JardineCenter for Maintenance Optimization and Reliability Engineering,University of Toronto, Toronto, Canadae-mail: [email protected]

D. Banjevice-mail: [email protected]

A. K. S. Jardinee-mail: [email protected]

Introduction

This paper concerns itself with practical mathematicalmodels for maintenance policies that are determined primar-ily by the data that are collected concerning an asset or a classof assets. There are three interrelated types of data that wewill consider. First, there are the maintenance records, whichwill include installation times, failure times and reasons, andrecords of various maintenance actions. These maintenanceactions will include anything from a complete renewal of theasset, to the replacement or repair of a component, all the waydown to minor maintenance actions such as an oil change oroil addition in the case of an asset whose health dependsin part on lubrication. Second, for some assets there willbe condition-monitoring data used to monitor asset health.Third, for some assets there will be data available on thefinancial and environment impacts of failures and mainte-nance actions. Together, these three types of data can beused to select and implement the appropriate maintenancestrategy.

The main contribution of this paper will be to show theextent to which the minor maintenance actions can have animpact on the accuracy of data-driven mathematical modelsfor maintenance that are becoming widely popular. First, wewill provide a brief review of very well-known basic materialabout maintenance policies for the purpose of highlightingwhich aspects can be data-driven, and to what extent. Wewill then provide a description of a well-established data-driven CBM methodology that can account for minor main-tenance actions. We will illustrate the benefits of consideringminor maintenance actions in an example using real data col-lected on a collection of gearboxes. We extend our previousresults in this area by introducing a method for incorporatingdata collected on the amount of oil periodically added to thegearbox.

123

304 J Intell Manuf (2012) 23:303–311

Overview of maintenance strategies

Assets can be classified according to the type of maintenancestrategy that should be employed. If an asset’s functionalfailures result in negligible downtime and negligible excessfinancial or environmental impact, it makes sense to adopt areactive or breakdown maintenance policy for that asset. Thistype of maintenance policy is data-driven only to the extentthat the asset owner may have recorded the cost, environ-mental, or availability impacts of past failures, and as suchis able to prove that predictive maintenance is not possiblyuseful. If, however, an asset exhibits an increasing failurerate as it ages, and the avoidance of asset failure results insubstantial savings in maintenance costs, reduction in envi-ronmental impact, or increase in asset availability, then it ismore suitable to use a predictive maintenance strategy.

Time-based maintenance

The less sophisticated type of predictive maintenance is time-based maintenance, in which a periodic maintenance intervalis specified, with the goal of maximizing the useful life ofthe asset while minimizing the number of failures that occur.There is an obvious trade-off between maximizing useful lifeand minimizing failure, so asset owners will naturally seekout the best solution for their situation. This optimal pol-icy must therefore depend on the probability distribution offailure times of the asset class, the choice of optimizationcriteria, and any extra data required (which will usually beindependent of the distribution of failure times) that are spe-cific to the chosen optimization criteria. For example, if anasset class’s failure distribution has a known Weibull distri-bution, and the optimization criterion chosen is to minimizeaverage cost per unit time, and the total cost due to failureand the total cost incurred from performing preventive main-tenance are available, then it is relatively easy calculation todetermine the optimal time-based maintenance interval usingthe formulae in Jardine and Tsang (2006), for example.

Condition-based maintenance

The more sophisticated type of predictive maintenance iscondition-based maintenance (CBM), in which condition-monitoring (CM) data are used to assess the health of theasset and, ideally, prompt maintenance actions that will pre-vent failures while maximizing useful life. If there is a con-dition monitoring technique that can accurately assess assethealth, then condition-based maintenance holds the promiseof more accurate predictive maintenance, since each individ-ual asset within a class will have its own on-condition mainte-nance recommendation. The potential value of on-conditionmaintenance has spawned a vast literature in CM technol-ogies, data processing techniques, and methods to produce

Fig. 1 Sample plot of iron measurements versus time

maintenance decisions from CM data. A recent review of thisliterature with a comprehensive list of references is containedin Jardine et al. (2006).

Shortcomings of simplest CBM implementations

Implementations of CBM vary widely in their complexity,with the simplest techniques data-driven technique being asimple plot of a CM variable against time. A simple illustra-tion appears in Fig. 1 below, which will be used to illuminatesome of the points we wish to make in this paper. Figure 1consists of artificial data from a series of oil analyses carriedout on some asset requiring lubrication.

The user of this plot, who could be responsible for themaintenance of the asset or who could even be the oil anal-ysis vendor, will likely have pre-defined warning and actionlimits. We will refer to this CBM methodology as the “con-trol chart technique”. While simple to implement and easyto understand, the limitations of the control chart techniquehave been widely recognized [resulting in an extensive liter-ature in data processing and decision modelling as reviewedin Jardine et al. (2006), for example]:

1. There are often (if not almost always) many conditionmonitoring variables. A large number of plots couldprove difficult to interpret and it is not obvious from theplots which variables, perhaps in combination, are thekey predictors of failure, especially when there is morethan one mode of failure to consider. The plot also doesnot take into account the possible effect of wear due toage over and above the wear information provided by thevariables.

2. It is not obvious where the warning and action limitsshould be set. Manufacturer recommendations may notbe optimized with respect to the costs of failure and pre-ventive maintenance particular to that asset’s operatingenvironment.

3. If an optimal maintenance decision is desired, or indeedany calculation related to the probability of failure giventhe current asset age and health as measured by the CMvariables, it is necessary to predict future values of the

123

J Intell Manuf (2012) 23:303–311 305

key variables. It is not clear how this should be done froma simple plot.

4. The simplest version of such a plot does not take intoaccount minor maintenance actions taken on the asset,such as component replacements, oil changes, or oil addi-tions. For example, if in fact there were oil changes orsignificant oil additions made at 400 and 800 h, the plotwould not be an accurate representation of the amountof iron being accumulated in the oil, as is well known.

A way to address these shortcomings of such basic plotsis to make the CBM implementation more data-driven. Thekey CM variables which most strongly predict failures canbe determined by any number of mathematical and statisticalmodels, which combine CM data with the failure data. Math-ematical models for predicting future CM variables have alsobeen developed. We will describe one such set of mathemat-ical and statistical models with a particular focus on point 4above. We will show how the mathematical modelling canprovide better predictions if minor maintenance actions thataffect the CM variables are carefully considered.

Review of past work on the impact of minor maintenance

In the particular case of CM using oil records, it is verywell known among oil analysts that accurate interpretationof trends in the data require knowledge of oil changes andoil additions. In Mayer (2007) it is advised that upon receiptof an unexpected oil reading, the asset owner should deter-mine if an oil change or oil addition was performed but notproperly recorded. An ideal “body of knowledge” for the oilanalyst is enumerated in Fitch (2006), with “lubricant servicehistory” and “repair/failure history” being two out of the 16recommendations listed by the author. In Reilly (2000) it isproposed that lubrication management and lubrication analy-sis systems be integrated, with one of the principal concernsbeing that “… the lubricant analyst is handicapped by nothaving easy access to data in the lubricant scheduling sys-tem. The lubrication schedule within the plant, and the typesof additions and changes made to the active lubricant withinthe machine will have a profound effect on lubricant analysisresults.”

Several simple data-driven adjustments to the basic plotillustrated in Fig. 1 above have been suggested in the oil anal-ysis literature. Clearly, one should not attempt to implementany trending technique using oil analysis data points beforeand after an oil change. This is common knowledge, as sug-gested in Toms (1996) who notes that “few oil analysts wouldmisinterpret the data shown” in a plot of oil readings versustime that also noted when significant oil additions were made.One very simple suggestion is made in Spano (2008): take

each sample at the same oil age. This would indeed controlfor any variation due to oil changes, but it would not workif the sampling frequency is greater than the oil change fre-quency and it would not account for smaller oil additions, inaddition to being impractical to implement.

A slightly more sophisticated treatment of minor main-tenance data appears in Mayer (2005). It gives the commonrecommendation to use the rate of change of oil readingsas opposed to the raw trend, and also to adjust these ratesof change by a “normalization factor” of 1 + v/V wherev is the amount of oil added since the last reading andV is the oil capacity of the system. In theory, using therate of change of CM readings, which in the case of oilanalysis would mean using oil reading divided by oil age,would take into account oil changes and would be moresensitive to increased machine wear. However, that sensi-tivity comes at a cost. In practice, oil analysis results aresubject to large amounts of variability (Mayer 2007; Fitch2006; Reilly 2000), which will only be exacerbated afterthe rate-of-change transformation, especially at lower oilages, and will possibly include negative rates of change.The normalization factor is merely a multiplier of the rateof change, so a decrease in a wear metal reading (whichis perfectly plausible if a large amount of oil were addedsince the last sample) would result in a negative rate ofchange, made even larger after being multiplied by the nor-malization factor. The proposed normalization factor leadsto better information only if the rate of change in thedata is always positive (or always negative) and has rel-atively low variation. The normalization factor also pro-vides information about an increase in oil consumption overtime, but that is not directly related to the amount of wearparticles.

The references in this section are drawn from the pro-fessional literature on oil analysis best practices. While theyaddress the need to consider minor maintenance, and suggestsimple data-driven techniques to adjust the raw data accord-ingly, the final result remains a plot of (adjusted) oil readingsversus time. None of the other limitations of such plots men-tioned in the previous section are addressed.

There is a large number of academic papers publishedconcerning mathematical models that can be used to deter-mine the critical CM variables, produce optimal decisions,and predict the future life of an asset. A recent review of thisliterature with an extensive list of references is also containedin Jardine et al. (2006). This body of work is largely silenton important practical matters related to the challenges ofreal industrial data. This shortcoming is discussed in manypapers calling for greater cooperation between industry andthe academy, with the review in Scarf (1997) being a thor-ough example. One CBM methodology that does considerthe intricacies of industrial data, and does consider minormaintenance actions such as oil changes, is described in

123

306 J Intell Manuf (2012) 23:303–311

depth in Banjevic et al. (2001), and will be the methodologyused in this paper to illustrate the impact of minor mainte-nance.

There is also a vast literature concerning the seem-ingly related field of imperfect repair, recently reviewedin Lindquist (2006), and we would like to distinguish theresults in this paper from that field of work. A recent imple-mentation of this methodology appears in Lugtigheid et al.(2004) using a repair maintenance indicator estimated fromthe component repair history. In the imperfect repair litera-ture, maintenance actions might be described as restoring asystem to some state between bad-as-old and good-as-new,depending on the extent of repair made. While minor main-tenance actions, such as oil changes and oil additions, couldbe thought of as repair actions that restore the system toits bad-as-old state, the field of imperfect repair generallyconcerns itself with component repairs and replacements.In the context of condition-based maintenance, this usuallymeans that the CM data will be used to monitor the healthof specific components, and models for the impact of com-ponent repairs on the system itself are considered. In somecases the lubricant is indeed treated as a component, andoil analyses are conducted not only to detect system deg-radation but also to determine the quality of the oil itselfwith a view toward optimizing the oil change interval. Butin other cases, lubrication changes are done according to aschedule, and lubricant additions are performed simply torestore the maximum amount of oil without specifically con-sidering the health of the oil itself. In this paper it is irrele-vant whether or not the lubrication system is considered tobe a specifically monitored component of the overall sys-tem, as the maintenance model for the overall system will beaffected the same way by oil changes whether they are con-ducted on a schedule or in reaction a particular oil analysisresult.

A data-driven maintenance model that accounts forminor maintenance actions

The Centre for Maintenance Optimization and ReliabilityEngineering at the University of Toronto has developed aCBM methodology that combines equipment age data, con-dition monitoring data, and data respecting the relative effectsof failure and preventive replacement, in a way that can pro-duce an optimal maintenance decision policy and also thereliability function given the equipment’s age and current CMvariable readings. The theory is fully elaborated in Banjevicet al. (2001) and Banjevic and Jardine (2006). A selectionof applications of this theory using real industrial data isdescribed in Vlok et al. (2002), Montgomery et al. (2006),and Sundin et al. (2007). We will provide a brief review of

this theory emphasizing where minor maintenance actionscan have an important effect on the model.

Description of the model

The basic model consists of a continuous-time non-homo-geneous discrete Markov process V (t) = (I (T > t), Z(t)),where I (T > t) is simply the indicator process taking onvalue 1 if the system is alive at time T and 0 otherwise, andwhere Z(t) represents the condition monitoring indicators,which are in general time-dependent and in the form of avector. In our model Z(t) takes on a finite number of states,0, 1, 2, . . . , m. We will call Z(t) the covariate process. Inpractice, many components of the vector of condition moni-toring indicators would be real valued; such components arediscretized into a finite number of intervals. The mathemat-ical analysis of a Markov process begins with its transitionprobabilities Li j (x, t), which we will expand into the prod-uct of two factors of particular interest as follows:

Li j (x, t) = P(T > t, Z(t) = j |T > x, Z(x) = i)

= P(T > t |T > x, Z(x) = i) (1)

P(Z(t) = j |T > t, Z(x) = i).

The first factor is the conditional probability of survivingpast time t given that the equipment was operating at timex and the covariate process was in state i , and representsthe influence of the covariate process on the failure time.The second probability is the conditional probability of thecovariate process moving from state i at time x to state j attime t , given survival past time t , and represents the evolutionof the covariate process in time.

The factor P(T > t |T > x, Z(x) = i) can be estimatedfrom data in many ways. Our methodology estimates thisprobability over a short interval using the hazard function asfollows:

P(T > x + �x |T > x, Z(x) = i)

= 1 − P(x < T ≤ x + �x |T > x, Z(x) = i) (2)

≈ 1 − h (x, Z(x) = i)�x,

which holds since h(x, Z(x) = i) = lim�x↓0 P(x < T ≤x + �x | T > x, Z(x) = i)/�x by definition. The hazardfunction is then modelled using a proportional hazards model(PHM) with a Weibull baseline hazard function as follows:

h (t, Z(t);β, η, �γ ) = β

η

(t

η

)β−1

exp ( �γ · Z(t)). (3)

The parameters (β, η, �γ ) of this hazard function are esti-mated from the data using the method of maximum likeli-hood as described fully in Banjevic et al. (2001).

Consider the second factor in (1), P(Z(t) = j |T > t,Z(x) = i), describing the covariate process. It consists ofa matrix of probabilities of moving from state i at time x

123

J Intell Manuf (2012) 23:303–311 307

to state j at time t. These unknown probabilities are alsoparameters that can be estimated, first by estimating transi-tion rates using the empirical occurrence-exposure rates, thenusing these rates to calculate the transition probabilities. SeeBanjevic et al. (2001) for details.

Model enhances the basic control chart technique

In section “shortcomings of simplest CBM implementations”four shortcomings of the control chart CBM methodology(simply using plots of CM variables versus working age)were enumerated. This model as we have described it pro-poses solutions to the first three.

The first point concerns the determination of the mostimportant CM variables and the possible effect of age onthe failure hazard. The maximum likelihood method used toestimate the parameters (β, η, �γ ) of the hazard function alsogives standard errors for the parameter estimates. Standardstatistical tests can then be used to determine which compo-nents of the vector �γ are significantly different from 0, and ifβ is significantly greater than 1, meaning that over and abovethe information provided by the CM variables in the model,there remains a significant impact of age on hazard. (Notethat the scale parameter η has no meaningful interpretationin this model.)

The second point concerns the setting of optimal warningand action limits. The CBM methodology described in thissection can calculate the optimal time at which to do preven-tive maintenance. The asset owner determines the total costof preventive repair, K , and the total cost of repair due tofailure, C. The optimal hazard level d∗ that minimizes thelong term average cost per unit time can be calculated as thevalue that minimizes this cost function:

�(d) = C(1 − Q(d)) + (C + K )Q(d)

W (d), (4)

where Q(d) is the probability of equipment failure undera replacement policy that mandates replacement when thehazard is d and W (d) is the expected life of equipment underthat same replacement policy. The minimization of this costfunction is complex and is described in detail in Banjevicet al. (2001). What is important to note about this minimiza-tion procedure is that it depends substantially on P(Z(t) =j |T > t, Z(x) = i) from Eq. (1), the model for the evolutionof the covariate process in time.

The third point in section “shortcomings of simplest CBMimplementations” concerns the requirement to predict theevolution of the covariate process in time, which was requiredfor the minimization of the cost function in the previous par-agraph. Our model accounts for this need.

Impact of minor maintenance actions on the describedmodel

The fourth shortcoming of the basic control chart techniqueconcerns the impact of minor maintenance actions. We willdescribe some known and some new techniques that involveaugmenting the dataset that can address this impact.

Incorporating minor maintenance informationinto augmented datasets

The data required to estimate the parameters in the modeldescribed in section “Description of the model” consist ofequipment histories. An equipment history itself consists ofthe working age at the beginning of the history (assumed to be0, i.e. the equipment is good-as-new), the CM observations,the working age at the end of the history, and whether thehistory is right-censored, which could be due to preventivemaintenance having renewed the equipment, or simply thatthe equipment is still in operation. Take the example datafrom Fig. 1 above in table form, shown in Table 1. Iron ismeasured in parts per million, abbreviated as ppm.

This history cannot be used in the statistical estimationprocedures described in section “Description of the model”for the simple reason that there are missing entries in thetable. Reasonable values for the iron readings must be pro-vided. In the case of a wear metal like iron, it is reasonableto enter 0 for the beginning of the history. If the oil readinghad concerned a component of an oil additive (calcium is acommon example), than a reasonable value to enter wouldcorrespond to whatever amount is added to the oil. Also, thishistory is censored because the equipment is still operatingas of the last observation; in this case the last observed valueof 34 would of course be repeated in the last line of the table.

In practice, however, CM variables are observed only atnearly regular intervals, so it would be just as likely that theworking age at the end of the history would have in fact beensome value between 1,000 and 1,100. Also, if the equipmentshould ever fail, it is likely that we will not know the exactvalue of the CM variables at failure. It would be unwise tomeasure them at that time, since CM variables are best col-lected in as uniform a manner as possible, and measuringthem after the unit has failed would be a clear violation ofsuch a common sense policy. In addition, after a failure theoil may be contaminated. So some method is required to pro-vide plausible values for CM variables at critical juncturessuch as the end of a history. Any sort of trending or predictionmethod could be employed, but given the typically variablenature of real industrial data there is some risk of imput-ing an unreasonable value at history ending. In our methodBanjevic et al. (2001) we use a conservative approach, whichis to simply repeat the last observed value at the end of thehistory.

123

308 J Intell Manuf (2012) 23:303–311

Table 1 Sample history basedon Fe data from Fig. 1 Event Working age Fe in ppm

Beginning of history 0 Missing

Oil sample 100 4

Oil sample 200 15

Oil sample 300 17

Oil sample 400 27

Oil sample 500 17

Oil missing 600 32

Oil sample 700 30

Oil sample 800 35

Oil sample 900 27

Oil sample 1,000 34

Ending of history (suspension—equipment still in use) 1,000 Missing

If there is a minor maintenance action during the equip-ment’s history that does not substantially affect the healthof the equipment, but which does have a substantial effecton the CM observations, the equipment history will be moreaccurate if the minor maintenance action is included in thehistory. Continuing the example from Table 1, suppose therewere complete oil changes taken immediately after samplingat 400 and 800 h. Then the history would appear as in Table 2.

In this table imputed values of 0 for iron have been addedto correspond with a reasonable value of such a wear metalafter an oil change. Default values for other oil analysis read-ings, such as water content, kinematic viscosity, etc., will bealso set based on values expected of new oil. Using a defaultvalue of 0 for a wear metal, for example, is not necessarilythe best strategy. In De Sousa (2005) it is recommended totake into account residual oil that remains even after the oil isdrained and refilled. If there were data available on expectedoil readings immediately after oil changes then it would bepossible to include imputed values into histories that accountfor residual oil. In our experience such data is rarely availablein practice.

In some cases not only are oil changes performed, but alsosimple oil additions, for the purpose of ensuring the correctamount of lubricant is in the system in cases where there arelubricant leaks or oil is being consumed in some way. We arefinding that some asset owners are beginning to take this levelof data collection more seriously. The paper Mayer (2005)attempts to adjust control chart data for oil additions, but aswe point out in the introduction, it does not do correctly.

We can modify the logic employed to impute values atoil change events as demonstrated in Table 2. The defaultvalues will be denoted by Zd and can be used to calculatedata-driven values to be entered at oil addition events in thehistory. Suppose the total oil capacity is G litres, and at timet an oil addition of g litres takes place. Denote the last avail-able observed oil readings prior to time t by Z(t−). Then

the values to be used at oil addition events can simply be theweighted average of the default values of fresh oil (Zd) andthe last observed values (Z(t−)), weighted by the amount offresh oil added:

Zd · g + Z (t−) · (G − g)

G. (5)

A complete oil change is simply the case g = G. This methodis reasonable for measurements related to components of theoil, such as metal values, that are simply being diluted withthe addition of fresh oil. It may not be reasonable for inher-ent measures of the quality of the oil itself, such as kinematicviscosity. For such measures a more careful considerationof the physics involved should be employed to determine aplausible imputed value at an oil addition.

Consideration of minor maintenance actions in the modelcomputations

The impact of minor maintenance on the estimation of theparameters for the hazard model is the most challenging tomeasure directly. The observed likelihood function used toestimate these parameters will change when the dataset isaugmented to account for minor maintenance, but it is notclear that the hazard model is being improved, aside fromthe common sense notion that more accurate data will leadto a more accurate model.

The transition probability estimation procedure accountsfor minor maintenance simply by ignoring an instance ofan observed transition between state i and state j if therewas a minor maintenance action that occurred between thetwo observed values. Such transitions do no contribute to theoccurrence-exposure rate used to estimate the transition ratebetween the two states. These adjusted computations moreaccurately reflect the true evolution of the covariate process.

123

J Intell Manuf (2012) 23:303–311 309

Table 2 Sample history with oilchanges included Event Working age Fe in ppm

Beginning of history 0 0 (imputed)

Oil sample 100 4

Oil sample 200 15

Oil sample 300 17

Oil sample 400 27

Oil change 400 0 (imputed)

Oil sample 500 17

Oil sample 600 32

Oil sample 700 30

Oil sample 800 35

Oil change 800 0 (imputed)

Oil sample 900 27

Oil sample 1,000 34

Ending of history (suspension—equipment still in use) 1,000 34 (imputed)

The computation of the optimal maintenance decision andconditional reliability functions should also take into accountminor maintenance actions, but only in an average sense. Theuser of the model needs to supply an average regular main-tenance interval. Then, when the model computes predictedvalues of the covariate process it accounts for the fact thatCM variable values will be periodically reset to the defaultvalues Zd . Without this adjustment the model will result inpredicted histories that are too short.

In the statistical literature there is the notion of influ-ence diagnostics, in which the influence of individual recordswithin a dataset is considered. Some work on influence diag-nostics applied to the analysis of reliability data has beendone and results are summarized in Reid (2006). In this work,however, only the influence of entire histories are considered,and not the exclusion of individual records within a history.A rigorous analysis of the impact of individual records withina history would be a topic for advanced statistical research.

Gearbox oil analysis case study

To illustrate the methodology we use a CBM dataset con-sisting of maintenance records from log cards and spectro-scopic oil analysis programme (SOAP) analysis records ona fleet of gearboxes. The data compiled thus far has resultedin 105 usable histories, with 72 histories ending in failure,12 censored histories due to an age-based refurbishment pol-icy, and 21 histories involving gearboxes still in operationwhen the latest oil records were available. In a full analysisof this dataset, the 72 failures would be classified into failuremodes, with each failure mode to be analysed separately. For

the purpose of this case study, however, we will consider allfailures together.

For these 105 histories there are 1,732 oil analysis records.The oil readings available include iron, copper, silver,aluminium, titanium, chromium, magnesium, silicon, zinc,and lead. There were only 10 oil changes explicitly recordedon the log cards, however, one of the oil analysis recordsalso contain another very important variable: oil additions.The oil capacity in the gearbox is approximately 29 l. Thereare 81 records indicating an oil addition of 29 l, which is infact an oil change. There are some records of oil additionsnearing 60 l. These refer to actions designed to flush out thegearbox by, in effect, performing two oil changes. In addi-tion to the oil addition readings that are really indicationsof oil changes, there are 194 records of oil additions of lessthan 29 l. Oil additions occur on average every 150 operat-ing hours. An analysis of the oil readings suggest that not alloil additions were recorded in the data. There are some oilrecords that show a substantial decline in all readings, but nooil additions were recorded. For these cases we have assumedthat oil changes did in fact take place and adjusted the dataaccordingly. We will consider two versions of this dataset, Aand B. Both datasets will have the same number of records:one for each oil analysis, each oil change, and each oil record,but will differ in the following ways: Dataset A will ignoreminor maintenance records by entering the last measured oilreading whenever an oil change or oil addition took place.Dataset B will account for complete oil changes by usingEq. (5) together with default values Zd and oil capacity Gprovided by the gearbox operator.

First we determine the key CM variables in the Weibullproportional hazards model, as described in section “Modelenhances the basic control chart technique”. For both datasets

123

310 J Intell Manuf (2012) 23:303–311

Table 3 Weibull PHMparameter estimates (withstandard errors)

Parameter Dataset A Dataset B

Scale η 2476 (914) 1912 (791)

Shape β 1.41 (0.21) 1.38 (0.16)

Silicon [Si] 0.31 (0.11) 0.33 (0.10)

Iron [Fe] 0.059 (0.019) 0.049 (0.009)

the same oil analysis variables, silicon and iron, are found tosignificantly affect the hazard. The impact of age on hazard,given the information already provided by the silicon andiron, is of borderline significance. The parameter estimates(with their standard errors in parentheses) are summarized inTable 3.

We notice that the standard errors are lower for all esti-mates from Dataset A than from Dataset B, and significantlyso for the parameter associated with iron. It is plausible thatthe more accurate portrayal of iron levels used in Dataset Blead to a more precise parameter estimate, which may notalways be the case.

Does accounting for minor maintenance improve themodel?

Clearly the inclusion of minor maintenance records in a data-set and their correct treatment in computations will make adata-driven CBM model a better reflection of reality. Engi-neers and asset managers will be more likely to adopt anadvanced CBM methodology if it is seen to take seriouslythe complexities of real industrial data. It is less obvious thatthe inclusion of minor maintenance records will result insignificantly better models and predictions. We propose sev-eral different criteria by which the inclusion of maintenancerecords might be measured.

We will compare the fit of the PHM to the data, using thestandard Kolmogorov–Smirnov (K-S) goodness-of-fit statis-tic. We will also calculate the optimal replacement decisionsproduced by both datasets and apply them to the data withand without minor maintenance records included, to assesswhich approach would result in lower average cost per unittime. The optimal decision policy will be calculated usingan 8:1 cost ratio for failure replacement versus preventivereplacement. Finally, we will use both datasets to predict theremaining life of each failed gearbox at 200 h prior to its trueending, to assess which approach results in more accurateresidual life predictions. For Dataset B we will specify anaverage oil addition interval of 150 h.

The K-S test accepts the null hypothesis of PHM modelfit for both datasets. The fit is better for Dataset B with anobserved K-S statistic of 0.83 (p-value 0.81), than for Data-set A with an observed K-S statistic of 0.106 (p-value 0.21).Next, when we compared the performance of the optimalmaintenance strategy computed using both datasets we found

that they both would have prevented 42 out of the 72 failures,but that Dataset B would have preserved 67.4% of averageuseful life versus 63.1% preserved by Dataset A, resultingin a lower cost per unit time when using Dataset B. Thiswas to be expected, since the computations using Dataset A,which are devoid of the periodic oil readings resets causedby oil changes and additions, will predict a more rapid longterm rise in iron and silicon levels, thus proposing preventiveremovals too soon. This longer lifetime prediction was alsonoted when predicting the final 200 h of gearbox life, Data-set A predicted an average remaining life of 189 h (standarderror 18 h) and Dataset B predicted an average remaininglife of 211 h (standard error 21 h). The average pairwisedifference was 22 h with a standard error of 10.1 h, so Data-set B produced significantly longer remaining life estimates(p-value 0.03). These results for the two datasets did notreveal dramatic differences, perhaps owing to the large vari-ability normally present in real SOAP data.

Conclusions

For one well-established general CBM methodology, appli-cable to a wide variety of situations, we have explained theimpact of minor maintenance actions on the model, and haveprovided evidence that the careful consideration of minormaintenance actions can give results that are plausibly moreaccurate. It has been advocated that the maintenance recordsbe kept with more care and attention to the details of allmaintenance actions, even relatively minor actions such asoil changes and oil additions. We believe our preliminaryresults should provide motivation for asset owners to con-tinue to collect such detailed and accurate data, and shouldalso convince researchers in condition-based maintenance toconsider all the intricacies of real industrial data. Our exam-ple uses SOAP data in particular, and introduces the adjust-ment to this data using Eq. (5). With adequate engineeringand physics knowledge of asset and the condition monitor-ing techniques used, similar formulas might be developed tomore accurately account for minor maintenance actions nomatter what the equipment, the condition monitoring tech-nique employed, or the minor maintenance actions undertak-ing. The greater challenge might in fact be to ensure accuratemaintenance and CM records are kept, making such detailedanalyses possible.

123

J Intell Manuf (2012) 23:303–311 311

Acknowledgments We thank Mr. Tim Jefferis of UK Ministry ofDefence for his valuable support and insight, and for providing the data.This work was made possible through the research and financial contri-butions from industry members of the C-MORE consortium, NSERC,and the Ontario Centres of Excellence.

References

Banjevic, D., & Jardine, A. K. S. (2006). Calculation of reliabil-ity function and remaining useful life for a Markov failuretime process. IMA Journal of Management in Mathematical, 17,115–120.

Banjevic, D., Jardine, A. K. S., Makis, V., & Ennis, M. (2001). Acontrol limit policy and software for condition-based mainte-nance. Information, 39, 32–50.

Basawa, I. V., & Rao, P. (1980). Statistical inference for stochasticprocesses. London: Academic Press.

De Sousa, T. (2005). Factoring residual oil into wear rate. PracticingOil analysis Magazine.

Fitch, J. (2006). The Importance of provenance. Practicing Oil AnalysisMagazine.

Jardine, A. K. S., & Tsang, A. (2006). Maintenance, replacement,and reliability: Theory and applications. London: CRC Press.

Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review onmachinery diagnostics and prognostics implementing condition-based maintenance. Mechanics Systems and Signal Processing, 20,1483–1510.

Lindquist, B. H. (2006). On the statistical modeling and analysis ofrepairable systems. Statistical Science, 21, 532–551.

Lugtigheid, D., Banjevic, D., & Jardine, A. K. S. (2004). Model-ling repairable system reliability with explanatory variables andmaintenance actions. IMA Journal of Management in Mathemat-ical , 15, 89–110.

Mayer, A. (2007). What to do when your oil analysis results gosouth. Practicing Oil Analysis Magazine.

Mayer, A. (2005). Understanding time-dependent limits. PracticingOil Analysis Magazine.

Montgomery, N., Lindquist, T., Garnero, M.-A., Chevalier, R., &Jardine, A. K. S. (2006). Reliability functions and optimaldecisions using condition data for EDF primary pumps. 9thInternational conference on probabilistic methods applied topower systems (PMAPS), Stockholm, Sweden, June 11–15.

Reid, N. (2006). Influence function in survival analysis. InP. K. Anderson & N. Keiding (Eds.), Survival and event historyanalysis (pp. 252–254). New York: Wiley.

Reilly, S. (2000). Integrated management of lubrication and lubricantanalysis information - Part I - the case for an integrated system.Practicing Oil Analysis Magazine.

Scarf, P. A. (1997). A Framework for condition monitoring andcondition based maintenance. Quality Technology & QuantitativeManagement, 99, 493–506.

Spano, A. (2008). Increasing accuracy in lubrication testing. www.reliabilityweb.com, Accessed Feb 18. http://www.reliabilityweb.com/art05/accurate_lube_test.htm.

Sundin, P., Montgomery, N., & Jardine, A. K. S. (2007). Pulpmill on-site implementation of CBM decision support software.International conference of maintenance societies, Melbourne,Australia, May 22–24.

Toms, L. A. (1996). Adaptive trend analysis—A simple solutionto data variability. Proceedings of the technology showcase, inte-grated monitoring, diagnostics, and failure prevention conference,Mobile Alabama, University of Wales, Swansea (pp. 48–53).

Vlok, P. J., Coetzee, J. L., Banjevic, D., Jardine, A. K. S., &Makis, V. (2002). Optimal component replacement decisions usingvibration monitoring and the proportional-hazards model. TheJournal of the Operational Research Society, 53, 193–202.

123