APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

12
Charles R. (Charlie) Rutan is Senior Engineering Advisor, Specialty Engineer- ing, with Lyondell Chemical Company, in Alvin, Texas. His expertise is in the field of rotating equipment, hot tapping/plugging, and special problem resolution. He has three patents and has consulted on turbo- machinery, hot tapping, and plugging problems all over the world in chemical, petrochemical, power generation, and polymer facilities. Mr. Rutan received his B.S. degree (Mechanical Engineering, 1973) from Texas Tech University. He is a member of the Advisory Committee of the Turbomachinery Symposium, and has published and/or presented many articles. ABSTRACT There are several methods of analysis to define the reliability of the critical rotating equipment at various facilities within a company. Part I of this tutorial is intended to present several of these methods. The following Part II presents the calculations. INTRODUCTION In the mid 1980s, the author’s supervisor gave him an “opportu- nity” to better define the critical machinery reliability as turnarounds were extended. The task was to develop an equation that would produce a number to be used as a guide such that the plant management would have a feeling as to the risk for postpon- ing the next olefin unit turnaround or minimizing the amount of inspection performed on the major rotating equipment, e.g., minor or major overhaul(s). Based on the critical machinery on the history, design, vibration, and process conditions he developed a weighted number that he tried to use as an indicator to justify the overhaul requirements. At the time, this was not accepted well by the plant management because other facilities in the company had longer/shorter shutdown intervals as well as published reports of other companies in the same commodity chemical industry and the “Solomon” report that was published every two years did not agree with his conclusions derived from this number. During this period of time there were two methods: Kepner-Tregoe’s ® Problem Solving and Decision Making Managerial Analytics, a Monsanto Chemical Company event analysis system In later years several other methods of potential problem and/or root cause analysis—hazard and operability analysis (HAZOP), Federal Emergency Management Agency (FEMA), failure mode effect and criticality analysis (FMECA), event-tree analysis (ETA), Delphi, method organization for a systematic analysis of risks (MOSAR), management oversight risk tree (MORT), Weibull Analysis, and Six Sigma—have been developed to aid in quantify- ing the risk of extending the operational time of the critical turbomachinery. KEPNER-TREGOE ® Kepner-Tregoe ® Problem Solving and Decision Making (PSDM) is a step-by-step process that helps people resolve business. Used in organizations worldwide, PSDM helps individu- als, groups, and/or teams efficiently organize and analyze vast amounts of information and take the appropriate action. This process provides a framework for problem solving and decision making that can be integrated into standard operating pro- cedures. It is used to enhance other operational improvement tools such as Six Sigma, Lean Manufacturing, and others. PSDM comprises four distinct processes: Situation Appraisal is used to separate, clarify, and prioritize concerns. When confusion is mounting, the correct approach is unclear, or priorities overwhelm plans, Situation Appraisal can be used. Problem Analysis is used to find the cause of a positive or negative deviation. When people, machinery, systems, or processes are not performing as expected, Problem Analysis points to the relevant information and leads the way to the root cause. Decision Analysis is used for making a choice; it is intended to clarify the purpose and balances risks and benefits to arrive at a supported choice. Potential Problem/Opportunity Analysis is used to protect and leverage actions or plans. Potential Problem Analysis should define the driving factors and identifies ways to lower risk. When one action is taken, new opportunities, good or bad, may arise. These opportunities must be recognized and acted on to maximize the benefits and minimize the risks. MANAGERIAL ANALYTICS Managerial Analytics (MA) is an analytical identification process developed by the Monsanto Company. Managing change and using change to manage requires the use of some combination of the five basic analytical processes. Event Analysis is a systematic process to identify events and events of change that impact the reliability of the turbomachinery. Events and events of change from the past and present have an impact on the present and future reliability. Events of change could be the stopping of wash oil injection or the rising of the sodium concentration in the steam or not repairing the spare rotor. Step 1. First person responsibility Step 2. Recognize events and their relationships Step 3. Establish priority Step 4. Separate and sequence components 193 APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I—RELIABILITY PROCESSES by Charles R. Rutan Senior Engineering Advisor, Specialty Engineering Lyondell Chemical Company Alvin, Texas

Transcript of APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Page 1: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Charles R. (Charlie) Rutan is SeniorEngineering Advisor, Specialty Engineer-ing, with Lyondell Chemical Company, inAlvin, Texas. His expertise is in the field ofrotating equipment, hot tapping/plugging,and special problem resolution. He hasthree patents and has consulted on turbo-machinery, hot tapping, and pluggingproblems all over the world in chemical,petrochemical, power generation, andpolymer facilities.

Mr. Rutan received his B.S. degree (Mechanical Engineering,1973) from Texas Tech University. He is a member of the AdvisoryCommittee of the Turbomachinery Symposium, and has publishedand/or presented many articles.

ABSTRACT

There are several methods of analysis to define the reliability ofthe critical rotating equipment at various facilities within acompany. Part I of this tutorial is intended to present several ofthese methods. The following Part II presents the calculations.

INTRODUCTION

In the mid 1980s, the author’s supervisor gave him an “opportu-nity” to better define the critical machinery reliability asturnarounds were extended. The task was to develop an equationthat would produce a number to be used as a guide such that theplant management would have a feeling as to the risk for postpon-ing the next olefin unit turnaround or minimizing the amount ofinspection performed on the major rotating equipment, e.g., minoror major overhaul(s). Based on the critical machinery on thehistory, design, vibration, and process conditions he developed aweighted number that he tried to use as an indicator to justify theoverhaul requirements. At the time, this was not accepted well bythe plant management because other facilities in the company hadlonger/shorter shutdown intervals as well as published reports ofother companies in the same commodity chemical industry and the“Solomon” report that was published every two years did not agreewith his conclusions derived from this number. During this periodof time there were two methods:

• Kepner-Tregoe’s® Problem Solving and Decision Making

• Managerial Analytics, a Monsanto Chemical Company eventanalysis system

In later years several other methods of potential problem and/orroot cause analysis—hazard and operability analysis (HAZOP),Federal Emergency Management Agency (FEMA), failure modeeffect and criticality analysis (FMECA), event-tree analysis (ETA),Delphi, method organization for a systematic analysis of risks

(MOSAR), management oversight risk tree (MORT), WeibullAnalysis, and Six Sigma—have been developed to aid in quantify-ing the risk of extending the operational time of the criticalturbomachinery.

KEPNER-TREGOE®

Kepner-Tregoe® Problem Solving and Decision Making(PSDM) is a step-by-step process that helps people resolvebusiness. Used in organizations worldwide, PSDM helps individu-als, groups, and/or teams efficiently organize and analyze vastamounts of information and take the appropriate action.

This process provides a framework for problem solving anddecision making that can be integrated into standard operating pro-cedures. It is used to enhance other operational improvement toolssuch as Six Sigma, Lean Manufacturing, and others.

PSDM comprises four distinct processes:

• Situation Appraisal is used to separate, clarify, and prioritizeconcerns. When confusion is mounting, the correct approach isunclear, or priorities overwhelm plans, Situation Appraisal can beused.

• Problem Analysis is used to find the cause of a positive ornegative deviation. When people, machinery, systems, or processesare not performing as expected, Problem Analysis points to therelevant information and leads the way to the root cause.

• Decision Analysis is used for making a choice; it is intended toclarify the purpose and balances risks and benefits to arrive at asupported choice.

• Potential Problem/Opportunity Analysis is used to protect andleverage actions or plans. Potential Problem Analysis shoulddefine the driving factors and identifies ways to lower risk. Whenone action is taken, new opportunities, good or bad, may arise.These opportunities must be recognized and acted on to maximizethe benefits and minimize the risks.

MANAGERIAL ANALYTICS

Managerial Analytics (MA) is an analytical identificationprocess developed by the Monsanto Company. Managing changeand using change to manage requires the use of some combinationof the five basic analytical processes.

• Event Analysis is a systematic process to identify events andevents of change that impact the reliability of the turbomachinery.Events and events of change from the past and present have animpact on the present and future reliability. Events of change couldbe the stopping of wash oil injection or the rising of the sodiumconcentration in the steam or not repairing the spare rotor.

Step 1. First person responsibilityStep 2. Recognize events and their relationshipsStep 3. Establish priorityStep 4. Separate and sequence components

193

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY

PART I—RELIABILITY PROCESSES

byCharles R. Rutan

Senior Engineering Advisor, Specialty Engineering

Lyondell Chemical Company

Alvin, Texas

Page 2: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Step 5. Identify intended nominal first person resultsStep 6. Identify resolution resources, threats, and opportunitiesStep 7. Identify intended nominal first person actionStep 8. Select the analytical processesStep 9. Statement for the first analysis

• Deviation Analysis is a process to help determine the unknowncause of an observed effect deviating from a standard effect inorder to decide on action. This is used for both positive andnegative deviations and in other words events of change that havehappened.

Step 1. Statement of the deviation effectStep 2. Deviation effect specificationsStep 3. Unique characteristics of the “involved” dimensionsStep 4. Change eventsStep 5. Possible causesStep 6. Test possible causesStep 7. Set priority on possible causesStep 8. Verification of cause

• Action Analysis is used to select a single course of action fromseveral courses of action. This is based on the projected perform-ance of the action to attain a set of desired effects and assessing theimpacts of and to that action if it were taken.

Step 1. Statement of the intended nominal actionStep 2. Set the desired effectsStep 3. Classify the desired effectsStep 4. Weight the want effectsStep 5. Generate action alternativesStep 6. Filter the action alternatives through “must” effectsStep 7. Score action alternatives to the “wanted” effectsStep 8. Impact evaluation

Step a. Identify an action alternativeStep b. Identify impactsStep c. Identify potential deviationsStep d. Assess impacts on the environmentStep e. Summarize potential deviations and their likely causesStep f. Plan actions to manage likely causesStep g. Plan actions to manage potential deviations

Step 9. Make best balanced selection

• Action Planning helps to decide “What will be done?”, “Whowill do it?”, and “When it will be done” to reach one or moredesired future effects that are not expected to occur unlesssomething is done. It involves a set of interrelated and interde-pendent actions.

Step 1. Define the action requiredStep 2. Define person(s) who have the prime responsibility tocomplete the required actionStep 3. Define support resourcesStep 4. Define the date and time to initiate the actionStep 5. Define the projected time of completion of the action

• Potential Deviation Analysis is a process used to examine aplanned action or a future event of change for significant impactsand deviations, and to plan additional actions to manage theseresults.

Section 1. Statement of the intended nominal actionSection 2. Identification of potential deviations

Step a. Identify a planned actionStep b. Identify impactsStep c. Identify potential deviationsStep d. Assess impacts to the environmentStep e. Summarize potential deviations and their likely causes

Section 3. Action planning for potential deviationsStep a. Plan actions to manage likely causeStep b. Plan action to manage potential deviationStep c. Revise the action plan

ROOT CAUSE ANALYSIS (RCA)

Define the Problem

Cause and Effect

There are five elements of a cause and effect chart.

1. Primary effect:a. A singular effect of consequence that we wish to eliminate or

mitigateb. The “what” in the problem definitionc. It is always the most present cause in the analysis and fre-

quently the most significant.d. It is the point at which we begin to ask “why.”

2. Actions and condition causes:a. They are both causes, actions are momentary and conditions

exist over time.b. Actions can become conditions and conditions can become

actions.c. Actions and conditions interact to create effect.

3. Casual connection caused by:a. Forces that cause going from present to past.b. Elicits a more specific responsec. Minimizes storytellingd. If a cause “connects” it adds to the visual dialogue and

therefore has value.

4. Evidence is the data that supports a conclusion and is presentedas:

a. Sensed—It is processed through our senses of sight, sound,smell, taste, and touch. Sensed is the highest quality evidence.

b. Inferred—The ability to infer is derived from our understand-ing of known and repeated casual relationships. Inference is thenext highest quality of evidence.

c. Evidence is important because:i. It supports the reality of any single cause.ii. Solutions should only be applied to evidence-based causes.iii. It minimizes the influence of politics and power plays.

5. “Stop” or a “?”

Problem Definition

There are four elements of problem definition.

1. What—What is the problem?2. When—When did it happen?3. Where—Where did it happen?4. Significance—Why are we working on this?

Identify Effective Solutions

The root cause is not what we seek, it is effective solutions.

1. Challenge each cause.2. Offer possible solutions for each cause.

Implement the Best Solutions

1. Prevents recurrencesa. Prevents or mitigates this problemb. Prevents similar problemsc. Does not create additional problems or unacceptable situa-

tions

2. Within your controla. Your control may be you, your department, your company,

your suppliers, or your customers.b. Nature is not within your control.c. The facilitator is rarely the problem owner.

3. Meets your goals and objectivesa. The goals of the overall organizationb. The goals of your department or group

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005194

Page 3: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

c. Your individual goals and objectivesd. Must provide reasonable value

RELIABILITY ASSESSMENTOF ROTATING EQUIPMENT

Reliability assessment of rotating equipment (RARE) is one toolof an overall maintenance and turnaround strategy of criticalrotating equipment.

Origin of Initial Program and Concept

The Hartford Steam Boiler Inspection & Insurance Company(HSB) expected the companies they insured to follow the steamturbine overhaul intervals recommended by the original equipmentmanufacturer (OEM). Due to world market pressure and techno-logical advancements companies began extending the intervalsbetween major unit turnarounds. These companies depended onthe expertise of their rotating equipment engineers to help definehow long they could reliably operate between turnarounds. Theirgoal was to lengthen the interval between major overhaul outagesto coincide with the need for other inspections and testingmandated by government agencies or process needs. During the70s and 80s the chemical, petrochemical, and refining industrieswould perform a major overhaul of their critical machinery everyfour to six years as a standard practice. As operating companiesextended their turnaround intervals HSB became concerned aboutthe increased risk their customers were taking. At this point intime, data showing the effects or risks of stretching run timesbetween overhauls were not available. HSB decided to develop aprogram that would quantify these risks for their customers. M&MEngineering, a part of the HSB company at that time, was given theassignment to develop a tool to assess the reliability of steamturbines. The tool is called Steam Turbine Reliability AssessmentProgram (STRAP). HSB did not plan to use this program tostructure premiums. The vision was to require a STRAP analysiswhenever a company chose to run longer than the OEM recom-mended practice. If STRAP showed that they were at low risk thenHSB would authorize or accept the company’s plans, but if STRAPshowed the company to have a high risk they were told that theyneeded to implement some risk reduction procedures or improve-ments to mitigate some of these risks.

Consortium of Companies

With this vision, M&M Engineering pulled together a group ofrecognized industry experts in the turbomachinery field. Theexperts brought with them their individual company’s reliabilityassessment techniques, industry standards (API, ASME, etc.), rec-ognized industry best practices, new technologies, and mostimportant their personal experience.

Starting with a blank sheet of paper the group spent countlesshours defining, developing, categorizing, and weighing questionsand responses. The caveat for this group was when completed theywould take this program back to their respective companies anduse it as another tool to justify turbine improvements that wouldimprove reliability and extend run lengths, which then resulted ina significant savings of money.

The turbine data accumulated, which consisted of originaldesign specifications, history, uprates, upgrades, failures, and sitespecific information, were then entered into the programdeveloped by M&M Engineering. The algorithms in the programtook the weighted turbine data and generated a risk index number(RIN). The RIN is the number of days that the turbine will bedown, during the run extension. It is based on statistical informa-tion that is calculated from the data contained in the database forthe same general type of turbine. The program gives 25 specificitems to be addressed and calculates a return on the investment(ROI) using site pricing data. Armed with this information therotating equipment engineer and managers can make informeddecisions to accept the projected risk or to take corrective actions

to minimize or mitigate the risks that were defined. Like home andcar insurance, risk can never be eliminated but most risks can beminimized, then it becomes a decision on what level of risk theplant/company is willing to accept. With this information long-term shutdown strategy can be developed that is based on theoperation, maintenance, and reliability practices of the unit.

STEAM TURBINE RISK ASSESSMENT PROGRAM

1. General Information Data Sheeta. Plant specifics?b. Size or class of the turbine (five categories)?c. Age of the turbine?d. Manufacturer of the turbine?e. What is the turbine driving?f. When the turbine was last dismantled?g. Etc.?

2. Turbine Performance (Design and Actual)a. Horsepower?b. Speed?c. Inlet flow?d. Temperature?e. Pressure?f. Etc.?

3. Site and Utility Dataa. Location?b. Steam generation?c. Etc.?

4. Construction Featuresa. Type of turbine?b. Critical speed?c. Control system?d. Etc.?

5. Sparesa. Complete turbines?b. Bearing sets?c. Complete rotor sets?d. Stationary diaphragms?e. Case?f. Nozzles blocks?g. Where are they stored?h. Couplings?i. Labyrinth seal sets?j. Control valve parts/governor valve assemblies?k. How many days are required to prepare the rotor for installa-

tion?l. Is there a periodic inspection for signs of corrosion on the

spare parts?

6. Maintenance and Repairsa. How many hours to the repair shop?b. How often do you drain water from the oil reservoir?c. Do you drain it at startup?d. Turbine overhauls?e. Are there documented detailed overhaul procedures for

this/similar turbines?f. How is the lube oil system cleaned during overhauls?g. Qualifications?h. Foundation inspection?i. Alignment?j. Inspections?k. Oil systems?

7. Turbine Operationa. Procedures?

i. Do the plant’s written steam turbine operating proceduresinclude:

(1) Prestartup checklist?(2) Overspeed tests?

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERYPART I—RELIABILITY PROCESSES

195

Page 4: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

(3) Putting turbine on slow roll?(4) Normal operations?

ii. Do you have a procedure for lining up sealing steam?iii. Is there a formal management of change?

b. Tests?i. When is the governor or valve rack exercised online?ii. When do you test the trip and throttle valve?iii. When do you test the nonreturn valve?

c. Qualifications?i. Do operators have the authority to shut down the turbine?

8. Monitoring and Protectiona. Which of the following parameters are monitored and

trended?i. Turbine steam inlet temperature?ii. Bearing metal temperatures?iii. Thrust?iv. Vibration?v. Lube oil pressure?

b. Do the following parameters trigger an operating–specificalarm?

i. Turbine steam inlet temperatures?ii. Bearing metal temperatures?iii. Thrust?iv. Vibration—radial?

c. Do the following parameters trigger an alarm followed by atrip?

i. Vibration—radial?ii. Thrust?

9. Upgrades—Have the following components been upgraded?a. Rotor assembly?b. Bearings?c. Casing?d. Coupling?e. Seals?f. Trip and throttle valve?g. Miscellaneous?

10. Steam Systema. What supplies steam to the turbine?b. What type of make up water does it use?c. What type of condensate polishing does this unit use?d. Is steam purity monitored?e. Etc.?

11. Past Failures/Problemsa. Turbine internal components?b. Bearings?c. Casing?d. Coupling?e. Seals?f. Trip and throttle valve and its components?g. Governor?h. Fouling?

12. Consequence Dataa. Plant production in dollars per day (minimum/maximum)?b. What is the cost if the turbine goes down?c. What other costs are there if the turbine shuts down?d. How does the plant typically handle a rub during the startup

of this turbine?e. How would the plant typically handle fouling of this turbine

during normal operation?

The total number of possible questions is about 3000, but for alarge steam turbine 350 to 400 questions are normally answered.

Question Weighting

The team of experts then weighted each of the 3000+ questionsand their respective answers based on their experience and

knowledge. As part of the development of failure probabilities, theteam needed to set a baseline interval for overhaul outages.

They decided to use a six year dismantle overhaul schedulefrequency for Class 1 to 4 turbines and a five year overhaulschedule frequency for Class 5 turbines. Thus, the baseline prob-abilities developed were based on the turbine operating for sixyears or five years without opening the case. Since risk is calcu-lated by multiplying probability times the consequence, thecalculated risk would be for a six year/five year interval. Thus,the risk calculated would be in days of lost production over a fiveor six year interval. Because some STRAP users do not divulgefinancial data about production revenue, converting the days inlost production to dollars cannot be performed for all turbines. Itwas decided to create the term “risk index number,” which wouldallow all turbines to be compared to each other. The RIN is therisk of failure in days of lost production over a six year intervalfor Class 1 to 4 turbines and over a five year interval for Class 5turbines.

On the basis of the input data and the likelihood-consequenceinformation, a risk for operation of the turbine may be calculatedas a function of time between the dismantle inspections. In eachcase, a quantified list of recommendations to mitigate the risk willalso be reported based on the greatest contribution to the risk.Inspection outage plans then may be tailored to optimize the timebetween overhauls on the basis of acceptable level of risk. The riskindex number is a number generated by the program based on:

• The questions and their corresponding answers.• Industry standards (ASME, API, etc.).• Accepted industry practices.• Latest technology.• Relative to other turbines in the company and/or the number ofturbines of the same design in the database.

Probability

The probability of failure of a component is the risk of failure indays of lost production (RIN) divided by the consequence of thatcomponent. Since risk is the product of probability time’s conse-quence there are questions that will significantly affect the RIN ifanswered with a poor option or not answered at all. An example ofthis is that the question of “testing of overspeed trip” and ananswer of “never tested” would have a significant impact on theRIN and the recommendations by increasing the probability offailure.

Program Aids

The program has several aids for the user in an effort to makethe output meet the user needs while providing the best output.Some of the aids are:

• Check for missing answers.• Check for inconsistent answers.• Etc.

Results and Comparisons

STRAP will compare the turbines with:

• Other turbines in the company.• All the turbines in the database.• Turbines in the same class.• Turbines by the same manufacturer.• Turbines in the same industry.• Etc.

Recommendations

STRAP makes recommendations to improve the reliability ofthe turbine(s) based on return of investment. It is then up to theengineers to decide which recommendation can be executed, inwhat order, and during an outage and/or online.

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005196

Page 5: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Impact

Based upon the consequence data of the operating unit that hasbeen input to the program, the ROI can be calculated. Some of thebasic questions are:

• Unit production rate?• Effect on the plant if a unit goes down per day?• Impact of a trip of the turbine?• Etc.?

RELIABILITY ASSESSMENT OF COMPRESSORS

The reliability assessment of compressors (RAC) program hasbeen developed in a very similar manner to the STRAP program.Compressors are divided into the types of service to which they arebeing applied. In addition the program requires all the constituents ona percent molecular weight basis. In reality the RAC program is farmore complicated than the STRAP program. There are a number of:

A. General Information1. What group this compressor belongs to?

a. Charge gas/cracked gas?b. Air?c. Ammonia?d. Chlorine?e. Oxygen?f. Refrigeration (clean gas)?g. Other?

2. What industry environment this compressor operates in?a. Chemical/petrochemical?b. Gas?c. Refining?d. Other?

3. Compressor General Details?a. Compressor manufacturer (there are a multitude of manu-

facturers)?b. Model number?c. Serial number?d. Date manufactured?e. Compressor duty?f. Driver?g. Number of years in service?

B. Construction1. Type of end seals?2. Type of interstage seals?3. Internal coatings?4. Has the rotor been high-speed balanced?5. Do you have a lube and oil system designed API 614?6. Type of bearings?7. Materials of construction?8. Type of coupling(s)?9. What is the number of impellers?

C. Past Failures/Problems1. Has the compressor ever had past failures or problems?2. Has the compressor ever operated in reverse?

3. How often does the compressor have vibration problems?4. Has the lube oil system been contaminated?5. Has the seal oil system been contaminated?6. Has the buffer gas consumption increased?7. Have you experienced compressor trips due to instrumenta-

tion problems?8. Has the compressor ever been oversped?9. Have you ever had problems with kinking in the past?

D. Design Versus Actual1. Inlet

a. Pressure?b. Temperature?c. Molecular weight?

2. Dischargea. Pressure?b. Temperature?c. Molecular weight?

3. Brake horsepower required?4. Speed?5. Estimated surge ICFM?

E. Process Gas Data1. Is the process dry or wet?2. Is the process gas corrosive?3. Does the process foul?4. Do you monitor gas composition online?5. What molecular weight was the compressor designed for?6. What is the current molecular weight of the process?7. Process stream

a. Air (MW 28.966)b. Carbon monoxide (MW 28.010)c. Ethylene (MW 28.052)d. Propane (MW 44.094)

F. Site DataG. Control SystemsH. Lube Oil SystemsI. SparesJ. Maintenance and RepairsK. OperationL. Monitoring and ProtectionM. Rerates and UpgradesN. Seal Fluid SystemO. Environment and Business Consequence Data

CONCLUSION

The Reliability Assessment of Rotating Equipment (RARE) isthe combination of both the STRAP and RAC programs.Depending on the maintenance strategy of the company and/orthe facility, either/or STRAP and RARE programs can be a sig-nificant benefit in assessing the critical machinery performanceand identifying the areas that could or should be addressed toimprove the reliability and define the obstacles to extending therun time between minor and major overhauls with an acceptablerisk.

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERYPART I—RELIABILITY PROCESSES

197

Page 6: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Shiraz A. Pradhan is a ConsultingEngineer, with ExxonMobil ChemicalCompany, in Baytown, Texas. His recentexperiences include the design, commis-sioning, and startups of oxoalcohol,halobutyle, polyethylene, polypropylene,and fluids’ projects in the Far East, SouthAmerica, and the U.S. He previouslyworked with Esso/Imperial Oil in Canadaas Machinery Engineer and with BritishGas Corporation in the United Kingdom as

Senior Project Leader. He has been involved in reliability auditsand service factor improvement projects in ammonia, fertilizer,PVC, olefins and polyolfins plants, and oil and gas pipelineprojects internationally.

Mr. Pradhan has a degree (Mechanical Engineering) from theUniversity of Nairobi, and an M.S. degree from Lehigh Universityin Pennsylvania. He holds the title of European Engineer (FEENI,Paris), is a fellow of the Institution of Mechanical Engineers, U.K.,and is a registered Professional Engineer in the Province ofOntario, Canada.

INTRODUCTION

Reliability is critical for all industries. For the petrochemicalindustry it assumes added significance because much equipment isunspared or has minimal redundancy. Table 1 shows a comparisonbetween the commercial airlines, nuclear, and petrochemicalindustries.

Table 1. System Characteristics for Different Industries.

COMPARISON BETWEENELECTRONICS AND MECHANICAL SYSTEMS

In electronic component reliability assessment the concept ofconstant failure rate is used (Ireson, et al., 1996). This is not thecase for mechanical and machinery components. There are manyreasons for this. Machinery components follow:

• Have increasing failure rate pattern• Are not standardized like electrical components

• Have more failure modes than electronic components

Fundamental to reliability assessment of mechanical compo-nents is the need for failure distribution and supporting data thatdescribe the behavior of the components in the real world. This ismore easily said than done.

Cumulative Distribution Function

For reliability prediction one would like to know the probabilityof a failure occurring before a time t. This can be derived by theequation:

F (t), Probability of a failure before time t =

(1)

As t approaches infinity, F(t) approaches 1.

Reliability Function

Reliability function is complementary to the cumulative distri-bution function and gives the probability of survival of acomponent or system to specified time t.

(2)

Failure Rate or Hazard Function

This function allows the determination of the failure probabilityof a system or component in a small increment of time Δt, havingsurvived to time t.

(3)

Failure Distributions for Mechanical Systems

Exponential Distribution

This distribution is used extensively in the electronic industryand for some mechanical system’s reliability assessment as well:

(4)

where λ = failure rate per unit time

(5)

where MTBF = mean time between failure.The reliability function for exponential distribution is:

(6)

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005198

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY

PART II—RELIABILITY CALCULATIONS

byShiraz A. PradhanConsulting Engineer

ExxonMobil Chemical Company

Baytown, Texas

System CommercialAirlines

Nuclear Petrochemical

Mission Length, Hr <50 <5000* 8760> T > 70000

Access DuringMission

NIL NIL NIL

Access BetweenMission

Full Limited FULL if a planned IRD**

* Depending on mandated maintenance** Inspection, Repair Downtime

( )f t dt

t

−∞∫

( ) ( ) ( )R t F t f t dt

t

= − =−∞

∫1

( ) ( )( )h t

f t

R t=

( ) ( )f t e for tt= >−λ λ 0

and MTBFλ = 1 /

( ) ( )R t e t= − λ

Page 7: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

For equipment that follows exponential distribution, the probabil-ity of having exactly k failures by time T is given by the Micro-soft® Excel function:

(7)

Some characteristics of the exponential distribution are:

• Applies to situations where failure events are random and notdue to wear, age, or deterioration• Is a memoryless distribution. This means that the probability offailure is the same in all intervals of time.• Has constant hazard rate• This distribution is often applied to systems that are repairable.

Normal Distribution

This distribution is applied to situations where the failures aredue to wear. However mathematics of failure rate or hazard rate arecomplex.

Log Normal Distribution

This distribution has wide applicability to mechanical systemswhere failures are due to crack propagation, corrosion, and stress-temperature phenomenon.

Weibull Distribution

Weibull distribution (Dodson, 1994) is one of the most versatileof the failure distributions and has wide applicability for mechani-cal systems. It is defined by two parameters: η called thecharacteristic life or scale factor and a constant β called the shapeparameter. The Weibull probability density function is:

(8)

The Weibull reliability function is:

(9)

The characteristic life, η, is the age at which 63.2 percent of thepopulation will have failed.

The shape parameter β has several cases of interest in mechani-cal reliability assessment.

• When β <1—This indicates a decreasing hazard rate. Inmechanical systems this is the initial run-in phase where faultycomponents with defects fail. With time these early failuresdiminish. This phase is often called the infant mortality or burn-inphase.

• When β = 1—This is a special case of Weibull distribution whenit becomes an exponential distribution. As previously noted, for theexponential distribution the hazard rate is constant and failures arerandom. This phase designates the useful life of the component.The failure rate is reciprocal of the MTBF.

• When β = 2—the hazard rate is increasing linearly with time.This case is known as Rayleigh distribution.

• When β = 2.5—the hazard rate is increasing and the distributionapproximates the log normal distribution.

• When β = 3.5—For this case the hazard rate is increasing andthe distribution approximates a normal distribution.

Figure 1 shows these various cases.

Bathtub Curve

A bathtub curve is a plot of hazard rate against time. Figure 2shows the curves for electronic and mechanical systems.

Figure 1. The Weibull Probability Density Function.

Figure 2. Bathtub Curve for Mechanical and ElectronicComponents.

APPLICATIONS OF FAILURE DISTRIBUTIONS

Example 1

In this example there are two identical pumps in parallel asshown in Figure 3. They have negative exponential failure distri-bution and therefore:

(10)

Figure 3. Two Pumps in Parallel.

Survival of one pump is sufficient to assure the success of thesystem. For this case the reliability of the system is given by:

(11)

(12)

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERYPART II—RELIABILITY CALCULATIONS

199

( )POISSON k t False, ,λ

( ) ( )f t t e t= − −βηβ

ββ η1 /

( ) ( )R t e t= − /η β

λ λ λA B= =

R R R R Rt A B A B= + − ×

= −− −2 2e et tλ λ

A

B

Page 8: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

• Assume pump MTBF = 36 months (3 years)Failure rate, λ = 1/ MTBF = 1/3 = 0.333 failures/yearMission time = 1 year

• For a single pump, the reliability is:R = e�0.333�1

= 71 percent and failure probability is 29 percent

• For a parallel pump, system reliability is:R = 2e�0.333�1 � e�2�0.333�1

= 1.4335 � 0.5137= 92 percent and failure probability is 8 percent

We will now evaluate the situation when one pump fails and thesister pump is operating without backup (Figure 4). Assume thatpump mean time to repair (MTTR) = six days. Evaluating theprobability of success for six days when the spare pump isoperating without backup:

• Repair time = 6 days = 6/365 = 0.0164 yearsR = e�0.333�0.0164

= 99.45 percent and failure probability is 0.55 percent

Installing a spare pump reduces the probability of system failurefrom 29 percent to less than 1 percent.

Figure 4. Only One Pump in Service.

Example 2

A screw compressor has a failure rate of 0.0666 failure/year.What is the probability of two failures in exactly five years? Thisproblem is solved by Equation (7).

• POISSON(k,λT,False)ΛT = 0.0666 � 5 = 0.333k = 2

The POISSON expression answer for the above values is 3.9percent.

Risk Based Maintenance

With a spared system one is often faced with deciding if therepair of failed equipment should be carried out in an expedited oremergency basis. A typical situation is the repair strategy for, say,boiler feedwater pumps when the main pump has failed. Should itbe repaired on an emergency basis? Plant operators have no confi-dence in the operating pump. Figure 5 can be used to aid in thisdecision. It relates the MTBF of the spare pump with MTTR ordays unavailable and reliability.

How long can the spared pump be out for repair so as not tocompromise the target reliability of 99 percent. The following factswill aid in the decision:

• The spare pump is running satisfactorily.• Its MTBF is 15 months.

In Figure 5 the x-axis is entered at 15 months and intersects thereliability line of 99 percent at a horizontal line that corresponds toan allowable outage of nine days. In this case there is no need todo emergency repair.

Example 3

In this example we will use a commercially available Weibullprogram to plot the Weibull for a set of reactor pump seals. There

Figure 5. Reliability Versus Mean Time Between Failure andRepair Time. (Courtesy Bloch and Geitner, 1990)

are two pumps in service and seven seal failures. Inputting thetimes to failure in the program yields the Weibull plot shown inFigure 6.

• Each Weibull distribution is applicable to a single failure modeof the equipment.• Weibull plots as a straight line• Statistically more data points give greater accuracy to the plot.• The r2 in the plot gives an indication of the good fit. r2 = 1 isbest.• In engineering sometimes we are forced to work with fewer datapoints. This increases the uncertainty of the plot.

Figure 6. Weibull Plot for Reactor Pump Seals.

The Beta (β) or the shape factor = 3.95 and the characteristic life(η) = 28.3 months. This suggests that for these pump seals hazardfunction is increasing and the seals have a wear-out mode.

Table 2 is an example of a Microsoft® Excel function reliabilitycalculator that can be programmed. It takes Beta (β) and the char-acteristic life (η) from the Weibull plot and calculates theprobability of failure and its converse, the reliability of the pumpseals, based on the Weibull reliability function [Equation (9)] for arange of desired operating months.

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005200

A

Input Data: Reactor Seal Failures

Failure # Time to Failure 1 17 2 18 3 23 4 24 5 31 6 33 7 34

Page 9: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Table 2. Reliability Calculator for Reactor Pump Seals, Based onWeibull.

In Table 3, the Weibull equation is solved for critical values.From this table it is possible to get the expected time to failure forany desired reliability.

Table 3. Inverse Weibull: Time to Failure for Desired Reliability.

Reliability Simulations

In multicomponent systems the mathematics gets too complexfor closed form solutions. In such cases simulation is the next bestapproach. With the availability of computers, simulations havebecome relatively easy. The objective of the simulations is topredict the central tendency of a given variable. In this case it is topredict the probability of failure of a complex system. Thealgorithm for Monte Carlo simulations for mechanical systems isdeveloped from the basic Weibull reliability function [Equation(8)], and noting that the failure function is:

(13)

Taking logarithm of both sides, it yields the equation for the timeto failure, t.

(14)

Inputting a random number for the F(t) in Equation (14) yieldsan estimate of time to failure, t, for a given set of the Beta and Eta.With modern computers it is possible to execute several thousandsimulations.

Example 4

Reliability/Risk and Maintenance Planning

A machinery engineer is faced with a decision to recommend tomanagement if a turbine, which has operated successfully for fouryears, should be overhauled in the fifth or the sixth year ofoperation? In this example only three components of the turbinefor which Weibull data are available from similar machines areconsidered. A commercially available reliability program was usedfor the simulation.

Figure 7 shows the reliability block diagram (RBD) of theturbine and Table 4 shows the input data for the program.

Figure 7. Reliability Block Diagram of the Turbine.

Table 4. Inputs for Turbine Components.

The methodology for making a risk-based decision is as follows:

• Step 1—Assemble failure distribution for each component. Thedata for the turbine components are from the maintenance/failurehistories from the operating plant, original equipment manufac-turer’s (OEM) data, and public domain databases (Table 4).

• Step 2—Make program simulations:• Number of program simulations: 1000• Mission times: five years and six years. The simulation

program is run from time t = 0 to t = 6.

• Step 3—From the event log feature of the reliability program,assemble histograms of the number of failures for each componentof the turbine for each successive year of mission time until thesixth year. The histograms essentially confirm the trend in thewear-out modes of the turbine components. Figure 8 shows asample output from the event log and Figure 9 shows a histogramfor the turbine components.

Figure 8. Sample Output from Event Log for Run #15.

• Step 4—Calculate the reliability of each component for amission time of five and six years having survived four years.

Use the Weibull reliability function [Equation (9)]:

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERYPART II—RELIABILITY CALCULATIONS

201

Beta 3.948 Months Failure Probability

Reliability

Eta (Months) 28.3 28 0.62 0.38

27 0.57 0.43

26 0.51 0.49

25 0.46 0.54

24 0.41 0.59

23 0.36 0.64

22 0.31 0.69

21 0.27 0.73

20 0.22 0.78

19 0.19 0.81

18 0.15 0.85

17 0.13 0.87

16 0.10 0.90

15 0.08 0.92

14 0.06 0.94

13 0.05 0.95

12 0.03 0.97

11 0.02 0.98

10 0.02 0.98

9 0.01 0.99

8 0.01 0.99

Beta = 3.98 Reliability Time to Failure (Months) Eta = 28.28 Months 0.01 41.637

0.1 34.932

0.5 25.773

0.75 20.627

0.8 19.341

0.9 15.993

0.99 8.820

( ) ( )F t R t= −1

( )Time to Failure InF t

=−

⎝⎜

⎠⎟

⎣⎢⎢

⎦⎥⎥

ηβ

1

1

1

Mon Apr 25 14:06:48 2005

Block Name Failure Distro Param1 Param2 Param3 _________________________________ _(Beta)_____(Eta)___________ BladeErosion Weibull 4.20000 8.50000 0.00000 Controller-HydroCU Weibull 1.50000 11.0000 0.00000 TTV Weibull 4.70000 8.50000 0.00000 Valve-Control Weibull 1.50000 10.0000 0.00000

Starting Run 15 Time= Years 2.619166 Valve-Control Failed , TimeOperated=2.619166 System=Red 2.634584 Valve-Control Repaired, RepairTime=0.015419 System=Green 4.239981 Controller-HydroCU Failed , TimeOperated=4.224562 System=Red 4.243281 Controller-HydroCU Repaired, RepairTime=0.003300 System=Green 4.535766 TTV Failed , TimeOperated=4.517048 System=Red 4.548038 TTV Repaired, RepairTime=0.012272 System=Green 5.423212 BladeErosion Failed , TimeOperated=5.392222 System=Red 5.510067 BladeErosion Repaired, RepairTime=0.086855 System=Green

6.000000 Simulation Terminated End of Run #15

Page 10: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Figure 9. Histograms of Component Failures with Time.

(15)

• Sample calculations for trip and throttle valve (TTV): Beta = 3.8Eta = 9 years

• Reliability at time = 4 yearsR(4 yr) = e�(4/9)3.8 = 95.5 percent

• Reliability at time = 5 yearsR(5 yr) = e�(5/9)3.8 = 89.8 percent

Thus, R (5yr/having survived 4 yr) = R(5 yr)/R(4 yr) = 89.8/95.5 =94.0 percentFailure probability = 1 � 94.0 = 6 percent

• Reliability at time = 6 yearsR(6 yr) = e�(6/9)3.8 = 80.7 percent

Thus, R (6 yr/ having survived 4 yr) = R(6 yr)/R(4 yr) = 80.7/95.5= 84.5 percentFailure probability = 1 � 84.5 = 15.5 percent

Table 5 and Figure 10 show the relative reliabilities and failureprobabilities for the turbine components.

Table 5. Relative Reliabilities and Probabilities of Failure forTurbine Components.

Figure 10. Relative Probability of Failure for Turbine Components(Example 3).

The analysis shows that there is an 11 percent additional risk inextending the operation from the fifth to the sixth year. The totalrisk is 19 percent for the control valve.

Example 5

A gas turbine generator (GTG) system is arranged in parallel asshown in Figure 11. Survival of one GTG line is sufficient forsystem success. All components have negative exponential distri-bution. The desire is to:

• Forecast the system reliability and availability for a mission timeof two years.• Assess system vulnerability when one GTG is out for plannedmaintenance for 10 days.

Figure 11. Two GTGs in Parallel.

The reliability program model comprises all the components ofthe GTGs including the gearboxs, turbine and compressor sections,the generator, and the auxiliaries such as cooling water, lubesystem, control system, and fire suppression system.

The results (Table 6) indicate that for two GTGs in parallel andfor a mission time of two years, the system reliability will be 99percent with a mean system failure rate of 0.01 failure in two years.The mean availability is >99 percent.

Table 6. Results of Simulation: GTGs in Parallel, Mission Time =Two Years.

When one GTG is down for planned maintenance of 10 days, theresults (Table 7) show that the system is vulnerable to failurewithin the 10 days. For the duration of the10 days the system reli-ability is only 97 percent and there is a potential of a mean failureof 0.03.

REFERENCES

Ireson, W. G., Coombs, C. F., Jr., and Moss, R. Y., 1996, Handbookof Reliability Engineering and Management, Second Edition,New York, New York: McGraw Hill.

Dodson, B., 1994, Weibull Analysis, Milwaukee, Wisconsin: ASQPress.

Bloch, H. and Geitner, F., 1990, Machinery Reliability Assessment,New York, New York: Van Nostrand Reinhold.

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005202

Failure Trend of TTVAccumulated Over 20 Simulations

02468

101214

1 2 3 4 5 6 7 8 9 10

Years

# F

ailu

res

Failure Trend of Blade ErosionAccumulated Over 20 Simulations

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9

Years

# F

ailu

res

Failure Trend of Hydraulic Control UnitAccumulated Over 20 Simulations

01234567

1 2 3 4 5 6 7 8 9 10

Years

# F

ailu

res

Failure Trend of Valve ControlAccumulated Over 20 Simulations

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9

Years

# F

ailu

res

( ) ( )R t e t n= − / β

Component Reliability (5 yr /having survived 4 yr)

Failure Prob. (5 yr /having

survived 4 yr)

Reliability (6 yr /having

survived 4 yr)

Failure Prob. (5 yr /having survived 4 yr)

Control Valve 90.43 9.57 80.9 19.1 Controller Hydraulic CU 91.65 8.35 83.2 16.8 Trip and Throttle Valve 94.05 5.95 84.5 15.5 Blade Erosion 98.7 1.3 96.4 3.6

Risk Assessment Probability of Failure

(Having Survived 4 Yrs.)

05

10152025

5 Yr. Run

6 Yr. Run

Mon Apr 25 15:20:35 2005

Results from 100 run(s):

Parameter Minimum Mean Maximum Standard Deviation Total Costs 66.00 67.62 72.66 1.37 Ao 0.991366829 0.999913668 1.0000000000 0.000858989 MTBDE 1.982734 >1.999827 >2.000000 n/a MDT (1 runs) 0.017266 0.017266 0.017266 n/a MTBM 0.283248 >1.439166 >2.000000 n/a MRT (79 runs) 0.003004 0.013159 0.030500 0.006948 %Green Time 95.527907 98.950059 100.000000 1.002106 %Yellow Time 0.000000 1.041308 4.065847 0.975965 % Red Time 0.000000 0.008633 0.863317 0.085899 System Failures 0 0.010000 1 0.099499

R(t=2.000000) =0.990000

Page 11: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

Table 7. Results of Simulation: One GTG Out for Overhaul,Mission Time = 10 Days.

APPLIED RISK AND RELIABILITY FOR TURBOMACHINERYPART II—RELIABILITY CALCULATIONS

203

Mon Apr 25 15:27:36 2005

Results from 100 run(s):

Parameter Minimum Mean Maximum Standard Deviation Total Costs 11.30 11.33 12.50 0.17 Ao 0.729199029 0.993634398 1.0000000000 0.037137561 MTBDE 0.019973 >0.027216 >0.027390 n/a MDT (3 runs) 0.004199 0.005812 0.007417 0.001314 MTBM 0.019973 >0.027216 >0.027390 n/a MRT (3 runs) 0.000000 0.003339 0.005819 0.002452 %Green Time 72.919903 99.363440 100.000000 3.713756 %Yellow Time 0.000000 0.000000 0.000000 0.000000 % Red Time 0.000000 0.636560 27.080097 3.713756 System Failures 0 0.030000 1 0.170587

R(t=0.027390) =0.970000

Page 12: APPLIED RISK AND RELIABILITY FOR TURBOMACHINERY PART I ...

PROCEEDINGS OF THE THIRTY-FOURTH TURBOMACHINERY SYMPOSIUM • 2005204