Risk Assessments and Reliability, What You Need To Know

51
INFRASTRUCTURE RELIABILITY AND RISK ASSESSMENTS Morrison Hershfield Mission Critical Steven Shapiro, P.E., ATD Mission Critical Practice Lead Morrison Hershfield Mission Critical

description

By the end of this presentation the attendees will understand the need for an Infrastructure Reliability and Risk Assessment for their critical environment as well as what types of systems should be included in the evaluation, how the evaluation should be performed to ensure tangible results, how it should be reported and ultimately how to interpret and utilize the information presented in the assessment to their advantage. Presentation Outline 1. What is an Infrastructure Reliability and Risk Assessment and what do I need one for? 2. Who should perform an Infrastructure Reliability and Risk Assessment. 3. What information should be included in an Infrastructure Reliability and Risk Assessment. 4. What building systems should be included. This will be an infrastructure system by system approach. 5. What are the key things to look for when my study is complete? A. Reliability Level. B. Single Points of Failure within Critical Systems. C. Redundancy of Critical Systems. D. System Integration. E. Adequacy of Engineered Systems (Exhaust Points). F. Adequacy of Operations, Maintenance and Testing Programs. G. Benchmark Findings with Industry Standards. 6. Availability, MTBF Calculations and Probability of Failure Calculations. What are they, who does them, what do they mean? 7. Computational fluid dynamic modeling. 8. How long should a study like this take? 9. Review of a sample study.

Transcript of Risk Assessments and Reliability, What You Need To Know

Page 1: Risk Assessments and Reliability, What You Need To Know

INFRASTRUCTURE RELIABILITY AND RISK ASSESSMENTS

Morrison Hershfield Mission Critical

Steven Shapiro, P.E., ATDMission Critical Practice LeadMorrison HershfieldMission Critical

Page 2: Risk Assessments and Reliability, What You Need To Know

• RISK ASSESSMENT

• INFRASTRUCTURE RELIABILITYPOWERCOOLING

WHAT YOU NEED TO KNOW

AGENDA

Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 3: Risk Assessments and Reliability, What You Need To Know

• WHY

• SITE EVALUATION

• METRICS

RISK ASSESSMENTS

Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 4: Risk Assessments and Reliability, What You Need To Know

5

• Location• Design • Redundancy level• Construction • Quality of equipment• Age • Operations & Maintenance program • Personnel training • Level of operator coverage• Thoroughness of the commissioning program

Lurking Vulnerabilities

WHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Causes of Critical Failures

Page 5: Risk Assessments and Reliability, What You Need To Know

• Equipment failure

• Operator error

• Natural disaster

• Design error

• Installation error

• Commissioning or test deficiency

• Maintenance oversight

• Equipment design

WHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Causes of Critical Failures

Page 6: Risk Assessments and Reliability, What You Need To Know

• Root cause not always easy to ascertain

• Combination of factors (Cascading Failures)

• Latent failures

• Most occur during change of state events

• More maintenance does not necessarily mean higher availability

• Non-Fault tolerant systems

FILURESWHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Causes of Critical Failures

Page 7: Risk Assessments and Reliability, What You Need To Know

Morrison Hershfield Mission Critical – Infrastructure and Risk Assessment

Commissioning or Test Deficiency

4%

Equipment Design13%

Equipment Failure28%

Human Error18%

Installation Error10%

Maintenance Oversight

4%

Natural Disaster3%

System Design20%

Causes of Critical Failures

WHY

Page 8: Risk Assessments and Reliability, What You Need To Know

WHY DO RISK ASSESSMENT

• Alignment of business mission and facility performance expectation

• Quantifies the risk and exposure of the critical facilities to failure

• Identifies vulnerabilities and single points of failure

• First step in creating an action plan for site hardening

• Benchmark against the industry

• Assists in developing business case for capital expenditures

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 9: Risk Assessments and Reliability, What You Need To Know

SITE EVALUATION

STEP 1

• Quantify reliability expectations

• Develop resiliency metrics

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 10: Risk Assessments and Reliability, What You Need To Know

SITE EVALUATION

STEP 2• Develop PRA model (Probabilistic Risk Assessment)

• Identify Single Points of Failure within critical systems• Evaluate redundancy of critical systems• Capacity and expendability analysis• Adequacy of Engineered Systems• Operation and maintenance policies, practices and procedures• Adequacy of maintenance and testing programs• Evaluate risks associated with site location • Overall Risk Analysis• Evaluate the adequacy of operations and maintenance programs

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 11: Risk Assessments and Reliability, What You Need To Know

SITE EVALUATION

STEP 2 cont.• Harmonics analysis

• EMF studies

• Short circuit & coordination studies

• Air flow modeling-CFD

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 12: Risk Assessments and Reliability, What You Need To Know

STEP 3• Perform gap analysis

STEP 4• Recommendations for upgrade/alteration to optimize facility

performance• Budget and schedule development• Assess risk during implementation• Benchmark findings with industry standards

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

SITE EVALUATION

Page 13: Risk Assessments and Reliability, What You Need To Know

• Probability of Failure/Reliability

• Availability

• MTTF

• MTTR

• Susceptibility to natural disasters

• Fault tolerance

• Single Points of Failure

• Maintainability

• Operational readiness

• Maintenance program

RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

RISK ASSESSMENT METRICS

Page 14: Risk Assessments and Reliability, What You Need To Know

• RELIABILITY / AVAILABLITY

• RELIABILITY MODELING

• RELIABILITY CONSIDERATIONS

INFRASTRUCTURE RELIABILITY

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRELIABILITY

Page 15: Risk Assessments and Reliability, What You Need To Know

RELIABILITY

• “Reliability” is used as an umbrella definition

• May Refer to Availability, Durability, Quality

• Five 9’s ????

• Reliability = Probability of Successful Operation

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 16: Risk Assessments and Reliability, What You Need To Know

RELIABILITY AND AVAILABILITY

• Reliability predicts how likely is the system to fail.

• Availability is a measure (or a future prediction) of what percentage of the time the system will operating properly

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 17: Risk Assessments and Reliability, What You Need To Know

AVAILABILITY

Five 9’s refers to Availability

Availability (A) = Average fraction of time Something is in service and performing intended function.

99.999% availability means:• 5.3 minutes of downtime each year

or• 1.77 hours of downtime every 20 years

Availability does not specify how often an outage occurs

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 18: Risk Assessments and Reliability, What You Need To Know

AVAILABILITY

Availability (A) = MTBF/(MTBF + MTTR)

MTTF: Mean Time To FailureMTBF: Mean Time Between FailuresMTTR: Mean Time to Repair or DowntimeMTBF=MTTF+MTTR

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 19: Risk Assessments and Reliability, What You Need To Know

RELIABILITY BATHTUB CURVE

12YEARS0.514

Failu

re R

ate

Time (t) Years

early wear-outlife useful life period

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 20: Risk Assessments and Reliability, What You Need To Know

RELIABILITY MODELING

• Used to compare system designs and assist in the evaluation of risk versus the cost to mitigate the risk.

• Failure and Repair data comes from IEEE 493, Recommended Practice for Design of Reliable Industrial and Commercial Power Systems (IEEE Gold Book)

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 21: Risk Assessments and Reliability, What You Need To Know

RELIABILITY MODELING

Components used for reliability modeling of the electrical system shown here:

• Utility power• Generator• Circuit breakers • Switchboards• Cables• Automatic Transfer Switch• UPS module• Battery• Static Bypass Switch• Rack Power

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 22: Risk Assessments and Reliability, What You Need To Know

RELIABILITY MODELING

Reliability Block Diagram (RBD)

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 23: Risk Assessments and Reliability, What You Need To Know

RELIABILITY MODELING

Shown below are the results of the calculations

Hours Hours

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 24: Risk Assessments and Reliability, What You Need To Know

THE TRADITIONAL CLASSIFICATION SYSTEMThe Uptime Institute

Tier 1 – Basic Non-Redundant Data CenterSingle path for power and cooling distribution without redundant components

Tier 2 – Basic Redundant Data CenterSingle path for power and cooling distribution with redundant components

Tier 3 – Concurrently Maintainable Data CenterMultiple paths for power and cooling distribution with only one path active and with redundant components

Tier 4 – Fault Tolerant Data CenterMultiple active power and cooling distribution paths with redundant components and fault tolerant

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 25: Risk Assessments and Reliability, What You Need To Know

Tier Definitions

Tier I Tier II Tier III Tier IV

Number of Delivery Paths 1 11 Active

1 Passive2 Active

Redundancy N N+1 N+1 2N MinimumCompartmentalization No No No YesConcurrent Maintainability No No Yes YesFault Tolerance No No No YesAvailability 99.67 99.75 99.982 99.95Downtime in Hr/Yr 28.8 22 1.6 0.4

TIER REQUIREMENTS

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 26: Risk Assessments and Reliability, What You Need To Know

From the UI

• Tier I - $10,000 US/kW of Useable UPS Power Output

• Tier II - $11,000 US/kW of Useable UPS Power Output

• Tier III - $20,000 US/kW of Useable UPS Power Output

• Tier IV - $22,000 US/kW of Useable UPS Power Output

• Plus $225 US/SF of Computer Room

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Data Center Cost

Page 27: Risk Assessments and Reliability, What You Need To Know

HOW MUCH REDUNDANCY IS ENOUGH?

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 28: Risk Assessments and Reliability, What You Need To Know

Assumptions

• Various configurations examined for single or dual utility feeders, UPS,

Generators, STS’s, single or dual cords

• Compare Reliability at 2000 KW and 4000 KW Load

• 5 Year Probability of Failure

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 29: Risk Assessments and Reliability, What You Need To Know

Single utility feeder, parallel redundant UPS and generators, single cord IT equipment

Page 30: Risk Assessments and Reliability, What You Need To Know

2N UPS, N+1 Generators, ASTSs, Dual Cord Rack

Page 31: Risk Assessments and Reliability, What You Need To Know

Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Cord Rack

Page 32: Risk Assessments and Reliability, What You Need To Know

Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Cord Rack

Page 33: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 34: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 35: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 36: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 37: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 38: Risk Assessments and Reliability, What You Need To Know

fail after 24 hours

Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants

Emergency Diesel Generators

fail to start

fail after ½ hour

fail after 8 hours

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 39: Risk Assessments and Reliability, What You Need To Know

• 2(N+1) UPS/Generator with dual utility feeders - most reliable topology

• 2(N+1) UPS > 2N UPS by small margin

• 2N > Distributed Redundant by small margin

• Significant improvement if a second utility feederis provided

• N+2 and/or 2N generator systems are more reliable than N+1

• Hybrid configuration in a hybrid facility is sometimes the best solution

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 40: Risk Assessments and Reliability, What You Need To Know

• Assess the condition of the mechanical plant in conjunction with the electrical system

• The facility reliability will be driven by the least reliable component (typically the electrical infrastructure)

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Reliability Considerations

Page 41: Risk Assessments and Reliability, What You Need To Know

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

System Reliability Block

Electrical MechanicalElectrical System

Electrical system powering the critical load

Mechanical system supporting critical load

Page 42: Risk Assessments and Reliability, What You Need To Know

Electrical MechanicalElectrical System

Electrical system powering the critical load

Mechanical system supporting critical load

MTBF Availability Pf (3 years)Electrical systemalone 330,184 0.99999 8.10%Mechanical systemalone 178,611 0.999943 11.70%Electrical systemsupporting mechanical 108,500 0.999985 21.40%Overall mechanicalsystem 70,087 0.999931 29.20%Combined electricalmechanical system 57,819 0.999922 36.90%

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

System Reliability Block

Page 43: Risk Assessments and Reliability, What You Need To Know

99.0

.9

99.9

99.99

99.999

Reliability

99.9999

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

$ $$ $$$ $$$$ $$$$$

The Cost of Reliability

Page 44: Risk Assessments and Reliability, What You Need To Know

• What Reliability Level Do you Really Need Based on Your Business

Case?

• Minimize Single Points of Failure

• Concurrent Maintainability?

• Fault Tolerance?

• Ensure Adequacy of Operations, Maintenance and Testing Programs

• How to justify the cost to upgrade from present state?

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT

Key Takeaways – Risk Assessment

Page 45: Risk Assessments and Reliability, What You Need To Know

• Design objective – find optimum compromise between cost and reliability

• Size matters – larger facilities yield lower reliability

• System architecture and design implementation is more important role than equipment selection

• Segregate system in independent blocks

• Eliminate common source components to minimize fault propagation (i.e. LBS, hot-tie, manual bus ties)

• Move single points of failures as close to the load as possible

• Always maintain two independent sources of power to the critical load

• Optimize the design of monitoring and controls circuits

• Keep it simple/minimize human intervention/Utilize Automation

Key Takeaways – Reliability

RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments

Page 46: Risk Assessments and Reliability, What You Need To Know

QUESTIONS? Thank you and please feel free to contact me

Steven Shapiro, PE, [email protected]://www.linkedin.com/in/stevenshapirope

References:Uptime Institute White Papers:Tier Myths and MisconceptionsData Center Site Infrastructure Tier Standard: Topology

Page 47: Risk Assessments and Reliability, What You Need To Know

48

Building Areas/Systems Reviewed

׀ General Construction׀ Electrical׀ Mechanical׀ Plumbing And Fire Protection׀ Operation and Maintenance׀ Security ׀ Load Density

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT

Page 48: Risk Assessments and Reliability, What You Need To Know

49

Site Reliability• Is Project Compatible With Zoning• Natural Environment Issues׀ Seismic Zone׀ Geo Technical Reports׀ Sub Surface Conditions׀ Tornado/hurricane Risk׀ Site Flood Potential׀ Fire Potential׀ Site Topography׀ Weather Extremes• Man‐Made Environment Issues׀ Power/Data and Communication/Water Supply/Sanitary Sewer Availability׀ ISP Connectivity to Mirror and DR Sites׀ Proximity of Hazardous Operational Facilities, i.e. Nuclear Power Plants, Military Bases, 

Chemical Plants, Tank Farms, Water/Sewage Treatment Plants, Dams/Reservoirs, Gas Stations, etc.

׀ Distance to Airports & Freeways׀ Distance to Emergency Services, i.e. Fire and Police Departments, Hospital 

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT

Page 49: Risk Assessments and Reliability, What You Need To Know

50

Building Areas/Systems ReviewedBuilding Utilities and Physical Issues׀ General building systems and area characteristics׀ Life safety and environmentalElectrical Systems׀ Utility feeders׀ Service entry׀ Base building electrical distribution system including busways, step‐down 

transformers, switchgear and distribution panels׀ Uninterruptible power supply (UPS) systems׀ Battery systems׀ Power Distribution System including the critical computer rooms׀ Emergency/standby generator and fuel system׀ Normal/standby power transfer switchgear׀ Grounding׀ Emergency Power Off Systems׀ Lightning protection system׀ Fire alarm and smoke detection systems

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT

Page 50: Risk Assessments and Reliability, What You Need To Know

51

Building Areas/Systems Reviewed• Mechanical Systems׀ Critical Systems Chilled Water Plant:  Chillers, pumps, piping distribution system, 

controls, etc׀ Critical Systems Condenser Water System:  Cooling towers, pumps, piping, etc׀ Critical Systems Air Handling Systems׀ Critical Systems Air Distribution׀ Critical Systems Secondary Chilled Water Loop׀ Fuel Oil Systems׀ Boiler Systems׀ Compressed Air Systems• Plumbing Systems׀ Domestic Water Systems׀ Natural Gas Systems׀ Fire Suppression Systems (Water and Gaseous)• Operation and Maintenance of the Critical Support Systems׀ Maintenance procedures and programs׀ Normal operating procedures׀ Emergency operating procedures׀ Training programs and methods׀ Spare parts

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT

Page 51: Risk Assessments and Reliability, What You Need To Know

52

Building Areas/Systems Reviewed• Building Automation׀ Building Automation Systems.׀ Physical Security Systems.׀ Access control׀ Intrusion detection׀ CCTV systems׀ ID badging systems׀ Intercom systems׀ Smoke Purge Systems• Technology Systems׀ Entrance Facility Feeds.׀ Telephone Company Services.• Systems Integration:׀ The integration, compatibility and interaction of the above systems with each 

other, as well as with the other building elements will be reviewed to ensure that the systems are compatible and fully integrated.

Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT