Post on 01-Jan-2016
description
Current Trends in Data CenterCOMMISSIONING
RICHARD L SAWYER, Strategist - HP Critical Facilities
ACG– ChicagoApril 2013
AGENDA:• WHAT IS A DATA
CENTER?• DIRTY LITTLE SECRET• RISK MITIGATION• LEVERAGING
COMMISSIONING• USING FAILURE TO
SUCCEED
What is a Data Center?• By NFPA 70: “Critical
Operations Data System”• By Clients: Where ever I
process data.• By Commissioning
Agents: A power intensive critical space.
PAN
1
Status AlarmsHeatingCoolingDehumidificationHumidification
High TemperatureLow TemperatureLoss of Air FlowHigh HumidityLow HumidityChange Air FiltersLocal Alarm
ON
ALARMPRESENTSILENCE
Liebert system 3
OPEN
Successful Data Center Operations Start with Commissioning• Data Centers are designed to a certain
availability expectation to meet business goals.
• Whether or not they meet the designed goal depends on the contractor.
• Commissioning is the only way to assure the availability of the design is achieved in practice!
It’s all about availability!
Tier 1
Single Generator
or No Generator
Basic UPS for LAN Room,
non-redundant
Single Utility or on Radial line from Loop
99.671% Availability per
Uptime Institute
Tier 2
Generator
N+1 UPS with redundant components
Single Utility Feeders, N+1 Mechanical
System
99.741% Availability per Uptime Institute
Tier 3- Concurrently Maintainable
N+1 Generator System
N+1 UPS with redundant components
One Active, One Passive, Utility Source, N+1
Mechanical System
99.982% Availability per
Uptime Institute
Tier 4- Fault Tolerant
2N Generator System
2N UPS Systems
Dual Active Utility Feeders, 2N
Mechanical System, compartmentalization
99.995% Availability per
Uptime Institute
April 19, 20234
Data Centers have specified design features.
These are investments to deliver a specified availability…….
The cost is huge: Availability is expensive!
April 19, 20235
1. Data center tier costs increase per sq. ft. (sqM) costs2. As tier level increases, build cost rises.3. Costs of Tier IV are almost double those of Tier II.
Tier II, III, IV build costs ($/sq. ft.) related to power density
$-
$500
$1,000
$1,500
$2,000
$2,500
$3,000
$3,500
$4,000
$4,500
$5,000
50 w/sf 100 w/sf 150 w/sf 200 w/sf 250 w/sf 300 w/sf
Tier IV
Tier III
Tier II
HP data, based on a 40,000 sq. ft. raised-floor data center.
A 20K sf Tier III data center costs $35 Million @ 150 w/sf
And the IT investment is even larger!
• A 20,000 square foot data center built to 150 watts/square foot can accommodate 800 racks of IT equipment @3.75 kW per rack.
• This 3,000 to 10,000 servers depending on architecture, form factor and configuration.
• The IT investment in hardware, software and service can amount to 5 to 8 times the data center facility investment.
April 19, 20236
Can you safely assume the data center investment will work as designed from Day One?
Availability interdependency
End-to-end availability is the product of the availability of the IT Architecture times the availability of the Facility Infrastructure (FI).
(Tier 3 FI x MS Server) = Total availability
99.982% x 99.202% = 99.184%
IT architecture and facility infrastructure are interdependent in meeting the data center goal. . . . . the speed of IT recovery is dependent on the speed of facility recovery!
Formula: (Availability of IT) X (Availability of FI) = Total End-to-End Availability
Dirty Little Secret: Data Centers Fail
Failure is:ExpensiveInevitablePredictableManageableUseful
5 YEAR PROBABILITY OF FAILURE
Failure is Inevitable
AFCOM 2007: “Understanding Tier Systems”, Tom Roberts, Rick Sawyer
Predictability of Failure
Utility Utility
UPS
Bypass Bypass
Static
Switch
PDU
Primary Bus 1 Primary Bus 2
UPS
Critical
Load
G GOption 2N2 Utilities
2 Generators
2 ATS
2 UPS Systems
STS
MTBF = 315,766 hours
Availability = 99.9985%
Probability of Failure in
5 years = 12.95%Failure is Predictable
Good News! Failure is Manageable
STRATEGY TO SURVIVE:• Design to Survive• Map Foreseeable
Failures• Develop SOP’s, MOP’s,
EOP’s• Commission! Test,
Document, Train
Absence
Initial
Repeatable
Managed
Defined
Optimizing
No dedicated data center, processing is in office space
Data Center is basic server or network room, in a dedicated space having minimal dedicated infrastructure systems
Data center has dedicated cooling, generators, UPS, fire, security and monitoring systems
Data center has concurrent maintainability features
Data center systems have redundant features for resiliency (N+1)
Fault tolerant system features (2N)
Design to Survive
Using ITSM Capability Maturity Model to assess Facility Infrastructure Design
M
M
CRAC CRAC CRAC CRAC
pdu
UPS
Cold
Aisle
pdu
UPS
Hot
Aisle
pdu
UPS
pdu
UPS
F
I
R
E
S
E
C
U
R
HEAT
REJECT
HEAT
REJECT
EPO
SYSTEM
MONITOR
WEBLINK
Zoned Availability- Scalable Mission Critical infrastructure using Central UPS and Rack based UPS for 2N redundancy
Site Availability – 99.995%
Hot
Aisle
Cold
Aisle
Cold
Aisle
CRACUPS
Battery
Central UPS for one “N” side, scalable
UPS System
Rack based UPS Systems as needed for
2N redundancy
Map Foreseeable FailuresSPOF Matrix - Common Single Points of Failure
Check observed SPOFs found in the survey
Electrical There is one utility supply with no standby generator.
Multiple generators are connected via a single paralleling switchgear
There is one transfer switch where the generator and utility are switched.
The UPS and Static Bypass are fed off of the same circuit breaker.
The UPS output distribution is controlled by one circuit breaker.
The UPS synchronization is controlled by one external circuit.
There is one electrical path to the critical load with no redundancy or automatic bypass provisions.
There is one step-down transformer in the critical electrical path, or step down transformers are in series if multiple.
There is one static switch in series with the UPS output.
All power is fed through one piece of supply electrical switchgear.
There is an EPO circuit that disconnects all electrical power.
There is a switchgear ground fault protection circuit that disconnects all electrical power distribution.
All power is fed through one piece of electrical distribution switchgear to the critical load
There is one set of electrical cables from utility supply to critical power supplies.
There is one set of electrical cables from critical power supply to critical power distribution.
The HVAC critical cooling system is supplied from one motor control center.
The HVAC critical cooling system is supplied from one piece of distribution switchgear.
The heat rejection system (i.e., cooling towers) are fed from one electrical distribution point.
Critical pumps are fed/controlled from one electrical distribution point.
HVAC Water supply is from one distribution point.
The chilled water piping system is a single loop system.
The condenser water piping system is a single loop system.
The glycol piping system is non-redundant.
There are no redundant air handling units supplying the critical load areas.
The building management system can only be operated/controlled from a single point.
The building management system is required for default HVAC system operation.
The water treatment system is not monitored for free chlorine content or biological contamination.
There is only one method, or piece of equipment to provide adequate critical space cooling.
The heat rejection system is non-redundant.
The fire detection system interrupts air flow to the critical load spaces without verifying sensors.
There is an EPO circuit that interrupts cooling to the critical load.
There are common valves that can fail, interrupting chilled water, condenser water or supply water.
Test, Document, Train
Develop MOP’s, SOP’s, EOP’s
Real time monitoring, continuous improvement features
Absence
Initial
Repeatable
Managed
Defined
Optimizing
No operational processes formally in place or measured
Maintenance and operations are not site specific or complete, ad hoc and depend on staff memory/knowledge
Standard, Maintenance and Emergency Operating Procedures exist and are site specific
Procedures are associated with asset management systems and are tracked to completion, effectiveness
Documentation is complete, available, compliance is measured and trended
Automate Servers
Automate Networks
Automate Storage
1
Runbook Automation
3
3
2
2
2
O&M MGE EPS 8000UPS System A, Module 01
Based on best available data 05/11- Verify against As-Builts
• Simplified One-Line power supply diagram
• Simplified One-Line UPS system diagram
– Normal power flow diagrams
– Emergency power flow diagrams
– Automatic Transfer Control diagram
• Location of equipment
• Start-Up and Shut-Down procedure
• Emergency response procedure
• Recommended maintenance practices
• Reference Engineering Prints
• Reference MGE EPS 8000 Operations and Maintenance Manual
SG-3A01SG-3A02
SG- 3B01SG-3B02
B-3A04B-3A29
B-3B04B-3B33
kk
ATS-31A01 ATS-31B01
13.8 kV
480V
13.8 kV
480VT-31A01 T-31B01
Automatic Transfer Control
CB-01A001CB-01A002
CB-01B001CB-01B002
SG-01A01 SG-01B01
SG-01A02 SG-01B02
Load Bus Synchronization Control
Bypass Power Flow to UPS A01
For Maintenance on Modules or Module Failure Mode
To SG-01A03Critical UPS Load A
To SG-01B03Critical UPS Load B
UPS Systems A01 & B01
From SG-0A04 From SG-0A04
NONO
NO NC
NCNC
Based on best available data 05/11- Verify against As-Builts
Process for installing a new IT server
Install in rack
Order DeliveryPhysical Inspection
Software verification
Data test of software
Burn-in functional test
Firmware verification
Network assignment
Integration with existing
systems
Online production
Process for “installing” a new datacenter
Construct Physical inspection
Failure mode tests
Design
System-level tests
Capacity tests
Equipment startup
Equipment tests
Controls and monitoring
tests“Pull-the-plug” integrated test
Turn over to IT and Operations
The Value of Commissioning• Assures design performance is
achieved following construction• Verifies performance levels
– Capacity– Availability (redundancies)
• Provides documentation base for SOP’s, MOP’s, and EOP’s
• Opportunity for “hands-on” training of operations staff which they may never see for years!
– Video taping of procedures– Monitoring and alarm testing
with response procedures– “New Employee” training guide
development
IT investment is 3-5X the data center investment. Commissioning assures the IT architecture support systems work, and can be recovered quickly when they fail.
Leverage Facility Commissioning1. Involve everyone: IT,
management, vendors, contractor, engineers and operating staff.
2. Manage your documents – capture everything methodically.
3. Test everything that can be safely tested.
4. Video tape procedures, especially risk mitigation procedures for SPOF’s.
Know your data center!
Commissioning Trends
• Standardized procedures to test standardized systems
• Capacity testing to verify efficiency at all load levels• Staff training during the commissioning process• Video taping of test procedures for future training• Integrated testing of raised floor areas before IT
equipment is installed• Digital data logging of system performance during
commissioning to lower cost and provide better information.
Typical Integrated Test
April 19, 202325
Utility
UPSBypass
Static Switch
PDU
Primary Bus 1
UPS
CriticalLoad
G G
Load banks are installed to simulate critical load
Static switch sources are failed to test performance
UPS redundancy is tested by failing modules and system
Utility is failed to test transfer switch and generator
performance
Generator capacity and redundancy is tested by failing
units
Digital meters record
performance at critical load
Things happen……
Use Failure as an Opportunity
• When you’re down, you’re down.• Use the downtime to access, maintain or
modify systems you can’t get to any other time– Verify breaker operation – “retro commission”!– Inspect and repair equipment in a powered down
condition– Tie in valves and breakers for future use– Test systems and operations procedures
Plan recovery procedures to leverage downtime opportunity for maintenance, testing and training!
Summary• Modern office building contain high power data
center spaces• Availability of those spaces is a key client
demand• Design can only do so much, performance must
be proven- Through Commissioning!• Actual availability is an operational issue.• Data center performance is contingent on a
strong commissioning program from the start!
Questions?
Richard L. SawyerStrategist, HP Critical Facility Servicesrsawyer@hp.com518-857-9751