Energy Efficient HPC Data Centers...Federal Data Center Consolidation Initiative: The FDCCI has set...
Transcript of Energy Efficient HPC Data Centers...Federal Data Center Consolidation Initiative: The FDCCI has set...
Energy Efficient HPC Data Centers
Bill Tschudi and Dave Martinez
Lawrence Berkeley National LaboratorySandia National Laboratory
November 11, [email protected]; [email protected]
eere.energy.gov
Resources
Benchmarking
Efficiency opportunities – typical measuresAir managementEnvironmental conditionsPower distributionLiquid coolingFree cooling
Outline
eere.energy.gov
Turkey
Sweden
Iran
World Data Centers
Mexico
South Africa
Italy
0 50 100 150 200 250 300
Electricity Use (Billion kWh)
Source for country data in 2005: International Energy Agency, World Energy Balances (2007 edition)
Data centers account for more energy than many countries
4
Many organizations involved with data center efficiency
Federal Data Center Consolidation Initiative: The FDCCI has set aggressive goals such as decommissioning 800 Federal data centers by 2015
Executive Order 13514: E.O. 13514 directs Federal agencies to employ best practices for energy efficient data center management
Federal data center energy efficiency drivers
6
LBNL supports DOE’s Data Center initiatives and the Federal Energy Management Program (FEMP)
Technical support is available
• DC Pro Tools• Data Center Energy Practitioner• Case studies• Best practice guides, etc.• Assessments
In progress: Guideline for Energy Considerations during Consolidations
Slide 7
The DC Pro tool suite includes a high level profile and system level tools
• Profiling Tool: profiling and tracking– Establish PUE baseline and efficiency
potential (few hours effort)
– Document actions taken– Track progress in PUE over time
• Assessment tools: more in‐depth site assessments– Suite of tools to address major sub‐
systems– Provides savings for efficiency actions– ~2 week effort (including site visit)
Slide 8
DC Pro tools point to efficiency opportunities but are not a substitute for an investment grade audit
High Level Profiling Tool• Overall energy performance (baseline) of data center• Performance of systems compared to benchmarks• List of energy efficiency actions and their savings, in terms of energy cost
($), source energy (Btu), and carbon emissions (Mtons)• Points to more detailed system tools
IT Module• Servers• Storage &
networking• Software
Electrical Systems
• UPS• Transformers• Lighting• Standby gen.
Cooling• Air handlers/
conditioners• Chillers,
pumps, fans• Free cooling
Air Management
• hot cold separation
• environmental conditions
Slide 9
The Data Center Energy Practitioner (DCEP) program qualifies individuals to perform data center assessments
9Developed by DOE in collaboration with Industry
Objective: Raise standards, repeatability, reach large numbers
Slide 10
http://www.cdcdp.com/dcep.php
10
The DCEP program is administered by Professional Training Organizations – selected through a competitive process
http://www.datacenterdynamics.com/training/course‐types/doe‐certifed‐energy‐professional
PTOs license training and exam content from DOE, provide training, administer exams, and issue certificates
11
The Federal Energy Management Program has resources
Best Practices Guide
Benchmarking Guide
Data Center Programming Guide
Technology Case Study Bulletins
Procurement Specifications
Report Templates
Process Manuals
Quick‐Start Guide
12
LBNL’s energy efficiency resources are diverse
Influencing the Market
Demonstration Projects
EE HPC Working Group
13
Members collaborate in areas such as performance metrics, thermal conditions, best practices, etc. as determined by the group. This large market influences manufacturers.
The EE HPC WG drives energy and computational performance improvement through collective actions
14
Results of a demonstration of wireless temperature monitoring
LBNL supports the Silicon Valley Leadership Group’s Data Center Summit.
Demonstrations have included DC Power, cooling options, control options, air management, etc.
LBNL demonstrates new or under‐utilized technologies that improve data center efficiency
15
DC Pro toolsData Center Energy Practitioner programComputing metrics development
Federal consolidation guidelineESPC content for data centers
Wireless assessment kitCompressor‐ less cooling
LBNL develops resources for the public
Heat re‐use Demonstration projects
High Level Metric: Power Utilization Effectiveness
(PUE) = Total Energy/IT Energy
Energy benchmarking reveals wide variation
End use breakdowns can be instructional
18
Intel measured energy end use in one of their data centers
Courtesy of Michael Patterson, Intel Corporation
Cooling
Air Management Free Cooling – air or water Environmental conditions Centralized Air Handlers Low Pressure Drop Systems Fan Efficiency Cooling Plant Optimization Direct Liquid Cooling Right sizing/redundancy Heat recovery Building envelope
UPS and transformer efficiency
High voltage distribution
Premium efficiency motors
Use of DC power Standby generation Right sizing/redundancy Lighting – efficiency and
controls On‐site generation
Electrical
Power supply efficiency
Standby/sleep power modes
IT equipment fans
Virtualization Load shifting Storage de-
duplication
IT
We found many areas for improvement…
Some strategies could be demand response strategies
What is Air Management?
20
Equip.Rack
Recirculation
Equip.Rack
By-Pass
Minimize mixing of air – both recirculated and bypassed
Thermal monitoring confirms separation of hot and cold air streams
Goal: Supply air directly to IT equipment intakes without mixing
22
Equip.Rack
No AirMixing
No AirMixing
Cool FrontAisle
Hot RearAisle
By‐pass air does no cooling
23
Misplaced perforated tilesLeaky cable penetrationsToo much supply airflow
Too high tile exit velocity
Recirculated air causes localized cooling problems
24
Equip.Rack
RecirculationAir
Lack of blanking panels
Gaps between racksToo little air supply
Short equipment rows
25
Containment can be achieved in various ways
Air curtains from the cleanroom industry
Fire marshal needs to be consulted
26
Don’t do stupid things
Open grate floor tiles reduce underfloorair pressure so air cannot flow to the top of racks
Air management does not include adding more fan heat to the room
27
These displays show two issues – low return air temperatures and simultaneous heating and cooling.
If you walk into an HPC center and it is cold…there is an energy efficiency opportunity
Microsoft operated computers in a tent for 8 months with no failures
There is a lot of misunderstanding of environmental conditions needed for IT equipment
29
Provides common understanding between IT and facility staff.
Developed with IT manufacturers
Recommends temperature range up to 80.6°F with “allowable” much higher.
ASHRAE Thermal Guidelines apply to HPC systems
Six classes of equipment identified with wider allowable ranges to 45° C (113°F).
Provides more justification for operating above the recommended limitsProvides wider
humidity ranges
30
Note that recommended range goes to 80F and allowable is much higher
ASHRAE Thermal Guidelines
31
The industry has moved on ‐ IT equipment operates reliably at higher temperatures.
Computing equipment is designed to meet ASHRAE Guidelines
32
Perceptions that data centers need to be cold should change
Data centers should be designed for computer comfort, not people comfort
Focusing on environmental conditions can lead to large energy savings
Inverter
In Out
Bypass
Battery/ChargerRectifier
Internal Drive
External Drive
I/O
Memory Controller
Processor
SDRAM
Graphics Controller
DC/DCAC/DC
DC/DC
AC/DC Multi outputPower Supply
Voltage Regulator Modules
5V
12V
3.3V
12V 1.5/2.5V
1.1V‐1.85V
3.3V
3.3V
12V
PWM/PFCSwitcher
Unregulated DCTo Multi Output Regulated DC
Voltages
Many power conversions occur before any computing is done. Each power conversion loses some energy and creates heat in the process.
Power Distribution Unit (PDU)
ServerUninterruptible Power Supply (UPS)
AC DC AC DC
Research illustrated that power conversion losses can be significant
45%
50%
55%
60%
65%
70%
75%
80%
85%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% of Nameplate Power Output
% E
ffici
ency
Average of All Servers
Uninterruptible Power Supplies (UPS)
Power supplies in IT equipment
Factory Measurements of UPS Efficiency
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%
Percent of Rated Active Power Load
Effic
ienc
y
Flywheel UPS
Double-Conversion UPS
Delta-Conversion UPS
(tested using linear loads)
Measured UPS efficiency confirms the curve
Sandia ‐ Thermal management case study
• Past– Supply 48‐52oF– Centrifugal fans– No automation– No integrated controls– Little concern for efficiency
– Moved from water cooled to air cooled computers
• Today– Supply at 58‐70oF– Plug fans– Automation– Integrated controls– Efficiency drives CRAC selection decisions
– Mixed environment of water/refrigerant cooled computers and indirect (air) cooled
– Hot/cold aisles36
Unclassified
Thermal management
In order to achieve these goals we have taken the following approach :
• Build a written program that would automate and tie all the CRAC units together and interface with our existing Facilities Control System.
• Standardize our equipment (CRAC units). Equipment criteria included:– CRAC units that could deliver an even air flow under the floor
• Plug fan technology to modulate based on air temperature of the fan motors (VFD's)
– Improved water‐side technologies (e.g. coil surface area, slant fin coils)– Vendors with demonstrated commitment:
• Improving energy efficiencies• Vision of the future
37Unclassified
Automated controls
38Unclassified
Thermal management
39
Red Sky’s Glacier Doors and XDPs 2009
Welding Curtain CRAC Plenum early
1980’s
Structured CRAC Plenums 2008
Air Flow Dams Removed from
Under Floor 2010
Hot Air Balloon Material for Cold Aisle Containment
2006
Unclassified
Thermal management ‐ 2012
40
Inside the Hot Aisle
Hot Aisle Containment and Controls Set point 95oF return air temperature
• Controls water valve & speed of fans 24kW per rack Water temperature 50oF (can raise to 60oF) 12 CRAC units controlled by VFD
• 30 ton units at 15,000 cfm
Top View of Containment
TLCC’sChama
Unclassified
TLCC’sPecos
HPC centers are moving to liquid cooling. Liquid has much more heat capacity than air
~ 200,000 cubic foot blimp
=
WaterAir
Volumetric heat capacity comparison
41
Pumps move energy more efficiently
42
0.58 bhpFan
0.05 bhpPump Pipe
4 gpm
Liquid cooling takes on different formsions
44
Liquid cooling guidelines enable standardization while achieving energy efficient liquid cooling
High temperature chilled water
Higher temperature produced with a cooling tower alone
Highest temperature produced with dry cooler (no water use)
Heat reuse
Source: ASHRAE whitepaper, 2011
45
Liquid cooling can reduce or eliminate compressor based cooling
High capital and operating cost for chillers Chillers can be
eliminated or downsized
Dry coolers alone can produce these temperatures
“Free cooling” isn’t entirely free but can save considerable energy in data centers
Translation –• Air economizer – open the window
• Water economizer ‐ use cooling towers with a heat exchanger
• Reduce or eliminate need for chillers
The results so far……
47Unclassified
Cold plate data
48Unclassified
Chilled water supply temperature from heat exchanger
Tower water supply temperature to heat exchanger
Outside air wetbulb temperature
Outside air drybulbtemperature
Knowledge and Adoption of Best Practices
Influencing suppliers:Thermal conditionsLiquid cooling
Training
There is always more that can be done
In summary, there are many opportunities for energy efficiency in HPC centers.
Questions?
50
Resources ‐ websites
http://www1.eere.energy.gov/femp/program/data_center.html
http://hightech.lbl.gov/datacenters.htmlhttp://hightech.lbl.gov/serverclosets.htm
http://www.energystar.gov/index.cfm?c=prod_development.server_efficiency
U.U.S. Department of
ENERGY http://www1.eere.energy.gov/manufacturing/datacenters/