Aurora hpc energy efficiency
-
Upload
eurotechhpc -
Category
Documents
-
view
1.966 -
download
7
description
Transcript of Aurora hpc energy efficiency
Motivations for energy efficiency
• Energy Efficiency and SuperMUC• Motivation
•Academic and governmental institutions in Bavaria use electrical energy from renewable sources
•We currently pay 15.8 Cents per KWh
•We already know that we will have to pay at least 17.8 Cents per KWh in 2013
quote from Meijer Huber, LRZ
Motivations for energy efficiency
Motivation• Data centers are highly energy-intensive facilities• 10-100x more energy intensive than an office.• Server racks well in excess of 30kW.• Surging demand for data storage.• ~3% of U.S. electricity consumption.• Projected to double in next 5 years.• Power and cooling constraints in existing facilities.
Sustainable Computing Why should we care?• Carbon footprint.• Water usage.• Mega$ per MW year.• Cost OpEx > IT CapEx!
Thus, we need a holistic approach to sustainability and TCO for the entire computing enterprise, not just the HPC system
quote from Steve Hammond, NREL
PUEs in various data centersPUE<1.8 requires free cooling or liquid cooling
Global bank’s best data center (of more than 100) 2.25 Air
EPA Energy Star Average 1.91 Air/Liquid
Intel average >1.80 Air
ORNL 1.25 Liquid
Google 1.16 Liquid coils, evaporative tower, hot aisle containment
Leibniz Supercomputing Centre (LRZ) 1.15 Direct liquid
National Center for Atmospheric Research (NCAR) 1.10 Liquid
Yahoo Lockport *(PUE declared in project) 1.08 Free air cooling + evaporative cooling
Facebook Prineville 1.07 Free cooling, evaporative
National Renewable Energy Laboratory (NREL) 1.06 Direct Liquid + evaporative tower
Energy efficiency - methods
IT equipmentMaximize Flops / Watt
Data CenterFacility PUE
Data Center or ecosystem
Reuse thermal energy
1 IT equipmentMaximize efficiency
2
3
Increased work per wattEliminate fansComponent level heat exchangeNewest processors are more efficientLiquid coolingEnergy aware design
Optimize air coolingFree coolingLiquid coolingDirect liquid coolingOptimization of power conversion
Direct liquid coolingMaximize outlet temperatureHolistic view of data center planning
Eurotech energy efficient design• Aurora supercomputers have been designed using standard
component but making choices for the best energy efficiency possible
• Aurora HPCs benefit from the Eurotech experience of making the power conversion chain efficiency progressively increased from 89% to 97%
The approach is:• Choice of the most efficient components in the market. That
is, choosing components (processors, accelerators, voltage regulators, memories, minor electronic parts) that minimize energy consumption giving the same functionality and performance
• Choice of the best «working points» to top the power convertors efficiency curves
• Water cooling to lower the working temperature of components and maximize their efficiency and eliminate fans
Gain DC/DC conversion efficiency• In the DC/DC choice a gain of over 2% in efficiency, from 95,5 % to
98%• Choice of the optimal current (I) to work on the top of the conversion
curves
Existing DC/DC conversion New upgraded DC/DC conversion
Water cooling and efficiency178 nodes – AMD Opteron 6128HE CPUs (Magny Cours) - 16GB RAM Measuremets
taken by LRZ
• With aircooling the CPU’s operate at about 5°C below maximum case temparture
• Normal operation of an water cooled server is with water of 20°C, which is about 40°C below the maximum case temperature
Water cooling = No fans, Low noise
• Ventilators consume about 20 Watt per node in «normal» operation! • This is roughly 5% of peak power…per se a small contribution but the SUM of
all of the contributions described gives a considerable positive delta in energy efficiency
Some efficiency resultsMeasures taken on a single Aurora Tigon node card
Intel Xeon E5 + Nvidia K20x: peak efficiency 3.57 Gflop/s per Watt
Some efficiency resultsMeasures taken on a single Aurora Tigon node card
Intel Xeon E5 + Nvidia K20x: peak efficiency 3.57 Gflop/s per Watt
Some efficiency resultsMeasures taken on a single Aurora Tigon node card
Intel Xeon E5 + Nvidia K20x: peak efficiency 3.63 Gflop/s per Watt
Some efficiency resultsScalability• Single node analysis gives the settings to
find the most efficient working point
• Scalbility over a rack (128 nodes) is 90%
• Rmax measured over a 128 nodes system 215 Tflop/s
• Efficency over a 128 nodes system 3.2 Gflop/s per Watt
128 Node cards2 x Nvidia K20 GPUs
2x Intel Xeon E5 2687W CPUs16 GB of RAM
Direct Water CoolingWater temperature 19 +/- 1 °C
Number 1 in Green 500 list
The Eurora supercomputer, built on the same Aurora Tigon architecture, is installed at Cineca.
The energy efficiency measured on Eurora following the Green500 reported 3.21 GFlops/Watt This result has placed Eurora at the first place of the Green500 list
Eurora supercomputer
The world most efficient architecture
The measurements were taken on a calibrated power meter with the system running a customized version of LINPACK
Eurora supercomputer
System Eurora supercomputer: 64 nodes, 128 CPUs, 128 GPUs
Node Card Intel Xeon E5-2687W (150W),
n.2 nVIDIA K20s, n.1 Infiniband QDR
NVIDIA® Tesla® K20
Ambient Temperature 20°C+/-1°C
Coolant Temperature 19°C+/-1°C
Coolant water
Flowrate 120lph +/-7lph each EuroraBoard
Reducing cooling energy
Ways to reduce cooling energy consumption• Air cooling optimization (hot and cold aisle containment…)• Free cooling: avoid compressor based cooling (chillers) using cold air coming from
outside the data center. Possible only in cold climate or seasonal• Free cooling with heat exchangers (dry coolers). Dry coolers consume much less
energy than chillers!• Liquid cooling to increase the cooling efficacy and reduce the power absobed by chillers• Liquid cooling with free cooling: the liquid is not cooled by chillers but by dry coolers• Hot liquid cooling allows the use of dry coolers all year round and also in warm climates• Liquid cooling using a natural source of • Alternative approaches: spray cooling, oil submersion cooling
Eurotech Aurora approach:• Direct Hot Water Cooling with no chillers but only dry coolers
Internal cooling Loop
Loop #1
Loop #6
Loop #12
Pump
Heat exchanger
Dry cooler
Filter
Aurora liquid cooling infrastructure
LOOP #1 LOOP #2
By passheater
Chiller sDry Coolers
Pumps consume energy but they can control the flowrate
Increasing the flowrate is much less energy demanding that swicthin on a chiller
Advantages of the Eurotech approach
Hot liquid cooling no chillers save energy• Avoid/limit expensive and power hungry chillers with the only
cooling method that requires almost always only dry coolers • Minimize PUE and hence maximize energy cost savings• Reuse thermal energy for heating, air conditioning, electrical
energy or industrial processes• “Clean” free cooling: no dust, no filters needed to filter
external air
Direct liquid cooling via cold plates effective cooling• Allow very limited heat spillage• Maximize the effectiveness of cooling allowing for hot water
to be used (up to 55 °C inlet water)
Comprehensive more efficiency• Cools any source of heat in the server (including power
supply)
Aurora results
• Use of telecom technology• Low cost• High reliability• Low level of maintenance• Easy to control
• Power conversion with high level of redundancy• Very high efficiency for EVERY load of the computer
• The AC/DC conversion efficiency increased from 96% to close to 97%
• 11 kW telecom rectifier shelfs of 1U, digitally controlled• Distributing the power over the backplane at 54V,
minimizing Ohmic losses
Three Stages Cooling + Heat Recovery
Liquid to LiquidHeat exchanger
1 MW
0.13 MW
30° C
0.87 MW
55° C
Thermal energy re-use
Liquid to Liquid Heatexchanger
Computingsystem
rack 1
Computingsystem
rack 2
Computingsystem
rack #n
25° C20° C 30° C
Minimize waste: thermal energy re-use
PUE < 1 !!
Minimize waste: thermal energy re-use
• The ability to effectively re-use the waste heat from the outlets increases with higher temperatures.
• Outlet temperatures starting from 45°C can be used to heat buildings, temperatures starting from 55°C can be used to drive adsorption chillers.
• Higher temperatures may even allow for trigeneration, the combined production of electricity, heating and cooling
• Warm water can be used also in industrial processes
Thermal energy recovery and swimming pools
Swimming pool 50 m, 4 lanes, 2m deep that looses 2°C per day if not heatedThe heat exchange system has 90% efficiency
Volume water = 2,50m x 4 x 50m x 2m = 1000m^3 = 10^6 litri = 10^6 KgWater specific heat= specificheat = 4186 Joule / Kg KWater target temperature = 28°C How much power do I need to keep the swimming pool at 28°C?
P(W) = Q(Joule)/t(sec) = m(kg) * c_specif (Joule/Kg K) * deltaT (K)/t(sec) = 10^6 Kg * 4186 Joule/Kg K * 2K ( 24*60*60 sec ) = 96900 W = 96,9 KW So we need a supercomputer generating roughly 110 kW. Assuming an energy efficiency of 900 Mflops/W… …to heat the swimming pool we would need to install a 100 Tflop/s system.This is, for instance, one Eurotech Aurora HPC 10-10 rack
Comparison - investment
Investment (K US dollars) Datacenter A Datacenter B Datacenter C
Servers $6,200 $6,200 $6,200Network and other IT $440 $440 $440Building $1,260 $540 $360Racks $280 $120 $60Cooling $2,670 $3,060 $1660Electrical $3,570 $3,570 $2,420TOTAL INVESTMENT $14,420 $13,930 $11,140
Data center A – PUE 2.2
Data center B – PUE 1.6
Data center C – PUE 1.05
Medium density (20 kW per rack) – air cooled
High density (50 kW per rack) – optimized air cooling, rear door liquid cooling
High density (87 kW per rack) – direct hot liquid cooling, floating Tamb
Comparison – annualized TCO
Annual cost (K US dollars) Datacenter A Datacenter B Datacenter CCost of energy $1,970 $1,430 $640Retuning and additional CFD $6 $3 $0Total outage cost $270 $270 $230Preventive maintenance $150 $150 $150Annual facility and infrastructure maintenance. $310 $290 $140Lighting $5 $3 $2Annualized 3 years capital costs $2,040 $2,000 $1,980Annualized 10 years capital costs $880 $940 $540Annualized 15 years capital costs $130 $60 $40ANNUALIZED TCO $5,761 $5,146 $3,722
Data center A – PUE 2.2
Data center B – PUE 1.6
Data center C – PUE 1.05
Medium density (20 kW per rack) – air cooled
High density (50 kW per rack) – optimized air cooling, rear door liquid cooling
High density (87 kW per rack) – direct hot liquid cooling, floating Tamb
Main Areas of impact on TCO Links to sustainability
Energy savings
Lower cost due to less energy consumed
Space savings
Savings in real estate, racks,
electrical, cooling and network
Reliability
Savings in downtime indirect cost and
maintenance
Sustainability impact
High
Sustainability impact
High
Sustainability impact
Medium