Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling...

34
Jie Wei Fujitsu Advanced Technologies Limited Liquid Cooling Challenges & Opportunities of the Technology for HPC Systems ECTC2015/CPMT, San Diego, CA, May 28, 2015

Transcript of Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling...

Page 1: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Jie Wei Fujitsu Advanced Technologies Limited

Liquid Cooling Challenges & Opportunities of the Technology for HPC Systems

ECTC2015/CPMT, San Diego, CA, May 28, 2015

Page 2: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

• System reliability: 100X

• Power consumption: 0.5X

7 years ago

2 © Fujitsu

Page 3: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Topics

3

• Liquid cooling, back to the future - the state of the art technologies - high density, toward volumetric scalability

• Challenges, design & implementation - packaging & thermal capability - reliability and product validation

• Opportunities, cooling and beyond - energy efficiency, saving, and reuse - bring ITE and facility together

© Fujitsu

Page 4: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Cold plate

CPU/LSI chips

Coolant/ Water flow

Liquid-Cooled Electronics • Capability for high density packaging • Energy efficiency at datacom level

© Fujitsu

Page 5: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Indirect liquid cooling

Direct liquid cooling

© Intel

© Fujitsu

Page 6: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Open-loop to facility cooling

Cold plates

Coolant/Water to facility cooling

Cold plates Air-cooled

heat-exchanger

Liquid pumps Closed-loop on board

© Fujitsu

Page 7: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

7

Fujitsu PRIMEHPC FX10 (2012)

・Air & water hybrid cooling ・Open-loop/chilled water full cooling

© Fujitsu

Page 8: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Fujitsu PRIMEHPC FX100 (2014)

・High density packaging ・Open-loop/chilled water full cooling

© Fujitsu

Page 9: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

IBM BG/Q Sequoia (2012)

・Thermal contact structure ・Open-loop/chilled water full cooling

© IBM

Page 10: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

10

IBM Aquasar (2012)

・Zero-emission ・Open-loop/warm water full cooling

© IBM

Page 11: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

HP Apollo 8000 (2014)

Pumped water circulation under vacuum

Heat-pipe dry-disconnect with rack water cooling

© Hewlett Packard

Page 12: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

12

Immersion (2013/2014) ExaScaler/KEK

Suiren

NEC/TIT TSUBAME-KFC

Allied Control ASIC Miner

Page 13: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

13

Design & Implementation of the LC Components

- Packaging/thermal capability - Reliability and product validation

Page 14: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

14

Design: packaging & cooling • Performance, mfg., cost, maintenance • Materials and novel technologies

- 1U_board - Hybrid cooling

© Fujitsu

Page 15: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

15

Methodology: cold plates

© IBM Power 775

Cold-plate with embedded tubes

Cold-plate with finned mini-channels

© Fujitsu FX10

Page 16: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Compliant & integrated tubing

Flow-channel

Cold-plates

Mechanics: structure & tubing

© Fujitsu

Page 17: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

System board

Cold plates

Thermal: hybrid configuration Air convection

17 © Fujitsu

プレゼンター
プレゼンテーションのノート
モデルの大きさの違いをシステムボードを例に説明します
Page 18: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

18

Implementation: reliability & product validation

© Fujitsu

Page 19: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

- Electronics on thermal management - System control, redundancy, detection

- Mechanical design & verification

19

Reliability issues

Leakage Performance

© Fujitsu

Page 20: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

20

ASHRAE Guidelines

• Liquid Cooling Guidelines for Datacom Equipment Centers • Datacom Equipment Power Trends and Cooling Applications

ASTM Standards

• ASTM D1384-05 Standard Test Method for Corrosion Test of Engine Coolants in Glassware • ASTM D4340-96 Standard Test Method for Corrosion of Cast Aluminum Alloys in Engine Coolants Under Heat Rejecting

UL/ANSI Standards

• UL 1995 Heating and Cooling Equipment (includes thermal cycling, aging for gaskets, pressure, and fatigue tests) • 109 Tube Fittings for Flammable and Combustible Fluids, Refrigeration Service and Marine USE

RoHS Specifications

• Directive 2002/95/EC of the European Parliament and of the Council on the restriction of the use of certain hazardous substances in electrical and electronic equipment

Standards & specifications

© Fujitsu

Page 21: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

- Coolants/Fluids - Deionized water of ASTM D1193-06, type II, grade A - 100-1000 ppm BTA – copper corrosion inhibitor

- Materials - Copper, brasses: low zine <15%, low lead - Stainless steel: low carbon 304, 304L, 316 Homogenized and passivated - Plastics / Rubber: Flammability with UL 94 V1 or VW1 Geometrical stable of no swelling Copper

Aluminum

21

Compatibility of coolants & materials

© Fujitsu

Page 22: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

22

Cold plate

Implementation & assembling

Electronic assembling

Cooling unit manufacturing/brazing

Manufacturing Assembling & test

Inspecting & validation

© Fujitsu

Page 23: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

23

Life Test • Long-term heat load testing or Accelerated life testing - temperature cycle, high-temperature/humidity, pressure - thermal/flow load testing for system performance variability

Component/Unit Seal Validation

• Helium leak testing, with thermal cycle testing • Chemical compatibility testing, Tubing permeability testing • Burst testing (UL1995 pressure cycle at low and high temp.)

Coolant Lifetime Validation

• Fluid breakdown testing (ASTM D1384/D4340) • System level fluid loss and/or permeation testing • Long-term storage testing (corrosion and fluid volume)

Component/Unit Freeze/Thaw Test

• Max./Min. shipping, operating, storage temperatures • Freezing-point validation for water-based solutions

Product validation

© Fujitsu

Page 24: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

24

Cooling and Beyond

© Fujitsu

Page 25: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Power density & environment

Source: Emerson Network Power, “Data Center 2025”

© ASHRAE

Page 26: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

ICT

rac

k

24℃

Fan

CRAC

Chiller

Coo

ling

tow

er

Fan

Fan 18℃

Pump Pump

9℃

Rack/CRAC fans ITE Chiller pump

Tower pump/fans

Compressor

Bring ITE & facility together Systematic optimization for - power / space / volume densities - energy / cooling efficiency

Electric required for an air-cooling DC © Fujitsu 26

Page 27: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

ICT

rac

k Fa

n

Chiller

Coo

ling

tow

er

Pump Pump

Fan CDU

20℃

Coolant pump ICT Chiller pump

Tower pump/fans

Compressor

18℃

9℃ Pump

Electric required for an water-cooling DC

Power consumption

Rack fans

CRAC fans

CDU pumps

Chiller pumps

Refri. compressor

Tower Pumps/fans

Air convection O O X O O O

Liquid circulation Ñ X O O O O

Immersion bath X X O O O O

© Fujitsu 27

Page 28: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

28

Expanded cooling margins

© Fujitsu

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8

CPU_Tc

ambient

air-

cool

ed h

eat s

ink

cool

ing

wat

er fo

r C

RAC

/CD

U

cool

ing

mar

gin

wat

er-c

oole

d co

ld p

late

Air- Water- cooling cooling

Tem

pera

ture

/ ºC

・CPU power: 150W ・CPU package: 1U ・cooling water required - air cooling: 27ºC - liquid cooling: 65ºC

Page 29: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

29

Power/Energy saving for cooling

Air- 1.0 Liquid- 0.7 Liquid+Envir. 0.2

© Fujitsu

Refri.

CRAC

Pumping

Lighting, etc.

Page 30: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

5 years later ~2020 • Energy efficiency & reuse

• Density toward volumetric

© Fujitsu 30

Page 31: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Ultimate efficiency/density

© IBM

© Fujitsu 31

Page 32: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

Integration & innovation of the technologies for

Chip power: 50~100 W/cm2 , 500+ W/Chip Packaging: 2000+ W/Board

・ 3D PKG/cooling: 1~3 kW/cm3 ・ energy efficiency: PKG~DC, PUE<1.1 ・ reuse of exhaust: PUE~1.0

© Fujitsu 32

Page 33: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

• Liquid cooling - components, units and systems are considerably

complicated and greater reliability necessitated. - reliability and product validation in each step of

mfg./assembling process, is the most important.

• Cooling and beyond - Integration of the technologies from chip to system. - Co-design of the system for energy saving/reusing

from chip to environment, and power-plant.

In a Summary

33 © Fujitsu

Page 34: Challenges & Opportunities of the Technology for HPC Systems · 2018. 1. 31. · Liquid Cooling Challenges & Opportunities . of the Technology for HPC Systems . ECTC2015/CPMT, San

34