1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th...

38
1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks, 2002

Transcript of 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th...

Page 1: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

1

ISCA 2004 Tutorial

Thermal Issues for Temperature-Aware Computer

Systems

Saturday, June 19th

8:00am - 5:00pm 

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Page 2: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

2

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Presenters:

Kevin Skadron ([email protected])CS Department, University of Virginia

Mircea Stan ([email protected])ECE Department, University of Virginia

David Brooks ([email protected])CS Department, Harvard University

Antonio Gonzalez ([email protected])UPC-Barcelona, and Intel Barcelona Research Center

Lev Finkelstein ([email protected])Intel Haifa

Page 3: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

3

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Overview

1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)

Page 4: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

4

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Overview

1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)

Page 5: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

5

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Motivation

• Power consumption: first-order design constraint unconstrained power is a theoretical max peak (inst.) power is limiting power delivery (dI/dt) sustained power limits thermal design/packaging max sustained power: thermal “virus”

same as thermal design power average active power and idle power limit mobile

battery life, etc. Common fallacy: instantaneous power temperature

• Power-density is increasing even faster: thermal effects become more problematic.

Moore’s Law: exponential increase Need Power/Temperature-aware computing!

Page 6: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

6

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Power density

From PACT 2000 keynote; source: Intel website

But this curve is flattening

Page 7: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

7

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Power-aware figures of merit

• Power (P): battery time (mobile) packaging (high-performance)

• Energy (PD): battery life (mobile) fundamental limits (kT)

• Energy-delay (PD^2): performance and low power

• Energy-delay^2 (PD^3): emphasis on performance

Power-aware low powerSimilar to “old” VLSI complexity (A, AD, AD^2)None of these are appropriate for thermal

Refs: R. Gonzales et al. “Supply and threshold voltage scaling for low power CMOS”, JSSC, Aug. 1997

A. Martin et al. “Design of an Asynchronous MIPS R3000”, ARVLSI’97J. Ullman, “Computational aspects of VLSI”, CS Press, 1984

Page 8: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

8

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Cooking-aware computing

Boiling water will come soon

Page 9: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

9

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Power and temperature are BAD

• and can be EVIL

Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html

Page 10: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

10

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Overview

1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)

Page 11: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

11

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Thermal issues

Temperature affects:• Circuit performance• Circuit power (leakage)• IC reliability• IC and system packaging cost• Environment

Page 12: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

12

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Performance and leakage

Temperature affects :

• Transistor threshold and mobility

• Subthreshold leakage, gate leakage

• Ion, Ioff, Igate, delay

• ITRS: 85°C for high-performance, 110°C for embedded!

IonNMOS

Ioff

Page 13: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

13

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Temperature-aware circuits

• Robustness constraint: sets Ion/Ioff ratio

• Robustness and reliability: Ion/Igate ratio

Idea: keep ratios constant with T: trade leakage for performance!

Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000Garrett et al. “T3…”, ISCAS 2001

Page 14: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

14

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Resulting performance

25% - 30% extra performance (110oC to 0oC)

regularTAC

Page 15: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

15

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Reliability

The Arrhenius Equation: MTF=A*exp(Ea/K*T)

MTF: mean time to failure at TA: empirical constantEa: activation energy

K: Boltzmann’s constantT: absolute temperature

Failure mechanisms:Die metalization (Corrosion, Electromigration, Contact spiking)Oxide (charge trapping, gate oxide breakdown, hot electrons)Device (ionic contamination, second breakdown, surface-charge)Die attach (fracture, thermal breakdown, adhesion fatigue)Interconnect (wirebond failure, flip-chip joint failure)Package (cracking, whisker and dendritic growth, lid seal failure)

Most of the above increase with T (Arrhenius)Notable exception: hot electrons are worse at low temperatures

Page 16: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

16

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Arrhenius or Erroneous?

“Hot” issue in thermal community: is the Arrhenius equation correct/relevant?

C. Lasance (Philips): “Erroneous” equation• Claim: what really matters are thermal gradients

in space and time, thermal cycling

• Will not solve the dispute here!• Agreement: thermal issues are key for reliability,

whether static or dynamic

Another famous quote: “We have a headache with Arrhenius” (T. Okada, Sony, when asked about reliability prediction methods)

Page 17: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

17

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Packaging cost

From Cray (local power generator and refrigeration)…

Source: Gordon Bell, “A Seymour Cray perspective”http://www.research.microsoft.com/users/gbell/craytalk/

Page 18: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

18

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Packaging cost

To today…• Grid computing: power plants co-located near

compute farms• IBM S/390:refrigeration

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D

Page 19: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

19

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

IBM S/390 refrigeration

• Complex and expensive

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D

Page 20: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

20

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

IBM S/390 processor packaging

Processor subassembly: complex!C4: Controlled Collapse Chip Connection (flip-chip)

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D

Page 21: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

21

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Intel Itanium packaging

Complex and expensive (note heatpipe)

Source: H. Xie et al. “Packaging the Itanium Microprocessor”Electronic Components and Technology Conference 2002

Page 22: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

22

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

P4 packaging

• Simpler, but still…

Source: Intel web site

Page 23: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

23

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Environment

• Environment Protection Agency (EPA): computers consume 10% of commercial electricity consumption– This incl. peripherals, possibly also manufacturing– A DOE report suggested this percentage is much lower– No consensus, but it’s still a lot

• Equivalent power (with only 30% efficiency) for AC• CFCs used for refrigeration• Lap burn• Fan noise

Page 24: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

24

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Heat mechanisms

• Conduction• Convection• Radiation• Phase change• Heat storage

Page 25: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

25

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Conduction

• Similar to electrical conduction (e.g. metals are good conductors)• Heat flow from high energy to low energy• Microscopic (vibration, adjacent molecules, electron transport)• No major displacement of molecules• Need a material: typically in solids (fluids: distance between mol)• Typical example: thermal “slug”, spreader, heatsink

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

A

Page 26: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

26

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Conduction

Different materials(not a strongfunction oftemperature)Si – more variation

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 27: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

27

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Convection

• Macroscopic (bulk transport, mix of hot and cold, energy storage)

• Need material (typically in fluids, liquid, gas)• Natural vs. forced (gas or liquid)• Typical example: heatsink (fan), liquid cooling

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 28: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

28

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Radiation

• Electromagnetic waves (can occur in vacuum)• Negligible in typical applications• Sometimes the only mechanism (e.g. in space)

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 29: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

29

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Surface-to-surface contacts

• Not negligible, heat crowding• Thermal greases (can “pump-out”) • Phase Change Films (undergo a transition from solid to

semi-solid with the application of heat)

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 30: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

30

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Phase-change

Thermal solutions evolution:• Natural air cooling• Forced-air cooling• Liquid cooling• Phase change (e.g. heat pipe)• Refrigeration

Phase change:

a. Solid changing to a liquid—fusion, or melting,

b. Liquid changing to a vapor—evaporation, also boiling,

c. Vapor changing to a liquid—condensation,

e. Liquid changing to a solid—crystallization, or freezing,

f. Solid changing to a vapor—sublimation,

g. Vapor changing to a solid—deposition.

Page 31: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

31

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Thermal capacitance

• Example:

(Aluminum) = 2,710 kg/m3

Cp(Aluminum) = 875 J/(kg-°C)V = t·A = 0.000025 m3

Cbulk = V·Cp· = 59.28 J/°C

Page 32: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

32

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Refrigeration

“conventional” vs. thermo-electric (TEC)• Can get T < T_amb (“negative” Rth!)TEC: Peltier effect (can use for local cooling)

Page 33: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

33

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

TEC electro-thermal model

Page 34: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

34

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Simplistic steady-state model

All thermal transfer: R = k/A

Power density matters!Ohm’s law for thermals

(steady-state)

V = I · R -> T = P · R

T_hot = P · Rth + T_amb

Ways to reduce T_hot:

- reduce P (power-aware)

- reduce Rth (packaging)

- reduce T_amb (Alaska?)

- maybe also take advantage of transients (Cth)

T_hot

T_amb

Page 35: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

35

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Simplistic dynamic thermal model

Electrical-thermal duality V temp (T) I power (P) R thermal resistance (Rth) C thermal capacitance (Cth)RC time constant

KCLdifferential eq. I = C · dV/dt + V/Rdifference eq. V = I/C · t + V/RC · tthermal domain T = P/C · t + T/RC · t(T = T_hot – T_amb) One can compute stepwise changes in

temperature for any granularity at which one can get P, T, R, C

T_hot

T_amb

Page 36: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

36

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Combined package model

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Steady-state

Tj – junction temperature

Tc – case temperature

Ts – heatsink temperature

Ta – ambient temperature

Page 37: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

37

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Itanium package model

Example: processor + 4 cache modules

Source: H. Xie et al. “Packaging the Itanium Microprocessor”Electronic Components and Technology Conference 2002

Page 38: 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

38

© M

irce

a St

an, K

evin

Ska

dron

, Dav

id B

rook

s, 2

002

Thermal issues summary

• Performance, power, reliability• Architecture-level: conduction only• Convection: too complicated• Radiation: can be ignored

• Use compact models for package• Power density is key