Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna &...

34
Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics [email protected] K: INTEL, FP7: THERMINATOR, Artist-Design, ERC-Multitherman

Transcript of Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna &...

Page 1: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Managing MPSoCs beyond their Thermal Design Power*

Luca BeniniUniversità di Bologna & STMicroelectronics

[email protected]

*ACK: INTEL, FP7: THERMINATOR, Artist-Design, ERC-Multitherman

Page 2: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Mobile SoCs are cool…right?

Wrong!So.. got myself a HTC (T-Mobile) HD2 […]. As i found out the problem is pretty common: damn thing restart itself - thermally related - the old CPU overheat problem. By searching the net I found out that it's pretty common with some HTC models. HD2 has it, Desire has it, Nexus One has it, hell even some xperia models have it.. about half of the devices powered by anything from the Snapdragon series could have it.

June 2011 - http://forum.xda-developers.com/showthread.php?t=982454

Why?ARM has unveiled its next generation "Eagle" (Cortex A15) processor, pitching it at everything from smartphones to energy-efficient servers.The A15 will be initially produced at 32nm or 28nm, although ARM claims the roadmap stretches down to 20nm. It will deliver clock speeds of up to 2.5GHz.

Aug. 2011 - http://www.pcpro.co.uk/news/360994/arm-preys-on-smartphones-and-servers-with-eagle

Page 3: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

A 2011 Mobile SoC• Tegra II

– TSMC 40nm (LP/G)– A9 - 1GHz (G)– GPU, etc. - 330MHz (LP)– GEForce ULV (8 shaders)– 2 separate Vdd rails– 1MB L2$– 32b LPDDR2 (600MHz DR)

• Tegra II 3D– A9 – 1.2GHz– GPU – 400Mhz

NVIDIA Tegra II SoC

SoC+DRAM PoP

Page 4: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

3D-SoCs are even worse

Power wall Thermal wall

Page 5: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Outline

• Introduction• Scalable Control• Scalable model learning• Experimental Environment• Challenges ahead

Page 6: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

O.S

DRM - General Architecture

• System (Chip Scale)• Sensors

– Performance counter - PMU

– Core temperature• Actuator - Knobs

– ACPI states– P-State DVFS– C-State PGATING

– Task allocation• Controller

– Reactive– Threshold/Heuristic– Controller theory

– Proactive – Predictors

Simulation snap-shot

L2

CPU1L1

L2

DRAM

Network

CPU2L1

CPUNL1

HW

SW

App.1

Thre

ad

1 ...

Th

read

N

App.N

Th

read

1 ...

Th

read

N.......

f,v

PGATING

TCPU,#L1MISS,#BUSACCESS,CYCLEACTIVE,....

CONTROLLER

• Controller • Minimize energy

• Bound the CPU temperature

• Layered approach• Stack of controllers

• Energy controller

• Thermal controller

CONTROLLER

Energy

Thermal

TifECi

WLei

fTCi

Page 7: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

task tasktask

Modello Termico

TjThermalmodel

Tn,j

Pn,j

Modello di potenza

TaskjPj

Power model

P=g(task,f)

Thermal & Power Modeling

Linear dynamic state space model

Non linear

Page 8: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Thermal transient1 2 3 4 5 6 7 8

1

2

34

56

Ts < 2ms Ts < 50ms Ts < 75ms Ts < 0.1s

Ts < 0.25s Ts < 0.5s Ts > 0.5s

Thermal locality (Direct Fourier law implication):• Continuous model:

– Thermal neighborhood = Physical neighborhood• Discrete model:

– Thermal neighborhood depends on sample time• Hotspot simulation of ‘Intel SCC like’ 48core

– Each core : Area = 11.82mm2, Pmax = 2.6W– We powered on only Core(5,3) – T neighborhood > +0.1°C

• Thermal transient – Model Order– Different materials reflects in

different time constants [1]

• Silicon die, heat spreader, heat sink • Second order model

[1] W. Huang Differentiating the roles of IR measurement and simulation for power and temperature-aware design 2009.

O.S time scale Neighborhood

Page 9: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Thermal Controller

COMPLEXITY[Intel®, ISSCC 2007]

Threshold basedcontroller

•T > Tmax low freq•T < Tmin high freq• cannot prevent overshoot• thermal cycle

Classical feed-back controller

• PID controllers• Better than threshold based approach• Cannot prevent overshoot

Model Predictive Controller

•Internal prediction:avoid overshoot

•Optimization: maximizes performance

• Centralized• aware of neighbor

cores thermal influence

• All at once – MIMO controller

• Complexity !!!

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

Page 10: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

MPC Scalability

4

36

[Intel, ISSCC 2007] [Intel, ISSCC 2007]

MPC Complexity• Implicit - a.k.a. on-line

• computational burden

• Explicit – a.k.a. off-line• high memory

occupation

# CORES

# E

xpli

cit

reg

ion

s

[Intel, ISSCC 2007] [Intel, ISSCC 2007]

8

1296

16

Ou

t o

f m

emo

ry

Complexity grows superlinearly with number of cores!!

Page 11: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Addressing Scalability

On a short time window, power has a local thermal effect!

4

36

# CORES

# E

xpli

cit

reg

ion

s

8

1296

16

Ou

t o

f m

emo

ry

8 16 32

One controller for each coreController uses:• local power & thermal model• neighbor’s temperatures

Fully distributedComplexity scales linearly with #cores

Page 12: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Distributed Control

CoreN

Core1

Corei

multicore

Distributed and hierarchical controllers:• Energy Controller (EC)

–Output Frequency fEC• Minimize power – CPI based• Performance degradation < 5%

• Temperature Controller (TC) –Distributed MPC– Inputs:

• fEC ,TCORE, TNEIGHBOURS

–Output• Core frequency (fTC)

fTC i,k+1

fTC N,k+1

T i,k

T i,k

controllernode

Ti+1,k

Ti-1,k

TN-1,k

Tx,k

T2,k

TCi

TCN

fEC i,k+1

ECi

CPI i,k+1

CPI N,k+1

ECN

fEC N,k+1

Page 13: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Thermal ControllerCore 1CPI

f1,EC

f1,TC

Thermal ControllerCore 2CPI

f1,EC

f2,TC

Thermal ControllerCore 3CPI

f1,EC

Thermal ControllerCore 4CPI

f1,EC

PLANT

T1+Tneigh

T1+Tneigh

f4,TC

T3+Tneigh

T4+Tneigh

f3,TC

Ene

rgy

Con

trol

ler

High Level Architecture

Distributed Controller

Page 14: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Distributed Thermal Controller

CPI

f1,EC

Observer

P1,ECg(·) MPC Controller

Linear Model

QP Optimizx1

TENV

g-1(·)CPI

P1,TC

f1,TC

MPC Controller Core 1

Nonlinear(Frequency to Power)

Linear(Power to Temperature)

s.t

2 states per core

T1

Classic Luenberger state observer

Implicit formulation

Page 15: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

MPC trade-off

Trace Driven Simulation (Matlab) – gold model• Parsec trace obtained on real

HW

• Power Model: Nonlinear vs. linear

• Thermal Model:one vs. two

• Centralized vs. Distributed

Page 16: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Outline

• Introduction• Scalable Control• Scalable model learning• Experimental Environment• Challenges Ahead

Page 17: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

CoreN

Core1

Corei

multicore

Model Identification

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

MPC needs a Thermal Model

• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing

“In field” Self-CalibrationWorkload

tCoreN

Workload

t

Workload

tCorei

Workload

t

Workload

t

Workload

tCore1

Workload

t

Workload execution

Training tasks

Workload

Power

Temperature

System Identification

Identified State-SpaceThermal Model

• Force test workloads• Measure cores temperatures• System identification

Page 18: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Modelleast squareoptimization

DataOptimal parameters

Test pattern

Systemresponse

A,B

e = Tmodel - Treal

Parametric optimization

Parameters

Cost function

Error Function

i

LS System Identification

Page 19: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Problem: unstable Model

20.5 21 21.5 22 22.5 23

40.5

41

41.5

42

42.5

Tcore2

seconds

°C

MeasuredThermal Model

Identification based on pure LS fitting

Black-box Identification

Page 20: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Tamb

Tcore

t

Tcore cannot be ≠ Tamb with P=0

Gray Box identification

LS model must be constrained by physical properties to avoid over-fitting

Page 21: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Constraint on initialcondition

Linear constraint

CONSTRAINED LEAST SQUARES

26 26.2 26.4 26.6 26.8 27 27.2

37

38

39

40

41

42

43

44Tcore2

seconds

[°C

]

Measured

Thermal Model

Physical Constraints

Page 22: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Model learning ScalabilitySystem

Identification

“In field” Self-Calibration

• Force test workloads• Measure cores temperatures• System identification

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

MPC Weaknesses – 2nd

Internal Thermal Model

• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing

Workload

tCoreN

Workload

t

Workload

tCorei

Workload

t

Workload

t

Workload

tCore1

Workload

t

Workload execution

Training tasks

Workload

CoreN

Core1

Corei

multicore

Temperature

Power

Identified State-SpaceThermal Model

Complexity issue• State-of-the-art is centralized MIMO• Least square based – is based on matrix inversion (cubic with #cores)

# CORES

Tim

e -

s

4

105

16

>1h

8

1230

1 2 4

Distributed approach: each core identifies its

local thermal modelComplexity scales linearly with #cores

Page 23: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Distributed Model learningDistributed

MISO identification

• ARX model

Corei

Ti(k-s)

Modeli

)2()1()( 12 kTkTkT iii

TneigE,i(k-s)TneigW,i(k-s)

TneigN,i(k-s)

)1()2( 2,11,1 kPkP ii

...)1()2( ,2,2,11,2 kTkT ineigEineig

)(te

Pi(k-s)

AutoRegressive term

Errors (disturbances, measurement errors)

Power inputNeightbours temperatures

ARX Model

State Space Model

)(

)(

)(

)(

)1(

)1(

, kT

kPB

kx

kTA

kx

kT

ineig

i

i

i

i

i

Page 24: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Distributed model-learningDistributed

identification

• ARX model• Computation Algorithm

• Objective: find ai and bi,j

• System data collection:• input: PRBS signals to all cores (persistently exciting inpt sequence)• output: Temperatures of all cores (Ti

o con i= # core)

• Parameters computation:• Tp(ai, bi,j)=T(k+1|k) computed with previous equation• To output temperature (measured)• Least square algorithm:

L

TTTT

L

pL

k

p

2

2

02

3

20 min1

min

• Results

2) Temperature response of core 11) Mean Absolute Error between original and identified models

Page 25: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Outline

• Introduction• Scalable Control• Scalable model learning• Experimental Environment• Challenges ahead

Page 26: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Simulator Virtual Platform

Virtual Platform

Virtutech Simics

PC#hlt,stall

active,cycles

RUBY COREStall Mem

Access

fi ,VDD

POWER MODELPCORE, PL1, PL2

CPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

f,vPGATING

#L1MISS, #BUSACCESS, CYCLEACTIVE,....

TEMPERATURE MODELT

Mat

hwor

ks

Mat

lab

Controller

MAT

LAB

Inte

rfac

eP

CPI

P,T

TCPU,

Mathworks Matlab interface:• New module named Controller in RUBY• Initialization: starts the Mathworks Matlab engine concurrent process,• Every N cycle - wake-up:

• send the current performance monitor output to the Mathworks Simulink model• execute one step of the controller Mathworks Simulink model• propagate the Mathworks Simulink controller decision to the DVFS module

DVFSfi

fi

CONTROL-STRATEGIES DEVELOPMENT CYCLE1. Controller design in Mathworks Matlab/Simulink framework

• system represented by a simplified model • obtained by physical considerations and identification techniques

2. Set of simulation tests and design adjustments done in Simulink

3. Tuned controller evaluationwith an accurate model of the plant done in the virtual platform

4. Performance analysis, by simulating the overall system

Page 27: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

ResultsEnergy Controller (EC)

–Performance Loss < 5%–Energy minimization

Temperature Controller (TC) – Complexity reduction

• 2 explicit region for controller

–Performs as the centralized• Thermal capping

<0.3° <3%

<3%

Page 28: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

PREPROCESSING

Pattern Generator Workloader

core0 core1 coreN

XTS

LS

MODEL

LS

MODEL

PREPROCESSING

PRBS.csv

DATA.csv

..0,0,1,1,1,…

..idle,idle,run,run,run,…

•Temperature• Frequency• Workload

C/FORTRAN(SLICOT,MINPACK)

Matlab System Identification• N4SID• PEM• LS (Levenberg-Marquardt)

HW

(Ts=1/10ms)

Fan boardFan board

CPU2CPU2 CPU1CPU1

Storage drivesStorage drives

SUN FIRE X4270SUN FIRE X4270

Air

flo

w

RAMRAMRAMRAM

ChipsetChipset

CPU1CPU1CPU2CPU2

• Intel Nehalem

• 8core/16thread

• 2.9GHz

• 95W TDP

• IPMI

• Intel Nehalem

• 8core/16thread

• 2.9GHz

• 95W TDP

• IPMI

Working on Real Chips (Intel)

• 567.1 mm2

• 48cores @1GHz

• 2GHz NoC

• 25-125W

• 27 (f), 8 (V) islands

• 567.1 mm2

• 48cores @1GHz

• 2GHz NoC

• 25-125W

• 27 (f), 8 (V) islands

Single Chip Cloud (45nm)Single Chip Cloud (45nm)

Page 29: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

SCC thermal calibrationIntel Single-Chip Cloud Computing thermal sensors and model calibration Core

Sensor

Router Sensor

Thermal sensors are ring oscillators#cycles = a + b*TEMPERATURE

We need to find a,b for each core!!

System level - thermal sensor characterizzation

T = k*PCORE + TAMB

#cycles = a+b*k*PCORE + b*TAMB

least square (#cycles, PCORE + b*TAMB) {a,b}

@steady state @ const. Vdd

• Spatial Variation in raw sensor values (> 50%)

• Hot –Cold < 20%Need of Calibration !

Page 30: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Outline

• Introduction• Scalable Control• Scalable model learning• Experimental Environment• Challenges ahead

Page 31: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

The 1,000 Cores Chip• STM-CEA Platform 2012 project

– Die with 4 16-cores tiles with L1 & L2 few tens of mm2 (28nm)

– SCC die 20 of these dies: 1,280 cores– Thousands of Vdd, f domains

• 3D stacking currently the only technology which can provide sufficient L3 bandwidth– Vertical thermal dissipation!– Heterogeneous requirements (DRAM≠LOGIC)

• Major static and dynamic variability

Page 32: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Power management Challenges• Truly scalable algorithms O(NlogN)• Hardware support needed (e.g. DPM NoC)• Cross-layer algos are needed

– Real-time intra+inter layer communication – Abstraction and filtering – Multi-scale

• The threat of non-linearity– Hybrid control complexity (MILP is NP-HARD)– Lack of robustness (Ill-conditioning) and stability proofs

SoCs as complex systems (societies/markets) DPM as political sociology/finance?

Page 33: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Thank you!

Page 34: Managing MPSoCs beyond their Thermal Design Power* Luca Benini Università di Bologna & STMicroelectronics Luca.benini@unibo.it *ACK: INTEL, FP7: THERMINATOR,

Resource management policy

System information

from OS

WorkloadCPU utilization,

queue status

Perf. Counters

Reliabilityalarms

TemperaturePower

Core 1 Core N …Proc. 1 Proc. 2

INTERCONNECT

Private

Mem

Private

Mem…

Multicore Platform

Management Loop: Holistic view

SW

HW

HW-Introspection: monitors HW-control: knobs (Vdd, Vb, f,on/off )

SW-control: scheduling, allocation

Proc. N

Private

Mem

SW-introspection: system tracing

OS: SchedulerMW: Job manager