Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image...

27
1 Slide -1 - DEIS Doctoral School 2010 Low-Power Integrated Systems A HW/SW perspective Luca Benini DEIS Universita' di Bologna, Italy [email protected] Slide -2 - DEIS Doctoral School 2010 Outline Introduction System-Level Power modeling and estimation Dynamic power management Shutdown-based Variable voltage devices Implementation strategies Conclusions

Transcript of Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image...

Page 1: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

1

Slide -1 -DEIS Doctoral School 2010

Low-Power Integrated SystemsA HW/SW perspective

Luca Benini DEIS Universita' di Bologna, Italy

[email protected]

Slide -2 -DEIS Doctoral School 2010

Outline

IntroductionSystem-Level Power modeling and estimationDynamic power management

Shutdown-basedVariable voltage devicesImplementation strategies

Conclusions

Page 2: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

2

Slide -3 -DEIS Doctoral School 2010

Embedded applications: Requirements

Year of Introduction2005 2007 2009 2011 2013 2015

5 GOPS/W

100GOPS/W

Signrecognition

A/Vstreaming

Adaptiveroute

Collisionavoidance

Autonomousdriving

3D projecteddisplay

HMI by motionGesture detection

Ubiquitousnavigation

Si Xray

Gbit radio

UWB

802.11n

Structured encoding

Structured decoding

3D TV 3D gaming

H264encoding

H264decoding

Imagerecognition

Fully recognition(security)

Autopersonalization

dictation

3D ambientinteraction

LanguageEmotionrecognition

Gesturerecognition

Expressionrecognition

MobileBase-band

1TOPS/W

[Philips/IMEC][DARPA08]

10GOPS/W

0.1-1 TOPS/W embedded platforms by 2015!

Slide -4 -DEIS Doctoral School 2010

1990 1995 2000 2005 2010 2015 2020

100

10-2

10-4

10-6

1

Gate-Oxide Leakage

Sub-Threshold Leakage

Dynamic Power

Pow

er C

onsu

mpt

ion

Power trend

0255075

100125150

Pow

er

Den

sity

(W

atts

/cm

2 )

250nm 180nm 130nm 90nm 65nm

Leakage Pow er

Dynamic Pow er

Power density trend

[STM ASIC]

Technology innovations (e.g. high-k dielectrics)

30%

[Intel, Microsoft and Stanford]

The Era of “Power Limited Scaling”

Page 3: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

3

Slide -5 -DEIS Doctoral School 2010

CMOS circuit power consumption components

P = ½ CswVdd ΔV f + IstVdd + IstaticVdd

Dynamic power consumption ( ½ CswVdd ΔV f + IstVdd)Load switching (including parasitic & interconnect)

Glitching

Shoot through power (IstVdd)

Static power consumption (IstaticVdd)Current sources – bias currents

Current dependent logic -- NMOS, pseudo-NMOS, CML

Junction currents

Subthreshold MOS currents

Gate tunneling

Slide -6 -DEIS Doctoral School 2010

Review of Constant Field Scaling

P/AP/APower density

d/α2dDensity

α2PPPower (VI)

αttPropagation time (~CV/I)

αIICurrent

αCCCapacitance

ΕΕField

αVVVoltage

Na/α, Nd/α

Na, Nd

Dopantconcentrations

αL, αW, αTox

L, W, Tox

Dimensions

Scaled Value

ValueParameter

n+S T I S T Ip

n+

T ran s is to rIso la tio n

n +S T I S T Ipn +

T ran s is to rS o u rce

T ran s is to rG ate

T ran s is to rD ra in

C o n ven tio n a l S ilic o n S u b stra te

E lec tro n F lo w

E le c tro n F lo w

A ll F ea tu res R ed u c e in W id th an d T h ickn e ss

S h o rte r D istan ce fo r E lec tro n F lo w P ro d u ce F a ster T ran s is to rs

Scale factor α<1

Page 4: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

4

Slide -7 -DEIS Doctoral School 2010

Supply Voltage Trend

With each generation, voltage has decreased 0.85x, not 0.7x for constant field.Thus, energy/device is decreasing by 50% rather than 65%

0

0.5

1

1.5

2

2.5

0 .2 5 m 0 .1 8 m 0 .1 3 m 9 0 n m 6 5 n m 4 5 n m

V d d (V o lts)

Slow declineto 0.7V in 22nm(some think nothingbelow 0.9V)

P = ½ CswVdd ΔV f + IstVdd + IstaticVdd

Slide -8 -DEIS Doctoral School 2010

Active Power Trend

But, number of transistors has been increasing, thus- a net increase in energy consumption,- with freq 2x, active power is increasing by 50%

(src: ITRS ’01-’05)

20406080100120140160Technology

100

150

200

250

300

Pow

er (W

)

Expected HP MP power

ITRS’01

ITRS’05 198 Watts forever!

P = ½ CswVdd ΔV f + IstVdd + IstaticVdd

Page 5: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

5

Slide -9 -DEIS Doctoral School 2010

Recent (180nm – 65nm) “Real Scaling”

P/AP/APower density 1 PPPower/chip

0.5 PPPower/device

1.4 FFPerformance

0.7 VVVoltage

1.4 Na, 1.4 Nd

Na, NdDopantconcentrations

0.7 L, 0.7 W, 0.7 Tox

L, W, Tox

Dimensions

Scaled Value

ValueParameter

0.9 V

2.0 F

1.0 P

2.0 P/A

1.5 P

Slide -10 -DEIS Doctoral School 2010

65nm – 22nm “Projected Scaling”

P/AP/APower density 1 PPPower/chip

0.5 PPPower/device

1.4 FFPerformance

0.7 VVVoltage

1.4 Na, 1.4 Nd

Na, NdDopantconcentrations

0.7 L, 0.7 W, 0.7 Tox

L, W, Tox

Dimensions

Scaled Value

ValueParameter

0.9 V

0.8 P

1.2 P/A

1.2 P198 Wattsforever!?How?

Page 6: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

6

Slide -11 -DEIS Doctoral School 2010

Active-Power Reduction Techniques

P = ½ CswVdd ΔV f + IstVdd + IstaticVddActive power can be reduced through:

− Capacitance minimization− Power/Performance in sizing

− Clock-gating

− Glitch suppression

− Hardware-accelerators

− System-on-a-chip integration

− Voltage minimization− (Dynamic) voltage-scaling

− Low swing signaling

− SOC/Accelerators

− Frequency minimization− (Dynamic) frequency-scaling

− SOC/Accelerators

Slide -12 -DEIS Doctoral School 2010

Static Power

P = CswVdd ΔV f + IstVdd + IstaticVdd

Static energy consumption (IstaticVdd)Current sources – even uA bias currents can add up.

NMOS, pseudo-NMOS – not commonly used

CMOS CML logic – significant power for specialized use.

Junction currents

Subthreshold MOS currents

Gate tunneling

Page 7: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

7

Slide -13 -DEIS Doctoral School 2010

Passive Power Continues to Explode

Pow

er D

ensi

ty (W

/cm

2 )

0.010.110.001

0.01

0.1

1

10

100

1000

Gate Length (microns)

Active Power

Passive Power

1994 2005

Gate Leakage

Leakage is the price we pay for the increasing device performance

Fit of published activeand subthreshold CMOSdevice leakagedensities

Src: Nowak, et al

Slide -14 -DEIS Doctoral School 2010

Standby-Power Reduction Techniques

Standby power can be reduced through:− Capacitance minimization

− Voltage-scaling

− Power gating

− Vdd/Vt selection

Page 8: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

8

Slide -15 -DEIS Doctoral School 2010

Where Does the Power Go?

issue queuesreg filesicache/itlbdcache/dtlbL2 cacheFUsresult busesclockother

Power profile (dynamic power) of a 4-way superscalar microprocessor

Bottom line: power needs to be reduced across-the-board

Slide -16 -DEIS Doctoral School 2010

Need to consider CPU & System Power

CPU Dominates Thermal Design Power

Mobile PCThermal Design (TDP) System Power

Note: Based on Actual Measurements

600/500 MHz uP37%

LCD 10"19%

HDD9%

Memory+Graphics12%

Power Supply10%

Other13%

Mobile PCAverage System Power

600/500 MHz uP13%

LCD 10"30%

HDD19%

Memory+Graphics15%

Power Supply10%

Other13%

Multiple Platform Components Comprise

Average Power[Courtesy: N. Dutt; Source: V. Tiwari]

Page 9: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

9

Slide -17 -DEIS Doctoral School 2010

Cost metrics

POWERP(t)=I(t)V(t)

Average power T-1∫TPdtPeak power MaxT(P)

PERFORMANCELatency vs. throughput

Worst Case vs. Average Case~T-1

Never considered in isolation

Compound Cost metricC=PTα

α>1

Performance constraintsMin{P}S.t. T<Tmax

Slide -18 -DEIS Doctoral School 2010

Battery Properties

Energy constrained systems do not alwaystarget energy minimizationThe charge drawn from a battery does notdepends only from capacity (energy budget) but also from discharge rateGoal is lifetime maximization

Page 10: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

10

Slide -19 -DEIS Doctoral School 2010

Optimization for low-energy always the same as optimization for high performance?

int a[1000];c = a;for (i = 1; i < 100; i++) { b += *c; b += *(c+7); c += 1;}

LDR r3, [r2, #0]ADD r3,r0,r3MOV r0,#28LDR r0, [r2, r0]ADD r0,r3,r0ADD r2,r2,#4ADD r1,r1,#1CMP r1,#100BLT LL3

ADD r3,r0,r2MOV r0,#28MOV r2,r12MOV r12,r11MOV r11,rr10MOV r0,r9MOV r9,r8MOV r8,r1LDR r1, [r4, r0]ADD r0,r3,r1ADD r4,r4,#4ADD r5,r5,#1CMP r5,#100BLT LL3

2231 cycles16.47 µJ

2096 cycles19.92 µJ

No !• High-performance if available memory bandwidth fully used;low-energy consumption if memories are at stand-by mode

• Reduced energy if more values are kept in registers

Slide -20 -DEIS Doctoral School 2010

Outline

IntroductionSystem-Level Power modeling and estimationDynamic power management

Shutdown-basedVariable voltage devicesImplementation strategies

Conclusions

Page 11: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

11

Slide -21 -DEIS Doctoral School 2010

Impact of software

For a given a hardware platform, the energy to realize a function depends on software

Operating systemDifferent algorithms to embody a function (e.g., sorting)Different coding stylesApplication software compilation

Slide -22 -DEIS Doctoral School 2010

Estimation of SW Power

SW consumes power on the hardwareSystem power models

Constant additive model (spreadsheet)Power state machines (abstract event simulator)Instruction-level (ISS)

Tradeoff: functional accuracy vs. speed

Page 12: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

12

Slide -23 -DEIS Doctoral School 2010

The spreadsheet model

Constant power dissipation for each componentTotal power consumption by summing contributions

General-purpose systemsBackward compatibilityComponent-based

Spreadsheet-based analysisBasic budgetingSimple “what if” analysesNo learning curve

Slide -24 -DEIS Doctoral School 2010

Example: spreadsheet analysis

PDA #Comp Vdd Iidle Ion %on %idle I(mA)

Proc 1 3.3 0.5 50 0.7 0.3 36.15DRAM 1 3.3 0.1 12 0.7 0.3 8.43FLASH 5 3.3 0.0 9 0.7 0.3 31.5IR 1 3.3 0.0 64 0.05 0.95 3.2RTC 1 3.3 0.0 0.1 1 0 0.1DC-DC 1 - 0.1 5.5 0.99 0.01 5.44

TOT 83.82

Page 13: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

13

Slide -25 -DEIS Doctoral School 2010

Limitations

The estimation is mainly left to the designerWorkload estimation is not a straightforwardtaskNeed a more complex high level system model

Slide -26 -DEIS Doctoral School 2010

Power State Machines: System Model

Event-driven model (resources & events)

Key feature: No overhead for long inactivity (no events).

Resource

Resource

Resource

Resource

PowerManager

DC-DCConverter

Battery SystemRequests

Requests

Requests

User

User

User

Environment

Page 14: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

14

Slide -27 -DEIS Doctoral School 2010

Key Features

Use for component selection and partitioning phasesAbstract away all but power behavior of the systemThe model includes information about power behavior of block, block interactions, info aboutenvironment which drives the behaviorEach reaource is a power state machinePower manager translates environment stimuli tostate changes of system resourcesFaster than Costant Additive Model

Slide -28 -DEIS Doctoral School 2010

Power State Machines: Resource Model

Example of PFSM: LCD display unit.

BACKLIT150mW

DISPLAY50mW

OFF0mW

0.5msec

0.1msec

0.1msec

10msec

0.1msec

11msec

Key features:Power associated with statesTransitions have a cost

Page 15: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

15

Slide -29 -DEIS Doctoral School 2010

Power State Machines: Additional Components

Workload:User/Environment:Non-deterministic FSM (models the non-deterministic nature of the requests).

Power supply sub-systemBatteryDC-DC converter

Slide -30 -DEIS Doctoral School 2010

Functional Power Models

Objective: Estimate the power dissipated by a specificfragment of codeNeeds to track instruction executionMust be fast (millions of instructions)

RTL or Gate-level are not fast enoughNeeds to model processor & memorysystem

Page 16: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

16

Slide -31 -DEIS Doctoral School 2010

Software Power Estimation: Instruction-Level

ILPA [TMWL96]Empirical method for characterizing single (or very short sequences of) instructions.Key issues:

Evaluation of power dissipation for single instructions.Choice of representative instructions forcharacterization.

Advantage: Roughly architecture-independent.

Slide -32 -DEIS Doctoral School 2010

Instruction-Level Power Characterization

Direct measurement of the currents drawn fromthe power supply while executing the instructions.HDL simulation:

The instructions are simulated on a processor model in some HDL.The processor is plugged into a tester machine and simulation traces are applied. The current is measuredby the tester.

Use simulation of a gate-level description of the processor.

Page 17: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

17

Slide -33 -DEIS Doctoral School 2010

Instruction-Level Models

A power cost is assigned to each instruction.Two components of the cost:

Static component, called ``base-cost'': It is the individual instruction cost without a notion of ``state''.Dynamic component, called ``circuit state effects'': Itaccounts for the previous processor state.

Dynamic cost accounts for events depending on sequences of events (e.g., cache misses, pipeline stalls).

Slide -34 -DEIS Doctoral School 2010

Extracting the model

The base cost is computed as follows:An infinite loop containing a total of N copies of the target instruction I is executed.The average current is measured as described earlier.The power cost is obtained from the values of the current, the supply voltage and the cycle/instruction.

N should not be too small to amortize the loopoverhead.

Page 18: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

18

Slide -35 -DEIS Doctoral School 2010

Computing program execution cost

Due to the averaging process, the costs for I1 → I2 and I2 → I1 cannot be distinguished.The cost of a program can be summarized as follows:

Cost(Program) = Σi (B i · N i) + Σi j (O i j ·N i j ) + Σ k E k

where: B i : Base cost of instruction i.N i : # of occurrences of instruction i.O i j : Dynamic cost of sequence →j.N i j : # of occurrences of sequence →j.E k: Other effects, obtained from program profiling.

Slide -36 -DEIS Doctoral School 2010

Instruction-Level power model: example

Example of power cost values (expressed in pJ):

Example of computation:

Total value = 5.87pJ/(3·25ns) = 78.26μW (Tc = 25ns)

LOADDLOADADDMULT

2.37 0.17 1.19 0.920.99 0.26 0.531.19 0.66

InstructionName

BaseCost

Circuit State EffectsLOAD DLOAD ADD MULT

1.98 0.13 0.15 1.19 0.92

Total

EvaluationProgram(initial state is ADD) Base Cost Circuit StateDLOAD A←x, B ←y LOAD C←z ADD A←C, B

2.37 1.191.98 0.150.99 1.193.34 2.53

Page 19: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

19

Slide -37 -DEIS Doctoral School 2010

Micro-architectural Power Model

The processor is viewed as an interconnection of macro blocks

E.g. Execution units, register file, etc.Power models are built for the macros

E.g. Analytical, look-up tables, etc.Advantage: allows micro-architecture expl.Disadvantage: no black-box for COTS proc.

Slide -38 -DEIS Doctoral School 2010

FPLA : Functional Level Power Analysis

Between ILPA and micro-architecturalLess parameters than ILPA, less info on intenals than micro-acrchitectural

Suitable for complex cores, with limited internal informationAlgorithmic parameters require functional simulation (ISS run or code analysis)

Algorithmic parameters• α: parallelism rate• β: processing rate• γ: ext. IM access rate• ε: DMA activity rate• τ: ext. DM access rate

Architectural parameters• F: clock frequency• MM: internal Mem mode

(mapped,bypass,cache,freeze)• DD: data mapping• DW: DMA data width

[Laurent03]

(example TI62, TI67 DSPs)

Page 20: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

20

Slide -39 -DEIS Doctoral School 2010

Integrating functional and power models

Estimating together HW and SW power consumption is more effective than consideringthe two contributions separately. This is because the power consumption of a task mapped onto software is not independent of the implementation of the remaining tasks.Two approaches:

Non-interacting (trace-based) HW/SW estimation.Concurrent HW/SW estimation.

Slide -40 -DEIS Doctoral School 2010

Non-Interacting HW/SW Power Estimation

Avalanche [LH98]Target system architecture:

Power estimation of custom HW done separately(constant power in the model).Focus on power dissipation of SW and memory hierarchy.

CPUSparcLite

Custom HW(ASICs)

MainMemory

I-CacheD-Cache

Page 21: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

21

Slide -41 -DEIS Doctoral School 2010

Trace-based Estimator Architecture

Block diagram:

Main feature: Exploitation of detailed software, memory, and cache energy models.Main limitation: No interaction between SW and HW during the estimation.

Behavioral- LevelSimulator

Mermory TraceProfiler

ApplicationProgram

Software Energy Model

Dinero III

ProgramExecutionTrace

MamoryAccessTrace

CPUenergy

MainMemoryEnergy

CacheEnergy

Main MemoryEnergy Model

CacheEnergy Model

Total System Energy

Slide -42 -DEIS Doctoral School 2010

Concurrent HW/SW Power estimation

IF ID EX MEM WBInstruction set

simulator

Microarchitectureunits utilization interface

Addr/Data stream interface

Icache Dcache

Main MemoryExternalpowermodels

Peripherals

Processor unitsProcessor unitsmemory modelsmemory models

Processorpowermodels

E.g.: Simplescalar/Wattch

Page 22: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

22

Slide -43 -DEIS Doctoral School 2010

State of the art: MPARM

INTERCONNECTION

Core Core INTERRUPTCONTROLLER

PRI MEM 4 SHARED MEM SEMAPHORES

Core Core

PRI MEM 3PRI MEM 2PRI MEM 1

STbus or AMBA or Xpipes

Simulation is cycle accurate(~ 24 Kcycles/sec with 4 cores on a 2-proc Pentium III, 1GHz, 512MB)

Slide -44 -DEIS Doctoral School 2010

Power modeling

Invoked from hardware modules after activation events on a cycle-by-cycle basisEnergy info is passed to data collectorroutine at each cycle

MEMORY(or CACHE)

MODULE

PowerModelEnergy spent

DataCollector

Memory state1. The module calls the

power model function

Energy spent2. The module sends the

energy consumption info to the data collectorroutines

Page 23: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

23

Slide -45 -DEIS Doctoral School 2010

Power model for ARM core

Power statistics for the ARM core are collected in a different wayNeed to account for idle power when ARM module is stalled (ISS not invoked)

ARMMODULE

PowerModel

Energy spent

DataCollector

1. The ISS calls the data coll. routine

Core state

2. The data collectorroutine gets the energyinformation fromthe power model

Slide -46 -DEIS Doctoral School 2010

Using power models

ISS core SWI_METRIC_START

Initialization:...RegisterSWI(SWI_METRIC_START,metric_start_swi_call);...

installs the handleruint32_t metric_start_swi_call(

CArmProc *arm, uint32_t r0, uint32_t r1, uint32_t r2, uint32_t r3)

{statobject->startMeasuring(arm->ID);return r0;

}......__asm ("swi " SWI_METRIC_STARTstr);......

Program:

handler invocation

• The handler can be easily modified to be invoked by a pseudo-hardware module for collection of system power statistics

Page 24: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

24

Slide -47 -DEIS Doctoral School 2010

Power profiling

Waveforms: cycle by cycle consumption

Power estimation----------------

Energy spent:ARM 0

core: 25609147.30 [pJ]cache: 105048808.17 [pJ]

ARM 1core: 25609092.30 [pJ]cache: 105048808.17 [pJ]

ARM 2core: 25609092.30 [pJ]cache: 105048808.17 [pJ]

ARM 3core: 25614207.30 [pJ]cache: 105048808.17 [pJ]

RAM 0: 2825183.87 [pJ]RAM 1: 2825183.87 [pJ]RAM 2: 2825183.87 [pJ]RAM 3: 2824958.26 [pJ]RAM 4: 0.00 [pJ]BUS: 50778876.39 [pJ]

Power spent:ARM 0

core: 51.18 [mW]cache: 209.95 [mW]

ARM 1core: 51.18 [mW]cache: 209.95 [mW]

ARM 2core: 51.18 [mW]cache: 209.95 [mW]

ARM 3core: 51.18 [mW]cache: 209.95 [mW]

RAM 0: 5.65 [mW]RAM 1: 5.65 [mW]RAM 2: 5.65 [mW]RAM 3: 5.65 [mW]RAM 4: 0.00 [mW]BUS: 101.49 [mW]

Output file: totals

Slide -48 -DEIS Doctoral School 2010

Energy characterization of communication primitives

Power distributions for send Power distributions for receive

Message size:128 byte

Message size:256 byte

Page 25: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

25

Slide -49 -DEIS Doctoral School 2010

DVFS Model

Performance :If fCK1= k * fCK2 (k>1)

k CPU1 # sim cycle 1 CPU2 # sim cycle

TaccCPU1 L1 = k * TaccCPU2->L1

TaccL2, TaccDRAM = cost

DVFS model : Simulation snap-shot Simics& RubySimics& Ruby

L2

CPU1L1

CPU NL1

L2

DRAM

CPU2L1

Network

fL2

f1 = k1 * fnom f2 = k2 * fnom fN = kN * fnom

fDRAM

( )αtdd

ddfg VV

VL=

f=T

⋅1t

dd

Vf

V Nominalvalue

fLProp. Const.

ft LVf ,, ddV

run-timeselected freq

associated voltage supply

MODEL INIZIALIZZATION

RUN TIME

Slide -50 -DEIS Doctoral School 2010

Power ModelPower model interface

Simulation snap-shot

L2

CPU1L1

CPU NL1

L2

DRAM

CPU2L1

Network

f1 = k1 * fnom f2 = k2 * fnom fN = kN * fnom

i-th CPU# Cycle Active# Cycle Stall# Cycle Idle# Cycle PGi-th L1# Line & WD Read# Line & WD Write

TVf dd ,,

i-th L2

# Line Read# Line Write

DRAM

# Burst Read# Burst Write

On a sampling window of 1.3us

Page 26: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

26

Slide -51 -DEIS Doctoral School 2010

Power Model

sta-Activespgatingsta-PowerG

-Activestasista-Idle

sta-Activesssta-Stall

dyn-Activedpgatingdyn-PowerG

dyn-Activedidyn-Idle

dyn-Activedsdyn-Stall

t

ddActivelkg

dddActivedyn

PK PPK P

P K P

P K P

PK P

P K P

TKVq

TVZ=P

fVK=P

⋅=

⋅=⋅=

⋅=

⋅=

⋅=

⋅⋅−

⋅⋅⋅

⋅⋅

e2

2

tdd

gPowerGatindyngPowerGatinsta

IdledynIdlesta

StalldynStallsta

ActivedynActivesta

VTVf

PP

PP

PP

PP

,,,

,

,

,

,

−−

−−

−−

−−

CPU nominal value – power per cycle

spgsiss

dpgdids

d

KKK

KKKZK

,,

,,,

Proportional constants per CPU

tdd VTVf ,,,run-time operating conditions

Power per cycle @ specific operating condition

MODEL INIZIALIZZATION

RUN TIME

spgsiss

dpgdids

d

KKK

KKKZK

,,

,,,

Power model equations

i-th CPU

PGdynPGsta

IdledynIdlesta

StalldynStallsta

ActivedynActivesta

PP

PP

PP

PP

−−

−−

−−

−−

,

,

,

,

Slide -52 -DEIS Doctoral School 2010

PCPU1PL1

PCPUnPL1

PCPU2PL1

L2L2

Network

Thermal Model

Power to Thermal interface

si sisi

sisi

sisi

sisi

Cu cucu cu cu

Heat spreaderIC package

Package pin

PCB

IC die

Termal ModelTi

Page 27: Low-Power Integrated Systems - unibo.it circuits and... · H264 encoding H264 decoding Image recognition Fully recognition (security) Auto personalization dictation 3D ambient interaction

27

Slide -53 -DEIS Doctoral School 2010

Reliability Model

Aging and critical path delay:Facelift : Hiding and Slowing Down Aging in Multicores. A.Tiwari , J.Torrellas

( )( ) 0,250e stress

a

ox

tdd

tddoxoxNBTItstress tTKE

EtVV

VVCtA=ΔV ⋅⋅−

⋅−

⋅−⋅⋅⋅

⎟⎟

⎜⎜

⎛ ⋅−⋅

stressrecovery

recoverytstresstrecovery t+t

tηΔV=ΔV 1

tEtVV

fαA=ΔV ox

tdd

HCIt ⋅⋅−

⋅⋅⋅ 1e

Δvt_stress NBTI

Δvt_recovery NBTI

Δvt_stress HCI

From DVFS model

from thermalmodel

from CPU usage Reliability Model

to powermodel

to DVFSmodel

Slide -54 -DEIS Doctoral School 2010

Simulator Performance

Host:Intel pentium core 2 duo 2.4 Ghz2GB RAM

Simics + Ruby:

Simics + Ruby + DVFS:

Simics + Ruby + DVFS + Power:

Simics + Ruby +DVFS + Power + Thermal interface:

Simics + Ruby +DVFS +Power +Thermal Model:

Target:4 core pentium 4 2GB RAM32 KB private L1 cache4 MB shared L2 cache

Tsim = 1040 s

Tsim = 1045 s

Tsim = 1110 s

Tsim = 1160 s

Tsim = 1240 s

68 cellsT = 100ns

Compute every 13us

1 Billion instruction ~ 0.5 sec virtual time