On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this...

40
On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 this version fixes some errors in the ASH performance graphs

Transcript of On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this...

Page 1: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

On The Energy Efficiency of Computation

Mihai Budiu

CMU CS

CALCM Seminar

Feb 17, 2004

Note: this version fixes some errors in the ASH performance graphs shown

Page 2: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

2

Presentation Setup

main( )

{

signal(SIGINT, welcome);

while (slides( ) && time( )) {

talk( );

}

}

Page 3: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

3

Why Do We Care?

Toasted CPU: about 2 sec after removing cooler. (Tom’s Hardware Guide)

Page 4: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

4

Power and Power Density

0

50

100

150

200

250

0.25m 0.18m 0.13m 0.1m

Wat

ts

0

25

50

75

100

Po

wer

Den

sity

(W

/cm

2)Leakage power

Active power

Power Density

Data from Fred Polack, Intel, MICRO 32

Assuming constant die size, no power management

Page 5: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

5

Power Density Distribution

Chip surface

Data from Fred Polack, Intel, MICRO 32

Page 6: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

6

Outline• Introduction

• Power and Energy Efficiency– data from Bob Brodersen,

Berkeley wireless group

• Synchronous Hardware Efficiency

• Asynchronous Hardware Efficiency

• ASH Efficiency

• Conclusions

Page 7: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

7

Energy Efficiency Metric

How much computing can we can do... ...with a finite

energy source?

Page 8: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

8

Some Arithmetic

Page 9: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

9

Energy and Power Efficiency

The energy efficiency metric for energy constrained applications (OP/nJ) =

thermal (power) considerations when maximizing throughput (MOPS/mW).

Joule Watt

OP/nJ = MOPS/mW

Page 10: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

10

ISSCC Chips (.18mm-.25mm)# Year Description # Year Description

1 1997 S/390  

11 1998 Graphics

2 2000 PPC (SOI)  

12 1998 Multimedia

3 1999 G5  

13 2000 Multimedia

4 2000 G6 

14 2002 Mpg decoder

5 2000 Alpha  

15 1998 Multimedia

6 1998 P6

16 2001 Encryption Processor

7 1998 Alpha

17 2000 Hearing Aid Processor

8 1999 PPC

18 2000 FIR for Disk Read Head

9 1998 StrongArm

19 1998 MPEG Encoder

10 2000 Comm

20 2002 802.11a Baseband

Microprocessors DedicatedDSP’s# Year Description

Page 11: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

11

0.01

0.1

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chip Number

En

erg

y (P

ow

er)

Eff

icie

ncy

M

OP

S/m

WEnergy Efficiency (MOPS/mW or OP/nJ)

3 orders of magnitude!

Page 12: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

12

Outline• Introduction

• Power and Energy Efficiency

• Synchronous Hardware Efficiency

• Asynchronous Hardware Efficiency

• ASH Efficiency

• Conclusions

Page 13: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

13

Explaining the Difference

Operations per second:

MOPS = fclk £ N op

Operations per clock

Chip area per operation

Efficiency:

MOPS/Pchip= (fclk £ Nop )/ (Achip £ Csw £ Vdd2 £ fclk )

=1/(Aop £ Csw £ Vdd2)

Normalized switched capacitancePower:

Pchip = Achip £ Csw £ Vdd2 £ fclk

Page 14: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

14

Supply Voltage, Vdd

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chip Number

Vd

d (

Vo

lts

)

MOPS/Pchip =1/(Aop £ Csw £ Vdd2)

Page 15: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

15

Normalized Switched Capacitance, Csw

10

30

50

70

90

110

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chip Number

Csw

(pf

/mm

2 )

MOPS/Pchip =1/(Aop £ Csw £ Vdd2)

3x

Page 16: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

16

Area per operation, Aop

0.01

0.1

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chip Number

Ao

p (m

m2 p

er

op

era

tio

n)

Aop = Achip/NopMOPS/Pchip =1/(Aop £ Csw £ Vdd2)

AHA!

Page 17: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

17

0.01

0.1

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Chip Number

En

erg

y (

Po

wer)

Eff

icie

nc

y (

MO

PS

/mW

)Focusing In

PPC

NECDSP

802.11a

Page 18: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

18

mP: MOPS/mW=.13

Useful arithmetic

Nop = 2 (two ways)fclock = 450 MHz

) 900 MIPS

Aop = Achip/2= 42mm2

Power = 7 Watts

Page 19: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

19

DSP: MOPS/mW=7

4 processors £ 4 ops eachNop = 16

fclock = 50 MHz) 800 MOPS

Aop = Achip/16= 5.3mm2

Power = 110 mW

Page 20: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

20

Dedicated Design: MOPS/mW=200

Nop = 96

fclock = 25 MHz

) 2400 MOPS

Aop = 5.4 mm2/96 =.15 mm2

Power = 12 mW

Complex MAC = 8 ops

Fully parallel mapping of adaptive correlator algorithm.

Page 21: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

21

Memory is More Power-Efficient

1

10

100

0.25m 0.18m 0.13m 0.1m

Po

wer

Den

sit

y (

Watt

s/c

m2)

Logic

Memory

Hint: use on-chip caches

Page 22: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

22

Energy Distribution in mP

Integer execution

19%

Reservation stations

10%

Reorder buffer15%

Memory order buffer

8%

Data cache14%

Branch target buffer

6%

Floating point execution

10%Global clock

10%

Register alias table8%

“useful” (includes local clock)

Page 23: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

23

Efficiency and Performance

• Vdd + ! fclock +, MOPS +Power +MOPS/mW *

• Better metric: Energy £ delay

–Roughly independent of Vdd

Page 24: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

24

Efficiency and Technology

1000

100

10

1

0.1

0.01

0.0012 1 0.5 0.25 0.13 0.1 0.07

MOPS / mW

feature size [µ]

hardwired

microprocessors

[T. Claasen, ISSCC 1999]

DSP

Page 25: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

25

How Low Can You Go?

• Energy required to compute is ZERO

• If computation is quasistatic...

• ...and no information is destroyed (reversible)

Ops/nJ ! 1

Rolf Landauer

Page 26: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

26

Outline• Introduction

• Power and Energy Efficiency

• Synchronous Hardware Efficiency

• Asynchronous Hardware Efficiency

• ASH Efficiency

• Conclusions

Page 27: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

27

Lutonium Performance

• Asynchronous microcontroller

• Designed and implemented at Caltech

• 0.18 mm technology

• 1.8V supply, 0.4V/0.5V th

• 200 MIPS

• 1.8 ops/nJDSP-like

Alain Martin

Page 28: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

28

Efficiency and Supply Voltage

200

100

48

4

66

1.8

4.83

10.9

23

7.2

0

50

100

150

200

250

1.8V 1.1V 0.9V 0.8V 0.5V

Supply voltage

MIP

S

0

5

10

15

20

25

MIP

S/m

W

performance

efficiency

Page 29: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

29

Async Processor Breakdown

ALU2%

Registers14%

Decode24%

I-Mem24%

I-Fetch24%

Slack6%

Buses2%PSW

4%

“useful”

Page 30: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

30

Outline• Introduction

• Power and Energy Efficiency

• Synchronous Hardware Efficiency

• Asynchronous Hardware Efficiency

• ASH Efficiency

• Conclusions

Page 31: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

31

Application-Specific Hardware

C code

Compiler forApplication

SpecificHardware

Asynchronous Circuits

Memory

Page 32: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

32

Tool-FlowC

CASHcore

Verilog back-end

Synopsys,Cadence P/R

ASIC

180nm std. cell library, 2V

~1999technology

Mediabench kernels(1 hot function/benchmark)

Memory

Page 33: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

33

Caveat

Memory

we model this partaccurately

optimistic speed model,no power accounting

Page 34: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

34

ASH Performance

0

500

1000

1500

2000

2500

3000

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

Meg

aop

erat

ion

s p

er s

eco

nd

MOPSall

MOPSspec

MOPS

Page 35: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

35

ASH vs 600MHz CPU

Page 36: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

36

ASH Area

minimal RISC core

0

1

2

3

4

5

6

7

8

9

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

Sq

ua

re m

m

Page 37: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

37

Normalized Area

0

10

20

30

40

50

60

70

80

90

100

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

So

urc

e l

ine

s/s

q m

m

many Cmacros

Page 38: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

38

ASH Energy Efficiency

0

10

20

30

40

50

60

70

adpc

m_d

adpc

m_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

e

mpe

g2_d

mpe

g2_e

pegw

it_d

Use

ful o

pe

ratio

ns/

nJ

Page 39: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

39

All Together Now

0.01 0.1 1 10 100 1000

Energy Efficiency (MOPS/mW or OP/nJ)

General-purpose DSP

Dedicated hardware

ASH media kernels

Asynchronous microcontroller

Microprocessors

Page 40: On The Energy Efficiency of Computation Mihai Budiu CMU CS CALCM Seminar Feb 17, 2004 Note: this version fixes some errors in the ASH performance graphs.

40

Conclusions

• Performance comes at a price

• Energy efficiency is expressed in ops/nJ or MOPS/mW

• Dedicated hardware is more power-efficient than microprocessors

• ASH efficiency competitivewith dedicated hardware