Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel...

77
Speed and Power Trade- Speed and Power Trade- offs offs : : Applied to Adder Applied to Adder Design: Design: Vojin G. Oklobdzija, Ram Krishnamurthy Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel AMR / ACSEL Laboratory Intel Corp/ University of California Intel Corp/ University of California Davis Davis www.ece.ucdavis.edu/acsel www.ece.ucdavis.edu/acsel From: Tutorial From: Tutorial Presentation Presentation 16 16 th th International Symposium on Computer International Symposium on Computer Arithmetic Arithmetic Santiago de Compostela, SPAIN Santiago de Compostela, SPAIN June 18, 2003 June 18, 2003

Transcript of Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel...

Page 1: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

Speed and Power Trade-Speed and Power Trade-offsoffs: : Applied to Adder Applied to Adder

Design: Design:

Speed and Power Trade-Speed and Power Trade-offsoffs: : Applied to Adder Applied to Adder

Design: Design:

Vojin G. Oklobdzija, Ram KrishnamurthyVojin G. Oklobdzija, Ram KrishnamurthyIntel AMR / ACSEL LaboratoryIntel AMR / ACSEL Laboratory

Intel Corp/ University of California DavisIntel Corp/ University of California Daviswww.ece.ucdavis.edu/acselwww.ece.ucdavis.edu/acsel

From: Tutorial PresentationFrom: Tutorial Presentation1616thth International Symposium on Computer International Symposium on Computer

Arithmetic Arithmetic

Santiago de Compostela, SPAINSantiago de Compostela, SPAIN

June 18, 2003June 18, 2003

Page 2: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

2

Issues to be addressed

• How do we compare different topologies for their efficiency ?

• How do we estimate speed and efficiency of our algorithm ?

• What criteria's should we use when developing a new algorithm ?

• How does power enter into this equation ?

Page 3: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

3

Additional Issues

• Determine which topology is the best for given Power or Delay budget

• Determine which topology can stretch the furthest in terms of speed or power

Page 4: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

Metric Metric Metric Metric

Page 5: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

5

Previously used estimates Counting the number of gates (logic levels): not accurate

C in

C out C in

C 4C 8C 12

C out

C 20C 24C 28

C in

C 16

a ib i

ind ividua l addersgenera ting: g i, p i,

and sum S i

C arry-lookahead b locks o f4-b its generating:

G i, P i, and C in fo r theadders

C arry-lookahead super- b locks o f4-b its b locks genera ting:

G * i, P * i, and C in fo r the 4-b itb locks

G roup producing fina lcarry C out and C 16

C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay

Page 6: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

6

Critical path in Motorola's 64-bit CLACritical path in Motorola's 64-bit CLA

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3

... G5

2

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G

0

P,G

1:0

P,G

2:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G

60

P,G

61

:60

P,G

62

:60

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

Page 7: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

7

Motorola's 64-bit CLA

Modified PG Block

Intermediate propagate signals Pi:0 are generated to speed-up C3

Page 8: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

8

Fan-In and Fan-Out DependencyFan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985)

Page 9: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

9

Delay Comparison: Variable Block Adder

(Oklobdzija, Barnes: IBM 1985)

Delay Complexity

Page 10: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

10

Design Objective• Design takes time:

– finding results afterward is not of much value

• There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation– we want to estimate as close to the measured

results

• A simple tool that can evaluate different design trade-off for a given technology is needed

• Power trade-off is the most important– speed and power are tradable

Page 11: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

11

Logical Effort Theory

•“Back of the Envelope” complexity: good for estimating speed

•Gate delay = linear function of load– Slope: logical effort gate driving

characteristics– Intersect: parasitic gate internal load

•“Logical Effort” accuracy is not sufficient– We needed to extend and refine the method– However, that becomes more than “Back of the

Envelope”

•Logical Effort does not account for possible power-delay trade-offs

Page 12: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

12

Logical Effort Theory

• Excel –a platform of choice (ARITH-16)– Simple enough– Can provide computation quickly– Easy to enter a given design

• Technology characterization is needed:– This needs to be done only once: available for

every design afterwards– Domino gate = 2 stages of dynamic and static

• Different driving characteristics of these stages• Multi-output gate (carry-look-ahead, Ling/conditional

sum)

• Energy model needs to be included

Page 13: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

13

AGUs: performance and peak-current limiters

High activity thermal hotspotGoal: high-performance energy-efficient

design

Energy Energy MotivationMotivation

Execution core

120oC

Cache

Processor thermal

map

AGU

Temp(oC)

*courtesy of Intel Corp.

Page 14: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

14

Critical Paths of Critical Paths of Representative 64-bit Representative 64-bit

AddersAdders

Page 15: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

15

Kogge-Stone AdderKogge-Stone Adder

Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b

Energy Energy inefficientinefficientEnergy Energy

inefficientinefficient

1235 4679 8101113 12141517 16181921 20222325 24262729 283031PG

Car

ry-m

erg

e g

ates

XOR

00

Page 16: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

16

Sparse-tree Adder ArchitectureSparse-tree Adder Architecture

Generate every 4th carry in parallelSide-path: 4-bit conditional sum generator73% fewer carry-merge gatesenergy-

efficient

C27 C23 C19 C15 C11 C7 C3

293031 28 252627 24 212223 20 171819 16 131415 12 91011 8 567 4 123 0

Page 17: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

17

StageLogical Effort

(G)Branch

Effort (B)Int. Pitch

(C)Effective Brnch Effort (B+I.C)

Paras tic Com p.

Path Branch

Effort = Bi Path Logical Effort=Gi

Path EffortPath Delay

(ps)

PG 0.6 2 1 2.1 1.3CM0 1.48 2 2 2.2 2.5CM1 0.59 2 4 2.4 1.6CM2 1.48 2 8 2.8 2.5CM3 0.59 2 16 3.6 1.6CM4 1.48 1 0 1.0 2.5XOR 1.69 1 0 1.0 3.0Inv 1 1 0 1.0 1.0

124.63 93.97

Kogge Stone Adder

108.92 1.14

Kogge-Stone adder (8-Kogge-Stone adder (8-stage)stage)

Adder Pitch (um)

10

Interconnect Cap

(fF/um) 0.157

Gate Cap (fF/um)

1.15

Avg inp. Cap /gate (um)

14

% int to gate

cap/pitch I10%

Inv. L.E. 2.24

Parasitic delay 3.8

Design ParametersAdder Pitch

(um)10

Interconnect Cap

(fF/um) 0.157

Gate Cap (fF/um)

1.15

Avg inp. Cap /gate (um)

14

% int to gate

cap/pitch I10%

Inv. L.E. 2.24

Parasitic delay 3.8

Design Parameters

D = 8*(GBH)1/8*2.2 + 3.8*P

Page 18: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

18

MXA2 – Architecture & Result

• Multiplexer-based• Generate carries

using radix-2 (P,G)• 4-bit conditional sum

selected by carries• 4-b cell width = 17m• 9-stage critical path

– Per-stage effort = 3.7– Total effort delay =

33.3– Total parasitic = 22.5– Total delay = 55.8

PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4

S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4

60..6356..5952..5548..5144..4740..4336..3932..3528..3124..2720..2316..1912..158..114..70..3

S1 0

S

1 0S

10

G01G23

2

a3 a1a2 b2 a0 b0a3 b3 a2 b2 b0 a0 b1 a1

2

2

P03P03

p3p3

P23P23

G03

PG Group

S10

S

1 0

S10

S10

S10

S10

S10

p0

Sum0Sum1Sum2Sum3

p1g0p2

p3

G01

g2 g2 g1 a0 b0

a1 b1a2 b2

G01

Cin

Page 19: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

19

(p,g)

XOR2NAND2

NOR2OAI

CM6CM1

NAND2AOI

NOR2OAI

CM2 CM3

NAND2AOI

NOR2OAI

CM4 CM5

AOI

OAI

CMo

XOR2NAND2

XOR2

XOR2

SumCiN

Evenbits

Oddbits

HC2 – ArchitectureHC2 – Architecture• Generate even carries

using radix-2 (P,G)• Generate odd carries

from even carries• CMOS adder for sum• 1-b cell width 4m• 10-stage critical path

4 3 02 114 7 663

30

31

15... ... ...

L2

L4

L6

L1

L3

L5

562

Odd

Sum ... ... ...

Page 20: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

20

HC2 – Circuits & HC2 – Circuits & ResultsResults

pi gi-1 gi

G

pi gi-1 gi

G

pi pi-1

P

pi pi-1

P

a b a b

g p

P Cin

Sum

CK

Gi

Gi-1

G

Pi

CKPi

Ai

Bi Gi-1

Pi

Gi

G

Gi-1

Gi

Pi-1

CKGi

Ai Bi

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayStatic 2.8t 28.0t 34.5t 62.5t

Page 21: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

21

KS2 – Architecture & KS2 – Architecture & ResultsResults

• Generate carries using radix-2 (P,G)

• CMOS adder for sum• Similar circuits as

HC2• 1-b cell width 4m• 9-stage critical path

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayStatic 3.0t 27.0t 30.6t 57.6tDynamic 2.11t 19.0t 23.6t 42.6t

4 3 02 114 7 615 ...

L2

L4

L6

L1

L3

L5

5

Inv

Sum ...

13...

...

...

...

30

31

29

63

62

Page 22: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

22

63 62 5961 60 4 3 02 18 57 648 1632 12... ...... ... ...

G4P4

G16P16

CoSum

KS4 – KS4 – ArchitectureArchitecture

• Generate carries using redundant radix-4 (P,G)• Dynamic circuit• 1-b cell width 4m• 6-stage critical path

Page 23: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

23

CKG4

A3

B3

A2

B2

A1

B1 B0

A0

B1 A1

A3

B3

A3

A2

B3

B2

A3

B3

A2

A3

B2

B3

A3

B3

A2

B2

A1

B1 A0

A1 B1

B0

P4CK

CK

CKG16

CK

g3 g2 g1 g0

p1

g3 p2

p1

g3 p2

p3

p1CK

g3 g1g2 g0

CKP16

G3 P2

P3 HS

STB

HSN

Sum

CK P1

G3 G2 G1 G0

CK

KS4 – Circuits & KS4 – Circuits & ResultResult

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 2.3t 13.8t 16.3t 30.1t

Page 24: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

24

b32

b0

b16

b48 b15

b31b47

b63

Cin = C0

C48

C16

C32

C4

C8

C12

C20

C24

C28C36

C40

C44

C52

C56

C60

PGC PGC PGC

PGC PGC

PGC PGC PGC PGC PGC

C

PGC

PGC

PGCPGCPGCPGC

PGC

PGC PGC PGC

(P,G,C) Network

G-PathP-Path

CLA4 – CLA4 – ArchitectureArchitecture• Generate carries using radix-4 (P,G,C)

• 1-b cell width 4m• 15-stage critical path

Page 25: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

25

A

B

AAN

CK

BNB

CK

G P K

AN

BN

CK CK

CK Sum

CiN

STBpg

Ci

CLA4 – Circuits & CLA4 – Circuits & ResultResult

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 1.4t 21.0t 33.3t 54.3t

G0 G1 G2 G3P0 P1 P2 P3

C0

P2:0 P3:0P1:0

G2:0 G3:0G1:0

C2 C3C1

Page 26: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

26

LNG4 – LNG4 – ArchitectureArchitecture• Generate carries using Ling pseudo-carries

• Conditional sums selected by local & long carries

• 1-b cell width 5.1m; 9-stage critical path

Page 27: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

27

LNG4 – Circuits & LNG4 – Circuits & ResultResult

A0

B0

A1 B1A1

B1

A2

B2

A2 B2

CKG3

G4

CK

A3

B3P4

A2 B2

B3A3B1

A0 B0

A1

CK

CK

P

LCH LCL

C1H C0LC1L C0H

SumH

CK

K

G

SumL LCH LCL

C1H C0LC1L C0H

CK

P2

P1

G0

CKLC

G2G1

Per-Stage Effort Total Effort Delay Total Parasitic Total DelayDynamic 2.4t 21.6t 22.3t 43.9t

Page 28: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

28

Results from SimulationResults from Simulation

2.7

0.10.50.4

1.3

0.5

1.4-0.9

0

2

4

6

8

10

12

14

16

KS CS HC KS-4 KS-2 Ling HC CLA

HS

PIC

E &

Diff

eren

ce (

FO4)

• Fairly consistent with logical effort analysis

• Per-stage delay– 1.4 FO4 (static)

– 0.8 FO4 (dynamic)

Type Adder # Stages LE (FO4) SPICE (FO4) Diff (FO4)Static KS2 9 11.8 10.9 -0.88

MX2 9 11.4 12.8 1.41HC2 10 12.8 13.3 0.46

Dynamic KS4 6 6.2 7.4 1.27KS2 9 8.7 9.2 0.44

LNG4 9 9.0 9.5 0.51HC2 10 9.8 9.9 0.08

CLA4 16 11.4 14.2 2.74

Page 29: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

29

Delay of Representative 64-b Delay of Representative 64-b AddersAdders

0

2

4

6

8

10

12

MXA2 HC2 KS2 QTA2 KS4 LNG4

To

tal D

elay

(F

O4)

Static

Dynamic

Page 30: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

30

What happened when Power is considered ?

Delay

Energy

A

B

Adder A

Adder B

Region 1 Region 2

Page 31: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

31

What happened when Power is What happened when Power is considered ?considered ?

Delay

Energy

A

B

Adder A

Adder B

Region 1 Region 2

A’ B’

A”

B”

Speed of A Speed of B

A isfaster

Lesspower

Point where B becomesbetter than A

With better E-Dtradeoff B canachieve more

speed with lesspower than A

• Must look at Energy-Delay Space of designs

Page 32: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

32

Energy-Delay SpaceEnergy-Delay SpaceEnergy

Delay

Emin

Dmin

speed barrier

power limit

Different Adders

Page 33: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

33

Logical Effort in Energy-Delay Logical Effort in Energy-Delay SpaceSpace

Total Delay

En

erg

y

LE Point

lower stage-effort

higher stage-effort

• It is possible to lower energy by trading delay? or …

Most design approaches focus here

Page 34: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

34

Logical EffortLogical Effort

Page 35: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

35

Delay in a Logic GateDelay in a Logic GateDelay of a logic gate has two components

d = f + p

• Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter)

• Electrical effort is the ratio of output to input capacitance

parasitic delay

effort delay, stage effort

f = gh

logical effort

electrical effort = Cout/Cin

electrical effortis alsocalled “fanout”

*from Mathew Sanu / D. Harris

Page 36: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

36

Logical Effort Parameters: Logical Effort Parameters: InverterInverter

• d = gh + p• Delay increases linearly with fanout• More complex gates have greater g and p

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6

p=3.8ps (parasitic delay)

Fanout: h =Cin/Cout

Del

ay

d=gh+p

g=2.2 (logic effort)

*from Mathew Sanu / D. Harris

Page 37: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

37

Normalized Logical Effort: InverterNormalized Logical Effort: Inverter

•Define delay of unloaded inverter = 1 •Define logical effort ‘g’ of inverter = 1•Delay of complex gates can be defined w.r.t d=1

1

2

3

4

5

6

1 2 3 4 5

parasitic delay

effortdelay

Fanout: h = Cout/Cin

Nor

mal

ized

del

ay:

d

inver

ter g =

p =d =

1 1gh + p = h+1

*from Mathew Sanu / D. Harris

Page 38: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

38

Computing Logical EffortComputing Logical EffortDEF: Logical effort is the ratio of the input capacitance to

the input capacitance of an inverter delivering the same output current•Measured from delay vs. fanout plots of simulated

gates•Or estimated, counting capacitance in units of

transistor W

*from Mathew Sanu / D. Harris

Page 39: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

39

L.E for Adder GatesL.E for Adder Gates

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

0 1 2 3 4 5 6

Fanout

Del

ay (

ps)

Inverter

Static CM

Dyn PG

Dyn CM

Mux

• Logical effort parameters obtained from simulation for std cells• Define logical effort ‘g’ of inverter = 1• Delay of complex gates can be defined w.r.t d=1

*from Mathew Sanu / D. Harris

Page 40: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

40

Normalized L.ENormalized L.E

• Logical effort & parasitic delay normalized to that of inverter

Gate type Logical Eff. (g)Parasitics

(Pinv)

Inverter 1 1

Dyn. Nand 0.6 1.34

Dyn. CM 0.6 1.62

Dyn. CM-4N 1 3.71

Static CM 1.48 2.53

Mux 1.68 2.93

XOR 1.69 2.97

*from Mathew Sanu

Page 41: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

41

Delay of a string of gatesDelay of a string of gates

•Delay of a path, D = di = gihi + pi

•gi & pi are constants

•To minimize path delay, optimal values of hi are to

be determined

D is minimized when each stage bears the same effort, i.e. gihi = g i+1h i+1

*from Mathew Sanu / D. Harris

Page 42: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

42

Minimizing path delayMinimizing path delay

• Logical Effort of a string of gates:

• Path Electrical Effort:

• Branching Effort

• Path Branching Effort:

• Path Effort: F=GBH

giG = Cout(path)

Cin(path)

H = hi =

biB =

Con-path + Coff-path

Con-path

b =

Delay is minimized when each stage bears the same effort:

f = gihi = F1/N

The minimum delay of an N-stage path is: NF1/N + P*from Mathew Sanu / D. Harris

Page 43: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

43

Inclusion of Wire DelayInclusion of Wire Delayinto Logical Effortinto Logical Effort

Page 44: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

44

Wiring Wiring LoadLoad

•Wiring in hand analysis– Only lumped capacitance included

•Wiring in HSPICE– Short wire: 1-segment -model RC network– Long wire: 4-segment -model RC network– Using worst-case wire capacitance

•Wire length– Estimated from most critical 1-bit pitch

Page 45: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

45

Modeling interconnect Modeling interconnect cap.cap.• Include interconnect cap in branching factor

Con-path + Coff-path

Con-path

b =

CM0

CM0

Coff-path

Con-path

PG

Add

er b

itpitc

h CM0

CM0Cint

Con-path

PG

Add

er b

itpitc

h

Coff-path

= 2 Con-path + Coff-path+Cint

Con-pathb = = 2+

Cint

Con-path

= 2 + I I : % int. cap to gate cap in 1 adder bitpitch

Page 46: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

46

Branching

CINCOUT1

COUT2

f0 f1

f2 f3

g0 g1

g2 g3

Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies

Page 47: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

47

CINCOUT1

COUT2

f0 f1

f2 f3

f0 = f1 , f2 = f3

Td1 = (f0 + f1 + parasitics) Td2 = (f2 + f3 + parasitics)

g0 g1

g2 g3

Minimum Delay occurs when Td1 = Td2

Correction on Branching

Page 48: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

48

F1g0 g1 out1

CinF2

g2 g3 out2Cin

B1F1 F2

F1

B1g0 g1 out1 g2 g3 out2

g0 g1 out1

B2F1 F2

F2

B2g0 g1 out1 g2 g3 out2

g2 g3 out2

““Real” Branching CalculationReal” Branching Calculation

Branching only equals 2 when:

g0 g1 out1 g2 g3 out2

This explains why we had to resort to Excel !

Page 49: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

49

Technology Characterization

Page 50: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

50

Characterization Setup• Logical Effort Requirements:

– Equalize input and output transitions.

• Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads.

• The Logical Effort of each gate is characterized for each input.

• Energy is characterized for each output transition of the gate caused by each input transition.

i.e. for an inverter: energy is measured for tLH and tHL

Page 51: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

51

LE Characterization Setup LE Characterization Setup forfor

Static Gates Static Gates

Gate Gate Gate GateIn

•tLH

•tHL

•Average•Energy

..

Variable Load

Page 52: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

52

LE Characterization Setup LE Characterization Setup forfor

Dynamic Gates Dynamic Gates

Gate GateIn

•tHL

•Energy

Variable Load

Page 53: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

53

LE Table (Static LE Table (Static CMOS)CMOS)

• Technology: P/N Ratio = 2 INV = 3.67, pINV = 4.29

• Measured on worst-case single-input switching

Fan-out INV NAND2 NAND3 NOR2 TGXORi TGXORs TGM UXi TGM UXs AOI OAI2 11.6 16.3 22.2 20.5 34.9 22.3 8.0 26.0 23.2 21.33 15.3 20.0 26.6 25.4 42.6 28.2 9.9 33.0 28.5 26.74 19.0 24.0 31.2 30.6 50.2 34.2 12.0 39.0 34.1 32.16 26.4 32.4 40.6 41.1 64.4 45.7 16.0 53.0 45.3 43.68 33.6 40.6 50.0 51.9 79.8 56.5 20.2 68.0 56.7 55.3

g (ps) 3.67 4.08 4.65 5.25 7.43 5.71 2.04 6.97 5.60 5.68p (ps) 4.29 7.90 12.74 9.77 20.19 11.12 3.85 11.76 11.82 9.69

g (norm) 1.00 1.11 1.27 1.43 2.03 1.56 0.55 1.90 1.52 1.55p (norm) 1.00 1.84 2.97 2.28 4.71 2.59 0.90 2.74 2.76 2.26

Page 54: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

54

0

10

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8 9

Fanout

Delay

INV

NAND2

NAND3

NOR2

AOI

OAI

Static CMOS Gates: Delay Static CMOS Gates: Delay GraphsGraphs

0

10

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8 9

FanoutD

elay

INV

TGXORi

TGXORs

TGMUXi

TGMUXs

Page 55: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

55

Static Gates: Pull-up Delay Static Gates: Pull-up Delay GraphGraph

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8 9

Fanout

Del

ayINV

NAND2

NAND3

NOR2

AOI

OAI

Page 56: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

56

LE Table (Dynamic LE Table (Dynamic CMOS)CMOS)

• Technology:• Minimum-sized keeper included• Measured on all-input switching of worst

pathFan-out DN2 DN3 DN4 Dk1ND2 Dk1NR2 DAOI_A DOAI_O

2 9.9 12.7 16.0 13.7 10.6 10.1 8.83 12.6 14.7 19.1 16.7 13.2 12.1 11.34 16.0 18.3 23.2 20.7 16.7 14.7 14.06 21.7 24.7 30.2 27.9 23.2 20.0 19.28 27.3 31.2 37.8 36.1 29.5 24.8 24.0

g (ps) 2.92 3.15 3.65 3.75 3.19 2.49 2.55p (ps) 4.04 5.82 8.46 5.76 3.95 4.86 3.75

g (norm) 0.80 0.86 1.00 1.02 0.87 0.68 0.69p (norm) 0.94 1.36 1.97 1.34 0.92 1.13 0.87

Page 57: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

57

Dynamic CMOS: Delay Dynamic CMOS: Delay GraphsGraphs

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10

N2

N3

N4

k1ND2

k1NR2

AOI_A

OAI_O

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10

G4

P4

C4

STBSum

Page 58: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

58

Dynamic CMOS: Delay Dynamic CMOS: Delay GraphsGraphs

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8 10

LG3

LP4

G4

P4

LC

Lsum

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8 10

KSG4

KSP4

KSG16KSP16KSSum

Page 59: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

59

Energy CalculationEnergy Calculation

Page 60: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

60

Energy Calculation

8X Minimal Size Dyn-NAND

16X Minimal Size Dyn-NAND

Page 61: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

61

Energy CalculationEnergy CalculationOffset (parasitic+wiring energy) vs. Size (in multiplesof the

gate size)

y = 0.8931x + 4.6411

y = 1.1413x + 10.22

y = 1.6382x + 11.988

y = 0.5538x + 12.338

y = 3.89x + 14.5

y = 1.9595x + 9.621

y = 1.2559x + 6.762

y = 1.0592x + 1.71

0

10

20

30

40

50

60

0 5 10 15 20 25 30 35 40 45

Gate Size (x)

Off

se

t

invdgckoai_odaoitgxoraoi_ona2stgmuxsLinear (inv)Linear (dgck)Linear (oai_o)Linear (daoi)Linear (tgxor)Linear (aoi_o)Linear (na2s)Linear (tgmuxs)

Page 62: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

62

Energy CalculationEnergy Calculation

1218

2436

482.5

5

7.5

10

0.00E+00

2.00E+01

4.00E+01

6.00E+01

8.00E+01

1.00E+02

1.20E+02

1.40E+02

Energy [fJ]

Load [u]

Size

Inverter

Page 63: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

63

Energy CalculationEnergy Calculation

M 1 5 10 15 20 1 5 10 15 200 1.12 5.6 11.2 16.8 22.4 2.51E+00 1.26E+01 2.51E+01 3.77E+01 5.02E+011 2.24 11.2 22.4 33.6 44.8 3.70E+00 1.85E+01 3.70E+01 5.54E+01 7.39E+012 3.36 16.8 33.6 50.4 67.2 4.85E+00 2.42E+01 4.85E+01 7.27E+01 9.70E+013 4.48 22.4 44.8 67.2 89.6 6.16E+00 3.08E+01 6.16E+01 9.24E+01 1.23E+024 5.6 28 56 84 112 7.45E+00 3.73E+01 7.45E+01 1.12E+02 1.49E+025 6.72 33.6 67.2 100.8 134.4 8.74E+00 4.37E+01 8.74E+01 1.31E+02 1.75E+026 7.84 39.2 78.4 117.6 156.8 1.02E+01 5.08E+01 1.02E+02 1.52E+02 2.03E+027 8.96 44.8 89.6 134.4 179.2 1.15E+01 5.75E+01 1.15E+02 1.72E+02 2.30E+028 10.08 50.4 100.8 151.2 201.6 1.27E+01 6.36E+01 1.27E+02 1.91E+02 2.54E+029 11.2 56 112 168 224 1.42E+01 7.08E+01 1.42E+02 2.13E+02 2.83E+0210 12.32 61.6 123.2 184.8 246.4 1.55E+01 7.76E+01 1.55E+02 2.33E+02 3.10E+0211 13.44 67.2 134.4 201.6 268.8 1.69E+01 8.44E+01 1.69E+02 2.53E+02 3.37E+0212 14.56 72.8 145.6 218.4 291.2 1.81E+01 9.05E+01 1.81E+02 2.71E+02 3.62E+0213 15.68 78.4 156.8 235.2 313.6 1.97E+01 9.85E+01 1.97E+02 2.96E+02 3.94E+0214 16.8 84 168 252 336 2.09E+01 1.04E+02 2.09E+02 3.13E+02 4.18E+0215 17.92 89.6 179.2 268.8 358.4 2.26E+01 1.13E+02 2.26E+02 3.39E+02 4.52E+0216 19.04 95.2 190.4 285.6 380.8 2.39E+01 1.20E+02 2.39E+02 3.59E+02 4.79E+0217 20.16 100.8 201.6 302.4 403.2 2.53E+01 1.27E+02 2.53E+02 3.80E+02 5.06E+0218 21.28 106.4 212.8 319.2 425.6 2.67E+01 1.34E+02 2.67E+02 4.01E+02 5.34E+0219 22.4 112 224 336 448 2.81E+01 1.40E+02 2.81E+02 4.21E+02 5.61E+02

INV

Output Capacitance (u) Energy [fJ]

Multiplier FactorEnergy Factors

1.211300121 7.39E-01Output Capacitance Factor

NAND-2

Page 64: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

64

ExamplesExamples

Page 65: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

65

64-Bit Adders

• Han-Carlson (prefix-2, HC2): Static and Dynamic

• Han-Carlson (prefix-2, HC2-2): Dynamic-Static

• Kogge-Stone (prefix-2, KS2): Static and Dynamic

• Kogge-Stone (prefix-2, KS2-2): Dynamic-Static

• Quaternary-Tree (prefix-2, QT2): Static and Dynamic

Included wire delay, tdelay = 0.7RwireCwire

Included wire energy, Ew = CwireV2

Len (um) 10 20 30 40 60 80 120 160 240 320 480Delay (ps) 0.01 0.04 0.09 0.17 0.38 0.67 1.50 2.67 6.01 10.7 24.1

Page 66: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

66

Adder

S0

S63

A0

A63

Cwire

Cwire

Test Setup

1mm wire

H=(Cin + Cwire)/Cin

Page 67: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

67

Energy-Delay Estimates

Page 68: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

68

Adders: EnergyAdders: EnergyEnergy vs. Delay

Cout = 1mm wire (160u gate cap)For Cin = ~minimum input to 50*minimum input

0

100

200

300

400

500

600

700

800

900

0 50 100 150 200 250 300

Delay [pS]

En

erg

y [p

J]

HC Dynamic (2-2)

KS Dynamic (2-0)

HC Dynamic (2-0)

KS Dynamic (2-2)

KS Static Prefix 2

HC Static Prefix 2

Quarternary Dynamic (2-2)

Quarternary Static

Dynamic: KS, HC

Static

Dynamic-Static

QT

KS

HC

Page 69: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

69

Dynamic Static Dynamic Static ImplementationImplementation

of Carry-Merge stageof Carry-Merge stage

VDD

Clk

Gi

Gi-1 Pi

VDD

Clk

Gi-2

Gi-3 Pi-2

VDD

Clk

Pi-1 Pi

VDD

Delayed Clk

VDD

Clk

Gi-2

Gi-3 Pi-2

VDD

Clk

Gi

Gi-1 Pi

VDD

Clk

Pi-1 Pi

Static Gate

Regular Domino Implementation Compound-Domino Implementation

inverters to be eliminated

Page 70: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

70

Energy-Delay comparison of Energy-Delay comparison of 64-bit KS, HC and QT adders64-bit KS, HC and QT adders

0

0.5

1

1.5

2

2.5

3

0.9 1.1 1.3 1.5 1.7 1.9 2.1

Normalized Delay

No

rmal

ized

En

erg

y

QT Static

HC Static

KS Static

QT compound-domino

HC compound-domino

KS compound-domino

Page 71: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

71

Adders: Critical Path EnergyAdders: Critical Path EnergyCritical Path Energy vs. Delay (no internal w ire Energy)

Cout = 1mm wire (160u gate cap)For Cin = ~minimum input to 50*minimum input

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200 250 300

Delay [S]

En

erg

y [

fJ]

HC Dynamic (2-2)

KS Dynamic (2-0)

HC Dynamic (2-0)

KS Dynamic (2-2)

KS Static Prefix 2

HC Static Prefix 2

Quarternary (2-2)

Quarternary Static (2-2)

QT dynamic-static

HC dynamic-staticQT static

KS dynamic-static

HC-dynamic

KS dynamic

HC-staticKS-static

Page 72: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

72

Intel 32-bit Adder 0.13u 1.2V [VLSI-2002]Intel 32-bit Adder 0.13u 1.2V [VLSI-2002]Comparison with Intel Measured Data

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140 160 180 200

Delay [pS]

En

erg

y [f

J]

Kogge-Stone (2-0)

Quarternary (2-2)

Intel Kogge-Stone (2-0)

Intel Quarternary (2-2)

QT

KS

KS estimated

QT Estimated

Page 73: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

73

Energy-Delay comparison of 32-bit

QT and KS adders: estimated vs. simulation in 0.10mm technology

0

10

20

30

40

50

60

90 100 110 120 130 140 150 160Delay [pS]

En

erg

y [p

J]

KS [9]

QT [9]

KS Estimate

QT Estimate

55%

35%

Page 74: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

74

Est. Results: All AddersEst. Results: All Addersw/o Wiresw/o Wires

0E+

002E

-11

4E-1

16E

-11

8E-1

11E

-10

7 8 9 10 11 12 13 14 15

Delay (FO4)

Est

imat

ed E

ner

gy

(J)

sKS

sHC

sQT9

dKS

dHC

dQT9

dQT7

dCLA

dIBM

dLNG

Page 75: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

75

Est. Results: All Addersw/ Wires

0.0E

+00

5.0E

-11

1.0E

-10

1.5E

-10

2.0E

-10

8 10 12 14 16 18Delay (FO4)

Est

imat

ed E

ner

gy

(J).

sKS_LE

sHC_LE

sQT9_LE

dKS_LE

dHC_LE

dQT9_LE

dQT7_LE

dIBM_LE

dLNG_LE

Page 76: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

76

Delay [ns]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

En

erg

y [p

J]

0

10

20

30

40

50

60

70

80

Energy-Delay Trade-offsEnergy-Delay Trade-offs

Initial Design

Optimized Design Worst Case Energy VectorWith 100% Input Activity

EnergySavingDelay

Saving

90nm technology

Collaboration with

Intel AMR

Page 77: Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California.

June 18, 2003 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

77

ConclusionConclusion• Using realistic measures for

comparing various designs leads to better design choices

• Power is as important as speed• Making comparison in Energy-Delay

space is necessary:– power can always be traded for speed

and vice versa

• Wire effects are significant• Leakage currents ?