Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of...

48
Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University

Transcript of Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of...

Page 1: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Recent Developments inTheory and

Implementationof Parallel Prefix Adders

Neil BurgessDivision of Electronics

Cardiff School of EngineeringCardiff University

Page 2: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Motivation

• Parallel Prefix Adders (e.g. Kogge-Stone) mostly ignored for deep submicron VLSI– large fan-out points– wide wiring channels

• Recent insights: can remove both and do...– absolute difference– late increment– media processing

Page 3: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Structure of Presentation

• Parallel Prefix Adder theory– Kogge-Stone, Ladner-Fisher

• New log-depth prefix trees– Knowles’ “family of adders”

• New applications of prefix adders– late operations, media adder

Page 4: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

I.

Parallel Prefix Adder theory

Page 5: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix adder structureA(0:w-1)

Bit propagate and generate cells

g(0:w-1)p(0:w-1)

B(0:w-1)

c(1:w)

Prefix carry tree

s(0:w)

Sum cells (XOR gates)

Page 6: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix Equations - 1

• g(i) = a(i) b(i) “carry generate”• p(i) = a(i) b(i)“carry propagate”• k(i) = {a(i) b(i)} “carry kill”

• g(i), p(i), & k(i) are mutually exclusive– Use any two: g(i) & k(i) = NAND & NOR– p(i) needed as well: s(i) = p(i) c(i)

Page 7: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix Equations - 2

• Generate and Not Kill signals are com-bined to form “Group Signals”Gx

z Kxz interpretation

0 0 c(x+1) = 00 1 c(x+1) = c(z)1 0 Don’t care1 1 c(x+1) = 1

Page 8: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix Equations - Interpretation

• Group signals yield carry signals:

• Tree outputs: c(i+1) = Gi0

• Tree inputs: Gii = g(i) ; Ki

i = k(i)

zy

yx

zx

zy

yx

yx

zx

KKK

GPGG

1

1

zy

yx

zx GKGKGK 1

Page 9: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix Equations - characteristics

• Associative– sub-terms may be pre-computed in

parallel g (0 ), (0 )kg (0 ), (0 )k g (1 ), (1 )kg (1 ), (1 )k g (2 ), (2 )kg (2 ), (2 )k g k(3 ), (3 )g k(3 ), (3 )

G K10G K

10G K

G K

G K

G K

3

3

2

3

2

0

0

0

c (4 )c (4 ) c (3 ) c (2 )c (2 ) c (1 )c (1 )

Page 10: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Prefix equations - characteristics

• Idempotent– sub-terms may be “overlapped”

g(0), k(0)g(0), k(0) g(1), k(1)g(1), k(1) g(2), k(2)g(2), k(2)

GK10 GKGK

2211

GKGK2200

c(3)c(3) c(2)c(2) c(1)c(1)

Page 11: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

4-bit Ladner-Fisher prefix tree

• 1 sub-term pre-computed

• Logarithmic depth

• Fan-out = 2 in 2nd row (laterally)

g (0 ), (0 )kg (1 ), (1 )kg (2 ), (2 )kg k(3 ), (3 )

G K10G K

G K G K

3

3 2

2

0 0

c (4 ) c (3 ) c (2 ) c (1 )

Page 12: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

8-bit Ladner-Fisher prefix tree

• Log depth; lateral fan-out = 4 in 3rd row

• No exploitation of idempotencyg (0 ), (0 )k

c (1 )

g k(3 ), (3 )

c (4 )

g k(7 ), (7 )

c (8 )

Page 13: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

16-bit Ladner-Fisher prefix tree

• Log depth with large fan-out in final row

Page 14: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

4-bit Kogge-Stone prefix graph

• Fan-out = 1(laterally)

• 1 extra cell

• parallel wires in 2nd row

g (0 ), (0 )kg (1 ), (1 )kg (2 ), (2 )kg k(3 ), (3 )

G K10G K

21G K

G K G K

3

3 2

2

0 0

c (4 ) c (3 ) c (2 ) c (1 )

Page 15: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

8-bit Kogge-Stone prefix graph

• More cells & wiring than Ladner-Fisher g (0 ), (0 )k

c (1 )

g k(3 ), (3 )

c (4 )

g k(7 ), (7 )

c (8 )

Page 16: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

16-bit Kogge-Stone prefix graph

• Low fan-out but wider wiring channels

• No exploitation of idempotency

Page 17: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Black cells and grey cells

• Carries, c(i) = Gi-10; Ki-1

0 terms not needed

• G-only cells called and coloured “grey”

Page 18: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

The story so far…

• Parallel prefix adders available in VLSI• Log-depth adders possible:

– high fan-outs {1,2,4,8…} & low cell count– low fan-outs {1,1,1,1…} & high cell count

• Problematic in VLSI (buffering, area)• Idempotency of ‘’ operator not

exploited

Page 19: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

II.

Knowles’“Family of Adders”

Page 20: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Log-depth prefix trees

• In VLSI:– L-F trees require too much buffering

delay– K-S trees require too much area (wire

flux)

• Fan-outs characterised as:– {1,2,4,8…} Ladner-Fisher– {1,1,1,1…} Kogge-Stone

Page 21: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Knowles’ insight

• Use other fan-out schemes• 5 possible 8-bit log-depth prefix

trees:– {1,1,1} 17 cells Kogge-Stone– {1,1,2} 17 cells uses idempotency– {1,1,4} 14 cells no idempotency– {1,2,2} 14 cells no idempotency– {1,2,4} 12 cells Ladner-Fisher

Page 22: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Knowles’ 8-bit prefix trees

• All trees are log-depth

{ 1 ,1 ,1 }

{ 1 ,1 ,2 }

{ 1 ,2 ,2 }

{ 1 ,1 ,4 }

{ 1 ,2 ,4 }

Page 23: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Tree construction rules

• Levels are labelled 0,1,2...

• Fan-out at jth level, 2k , satisfies 2k 2j

• Fan-out at jth level fan-out at j+1th level

• Lateral wire length at jth level is 2j

Page 24: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Knowles’ 16-bit trees - I

• {1,1,1,1} 49 cells {1,1,1,8} 42 cells

• {1,1,1,2} 49 cells {1,2,2,2} 42 cells• {1,1,1,4} 49 cells {1,1,4,4} 40 cells• {1,1,2,2} 49 cells {1,1,4,8} 36 cells• {1,1,2,4} 49 cells {1,2,2,8} 36 cells• {1,1,2,8} 42 cells {1,2,4,4} 36 cells• {1,2,2,4} 42 cells {1,2,4,8} 32 cells

Page 25: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Knowles’ 16-bit trees - II

• {1,1,1,1} {1,1,1,8}• {1,1,1,2} Idempotent {1,2,2,2}• {1,1,1,4} Idempotent {1,1,4,4}• {1,1,2,2} Idempotent {1,1,4,8} • {1,1,2,4} Idempotent {1,2,2,8} • {1,1,2,8} Idempotent {1,2,4,4} • {1,2,2,4} Idempotent {1,2,4,8}

Page 26: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Knowles’ 16-bit trees - III

• {1,1,1,1} {1,1,1,8} R• {1,1,1,2} I {1,2,2,2} R• {1,1,1,4} I {1,1,4,4} R• {1,1,2,2} I {1,1,4,8} R• {1,1,2,4} I {1,2,2,8} R• {1,1,2,8} R, I {1,2,4,4} R• {1,2,2,4} R, I {1,2,4,8} R

Page 27: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Quick way of spotting R, I

• Define span(l) as distance from start of wire to first cell in lth level

• span(l) = 2l fanout(l) 1• tree characteristics

– R if span(j) span(k) for j < k– I if span(i) + span(j) = span(k) for i < j <

k

Page 28: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Examples of R & I spotting

fanout(l) span(l) characteristic• [1,1,1,1] [1,2,4,8] neither R nor

I• [1,1,2,2] [1,2,3,7] I only• [1,2,2,2] [1,1,3,7] R only• [1,2,2,4] [1,1,3,5] R & I• Are R & I adders “best”?

Page 29: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

VLSI design of prefix adders

• Adders laid out as rectangular array of prefix cells (and gaps)

• Assume cells measure 10m 4m– 2 cells per significance 20m / bit

• Key design parameters:– buffering (area & delay)– wiring channels (area)

Page 30: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

16-bit adder example

• Assumptions• Maximum fan-out without

buffering:– 3 cells + 80m wire (4 cell widths)

• Maximum fan-out with buffering:– 9 cells + 240m wire (12 cell widths)

• Employ {1,2,2,4} architecture

Page 31: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

{1,2,2,4} prefix adder layout

g

xor

xor

b u f b u f b u f b u f b u f b u f b u f b u f b u f b u f b u f b u f

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

xor

k

K G

K G

K G

G G G G G

G

G

G

G

G

G

G

G

G

G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G

K G K G

K G

K G K G

K G

k k k k k k k k k k k k k k kg g g g g g g g g g g g g g g

Page 32: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Area vs Time for 32-bit adders

Delay12 12.5 13 13.5 14

24

26

28

30

32

34

36

38

40

Area K-S {1,1,1,1,1}

{1,1,2,2,2}

L-F {1,2,4,8,16}{1,2,2,4,4}

[1,1,3,5,13]

Page 33: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

32-bit prefix tree adders

• Exploitable trade-off between adder’s delay and area– Kogge-Stone adder 16% faster than

Ladner-Fisher but 66% larger– {1,2,2,4,4} adder 8% faster than Ladner-

Fisher but only 3% larger– buffering also trades off speed for area

Page 34: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

III.

New applications of prefix adders

Page 35: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Other addition operations

• Late increment– Mod 2w-1 addition for Reed-Solomon coding– floating-point rounding

• Late complement– absolute difference for video motion

estimation– sign-magnitude addition

• Typically use 2 adders and a MUX

Page 36: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Increments in prefix trees

• Row of prefix cells = ‘late +1’ operation

• Ladner-Fisher comprises many late +1’s– 1 8-bit, 2 4-bit, 4 2-bit, & 8 1-bit

Page 37: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Late increment tree

• Adder returns A+B if inc = 0• Adder returns A+B+1 if inc = 1

inc

Page 38: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Late increment logic

• “Late Carry” lc(i) set high if:– c(i) = 1 or– inc = 1 and a(n),b(n) 0,0 n: 0 n < i

p(i)

s(i)

incKi-1

0

c(i) = G i-10

lc(i)

Page 39: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Late complement theory

• In 2’s-complement, N = -(N+1)• A + B = A B 1

* late increment then yields A B

(A + B) = -(A B 1+1) = B A

• Absolute difference readily available

Page 40: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Absolute difference logic

• If c(w) = 0, result negative– if c(w) = 0, invert all the bits– else always perform late increment with

Ki-10

p(i)

s(i)

c(w)

Ki-10

c(i)

Page 41: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Summary of “late” ops

• Available on all prefix adders• Extra delay: 1 gate’s delay +

buffering• Extra hardware: w black cells • This technique used in floating-point

units– late increment for rounding– late complement for true subtraction

Page 42: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Media (“packed”) arithmetic

• Fundamental strategy:Use full wordlength hardware for

multiple sub-wordlength computations

• Examples:– 32-bit adder 4 8-bit adders– 32-bit multiplier 2 16-bit multipliers

Page 43: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Partitioning an adder

• Criteria:– support carries propagating within sub-adders– prevent carries propagating between sub-

adders

• Solutions:– put AND gates on carry chains slower adder– put dummy 0’s on operand bits larger adder

• Use prefix adder!!

Page 44: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Packed prefix adder - 1

• Force k(n) = 0 at partition points– prevents carries propagating across bit n– exploits don’t care condition (g,k) = (1,0)

• Implementation– change k(n) gate to (2,1) OR-AND gate– delay-neutral modification

Page 45: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Packed prefix adder - 2

• Force c(n) = Gn-10 = 0 at partition

points– prevents c(n) s(n) errors

• Implementation– insert AND gates (off critical path) or

– change Gn-10 gate to ({2,1},1) complex gate

– BUT need Gn-10 signal for sub-adder overflows

Page 46: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Packed prefix adder - 3

• Sub-adder carries complete early• Extraneous cells automatically do

nothing Force k(n) = 0

Force c(n) = 0

Page 47: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

Last Slide

• Recent developments in prefix adders:– new “family” of log-depth trees– late operations– packed arithmetic for media processing

• Future possibilities:– systematic exploitation of idempotency – trees with reduced buffering– combine packed arithmetic/late ops

Page 48: Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.

ANY QUESTIONS OR COMMENTS?