Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645:...

Post on 04-Jan-2016

232 views 7 download

Transcript of Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645:...

Fast Adders:Parallel Prefix Network Adders,

Conditional-Sum Adders,& Carry-Skip Adders

ECE 645: Lecture 5

Required Reading

Chapter 6.4, Carry Determination as Prefix ComputationChapter 6.5, Alternative Parallel Prefix Networks

Chapter 7.4, Conditional-Sum AdderChapter 7.5, Hybrid Adder Designs

Chapter 7.1, Simple Carry-Skip Adders

Note errata at:http://www.ece.ucsb.edu/~parhami/text_comp_arit_1ed.htm#errors

Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design

Recommended Reading

J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems

Chapter 11.1.9, Prefix Adders

Chapter 11.1.3, Carry-Skip AdderChapter 11.1.4, Optimization of Carry-Skip AddersChapter 11.1.10, FPGA Implementations of AddersChapter 11.1.11, Long-Operand Adders

Parallel Prefix Network Adders

5

Basic component - Carry operator (1)

B” B’

g” p” g’ p’

g p

g = g” + g’p”p = p’p”

(g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)

B

Parallel Prefix Network Adders

6

Basic component - Carry operator (2)

B” B’

g” p” g’ p’

g p

g = g” + g’p”p = p’p”

(g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)

B

overlap okay!

Parallel Prefix Network Adders

7

Associative

Not commutative

[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]

(g1, p1) ¢ (g2, p2) (g2, p2) ¢ (g1, p1)

Properties of the carry operator ¢

8

Major concept

Given:

Find:

(g0, p0) (g1, p1) (g2, p2) …. (gk-1, pk-1)

(g[0,0], p[0,0]) (g[0,1], p[0,1]) (g[0,2], p[0,2]) … (g[0,k-1], p[0,k-1])

ci = g[0,i-1] + c0p[0,i-1]

Parallel Prefix Network Adders

block generate from index 0 to k-1

9

Parallel Prefix Sum Problem

Given:

Find:

Parallel Prefix Adder Problem

Given:

Find:

x0 x0+x1 x0+x1+x2 … x0+x1+x2+ …+ xk-1

x0 x1 x2 … xk-1

x0 x0 ¢ x1 x0 ¢ x1 ¢ x2 … x0 ¢ x1 ¢ x2 ¢ … ¢ xk-1

x0 x1 x2 … xk-1

where xi = (gi, pi)

Similar to Parallel Prefix Sum Problem

10

Parallel Prefix Sums Network I

11

Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1C(2) + k/2 (log2k-1) = = k/2 log2k

2

Example:

C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = 8 + 24 = 32 = (16/2) log2 16

C(2) = 1

Parallel Prefix Sums Network I – Cost (Area) Analysis

12

Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) + 1 + 1 = = …. = = log2k

Example:

D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log2 16

D(2) = 1

Parallel Prefix Sums Network I – Delay Analysis

13

Parallel Prefix Sums Network II (Brent-Kung)

14

Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2(log k-1)) - (log2k-1) = = 2k - 2 - log2k

Example:

C(16) = C(8) + 16-1 = [C(4) + 8-1] + 16-1 = = C(2) + 4-1 + 24-2 = 1 + 28 - 3 = 26 = 2·16 - 2 - log216

C(2) = 1

2

Parallel Prefix Sums Network II – Cost (Area) Analysis

15

Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) + 2 + 2 = = …. = = 2 log2k - 1

Example:

D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log2 16 - 1

D(2) = 1

Parallel Prefix Sums Network II – Delay Analysis

16

x0x1x2x3x4x5x6x7

s0s1s2s3s4s5s6s7

4 –bit B-K PPN

8-bit Brent-Kung Parallel Prefix Network

17

x1’x3’

s1’s3’

2 –bit B-K PPN

x5’x7’

s5’s7’

4-bit Brent-Kung Parallel Prefix Network

18

8-bit Brent-Kung Parallel Prefix Network Critical Path

GP GP GP GP GP GP GPGPGP

x7 y7 x6 y6 x5 y5 x4 y4 x3 y3 x2 y2 x1 y1 x0 y0

g7,p7 g6,p6 g5,p5 g4,p4 g3,p3 g2,p2 g1,p1 g0,p0

ccc ccc ccc ccc

ccc ccc

ccc

ccc

ccc ccc ccc

g[0,0]

p[0,0]

g[0,1]

p[0,1]

g[0,2]

p[0,2]

g[0,3]

p[0,3]

g[0,4]p[0,4]

g[0,5]p[0,5]

g[0,6]p[0,6]

g[0,7]

p[0,7]

CC

c0

C C C C C C C

SS S S S S S S S

c7 p7 c6 p6 c5 p5 c4 p4 c3 p3 c2 p2 c1 p1

p0

s7 s6 s5 s4 s3 s2 s1 s0

c8

criticalpath

19

Critical Path

GP

c

C

S

gi = xi yi

pi = xi yi

g = g” + g’ p”p = p’ p”

ci+1 = g[0,i] + c0 p[0,i]

si = pi ci

1 gate delay

2 gate delays

2 gate delays

1 gate delay

20

Brent-Kung Parallel Prefix Graph for 16 Inputs

21

Kogge-Stone Parallel Prefix Graph for 16 Inputs

22

Comparison of architectures

HybridNetwork 2Brent-Kung

Kogge-Stone

Cost(k)

Delay(k)

Cost(16)

Delay(16)

Cost(32)

Delay(32)

2 log2k - 2

2k - 2 - log2k

log2k+1 log2k

k/2 log2k k log2k - k + 1

6 5 4

26 32 49

8 6 5

57 80 129

Parallel Prefix Network Adders

23

Latency vs. Area Tradeoff

24

Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs

Conditional-SumAdders

One-level k-bit Carry-Select Adder

Two-level k-bit Carry Select Adder

28

Conditional Sum Adder

• Extension of carry-select adder• Carry select adder

• One-level using k/2-bit adders• Two-level using k/4-bit adders• Three-level using k/8-bit adders• Etc.

• Assuming k is a power of two, eventually have an extreme where there are log2k-levels using 1-bit adders• This is a conditional sum adder

29

Conditional Sum Adder:Top-Level Block for One Bit Position

30

Three Levels of a Conditional Sum Adder

2 2

2 2

xi+3 yi+3

c=0c=1

2 2

xi+2 yi+2

c=0c=1

1 1

1

1

3c=0

3c=1

2 2

2 2

xi+1 yi+1

c=0c=1

2 2

1-bit conditional sum block

xi yi

c=0c=1

1 1

1

1

3c=0

3c=1

22

1

1

3 3

5

c=1

5

c=0

1+1

2+1

4+1

branch point

concatenation

block carry-indetermines selection

31

16-Bit Conditional Sum Adder Example

32

Conditional Sum Adder Metrics

Hybrid Adders

A Hybrid Ripple-Carry/Carry-Lookahead Adder

A Hybrid Carry-Lookahead/Carry-Select Adder

Carry-Skip AddersFixed-Block-Size

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 37

7.1 Simple Carry-Skip Adders

Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.

cc ccc

cc ccc

pppp

SkipSkipSkip

4-Bit Block

Skip logic (2 gates)

1612

8

4

0

0

4

8

1216

[12,15] [8,11] [4,7][0,3]

(a) Ripple-carry adder.

(b) Simple carry-skip adder.

3 2 1 0

Ripple-carry stages

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

3 2 1 0

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 38

Another View of Carry-Skip Addition

Street/freeway analogy for carry-skip adder.

c

g

p

4j+1

4j+1

g

p

4j

4j

g

p

4j+2

4j+2

g

p

4j+3

4j+3

c

4j

4j+4

c

4j+3

c

4j+2

c

4j+1

One-way street

Freeway

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 39

Mux-Based Skip Carry Logic

The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version

c

g

p

4j+1

4j+1

g

p

4j

4j

g

p

4j+2

4j+2

g

p

4j+3

4j+3

c

4j

4j+4

c

4j+3

c

4j+2

c

4j+1

01

p[4j, 4j+3]

c4j+4

c

g

p

4j+1

4j+1

g

p

4j

4j

g

p

4j+2

4j+2

g

p

4j+3

4j+3

c

4j

4j+4

c

4j+3

c

4j+2

c

4j+1

Fig. 10.7 of arch book

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 40

Carry-Skip Adder with Fixed Block Size

Block width b; k/b blocks to form a k-bit adder (assume b divides k)

Example: k = 32, b opt = 4, T

opt = 12.5 stages(contrast with 32 stages for a ripple-carry adder)

Tfixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block

2b + k/b – 3.5 stages

dT/db = 2 – k/b2 = 0 b opt = k/2

T opt = 22k – 3.5

. . .

1

Fixed-Block-Size Carry-Skip Adder (1)

Notation & Assumptions

Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit

Latency of the carry-skip adder with fixed block width

Fixed block size - b bits

Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + - 2 + ( b - 1 )

in block 0

OR gate skips in last block

Adder size - k-bits

= 2b + - 3.5

kb

kb

Number of stages - t

Fixed-Block-Size Carry-Skip Adder (2)

Optimal fixed block size

dLatencyfixed-carry-skip

db= 2 -

k

b2= 0

bopt=

k2

topt = k

bopt

=

2 k

Latencyfixed-carry-skipopt = 2

k2

+k

k2

- 3.5 =

=

2 k +

2 k - 3.5 =

2 k2 - 3.5

k bopt Latencyfixed-carry-skip

32

16

4

Latencyripple-carry Latencylook-ahead

64

128

Fixed-Block-Size Carry-Skip Adder (3)

8

topt

8

16

12.5

28.5

32

128

6.5

6.5

4.5

8.5

2

3

8

5

8.5 16

5

6

13

11

18.5 64

7.5

18.5

Carry-Chain &Carry-Skip Adders

in Xilinx FPGAs

LUT 0 1

xi yi ci+1

0

0

1

1

0

1

0

1

yi

yi

ci

ci

xi

yi

pi

ci+1

ci

si

Basic Cell of a Carry-Chain Adder in Xilinx FPGAs

LUT 0 1

Carry-Skip Adder

b-bit blockLUT 0 1

LUT 0 1xib

. . . . . . .

yib

pib

cib

sib

cib+1

sib+1

xib+1

yib+1

pib+1

xib+(b-1)

yib+(b-1)

pib+(b-1)

cc(i+1)b

sib+(b-1)

LUT 0 1

Carry-Skip Adder:

Carry SkipMultiplexers

LUT 0 1

LUT 0 1

. . . . . . .

p[0, b-1]

c0

cb

p[b, 2b-1]

p[k-b, k-1]

ck

c0

cb

ck-b

p[k-b, k-1]

p[b, 2b-1]

p[0, b-1]

ccb

cc2b

cck-b

LUT 0 1

Carry-Skip Adder:

Computationof the BlockPropagate

Signals

LUT

LUT 0 1

. . . . . . .

cb

p[ib, ib+(b-1)]

0 1

p[ib, ib+1]

p[ib, ib+3]

p[ib, ib+(b-3)]

0 1

0

0

xib+1

yib+1

xib

yib

xib+3

yib+3

xib+2

yib+2

xib+(b-1)

yib+(b-1)

xib+(b-2)

yib+(b-2)

Complete Carry-Skip Adder

xb-1..0

yb-1..0

x2b-1..b

y2b-1..b

x3b-1..2b

y3b-1..2b

xk-1..k-b

yk-1..k-b. . . . .

cc2b

0

1

cb

0

1

c2b

cc3bccb

0

1

c0

cck

ck-b

. . .

ss-1..0 s2s-1..s s3b-1..2b sk-1..k-b

c0

0

1

ck

xb-1..0

yb-1..0

x2b-1..b

y2b-1..b

x3b-1..2b

y3b-1..2b

xk-1..k-b

yk-1..k-b. . . . .

p[0,b-1] p[b,2b-1] p[2b,3b-1] p[k-b,k-1]

Carry-Skip Adders in Xilinx Spartan II FPGAs

by J-P. Deschamps, G. Bioul, G. Sutter

Delay in ns

b=k b=8 b=16 b=32

Max. Frequency

Increase

64 14 13 12 13%

96 16 14 13 21%

128 23 14 14 63%

256 38 - 16 17 141%

512 77 - 20 20 296%

1024 159 - 28 25 531%

k

Carry-Skip Adders in Xilinx Spartan II FPGAs

by J-P. Deschamps, G. Bioul, G. Sutter

Area in CLB slices

b=k b=8 b=16 b=32 b=8 b=16 b=32

Area Overhead

64 32 47 41 - 47% 28% -

96 48 73 66 - 52% 38% -

128 64 99 91 - 55% 42% -

256 128 - 191 179 - 49% 40%

512 256 - 391 375 - 53% 46%

1024 512 - 791 767 - 54% 50%

k

Carry-Skip AddersVariable-Block-Size

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 53

Carry-Skip Adder with Variable-Width Blocks

Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths.

b b b b. . .

RippleSkip

Carry path (1)

01t–1 t–2 Block widths

Carry path (3)

Carry path (2)

The total number of bits in the t blocks is k:

2[b + (b + 1) + . . . + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k

b = k/t – t/4 + 1/2

Tvar-skip-add = 2(b – 1) + 0.5 + t – 2 = 2k/t + t/2 – 2.5

dT/db = –2k/t 2 + 1/2 = 0 t

opt = 2k

T opt = 2k – 2.5 (a factor of 2 smaller than for fixed-block)

P

xj yj

pj

P

xj+1 yj+1

pj+1

P

pj+bi-2

P

xj+b -2 yj+b -2xj+b -1 yj+b -1

pj+bi-1

i i i i

CP

p[i,i|+bi-1]

cin=0

xj yj

FA

xj+1 yj+1

FA

xj+b -2 yj+b -2

FAFA . . .

bi stages

xj+b -1 yj+b -1i i i i

cout

sj+b -1 sj+b -2 sj+1 sj

Most critical path to produce carry

i i

P

xj yj

pj

P

xj+1 yj+1

pj+1

P

pj+b -2

P

xj+b -2 yj+b -2xj+b -1 yj+b -1

pj+b -1

i i i i

CP

p[i,i|+bi-1]

cin

xj yj

FA

xj+1 yj+1

FA

xj+b -2 yj+b -2

FAFA . . .

bi stages

xj+b -1 yj+b -1i i i

cout

sj+b -1 sj+b -2 sj+1 sj

Most critical path to assimilate carry

i i

i

i i

Variable-Block-Size Carry-Skip Adder (1)

Notation & Assumptions

Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit

Adder size - k-bits

Number of stages - t

Block size - variable

First and last block size - b bits

Variable-Block-Size Carry-Skip Adder (2)

Optimum block sizes

bt-1 bt-2 bt-3 . . . bt/2+1 bt/2-1 . . . b2 b1 b0

b b+1 b+2 . . . b+t

2-1 b+

t

2-1 b+2 b+1 b

Total number of bits

k = 2 [ b + (b+1) + (b+2) + … + (b + + 1 ) ] =

= t ( b + - )

t2

t4

12

Variable-Block-Size Carry-Skip Adder (3)

Number of bits in the first and last block

b = kt

- t4

+ 12

Latency of the carry-skip adder with variable block width

Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + t - 2 + ( b - 1 )

in block 0

OR gate skips in last block

= 2 b + t - 3.5 = 2 kt

- t4

+ 12

+ t - 3.5 =

= 2kt

12

t - 2.5+

Variable-Block-Size Carry-Skip Adder (4)

Optimal number of blocks

dLatencyvariable-carry-skip

dt=

2k

t2

topt=

4 k

- +1

2= 0

=

k2

bopt = ktopt

-topt

4+ 1

2=

= k -4

+ 12

k2

k2

= 12

bopt = 1

Variable-Block-Size Carry-Skip Adder (5)

Optimal latency

= 2kt

12

t - 2.5+Latencyvariable-carry-skipopt =

=2k

+2

k2

k2

= - 2.5

=

k2 - 2.5

Latencyvariable-carry-skipopt

Latencyfixed-carry-skipopt

2