EE5324 Adders

63
Spring 2006 EE 5324 - VLSI Design II - © Kia Bazar gan 1 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part II: Adders

Transcript of EE5324 Adders

Page 1: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 1

EE 5324 – VLSI Design IIEE 5324 – VLSI Design II

Kia Bazargan

University of Minnesota

Part II: AddersPart II: Adders

Page 2: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 2

References and Copyright

• Textbooks referenced [WE92] N. H. E. Weste, K. Eshraghian

“Principles of CMOS VLSI Design: A System Perspective”Addison-Wesley, 2nd Ed., 1992.

[Rab96] J. M. Rabaey“Digital Integrated Circuits: A Design Perspective”Prentice Hall, 1996.

[Par00] B. Parhami“Computer Arithmetic: Algorithms and Hardware Designs”Oxford University Press, 2000.

Page 3: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 3

References and Copyright (cont.)

• Slides used [©Hauck] © Scott A. Hauck, 1996-2000;

G. Borriello, C. Ebeling, S. Burns, 1995, University of Washington

[©Prentice Hall] © Prentice Hall 1995, © UCB 1996

Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html

[©Oxford U Press] © Oxford University Press, New York, 2000 Slides for [Par00] With permission from the authorhttp://www.ece.ucsb.edu/Faculty/Parhami/files_n_docs.htm

Page 4: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 4

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 5: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 5

Why Adders?

• Addition: a fundamental operation Basic block of most arithmetic operations Address calculation

• Faster, faster and faster• How?

Architectural level optimization Gate-level optimization Speed/area trade-off

Page 6: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 6

• One-bit Half Adder:

• One-bit Full Adder:

Adding Two One-bit Operands

Sum = A B Cin

Cout = A.B + B.Cin + A.Cin

FA

A B

CinCout

Sum

Sum = A B

Cout = A.BHA

A B

Cout

Sum

A B Sum Cout0 0 0 00 1 1 01 0 1 01 1 0 1

Cin A B Sum Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1

Page 7: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 7

N-Bit Ripple-Carry Adder: Series of FA Cells

• To add two n-bit numbers

C0FA

A0

S0

B0

FA

A1

S1

B1

FA

A2

S2

B2

FA

An-1

Sn-1

Bn-1

Cn. . .

• Note: adder delay = Tc * n

• Tc = (Cin:Cout delay)FA

A B

CinCout

Sum

Page 8: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 8

4-bit Ripple Carry Addition: Example

C0FA

A0

S0

B0

FA

A1

S1

B1

FA

A2

S2

B2

FA

A3

S3

B3

C4 C1C2C3

T=1 00 10 10 01

00 10 01 11

0

00 00 00 00T=0

B=0101

A=0011

S=0000

S=0110

00 10 01 01T=2 S=0100

00 01 01 01T=3 S=0000

10 01 01 01T=4 S=1000

Page 9: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 9

One-bit Full Adder Implementation

• Direct gate implementation

Cout = A.B + B.Cin + A.Cin = A.B + Cin. (A+B)

Sum = A B Cin

AB

CinSum

AB

AB

Cin Cout

32 Transistors Used32 Transistors Used

[WE92] p516

Page 10: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 10

includes 111

excludes 000

One-Bit Full Adder: Share Logic

• An observation Almost always,

sum = NOT carry

Cin A B Sum Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1

Sum = A.B.Cin + (A+B+Cin).Cout

Page 11: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 11

One-Bit Full Adder: Transistor Implementation

Sum = A.B.C + (A+B+C).CoutCout = A.B + C.(A+B)

A B B

AC

ABA B

C

Cout

C B AABC

CBACBA

Sum

– Use inverters to get Cout and Sum– C transistors close to output– Cout delay: 2 inverting stages (1-stage

possible?)– Sum delay: 3 inverting stages (not an issue,

though)

28 Transistors28 Transistors28 Transistors28 Transistors

[WE92] p517[Rab96] p390

Page 12: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 12

• An observation Invert inputs =>

outputs invert

• Exploit this property: Get rid of the inverter

on the carry critical path

One-Bit Full Adder: Inverted Inputs

FA

Cin A B Sum Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1

FA

Page 13: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 13

Ripple Carry Adder: Inverting Property

FA’ is similar to FA, but with no inverters on the outputs

Much faster (1-stage) Disadvantage: not regular data path

A1

S1

B1

C2C0

A0 B0

S0

C1

A2 B2

S2

C3. . . FA’

A3

S3

B3

C4

FA’ FA’FA’

Page 14: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 14

Summary: Ripple-Carry Adder

• Basic ripple carry: AND-OR gates Area: 32 transistors (per bit position) Delay: 2 stages of inverting logic (per bit

position)

• Direct CMOS logic, share Cout’ Area: 28 transistors Delay: 2 stages

• Use “inverting” property Area: 27 (odd bits:26, even bits:28) Delay: ~1 stage

• So far: transistor/logic manipulation• Is that all we can do?!!

Page 15: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 15

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 16: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 16

Carry-Lookahead Adder: Idea

• New look: carry propagation• Idea:

Try to “predict” Ck earlier than Tc*k Instead of passing through k stages, compute

Ck separately using 1-stage CMOS logic

• Carry propagation: an example

Bit position

Carry

A B

Sum

7 6 5 4 3 2 1 0

1 0 0 1 1 1 1

0 1 0 0 1 1 0 1 +0 1 0 0 0 1 1 1

1 0 0 1 0 1 0 0

Page 17: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 17

0-propagate

1-propagate generate

kill

(kill) (propagate) (propagate) (generate)

Carry-Lookahead Adder (CLA): One Bit

• What happens to thepropagating carry inbit position k? 0 0 - 0

0 1 C C 1 0 C C 1 1 - 1

C

A

A

B

BA

A

B

BCout

[Rab96] p391

p = A+B (or A B)

g = A.B

A B Cin Cout

Page 18: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 18

CLA: Propagation Equations

• If C4=1, then either: g3 generated at bit pos 3

g2.p3 generated at bit pos 2, propagated 3

g1.p2.p3 generated at bit pos 1, propagated 2,3

g0.p1.p2.p3 generated at bit pos 0, propagated 1,2,3

Cin.p0.p1.p2.p3 input carry, propagated 0,1,2,3

• C4 = g3+ g2.p3 + g1.p2.p3 + g0.p1.p2.p3 + Cin.p0.p1.p2.p3

Implement Implement CC44 as a one-stage CMOS logic as a one-stage CMOS logic

delay=1 (or is it?) delay=1 (or is it?)

Implement Implement CC44 as a one-stage CMOS logic as a one-stage CMOS logic

delay=1 (or is it?) delay=1 (or is it?)

Page 19: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 19

p3.g2 C4

p1.g2.g3C4

CLA: Static Logic Implementation

p0

p1

p2

p3

Cin

g0

g1

g2

g3

C4

[©Hauck][Rab96] p405

d

e

f

h

j

k

l

m

n

s

r

q

o

t

u

v

w

x

Page 20: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 20

6 transistors6 transistorsin seriesin series

CLA: Dynamic Logic Implementation

• Dynamic gate implementation: C4 = g3+ p3 . (g2 + p2 . (g1 + p1 . (g0 + P0.Cin)))

C4

Cin

p0

p1

p2

p3

g0

g1

g2

g3

[©Hauck][WE92] p529

Page 21: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 21

CLA: Dynamic Logic Implementation

• Can we reuse logic? Can we get C1, C2 and C3 from the same circuit?

C4

Cin

p0

p1

p2

p3

g0

g1

g2

g3

C1?

C2?

C3?

[©Hauck]

No!No! C1, C2 and C3 C1, C2 and C3 may be floating may be floating (not precharged) (not precharged)

Charge sharingCharge sharing problem problem

No!No! C1, C2 and C3 C1, C2 and C3 may be floating may be floating (not precharged) (not precharged)

Charge sharingCharge sharing problem problem

Page 22: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 22

CLA: Dynamic Logic Implementation

[WE92] p529

C1g0p0

Cin

p1 g1

C2

g0p0

Cin

p1

p2

g1

g2

C3

g0p0

Cin

p1

p2

p3

g1

g2

g3

C4

g0p0

Cin

Page 23: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 23

CLA: Basic Block (4 Bits) Architecture

• Block of 4-bit p, g, Cout

C0

A0

S0

B0A1

S1

B1A2

S2

B2A3

S3

B3

p,g p,g p,g p,g

p0 g0p1 g1p2 g2p3 g3

C1C2

C3

C4

Page 24: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 24

CLA: N-Bit Architecture

• Put it all together:

C0

B0A0

S0

A1

S1

B1A2

S2

B2A3

S3

B3

p,g p,g p,g p,g

C4

A4

S4

A5

S5

B5A6

S6

B6A7

S7

B7

p,g p,g p,g p,g

B4

C8

Carry Generator Carry Generator

Page 25: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 25

CLA: 12-Bit Example

p,g p,g p,g p,g

S0S1S2S3S4S5S6S7

p,g p,g p,g p,g

B0B1B2B3B5B6B7 B4

C0

C4

Carry Generator Carry Generator

C8

S8S9S10S11

p,g p,g p,g p,g

B9B10

A0A1A2A3A4A5A6A7A8A9A10A11B11 B8

Carry Generator

C12

00000 00000 00000T=0

01111101

01101001

11011010

0

B=A=

01001 11110 01111T=201001 00001 01111T=301011 00001 01111T=4

Page 26: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 26

Summary: Carry Lookahead Adder

• CLA compared to ripple-carry adder: Faster (“4 times”?),

but delay still linear (w.r.t. # of bits) Larger area

o P, G signal generationo Carry generation circuitso Carry generation ckt for each bit position (no re-use)

• Limitation: cannot go beyond 4 bits of look-ahead Large p,g fan-out slows down carry generation

• Next: Manchester carry chains Tries to reuse logic by pre-charging each carry

position

Page 27: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 27

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 28: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 28

Recap: Carry Look-Ahead

• Charge sharing problem

C4

Cin

p0

p1

p2

p3

g0

g1

g2

g3

C1?

C2?

C3?

Page 29: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 29

C1 C2 C3

Manchester Carry Chain: First Shot

• Improvement over CLA: Precharge internal nodes to avoid charge-sharing

problem

Cin g0

p0

g1

p1

g2

p2

g3

p3

C4

[©Hauck]

• Fastest way to do small adders– 6 transistors on the critical path

Page 30: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 30

Manchester Carry Chain: Sizing

R1

C1

R2

C2

R3

C3

R4

C4

R5

C5

R6

C6

Out

M0 M1 M2 M3 M4MC

Discharge Transistor

1 2 3 4 5 6

tp 0.69 Ci Rjj 1=

i

i 1=

N=

1 1.5 2.0 2.5 3.0k

5

10

15

20

25

Spe

ed

1 1.5 2.0 2.5 3.0k

0

100

200

300

400

Are

a

Speed (normalized by 0.69RC) Area (in minimum size devices)

[© Prentice Hall] (“k” is t

he s

izin

g f

acto

r)

dela

y

Page 31: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 31

Manchester Carry Chain: An Improvement• Problem: Cin arrives late move it closer to output

Use bypass logic:

Cin g0

p0

g1

p1

g2

p2

g3

p3

C4

p0 p1 p2 p3

Cin

[©Hauck]

Page 32: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 32

Manchester Carry Chain: the Improvement

• Direct implementation

Cin

p0 g0 p1 g1 p2 g2 p3 g3

C4

C1 C2 C3

[©Hauck]

p0 p1 p2 p3

Cin

Cin

C4

C4

• Carry bypass circuitry

• Advantages of the carry bypass circuitry– Only 5 series transistors– Less capacitance in internal nodes

– Cin close to the output

Page 33: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 33

Manchester Carry Chain: Summary• Compared to CLA:

Smaller areao Pre-charge internal nodeso Reuse logic for intermediate carry signals

Cin close to the output

• Carry chain can be any length Series propagate is slow (O(n2) delay)

buffer every 4 bits

• Compact adder: good for up to 16 bits• Using carries to compute sum slows down

MCC– Use two carry chains: one for sum, one for carry propagation

[©Hauck]

Page 34: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 34

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 35: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 35

Carry Bypass Adder: Idea

• The “bypass” idea is general Not just for Manchester carry chain The local carry chain could be “ripple carry adder”

Ci

Bit i to i+k

Setup

LocalCarryChain

Sum

Ci+k+1

BypasBypass?s?

• Structure– Could be static,

dynamic, pass transistor

– Carry and sum paths shown in different colors

– Bypass logic determines: “pass” or “kill/generate”?

Page 36: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 36

Local Carry Chain

• Static implementation, using ripple carry adder

• Dynamic, Manchester (mux=wire!)

Carry Bypass Adder: Cell Examples

FA FA FA FA

p0.p1.p2.p3

g0 g1

p1

g2

p2

g3

p3

C4

p0 p1 p2 p3

Cin

[Rab96] p398

p0

Page 37: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 37

Carry Bypass Adder: Cell Examples (cont.)

• Static (pass transistor logic), Manchester

T1=(p0.p1.p2).p3 T2=p3 T3=p0.p1.p2.p3

p0

p0

p0

g0

p1

p1

p1

g1

p2

p2

p2

g2

T2

T1

T1

g3

T2

T3

T3

C4C0

[WE92] p531

Page 38: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 38

Carry Bypass Adder: the Structure and Timing

Bit 0-3

C0

[Rab96] p.399

Setup

LocalCarryChain

Sum

Bit 4-7

Setup

LocalCarryChain

Sum

Bit 8-11

Setup

LocalCarryChain

Sum

Bit 12-15

Setup

LocalCarryChain

Sum

• Timing (Critical path shown in different color):1-Setup2-Local carry generate/kill, MUX select line ready3-C0-C16 carry propagate (if applicable)

Page 39: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 39

LocalCarryChain

Sum

Bit 8-11Setup

LocalCarryChain

Sum

Bit 8-11Setup

• For an intermediate stage, after setup: If in pass mode

o Local carry vector computes intermediatecarries (possibly incorrectly)

o At the same time, mux selection set to passo When input carry arrives, intermediate carries

might be recomputedo Meanwhile, input carry is sent to Cout

Carry Bypass Adder: Timing of a Sub-block

Sum

Setup

Setup– If not pass mode (assume bit 10

generates)• Local carry vector computes intermediate

carries (bits 10, 11 correc)• At the same time, mux selection set to

local• Meanwhile, output carry is sent to Cout

correctly• When input carry arrives, intermediate

carriesC8and C9 (S8,S9,S10) will be recomputed correctly

LocalCarryChain

LocalCarryChain

Sum

LocalCarryChain

SumSum

LocalCarryChain

Page 40: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 40

3 x tFA+ tsum3 xtmux_pass +

max { tselect , 4 x tFA} +tsetup+

Carry Bypass Adder: Timing

Bit 0-3

C0

Setup

LocalCarryChain

Sum

Bit 4-7

Setup

LocalCarryChain

Sum

Bit 8-11

Setup

LocalCarryChain

Sum

Bit 12-15

Setup

LocalCarryChain

Sum

Delay =

Page 41: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 41

Carry Bypass Adder: Pros and Cons

• Speed: Faster than

ripple adder Still linear!

• Area overhead: Mux (setup?) Not worth for

small adders (N<8) 10-20% for

large adders

[Rab96] p.399

Pro

pag

ati

on

Dela

yNumber of

bits

4..8

Ripple Adder

Bypass Adder

Page 42: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 42

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 43: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 43

Carry Select Adder: the Idea

• Similar to bypass Instead of “waiting” for

the input carry, ”precompute” the carry output

Compute Ci+k for both cases Ci=0 and Ci=1

When Ci arrives, select the appropriate result

Sum computed in one step after the intermediate carry signals are ready

[Rab96] p.400

p,g p,g

MultiplexersCi Ci+k

Sum GenerationCarry Vector

Setup (p,g)

k bits

0-Carry propagation

1-Carry propagation1

0

Page 44: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 44

Linear Carry Select Adder: Structure

C0

Sum

Setup

Bits 0-3

0-Carry

1-Carry1

0

C4

Sum

Setup

Bits 4-7

0-Carry

1-Carry1

0

C8

Sum

Setup

Bits 8-11

0-Carry

1-Carry1

0

C12

Sum

Setup

Bits 12-15

0-Carry

1-Carry1

0

C16

[Rab96] p.401

Page 45: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 45

Linear Carry Select Adder: Timing

Setup

Bits 0-3

Setup

Bits 4-7

Setup

Bits 8-11

Setup

Bits 12-15

C0 C4

Sum

C8

Sum

C12

Sum

0-Carry

1-Carry1

0 0-Carry

1-Carry1

0 0-Carry

1-Carry1

0 0-Carry

1-Carry1

0

Sum

C16

Delay = 3 + 1 + 1 + 1 + 1 = 7 (16 bits)

[Rab96] p.401

Page 46: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 46

Square Root Carry Select Adder: the Idea

• Later stages have to wait for the multiplexers in the earlier stages

• Why not give them bigger chunks of data to compute? Balances the delay paths Sub-linear delay (we will see why)

Page 47: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 47

3

Square Root Carry Select Adder: the Structure

• Assuming the following delays: Setup=1, carry propagate=1/bit, mux=1

C0Sum

Bits 0-1

C2

Bits 2-4

C5

4

Bits 5-8

C9

5

Bits 9-13

C14

6

Bits 14-19

C19

7

Delay from all paths = 8 (20 bits)

[Rab96] p.402

Page 48: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 48

Square Root Carry Select Adder: Delay• Assume

N-bit adder P stages (delay directly depends on P) First stage computes M bits

• For M<<N (e.g. N=64, M=2) The first term dominates N P2/2

)2

1(

2

2

)1(

)1()2()1(

2

MPP

PPMP

PMMMMN

)2

1(

2

2

)1(

)1()2()1(

2

MPP

PPMP

PMMMMN

NP 2 NP 2

Page 49: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 49

Carry Select Adder: Trade-offs• Area overhead:

An additional carry path and a multiplexer (not the whole adder)

About 30% more than a ripple-carry

• Delay Sub-linear (we can beat that too!)

0 20 40 60Number of bits

0.0

10.0

20.0

30.0

40.0ripple adder

linear select

square root select

[© Prentice Hall]

Page 50: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 50

Outline

• One-bit adder, basic ripple-carry

adder

• Carry-Lookahead adders (CLA)

• Manchester carry chain

• Carry bypass

• Carry select adder

• Brent-Kung adder

Page 51: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 51

Binary Carry-Lookahead or Brent-Kung Adder

• Idea: use binary tree for carry propagation logarithmic delay

A7

F

A6A5A4A3A2A1

A0

A0A1A2A3A4A5A6A7

F

tp log2(N)

tp N

[© Prentice Hall]

Page 52: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 52

Brent-Kung Adder

• Basic component

Concatenation

MSB LSB

gleft pleft gright pright

g p

(g, p)

g = gleft + pleft • gright

p = pleft • pright

(gleft, pleft) (gright pright)

[©Hauck]

Page 53: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 53

No! Doesn’t know aboutC0-3 yet!

C5?

Brent-Kung Adder: Structure• Define (Gi, Pi)

generate and propagate for least significant i bits(G0,P0) = (g0,p0) gi = Ai.Bi pi = AiBi

for i>0: (Gi, Pi) = (gi, pi) • (Gi-1, Pi-1)

= (gi, pi) • (gi-1, pi-1) • . . . . • (g1, p1)

• Key to Brent-Kung adder – use tree structure to perform concatenations

7 6 5 4 3 2 1 0

7-6 5-4 3-2 1-0

7-4 3-0

7-0 [©Hauck]

Page 54: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 54

Brent-Kung: the Complete Tree

tadd log2 (N) [© Prentice Hall]

(g0 ,p0)(g1 ,p1)

(g2 ,p2)

(g3 ,p3)

(g4 ,p4)

(g5 ,p5)

(g6 ,p6)

(g7 ,p7)

C0C1

C3

C7

C2

C6

C5

C4

Page 55: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 55

Brent-Kung: Timing

[©Oxford U Press][Par00] p.102

x0x1x2x3x4x5x6x7x8x9x10x11

x12x13x14x15

s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15

1

2

3

4

5

6

Level

Page 56: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 56

Brent-Kung Adder: Summary

• Area On average, twice as large as ripple adder Layout of the cells is very compact

• Delay Logarithmic time Once carry signals are ready,

sum bits derived in const time Good for wide adders

Page 57: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 57

Comparing Adder Designs

0 10 20

Number of bits

0

20

40

60

80

0 10 20

Number of bits

0

0.2

0.4

Brent-Kung

select

bypassmanchestermirrorstatic

manchester

Brent-Kung

select

static

mirrorbypass

[© Prentice Hall]

t p(s

ec)

Are

a (

mm

2)

Page 58: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 58

Combining Different Adders

[©Oxford U Press][Par00] p.103

Page 59: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 59

Combining Different Adders

• Two-level carry skip adder Delay = 8 cycles Number of bits: 30

Blk E Block D Block C Block B Block AF

7 6 5 4 3 3

2 Cint=0

Coutt=8

[©Oxford U Press][Par00] p.113

c c

80

7 6 5 34 3

b b b b b b{8, 1} {7, 2} {6, 3} {5, 4} {4, 5} {3, 8}

inoutABCDEF

S2 S2 S2 S2 S2

Tproduce Tassimilate

Page 60: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 60

Combining Different Adders

40 BitCarry Select Adder

24 BitDifferential CarryLookahead Adder

MSB LSBRA(23:0) RB(23:0)RA(63:24) RB(63:24)

cout2364 Bit Adder

EA(63:24)

EA(23:0)

real_add(40:0)hit/miss/data

TLB

Compare

DataCache

Compare

© Dan Stasiak, IBM Rochester, 2001

Page 61: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 61

Combining Different Adders

© Dan Stasiak, IBM Rochester, 2001

40 Bit Adder Section 24 Bit Adder Section

EA(0:23) &EA_L(0:23)EA(24:63)

Page 62: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 62

Combining Different Adders

• Ripple+skip adder: delay=8. Max adder width? Assume: p,g, ripple, skip signal, skipping: 1 unit delay Carry signals

o Pass mode: ready at time x through skip logic limit # blockso Local gen mode: blocks can process y bits and still have time to

deliver locally generated carry by time x for the next block.

Sum signalso If in local generation mode, y is OK

o If in pass mode, y not OK for left bits (e.g., bE receives cin at x=5, can process at most z=3 bits to meet the delay bound of 8 on the sum bits)

[©Oxford U Press][Par00] p.112

Cout Cin

7

0

6 5 4 232

b b b b b bABCDEF

S S S S S

bG

7 6 5 4 3 11 2 3 4

Should appear before

slide 126

Page 63: EE5324 Adders

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 63

CLA Static Logic: Trimmed Down

p0

Cin

g0

C1

[©Hauck][Rab96] p405

h

j

k

s

t

u

Should appear before

slide 86