ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead...

21
ECE 341 Lecture # 6 Instructor: Zeshan Chishti [email protected] October 15, 2014 Portland State University

Transcript of ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead...

Page 1: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

ECE 341

Lecture # 6

Instructor: Zeshan Chishti

[email protected]

October 15, 2014

Portland State University

Page 2: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Lecture Topics

• Design of Fast Adders – Carry Looakahead Adders (CLA)

– Blocked Carry-Lookahead Adders

• Multiplication of Unsigned Numbers – Array Multiplier

– Sequential Circuit Multiplier

• Reference: – Chapter 9: Sections 9.2 and 9.3

Page 3: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

CLA Delay Calculation

Consider the expression:

ci+1 = Gi + PiGi-1 + PiPi-1Gi-2 + …. ………+ PiPi-1….P1G0 + PiPi-1….P0c0

• All the Gi and Pi functions can be obtained in parallel in one gate delay • AND terms in each ci+1 calculation require one additional gate delay • ORing the AND terms in each ci+1 calculation requires one additional gate delay Therefore, • Total delay in calculating carry outputs = 1 + 1 + 1 = 3 gate delays

Sum outputs require one additional XOR delay after carries are computed • Total delay in calculating sum outputs = 3 + 1 = 4 gate delays

n-bit CLA requires 4 gate delays independent of n

Page 4: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

CLA Fan-in Limitation

• Performing n-bit CLA in 4 gate delays, independent of n, good only in theory • In practice, CLA is limited by fan-in constraints

ci+1 = Gi + PiGi-1 + PiPi-1Gi-2 + …. ………+ PiPi-1….P1G0 + PiPi-1….P0c0

• OR gate & last AND gate in the expression for ci+1 require i+2 inputs, each • For a 4-bit CLA, the MSB carry-out (c4) requires a fan-in of 5 • 5 is the practical fan-in limit for most gates • In order to add operands larger than 4-bits, we can cascade multiple CLAs • Cascade of CLAs is called Blocked Carry-Lookahead adder

Page 5: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Blocked Carry-Looakhead Adder

c0 4-bit

CLA

c4 c8 ……. c32 c28 4-bit

CLA 4-bit

CLA

s0 s1 s2 s3 s4 s5 s6 s7 s28 s29 s30 s31

y3-0 x3-0 y7-4 x7-4 y31-28 x31-28

After input operands (X, Y and c0) are applied to the 32-bit adder: • All the Pi and Gi terms in each CLA calculated in parallel in 1 gate delay • c4 available after 3 gate delays • c8 available 2 gate delays after c4 = 3 + (1*2) = 5 gate delays • c12 available (2*2) gate delays after c4 = 3 + (2*2) = 7 gate delays • c16 available after (3*2) gate delays after c4 = 3 + (3*2) = 9 gate delays • c32 available after (7*2) gate delays after c4 = 3 + (7*2) = 17 gate delays • s28, s29, s30, s31 available after 17+1 = 18 gate delays

Carry-outs ripple from one CLA block to the next. Can we avoid this rippling?

32-bit Blocked CLA composed of eight 4-bit

CLA blocks

Page 6: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Faster Blocked Carry-Lookahead adder

Key Idea: Generate the carry outputs c4, c8, c12, … of CLA blocks in parallel, similar to how c1, c2, c3, c4 are generated in parallel within a CLA block

• Carry-out from a 4-bit block can be given as: c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0

• This can be re-written as: c4 = G0

1 + P01c0

where G0

1 = G3 + P3G2 + P3P2G1 + P3P2P1G0 and P01 = P3P2P1P0

We can similarly compute G11, G2

1, G31 …..

Gi1 & Pi

1 are first-level generate & propagate functions, where i denotes the CLA block • Gk

1 = 1 implies that the kth CLA block generates a carry • Pk

1 = 1 implies that the kth CLA block propagates a carry

Carry-out of kth block = Gk1 + Pk

1Gk-11 + Pk

1Pk-11Gk-2

1 + ….. + Pk1Pk-1

1….P11G0

1 + Pk1Pk-1

1…P01c0

• For example:

c16 = G31 + P3

1G21 + P3

1P21G1

1 + P31P2

1P11G0

1 + P31P2

1P11P0

1c0

Page 7: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Blocked CLA with First-level Propagates and Generates

Carry-lookahead logic

4-bit adder 4-bit adder 4-bit adder 4-bit adder

s 15-12

P 3 I G 3

I

c 12

P 2 I G 2

I

c 8

s 11-8

G 1 I

c 4

P 1 I

s 7-4

G 0 I

c 0

P 0

I

s 3-0

c 16

x 15-12

y 15-12

x 11-8

y 11-8

x 7-4

y 7-4

x 3-0

y 3-0

After input operands (X, Y and c0) are applied to the above 16-bit adder: • Pi and Gi terms within each CLA calculated in parallel in 1 gate delay • First-level generates (Gk

1) available after 1 + 2 = 3 gate delays • Carry-outs of CLA blocks (c4, c8, c12, c16) available after 3 + 2 = 5 gate delays • Carries within CLA blocks (such as c15) available after 5 + 2 = 7 gate delays • Sum outputs (such as s15) available after 7 + 1 = 8 gate delays • Compare this with the blocked CLA formed by cascading, where c15 and s15 required 9 and 10 gate delays respectively

Page 8: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Multiplication

Page 9: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Multiplication of Unsigned Numbers

Product of two n-bit numbers is at most a 2n-bit number

Unsigned multiplication can be viewed as addition of shifted versions of the multiplicand.

Page 10: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Multiplication of Unsigned Numbers (contd.)

• In hand multiplication, we add the shifted versions of multiplicand at the end (column-by-column)

– Alternative would be to accumulate partial products at each stage (row-by-row)

Multiplication logic for two n-bit numbers can be implement as follows :

Initialize the partial product PP0 to a value of 0

Start from the LSB of multiplier and proceed towards MSB, one bit at a time. For each bit position of the multiplier, perform the following step:

If the ith bit of the multiplier is 1, shift the multiplicand by i bit positions and add it to PPi in order to obtain PP(i+1)

After n steps, the partial product PPn represents the final product

Page 11: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Multiplication of Unsigned Numbers

Typical multiplication cell

carry in carry out

jth multiplicand bit

ith multiplier bit

Bit of incoming

partial product (PPi)

Bit of outgoing partial product (PP(i+1))

FA

Page 12: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Combanitorial Array Multiplier

Multiplicand

m 3

m 2

m 1

m 0

0 0 0 0

q 3

q 2

q 1

q 0

0

p 2

p 1

p 0

0

0

0

p 3

p 4

p 5

p 6

p 7

PP1

PP2

PP3

(PP0)

, Product = P7,P6,..P0

Multiplicand is shifted by displacing it through an array of adders

Each box represents a multiplication cell

Page 13: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Combinatorial Array Multiplier (cont.)

• Array multipliers are highly inefficient: – Need n n-bit adders => number of gate counts is proportional to n2

– Impractical for large numbers such as 32-bit or 64-bit numbers typically used in computers

– Perform only one function, namely, unsigned integer product

• Solution: Improve gate efficiency by using a mixture of combinatorial array techniques and sequential techniques – Instead of n n-bit adders, use one n-bit adder

– Use a register to hold the accumulated partial product

– This is called a sequential multiplier

Page 14: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication

• Recall the rule for generating partial products: – If the ith bit of the multiplier is 1, add the appropriately shifted

multiplicand to the current partial product.

– Multiplicand is shifted left when being added to the partial product

Key Observation:

• Adding a left-shifted multiplicand to an unshifted partial product is equivalent to adding an unshifted multiplicand to a right-shifted partial product

Page 15: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Circuit Multiplier

q n 1 -

m n 1 -

n-bit Adder

Multiplicand M

Control sequencer

Multiplier Q

0

C

Shift right

Register A (initially 0)

Add/Noadd control

a n 1 -

a 0

q 0

m 0

0

MUX

Page 16: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Algorithm

• Initialization: – Load multiplicand in “M” register, multiplier in “Q” register

– Initialize “C” and “A” registers to all zeroes

• Repeat the following steps “n” times, where “n” is the number of bits in the multiplier – If (LSB of Q register == 1)

A = A + M (carry-out goes to “C” register)

– Treat the C, A and Q registers as one contiguous register and shift that register’s contents right by one bit position

• After the completion of “n” steps – Register “A” contains high-order half of product

– Register “Q” contains low-order half of product

Page 17: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Example

Initial configuration

M

1 1 0 1

C

0 0 0 0 0 1 0 1 1

Q A

Page 18: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Example

1 0 1 1

1 1 0 1

Initial configuration

A += M

M

1 1 0 1

C

First cycle Shift

0

0

0

0 0 0 0

0 1 1 0

1 1 0 1

1 0 1 1

Q A

Page 19: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Example

1 0 1 1

1 1 1 0

1 1 0 1

1 1 0 1

Initial configuration

A += M

M

1 1 0 1

C

First cycle

Second cycle

Shift

Shift

A += M

0

0

0

1

0

0 0 0 0

0 1 1 0

1 1 0 1

0 0 1 1

1 0 0 1

1 0 1 1

Q A

Page 20: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Example

1 0 1 1

1 1 1 1

1 1 1 0

1 1 1 0

1 1 0 1

1 1 0 1

Initial configuration

A += M

M

1 1 0 1

C

First cycle

Second cycle

Third cycle No add

Shift

Shift

A += M

Shift

0

0

0

1

0

0

0

0 0 0 0

0 1 1 0

1 1 0 1

0 0 1 1

1 0 0 1

0 1 0 0

1 0 0 1

1 0 1 1

Q A

Page 21: ECE 341 Lecture # 6web.cecs.pdx.edu/~zeshan/ece341_lecture6.pdf · Faster Blocked Carry-Lookahead adder Key Idea: Generate the carry outputs c 4, c 8, c 12, … of CLA blocks in parallel,

Sequential Multiplication Example

1 1 1 1

1 0 1 1

1 1 1 1

1 1 1 0

1 1 1 0

1 1 0 1

1 1 0 1

Initial configuration

A += M

M

1 1 0 1

C

First cycle

Second cycle

Third cycle

Fourth cycle

No add

Shift

Shift A += M

Shift

Shift

A += M

1 1 1 1

0

0

0

1

0

0

0

1

0

0 0 0 0

0 1 1 0

1 1 0 1

0 0 1 1

1 0 0 1

0 1 0 0

0 0 0 1

1 0 0 0

1 0 0 1

1 0 1 1

Q A

Product