Parallel prefix adders - Concordia...

35
Parallel prefix adders Kostas Vitoroulis, 2006. Presented to Dr. A. J. Al-Khalili. Concordia University.

Transcript of Parallel prefix adders - Concordia...

Page 1: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Parallel prefix adders

Kostas Vitoroulis, 2006.Presented to Dr. A. J. Al-Khalili.Concordia University.

Page 2: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Overview of presentation

Parallel prefix operationsBinary addition as a parallel prefix operationPrefix graphsAdder topologiesSummary

Page 3: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Parallel Prefix Operation

Terminology background:

Prefix: The outcome of the operation depends on the initial inputs.

Parallel: Involves the execution of an operation in parallel. This is done by segmentation into smaller pieces that are computed in parallel.

Operation: Any arbitrary primitive operator “ ° ” that is associative is parallelizable

it is fast because the processing is accomplished in a parallel fashion.

Page 4: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Example: Associative operations are parallelizable

Consider the logical OR operation: a + bThe operation is associative:a + b + c + d = ((( a + b ) + c) + d ) = (( a + b ) + ( c + d))

Serial implementation: Parallel implementation:

Page 5: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Operator: “ ° ”

Input is a vector:A = AnAn-1 … A1

Output is another vector:B = BnBn-1 … B1

whereB1 = A1B2 = A1 ° A2 …Bn = A1 ° A2 … ° An

this is the unary operator known as “scan” or “prefix sum”

Bn represents the operator being applied to all terms of the vector.

Mathematical Formulation: Prefix Sum

Page 6: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Example of prefix sumConsider the vector: A = AnAn-1 … A1 where element Ai is an integer

The “*” unary operator, defined as:*A = B

WithB = BnBn-1 … B1

B1 = A1

B2 = A1 * A2

B3 = A1 * A1 * A3

and ‘ * ’ here is the integer addition operation.

Page 7: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1

B1

2

B2

3

B3

5

B5

6

B6

4

B4

1

B1

Example of prefix sumCalculation of *A, where A = 6 5 4 3 2 1 yields:

B = *A = 21 15 10 6 3 1

Because the summation is associative the calculation can be done in parallel in the following manner:

2

B2

+

3

B3

+

5

B5

+

B1 = A1 = 1

6

B6

+

+

4

B4

+

+

B2 = A1 + A2 = 3B3 = (A1 + A2) + A= 6

3B6 = A6 +… +A1

= (A6 + A5) +((A4+A3) +(A2 +A1))

= 21

1

B1

2

B2

3

B3

5

B5

6

B6

4

B4

1

B1

2

B2

+

3

B3

5

B5

6

B6

4

B4

+

++

+

Parallel implementation versus Serial implementation

Page 8: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Binary Addition

Each stage ii adds bits ai, bi, ci-1 and produces bits si, ciThe following hold:

y3 y2 y1

x0x1x2x3+

y0

This is the pen and paper addition of two 4-bit binary numbers x and y. c represents the generated carries.s represents the produced sum bits.

A stage of the addition is the set of x and y bits being used to produce the appropriate sum and carry bits. For example the highlighted bits x2, y2 constitute stage 2 which generates carry c2 and sum s2 .

s0s1s2s3

c0c1c2c3

s4

ai bi ci Comment: Formal definition:

0 0 0 The stage “kills” an incoming carry. “Kill” bit:

“Propagate” bit:

“Generate” bit:

0 1 ci-1 The stage “propagates” an incoming carry

1 0 ci-1 The stage “propagates” an incoming carry

1 1 1 The stage “generates” a carry out

iii yxp ⊕=

iii yxk +=

iii yxg •=

Page 9: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Binary Addition

The carry ci generated by a stage ii is given by the equation:

This equation can be simplified to:

The “ai” term in the equation being the “alive” bit. The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implemented in CMOS technology. Note that:

Where ki is the “kill” bit defined in the table above.

ai bi ci Comment: Formal definition:

0 0 0 The stage “kills” an incoming carry. “Kill” bit:

“Propagate” bit:

“Generate” bit:

0 1 ci-1 The stage “propagates” an incoming carry

1 0 ci-1 The stage “propagates” an incoming carry

1 1 1 The stage “generates” a carry out

( ) 11 −− ⋅⊕+⋅=⋅+= iiiiiiiii cyxyxcpgc

iii yxp ⊕=

iii yxk +=

iii yxg •=

( ) 11 −− ⋅+=⋅++⋅= iiiiiiiii cagcyxyxc

ii ka =

Page 10: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Carry Look Ahead adders

The CLA adder has the following 3-stage structure:

Pre-calculation of pi, gi for each stage

Calculation of carry ci for each stage.

Combine ci and pi of each stage to generate the sum bits si

Final sum.

Page 11: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Carry Look Ahead addersThe pre-calculation stage is implemented using the equations for pi, gi shown at a previous slide:

Alternatively using the “alive” bit:

Note the symmetry when we use the “propagate” or the “alive” bit… We can use them interchangeably in the equations!

x0y0

p0g0

x1y1

p1g1

x2y2

p2g2

x0y0

a0g0

x1y1

a1g1

x2y2

a2g2

Page 12: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Carry Look Ahead adders

The carry calculation stage is implemented using the equations produced when unfolding the recursive equation:

11 −− ⋅+=⋅+= iiiiiii cagcpgc

( )

Ketcgppgpg

gpgpgcpgcgpgc

gc

012122

011221222

0111

00

⋅⋅+⋅+=⋅+⋅+=⋅+=

⋅+==

g0p0

c0

g1p1

c1c2

g2p2

Carry generator block

Page 13: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Carry Look Ahead adders

The final sum calculation stage is implemented using the carry and propagate bits ci,pi:

If the ‘alive’ bit ai is used the final sum stage becomes more complex as implied by the equations above.

cinp0

s0

c2p3

s3

c1p2

s2

c0p1

s1

iiiiiii

iiiiii

yxawithcagsNote

yxpwithcps

+=⋅+=

⊕=⊕=

,:

,

1

1

Page 14: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Binary addition as a prefix sum problem.

( )( ) ( )0011 ,,, pgpgpg nnnn K−−

We define a new operator: “ ° ”Input is a vector of pairs of ‘propagate’ and ‘generate’ bits:

Output is a new vector of pairs:

Each pair of the output vector is calculated by the following definition:

),(),(:

),(),(),(

0000

11

pgPGWhere

PGpgPG iiiiii

=

= −−o

( )( ) ( )0011 ,,, PGPGPG nnnn K−−

operationsANDORthebeingwith

ppgpgpgpg yxyxxyyxx

,,

),(),(),(

⋅+

⋅⋅+=o

Page 15: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Binary addition as a prefix sum problem.

),(),(),(),( xxxxxxxxxxx pgppgpgpgpg =⋅⋅+=o

),(),(),(),(),(

1111

11

pgPGWherePGpgPG iiiiii

== −−o

Properties of operator “ ° ”:Associativity (hence parallelization)

Easy to prove based on the fact that the logical AND, OR operations are associative.

With the definition:

Gi becomes the carry signal at stage i of an adder. Illustration on next slide.

The operation is idempotent

Which implies

nmandjiWhere

PGPGPG jmjmninijiji

≥≥

= ),(),(),( :::::: o

Page 16: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Binary Addition as a prefix sum problem.

K

o

o

o

o

etcpppgppgpg

pppgpgpgPGpgPGppgpgPGpgPG

pgPGhaveWe

ppgpgpgpgPGpgPG

With

yxyxxyyxx

iiiiii

)),()),((),(),(),(

),(),(),(),(),(),(

:

),(),(),(),(),(),(

:

123123233

12312233223333

12122112222

1111

11

⋅⋅⋅⋅+⋅+=⋅⋅⋅+⋅+==

⋅⋅+===

⋅⋅+== −−

… The familiarcarry bit generatingequations for stage iiin a CLA adder.

),(),(:

0000 pgPGWhere

=

b3 b2 b1 b0

a0a1a2a3+

A stage i will generate a carry if gi=aibi

and propagate a carry if pi=XOR(ai,bi)

Hence for stage i:ci=gi+pici-1

Page 17: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Addition as a prefix sum problem.Conclusion:

The equations of the well known CLA adder can be formulated as a parallel prefix problem by employing a special operator “ ° ”.

This operator is associative hence it can be implemented in a parallel fashion.

A Parallel Prefix Adder (PPA) is equivalent to the CLA adder… The two differ in the way their carry generation block is implemented.

In subsequent slides we will see different topologies for the parallel generation of carries. Adders that use these topologies are called Parallel Prefix Adders.

Page 18: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Parallel Prefix AddersThe parallel prefix adder employs the 3-stage structure of the CLA adder. The improvement is in the carry generation stage which is the most intensive one:

Pre-calculation of Pi, Gi terms

Calculation of the carries.

This part is parallelizable to reduce time.

Simple adder to generate the sum

Straight forward as in the CLA adder

Prefix graphs can be used to describe the structure that performs this part.

Straight forward as in the CLA adder

Page 19: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Calculation of carries – Prefix GraphsThe components usually seen in a prefix graph are the following:

processing component: buffer component:

),(22 inin pg

( ) ( )21211

,, inininininoutout ppgpgpg ⋅⋅+=

( )11

, inin pg

( )outout pg ,( )outout pg ,

( )inin pg ,

( )outout pg ,( )outout pg ,

( ) ( )ininoutout pgpg ,, =

Page 20: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Prefix graphs for representation of Prefix addition

Example: serial adder carry generation represented by prefix graphs

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8)

c2c3c4c5c6c7c8

(p1, g1)

Page 21: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Key architectures for carry calculation:

1960: J. Sklansky – conditional adder1973: Kogge-Stone adder1980: Ladner-Fisher adder1982: Brent-Kung adder1987: Han Carlson adder1999: S. Knowles

Other parallel adder architectures:1981: H. Ling adder2001: Beaumont-Smith

Page 22: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1960: J. Sklansky – conditional adder

Page 23: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1960: J. Sklansky – conditional adder

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8)

c2c3c4c5c6c7c8

(p1, g1)

The Sklansky adder has:Minimal depthHigh fan-out nodes

Page 24: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1973: Kogge-Stone adder

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8)

c2c3c4c5c6c7c8

(p1, g1)

The Kogge-Stone adder has:Low depthHigh node count (implies more area).Minimal fan-out of 1 at each node (implies faster performance).

Page 25: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1980: Ladner-Fischer adder

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8) (p1, g1)

c2c3c4c5c6c7c8

The Ladner-Fischer adder has:Low depthHigh fan-out nodesThis adder topology appears the same as the Schlanskly conditional sum adder. Ladner-Fischer formulated a parallel prefix network design space which included this minimal depth case. The actual adder they included as an application to their work had a structure that was slightly different than the above.

Page 26: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1982: Brent-Kung adder

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8) (p1, g1)

c2c3c4c5c6c7c8

The Brent-Kung adder is the extreme boundary case of:Maximum logic depth in PP adders (implies longer calculation time).Minimum number of nodes (implies minimum area).

Page 27: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1987: Han Carlson adder

The Han-Carlson adder combines the Brent-Kung and Kogge-Stone structures into a hybrid structure.

EfficientSuitable for VLSI implementation.

Page 28: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1999: S. KnowlesKnowles proposed adders that trade off:

Depth, interconnect, area.These adders are bound by the Lander-Fischer (minimum depth) and Brent-Kung (minimum fanout) topologies.

Brent-Kung topology(Minimum fan-out)

Ladner-Fischer topology(Minimum depth, high fanout)

Knowles topologies(Varied fan-out at each level )

Page 29: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

An interesting taxonomy:

Harris[2003] presented an interesting 3-D taxonomy of the adders presented so far.

Each axis represents a characteristic of the adders: -Fanout

-Logic depth

-Wire connections

He also proposed the following structure:

Page 30: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

1981: H. Ling adderLing Adders are a different family of adders. They can still be formulated as prefix adders.

Ling adders differ from the “traditional” PP adders in that:They are based on a different set of equations.The new set of equations introduces the following tradeoffs:

Precalculation of Pi, Gi terms is based on more complexequations

Calculation of the carries is based on simpler equations

Final addition stage is more complex

Page 31: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

2001: Beaumont-Smith

c1

(p2, g2)(p3, g3)(p4, g4)(p5, g5)(p6, g6)(p7, g7)(p8, g8) (p1, g1)

c2c3c4c5c6c7c8

The Beaumont-Smith adders incorporate nodes that can accept more than a pair of inputs and produce the carry calculation.These ‘higher valency’ nodes are optimized circuits for a specific technology (CMOS).The above topology is a Beaumont-Smith tree based on the Kogge-Stone architecture

Page 32: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Summary (1/3)

The parallel prefix formulation of binary addition is a very convenient way to formally describe an entire family of parallel binary adders.

Page 33: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Summary (2/3)A parallel prefix adder can be seen as a 3-stage process:

There exist various architectures for the carry calculation part.Trade-offs in these architectures involve the

area of the adderits depththe fan-out of the nodesthe overall wiring network.

Pre-calculation of Pi, Gi terms

Calculation of the carries.

Simple adder to generate the sum

Page 34: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

Summary (3/3)

Variations of parallel adders have been proposed. These variations are based on:

Modifying the carry generation equations and reformulating the prefix definition (Ling)Restructuring the carry calculation trees based by optimizing for a specific technology (Beaumond-Smith)Other optimizations.

Page 35: Parallel prefix adders - Concordia Universityusers.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel... · We define a new operator: ... The parallel prefix adder employs the

References:Beaumont-Smith, Cheng-Chew Lim, “Parallel Prefix Adder Design”, IEEE, 2001

Han, Carlson, “Fast Area-Efficient VLSI Adders, IEEE, 1987

Dimitrakopoulos, Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders”, IEEE 2005

Kogge, Stone, “A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations”, IEEE, 1973

Simon Knowles, “A Family of adders”, IEEE, 2001

Ladner, Fischer, “Parallel Prefix Computation”, ACM, 1980

Brent, Kung, “A regular Layout for Parallel Adders”, IEEE, 1982

H. Ling, “High-Speed Binary Adder”, IBM J. Res. And Dev., 1980

J. Sklansky, “Conditional-Sum Addition Logic”, IRE transactions on computers, 1960

D. Harris, “A Taxonomy of Parallel Prefix Networks”, IEEE, 2003