Adder methodology and design using probabilistic multiple carry estimates

7
Adder methodology and design using probabilistic multiple carry estimates E.M. Ashmila, S.S. Dlay and O.R. Hinton Abstract: A novel approach for designing estimated carry adders for use in asynchronous circuits is presented. It demonstrates that by using statistical probability of a carry being in a particular state, a 32-bit adder can be constructed in which for the majority of additions there is an improve- ment in the speed performance of the adder. This methodology shows that each time an additional carry is introduced for carry prediction there is a 50% gain in speed performance over the previous 32-bit addition. This novel adder methodology significantly reduces the addition time, and through simulation and design it has been shown that the 32-bit ESTC adder using multiple carries can dra- matically achieve speed and/or area advantages over existing adder circuits. For example using four carries for prediction, comparisons in terms of delay –area product show performance savings of more than 41% over the carry select adder with ripple adder elements, and more than 26% over both carry lookahead and carry select adders based on carry lookahead elements. 1 Introduction Addition is a fundamental arithmetic operation in almost all digital circuits and processors, and the performance is significantly influenced by the speed of the adders. High- speed adders are always necessary for high performance in digital applications, hence research into improving the performance of digital adders is an important field and many high speed adders have been recently introduced [1–3]. Adders can be used in both synchronous and asynchro- nous circuit design methodologies. Recently, asynchronous circuit designs have attracted particular attention because of the benefits they offer compared with synchronous circuits: improved delay – area product, lower power, improved electromagnetic emissions, superior adaptability, and elim- ination of a global clock signal [4–14]. Improved delay- area product is achieved by including early completion logic, despite the area overhead of this logic [4]. In asyn- chronous design the performance is determined by average addition delay rather than worst case delay [15, 16], and asynchronous design approaches have the potential to lead to a more modular design style [17, 18] which is suitable for VLSI technology. Generally, addition operation speed is limited by the speed of the propagation carry across the word, hence high-speed performance can be achieved by reducing the length of the carry path by various methods [2, 19, 20]. A recent new multiple-operand redundant binary adder demonstrates how the carry propagation chain can be reduced by dividing the input operands into even and odd parts to add them separately [2]. This method keeps the word length independent of the addition speed, and achieves 6-operand parallel addition faster than traditional methods. A significant speed improvement in the implementation of a parallel adder was introduced by the carry-lookahead- adder (CLA) developed by Weinberger and Smith [21]. The CLA adder is one of the fastest schemes used for the addition of two numbers, since the delay depends on the logarithm of the size of the operands. Many other different adder architectures have been proposed, including the carry save, carry select, and asynchronous self-timing adders; a good coverage of these adders is given in [22]. Brent-and- Kung’ Sklanski adders are discussed in [23] and present a higher latency than the CLA but this drawback is balanced by a higher regularity in their structure and by a lower fan-out. There is still ongoing research in the area of digital adder design and many architectures have been developed in an attempt to speed up the addition operations. The cost of this has meant an increase in silicon area for implemen- tation and a loss of regularity in the structure of the arith- metic blocks which is very important for VLSI design [19, 20, 24]. In the carry select adder (CSA) [25, 26] the problem of carry propagation delay is overcome by dividing the data word into two parts and completing the addition of the most significant (MS) and least significant (LS) parts simul- taneously by taking both possible values of the input carry and evaluating the result for both possibilities. The CSA adder speeds up the addition operation, but still has area and power consumption disadvantages since the MS adder uses dual carry ripple adders (CRAs) and the correct, or true, sum from each of the MS adders is selected by using a multiplexer. Thus attempts to reduce the additional area of the second MS adder [27] in the CSA and the use of dynamic logic have resulted in some area and speed savings [28–30]. Recently, Wallace et al. introduced a new adder called the estimated carry adder (ESTC) [31] and provide evidence that this adder can be competitive in speed and size when compared with other adder designs. This adder dispenses with the dual adders of the CSA, and instead # IEE, 2005 IEE Proceedings online no. 20045185 doi:10.1049/ip-cdt:20045185 Paper first received 22nd December 2004 and in revised form 31st March 2005 The authors are with the School of Electrical, Electronic & Computer Engineering, Merz Court, University of Newcastle upon Tyne, NE1 7RU, UK E-mail: [email protected] IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005 697

Transcript of Adder methodology and design using probabilistic multiple carry estimates

Page 1: Adder methodology and design using probabilistic multiple carry estimates

Adder methodology and design using probabilisticmultiple carry estimates

E.M. Ashmila, S.S. Dlay and O.R. Hinton

Abstract: A novel approach for designing estimated carry adders for use in asynchronous circuitsis presented. It demonstrates that by using statistical probability of a carry being in a particularstate, a 32-bit adder can be constructed in which for the majority of additions there is an improve-ment in the speed performance of the adder. This methodology shows that each time an additionalcarry is introduced for carry prediction there is a 50% gain in speed performance over the previous32-bit addition. This novel adder methodology significantly reduces the addition time, and throughsimulation and design it has been shown that the 32-bit ESTC adder using multiple carries can dra-matically achieve speed and/or area advantages over existing adder circuits. For example usingfour carries for prediction, comparisons in terms of delay–area product show performancesavings of more than 41% over the carry select adder with ripple adder elements, and more than26% over both carry lookahead and carry select adders based on carry lookahead elements.

1 Introduction

Addition is a fundamental arithmetic operation in almostall digital circuits and processors, and the performance issignificantly influenced by the speed of the adders. High-speed adders are always necessary for high performancein digital applications, hence research into improving theperformance of digital adders is an important field andmany high speed adders have been recently introduced[1–3].

Adders can be used in both synchronous and asynchro-nous circuit design methodologies. Recently, asynchronouscircuit designs have attracted particular attention because ofthe benefits they offer compared with synchronous circuits:improved delay–area product, lower power, improvedelectromagnetic emissions, superior adaptability, and elim-ination of a global clock signal [4–14]. Improved delay-area product is achieved by including early completionlogic, despite the area overhead of this logic [4]. In asyn-chronous design the performance is determined by averageaddition delay rather than worst case delay [15, 16], andasynchronous design approaches have the potential to leadto a more modular design style [17, 18] which is suitablefor VLSI technology.

Generally, addition operation speed is limited by thespeed of the propagation carry across the word, hencehigh-speed performance can be achieved by reducing thelength of the carry path by various methods [2, 19, 20].A recent new multiple-operand redundant binary adderdemonstrates how the carry propagation chain can bereduced by dividing the input operands into even and oddparts to add them separately [2]. This method keeps the

word length independent of the addition speed, and achieves6-operand parallel addition faster than traditional methods.

A significant speed improvement in the implementationof a parallel adder was introduced by the carry-lookahead-adder (CLA) developed by Weinberger and Smith [21].The CLA adder is one of the fastest schemes used for theaddition of two numbers, since the delay depends on thelogarithm of the size of the operands. Many other differentadder architectures have been proposed, including the carrysave, carry select, and asynchronous self-timing adders; agood coverage of these adders is given in [22]. Brent-and-Kung’ Sklanski adders are discussed in [23] and present ahigher latency than the CLA but this drawback is balancedby a higher regularity in their structure and by a lowerfan-out.

There is still ongoing research in the area of digital adderdesign and many architectures have been developed in anattempt to speed up the addition operations. The cost ofthis has meant an increase in silicon area for implemen-tation and a loss of regularity in the structure of the arith-metic blocks which is very important for VLSI design[19, 20, 24].

In the carry select adder (CSA) [25, 26] the problem ofcarry propagation delay is overcome by dividing the dataword into two parts and completing the addition of themost significant (MS) and least significant (LS) parts simul-taneously by taking both possible values of the input carryand evaluating the result for both possibilities. The CSAadder speeds up the addition operation, but still has areaand power consumption disadvantages since the MS adderuses dual carry ripple adders (CRAs) and the correct, ortrue, sum from each of the MS adders is selected by usinga multiplexer. Thus attempts to reduce the additional areaof the second MS adder [27] in the CSA and the use ofdynamic logic have resulted in some area and speedsavings [28–30].

Recently, Wallace et al. introduced a new adder calledthe estimated carry adder (ESTC) [31] and provideevidence that this adder can be competitive in speed andsize when compared with other adder designs. This adderdispenses with the dual adders of the CSA, and instead

# IEE, 2005

IEE Proceedings online no. 20045185

doi:10.1049/ip-cdt:20045185

Paper first received 22nd December 2004 and in revised form 31st March 2005

The authors are with the School of Electrical, Electronic & ComputerEngineering, Merz Court, University of Newcastle upon Tyne, NE1 7RU, UK

E-mail: [email protected]

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005 697

Page 2: Adder methodology and design using probabilistic multiple carry estimates

uses statistical probability of a carry being in a particularstate to design a 32-bit adder, which will operate at thespeed of a 16-bit adder for the majority of additions.

This paper extends the theory further for the ESTC meth-odology and shows a novel methodology to achieve furtherperformance benefits by using the statistical approachtogether with multiple carries to predict a correct carryfrom the LS to the MS half of a 32-bit ESTC. The paperis organised as follows. A brief description of a 32-bitESTC adder is first presented, followed by the methodologyfor extending this approach for multiple carries. Thismethodology is then demonstrated with implementationsof circuits using two and three carries for prediction,together with simulation results. Finally, conclusions andfuture work are addressed.

2 32-bit estimated carry adder (ESTC)

A 32-bit ESTC adder is shown in Fig. 1 which works on theprinciple of predicting the carry signals and making themavailable earlier than a fully evaluated carry. Although itmay seem like a CLA adder, the structure is in fact basedon a CSA. A CLA adder speeds the addition process byanticipating and producing the output carry of each bit, byeither carry generation or carry propagation, based onusing the input of each bit. The CSA given in Fig. 2 com-pletes the addition of the LS part and MS parts (for acarry 1 and 0) simultaneously; once the real carry outfrom the LS adder is known, the correct MS result isselected with a simple multiplexer. The ESTC adder elimin-ates one of the MS adders and the multiplexer of the CSA,and instead uses a control circuit to generate the carry outfrom the LS adder to the MS adder. This control circuitselects initially a probabilistically predicted carry, whichif correct results in early completion at the speed of a16-bit adder, but if wrong requires the correct carry topropagate through from the LS part. ESTC adder perform-ance therefore depends on effective carry prediction, andon the prediction logic requiring lower area overhead thanthe CLA or CSA.

2.1 Single carry prediction for 32-bit ESTC [31]

The concept of a single carry prediction for a 32-bit ESTCadder was first introduced in [31], and a brief review isgiven below.

As shown in Fig. 1, the 32-bit ESTC is divided into two16-bit halves, an LS adder (A0 – 15 and B0 – 15) and an MSadder (A16 – 31 and B16 – 31). An estimated value (C15

E ) ofthe carry out from the LS to MS adders can be evaluatedfrom the MS bits of the first half (C15

E ¼ A15 ^ B15), and

it is shown that

PðCE15 ¼ CT

15Þ ¼ PðA15 ¼ B15Þ þ ðPðA15 = B15Þ

� PðCT14 ¼ 0ÞÞ ¼ 0:5þ ð0:5� 0:5Þ ¼ 0:75

ð1Þ

where P( f ) is the probability that the logical expression f istrue, and C14

T is the true value of the carry from the 14th tothe 15th MS stages of the LS adder.

Thus a prediction of the carry from the LS to MS adderscan be made for a majority of additions (75%) by using thelogic A15 ^ B15. In this situation a 32-bit adder will operateat the speed of a 16-bit adder in parallel and this is describedas a short delay. For the cases of incorrect predictions(25%), the true carry out C15

T is applied to the MS adder,and in this situation the addition time is similar to thedelay time of a full 32-bit adder plus time needed todetect the state and apply the correct value and this iscalled a long delay. Using these results a carry controlcircuit has been designed and implemented to determinewhether a short or long delay should be used.

2.2 Multiple carry prediction for 32-bit ESTC

This paper shows how the concept of predicting carries canbe extended successively to less significant carry bits in theLS half of the adder to increase the probability of correctlypredicting the carry to the MS half. New control circuits arepresented, together with performance comparisons withother commonly used architectures in terms of additionspeed, area, and delay-area product.

2.2.1 Two carries for estimation: In this subsection thetheory will be extended using one further carry in the LSadder for estimation of the carry to the MS adder. Thefull truth table for the addition of all possible input combi-nations is given in Table 1. In Section 2.1, by comparing thestates of the true carry (C15

T ) and the estimated carry (C15E )

outputs for all possible input combinations, it was foundthat the probability of the estimated carry C15

E being equalto the true carry C15

T is 0.75 when the following logicalexpression is used for prediction

CE15 ¼ A15 ^ B15

However, if we also made available the predicted value ofthe carry from the 14th bit of the LS stage(C14

E ¼ A14 ^ B14), then a better estimate for the carry toFig. 1 32-bit estimated carry adder

Fig. 2 32-bit carry select adder

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005698

Page 3: Adder methodology and design using probabilistic multiple carry estimates

the MS adder can be expressed as

CE15ð2Þ ¼ CE

15 _ ðCE14 ^ ðA15 _ B15ÞÞ ð2Þ

where the ‘(2)’ in C15E (2) indicates that the carry estimate is

based on the most significant two stages of the LS adder.As a result, the probability of prediction being correct

increases from 75% to 87.5% of the additions, as shownby the following

PðCE15ð2Þ ¼ CT

15Þ

¼ PðA15 ¼ B15Þ þ ðPðA15 = B15Þ � PðCE14 ¼ CT

14ÞÞ

ð3Þ

where

PðCE14 ¼ CT

14Þ ¼ PðA14 ¼ B14Þ þ ðPðA14 = B14Þ

� PðCT13 ¼ 0ÞÞ

¼ 0:5þ ð0:5� 0:5Þ ¼ 0:75 ð4Þ

Hence, from equations (3) and (4),

PðCE15ð2Þ ¼ CT

15Þ ¼ 0:5þ ð0:5� 0:75Þ ¼ 0:875 ð5Þ

A new carry control circuit has been designed using thelogic combination given in (2) and shown in Fig. 3. Thisnon-minimised circuit amounts to three extra gates andthe enlargement of the OR gate, given in [31], from twoinputs to three inputs. It is clear from (5) the average timedelay for the adder will be reduced since for 87.5% ofaddition time the 32-bit adder will operate as two 16-bitadders in parallel mode.

The configuration of the carry control circuit, Fig. 3, hasbeen used to design a new completion circuit for the 32-bitESTC adder, as shown in Fig. 4. This circuit will generate

Table 1: Carry logic truth table for multiple carry prediction using bits A14, B14

C13T A14 B14 C14

T C14E A15 B15 C15

T C15E C15

E (2) Logic gate Performing mode

0 0 0 0 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

0 0 0 0 0 0 1 0 0 0 A15.B15 Two 16-bit in parallel

0 0 0 0 0 1 0 0 0 0 A15.B15 Two 16-bit in parallel

0 0 0 0 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

0 0 1 0 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

0 0 1 0 0 0 1 0 0 0 A15.B15 Two 16-bit in parallel

0 0 1 0 0 1 0 0 0 0 A15.B15 Two 16-bit in parallel

0 0 1 0 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

0 1 0 0 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

0 1 0 0 0 0 1 0 0 0 A15.B15 Two 16-bit in parallel

0 1 0 0 0 1 0 0 0 0 A15.B15 Two 16-bit in parallel

0 1 0 0 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

0 1 1 1 1 0 0 0 0 0 A15.B15 Two 16-bit in parallel

0 1 1 1 1 0 1 1 0 1 A14.B14 (A15þ B15) Two 16-bit in parallel

0 1 1 1 1 1 0 1 0 1 A14.B14 (A15þ B15) Two 16-bit in parallel

0 1 1 1 1 1 1 1 1 1 A15.B15 Two 16-bit in parallel

1 0 0 0 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

1 0 0 0 0 0 1 0 0 0 A15.B15 Two 16-bit in parallel

1 0 0 0 0 1 0 0 0 0 A15.B15 Two 16-bit in parallel

1 0 0 0 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

1 0 1 1 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

1 0 1 1 0 0 1 1 0 0 C15T is applied Full 32-bit CPA

1 0 1 1 0 1 0 1 0 0 C15T is applied Full 32-bit CPA

1 0 1 1 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

1 1 0 1 0 0 0 0 0 0 A15.B15 Two 16-bit in parallel

1 1 0 1 0 0 1 1 0 0 C15T is applied Full 32-bit CPA

1 1 0 1 0 1 0 1 0 0 C15T is applied Full 32-bit CPA

1 1 0 1 0 1 1 1 1 1 A15.B15 Two 16-bit in parallel

1 1 1 1 1 0 0 0 0 0 A15.B15 Two 16-bit in parallel

1 1 1 1 1 0 1 1 0 1 A14.B14 (A15þ B15) Two 16-bit in parallel

1 1 1 1 1 1 0 1 0 1 A14.B14 (A15þ B15) Two 16-bit in parallel

1 1 1 1 1 1 1 1 1 1 A15.B15 Two 16-bit in parallel

CnT: True carry, Cn

E ¼ An ^ Bn: Estimated carry based on most significant stage of LS adder only, C15E (2): Estimated carry out to MS adder

based on two most significant stages of LS adder

Fig. 3 Carry control circuit using two carries for prediction

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005 699

Page 4: Adder methodology and design using probabilistic multiple carry estimates

the correct carry out from the LS adder to the MS adder.When a pulse signal is applied to the start input, thestart pulse sets the RS flip-flop, which is then held in theset state until a signal is returned when the addition iscomplete. At this point the flip-flop is reset and a new setof data can be applied to the adder. The start pulse placesa high signal on the output of the second NOR gate whichfeeds the selection circuit (the NAND and AND gates).These two gates control the two delay lines; specificallythey determine which delay should be used for a specificset of inputs. When the carry is estimated correctly, thecarry out from the LS adder propagates to the MS adderusing a short delay. However, if the estimated carry isdifferent from the true carry, as detected by an XOR gatewhen the LS adder has finished, the circuit waits for thetrue carry to propagate through and the MS adder. In thiscase the adder operates and performs the addition in asingle 32-bit adder delay. Therefore, during the shortdelay the done signal is set after a delay comprisingD1, Mux., and OR gate. Whereas in the case of the longdelay the result is available after a delay of D1, Mux, D2

and OR gate.

2.2.2 Three carries for estimation: The concept ofusing two carries for estimation described in the previousSection can be readily extended to using three carries. Wedevelop this by replacing C14

E in (2) by an improvedestimate C14

E (2) which uses C13E as follows

CE15ð3Þ ¼ CE

15 _ ðCE14ð2Þ ^ ðA15 _ B15ÞÞ ð6Þ

where

CE14ð2Þ ¼ CE

14 _ ðCE13 ^ ðA14 _ B14ÞÞ; and

CEN ¼ AN ^ BN :

Using this logic the probability of the estimated and thetrue carry being equal can be obtained from the followingequations

PðCE15ð3Þ ¼ CT

15Þ ¼ PðA15 ¼ B15Þ þ PðA15 = B15Þ

� PðCE14ð2Þ ¼ CT

14Þ ð7Þ

where

PðCE14ð2Þ ¼ CT

14Þ ¼ PðA14 ¼ B14Þ þ ðPðA14 = B14Þ

� PðCE13 ¼ CT

13ÞÞ

¼ 0:5þ ð0:5� 0:75Þ ¼ 0:875 ð8Þ

Hence, from (7) and (8)

PðCE15ð3Þ ¼ CT

15Þ ¼ 0:5þ ð0:5� 0:875Þ ¼ 0:9375

As a result, the probability of prediction being correct hasincreased from 75% to 93.75% by using three carries forprediction and a new carry control circuit has been designedas shown in Fig. 5.

Fig. 5 Carry control circuit using three carries for prediction

Fig. 4 Schematic of the control circuit for a 32-bit ESTCA using two carries for prediction

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005700

Page 5: Adder methodology and design using probabilistic multiple carry estimates

2.2.3 Using more carries for estimation: The resultsobtained from the use of one, two, and three carries for pre-diction, show that the percentage of the incorrect estimationare 25%, 12.5%, and 6.25% respectively. This clearlyshows that as each further carry is introduced for prediction,the percentage of correct estimation increases by half of thepercentage of incorrect estimation of the previous structure.Consequently, the estimation theory can be extended furtherby using other bits, and each time the probability that theestimated carry will be equal to the true carry increasesby half of the incorrect estimation of the previousprediction.

3 Results of carry prediction for ESTC

Typically, the adder’s performance is characterised by itsarea and speed. Thus in this Section we focus on evaluat-ing the delay time and the hardware cost of the circuitsand compare them with other designs. Delay-area productis used as a performance measure as it is an importantmetric that highlights the trade-off between area andspeed. Simulations have been performed for different typesof 32-bit adders with different structures using PSPICEwith parameters from a 0.125 mm CMOS technology.

3.1 Theoretical results

The general formulae for calculation of the probability per-centage of correct and incorrect prediction for eachadditional pair of inputs can be given by

probability of correct prediction

¼

" 2n � ð2n=2=2Þ

2n

!#� 100 ð9Þ

probability of incorrect prediction

¼

"ð2n=2=2Þ

2n

#� 100 ð10Þ

where 2n is the number of possible input states, 2n/2 is thenumber of carry outputs dependent on the carry from theprevious bit and n is the number of input bits used forprediction.

The results of examining the probabilities of the correctprediction by including all the possible input bits of theLS adder for prediction are tabulated in Table 2 and pre-sented graphically in Fig. 6. According to the graphshown in Fig. 6, there is a significant increase in the prob-ability of prediction being correct when using up toaround five carries for prediction. Thereafter, the increaseis only small. In theory, by using all of the carries in theLS adder this approach could achieve 100% correct predic-tion, but the performance gains after around five carriesbecome impractical since the control circuitry becomesmore complex and more expensive in terms of silicon area.

3.2 Practical results

The simulation was performed for different types of 32-bitadder with different structures, and the results obtainedfor a 32-bit ESTC adder using a single carry for predictionare listed in Table 3.

The comparison of ESTC/CRA, using single carry forprediction with other adders shows that it offers a significantspeed advantage of 41% over the CRA/CRA for only 8.7%of increase in hardware area. In addition, it is only 7%

Fig. 6 Probability of correct prediction for 32-bit ESTC adder

Table 2: Probabilities of correct prediction usingmultiple carries for prediction

Carryforprediction

Inputbits forprediction

Probabilityof a correctprediction

Probabilityof incorrectprediction

C15 2 0.75000 0.25000

C14 4 0.87500 0.12500

C13 6 0.93750 0.06250

C12 8 0.96875 0.03125

C11 10 0.98438 0.01562

C10 12 0.99219 0.00781

C9 14 0.99609 0.00391

C8 16 0.99805 0.00195

C7 18 0.99902 0.00098

C6 20 0.99951 0.00049

C5 22 0.99976 0.00024

C4 24 0.99988 0.00012

C3 26 0.99994 0.00006

C2 28 0.99997 0.00003

C1 30 0.99998 0.00002

C0 32 0.99999 0.00001

Table 3: Comparison for 32-bit ESTC adder using asingle carry for prediction with relevant design

Adderstructure

Delay (ps) Transistorsno.

Delay–areaproduct

32-bit CLA/CLA 100 3425[32] 342500

32-bit CSA/CLA 121 2832 342672

32-bit CSA/CRA 584 2424 1415616

32-bit ESTC/CLA 166 1756 291496

32-bit CRA/CLA 203 1680 341040

32-bit ESTC/CRA 627 1530 959310

32-bit CRA/CRA 1064 1408 1498112

CLA/CLA: carry lookahead adder with carry lookahead elementsCSA/CLA: carry select adder with carry lookahead elementsCSA/CRA: carry select adder with carry ripple elementsESTC/CRA: estimated carry adder with carry ripple elementsESTC/CLA: estimated carry adder with carry lookaheadelementsCRA/CRA: carry ripple adder with carry ripple elementsCRA/CLA: carry ripple adder with carry lookahead elements

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005 701

Page 6: Adder methodology and design using probabilistic multiple carry estimates

slower than the CSA/CRA but it uses 37% less area. Theresults for adders based on CLA show that the CLA/CLAis still the fastest, but at the expense of increase in area[32]. The ESTC/CLA is 18% faster than the CRA/CLA,for an increase of only 4% in area, and is 27% slowerthan the CSA/CLA, but uses 38% less area. Although theCLA/CLA is the fastest adder it occupies 89% more areathan the ESTC/CLA. Comparing ESTC/CRA in terms ofdelay-area product with other adders shows a saving of36% over CRA/CRA and 32% over CSA/CRA. Theresults for ESTC/CLA give a saving of over 14% overthe CRA/CLA, CSA/CLA, and CLA/CLA designs.

The comparison is also made for ESTC using multiplecarries for prediction and the value of the delay, transistornumber, and delay-area product for all configurationshave been computed for ESTC and are listed in Table 4.The results are organised into two groups for the ESTC/CRA and ESTC/CLA adders. It can be seen clearly thatthe area requirements grow as more carries are used forprediction, but the delay is reduced. Since the speed advan-tage gained after C11 is not practical because it is limited bythe complexity, the comparisons are carried out for up tofive carries.

The results of performance comparisons for the multiplecarry ESTC with other relevant designs show that there areconsiderable performance advantages to be made when anew carry is introduced for prediction (Tables 3 and 4).For example using five carries for prediction, the ESTC/CRA adder offers speed advantage of 50.1% over theCRA/CRA for only 12.4% of increase in area, and 9.1%over the CSA/CRA and it uses 34.7% less area.Moreover, comparing the adders that are based on carrylookahead elements, the results show that the ESTC/CLAis 29% slower than CLA/CLA [32] but uses 47% lessarea, and CSA/CLA is 13.4% faster but uses 56.6% morearea than the ESTC. However, it is 31.1% faster than theCRA/CLA for a mere 7.6% increase area.

Also in terms of delay-area product, the ESTC using fourcarries for example shows that ESTC/CRA offers a savingof 44.2% over CRA/CRA and 41% over CSA/CRA, andthe ESTC/CLA gives a saving of greater than 26% overthe CLA/CLA, CSA/CLA, and RA/CLA designs. Theresulting graphs in Fig. 7 give a clear overview of thetrade-offs obtained.

4 Conclusion

The authors have presented a novel methodology to signifi-cantly improve the speed performance of a 32-bit ESTC

asynchronous adder. The methodology uses multiplecarries to predict a correct carry from the LS to the MShalf of a 32-bit adder. Theoretically, as each new carry isintroduced for prediction the performance gain in terms oftime improves by half over the time taken by the previous32-bit addition. Therefore, theoretically the performancecan increase to nearly 100% but the complexity and areaoverhead of the control circuits start to out weigh thespeed advantages.

Simulation results show that the 32-bit ESTC adder usingmultiple carries can dramatically achieve speed/areaadvantages over other adders.

Since the main aim of a designer is to optimise circuitperformance in terms of delay-area product two controlcircuits for 32-bit ESTC adder have been designed andpresented in this paper, which offer major improvementsin performance over the benchmark adders. For exampleusing a single carry for estimation the 32-bit ESTC/CLA

Table 4: Comparison for 32-bit ESTC adder usingmultiple carry for prediction

AdderStructure

Carries used forPrediction

Delay(ps)

Transistorsno.

Delay-areaproduct

32-bit Single carry 627 1530 959310

ESTC/CRA Two carries 571 1546 882550

Three carries 544 1556 845842

Four carries 533 1568 835509

Five carries 531 1582 839647

32-bit Single carry 166 1756 291496

ESTC/CLA Two carries 151 1772 267413

Three carries 144 1782 255824

Four carries 141 1794 252147

Five carries 140 1808 252758

Fig. 7 Delay, area, and delay–area product comparison for theconfigurations of ESTC

a Delay comparisonb Area comparisonc Delay–area product comparison

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005702

Page 7: Adder methodology and design using probabilistic multiple carry estimates

is more than 15% more efficient and for an ESTC/CLA thatuses four carries the efficiency rises to over 26%. Theoverall results demonstrate the ESTC adder is simple,fast and is suitable for implementation using VLSItechnology.

5 References

1 Sun, X.-G., Mao, Z.-G., and Lai, F.-C.: ‘A 64 bit parallel CMOS adderfor high performance processors’, Proc. IEEE Asia-Pacific Conf. onASIC, 2002, pp. 205–208

2 Sakamoto, M., Hamano, D., and Morisue, M.: ‘Design of a multiple-operand redundant binary adder’, Syst. Comput. Jpn., 2002, 33, (10),pp. 1–9

3 Fang, C.-J., Huang, C.-H., Wang, J.-S., and Yeh, C.-W.: ‘Fast andcompact dynamic ripple carry adder design’. Proc. IEEE Asia-Pacific Conf. on ASIC, 2002, pp. 25–28

4 Kinniment, D.J.: ‘An evaluation of asynchronous addition’, IEEETrans. Very Large Scale Integr. (VLSI) Syst., 1996, 4, (1),pp. 137–140

5 Kong, B.-S., Im, J.-D., Kim, Y.-C., Jang, S.-J., and Jun, Y.-H.: ‘CMOSdifferential logic family with self-timing and charge-recycling forhigh-speed and low-power VLSI’, IEE Proc., Circuits Devices Syst.,2003, 150, (1), pp. 45–50

6 Sklavos, N., and Koufopavlou, O.: ‘Asynchronous low power VLSIimplementation of the international data encryption algorithm’. 8thIEEE Int. Conf. on Electronics, Circuits and Systems, 2001, 3,pp. 1425–1428

7 Singh, M., and Nowick, S.M.: ‘Fine-grain pipelined asynchronousadders for high-speed DSP applications’. Proc. IEEE ComputerSociety Workshop on VLSI, System Design for a System-on-ChipEra, 2000, pp. 111–118

8 Beerel, P.A.: ‘Asynchronous circuits: an increasingly practical designsolution’. Proc. IEEE Computer Society Int. Symp. on QualityElectronic Design, 2002, pp. 367–372

9 Brunvand, E., Nowick, S., and Yun, K.: ‘Practical advances inasynchronous design and in asynchronous/synchronous interfaces’.Proc. IEEE Design Automation Conf., 1999, pp. 104–109

10 Allier, E., Fesquet, L., Renaudin, M., and Sicard, G.: ‘Low-powerasynchronous A/D conversion’. Proc. Integrated Circuit Design.Power and Timing Modeling, Optimization and Simulation. 12thInt. Workshop, PATMOS, 2002, pp. 81–91

11 Siu, P.-L., Choy, C.-S., Butas, J., and Chan, C.F.: ‘A low powerasynchronous DES’. Proc. IEEE Int. Symp. on Circuits andSystems, 2001, 4, pp. 538–541

12 Hauck, S.: ‘Asynchronous design methodologies an overview’, Proc.IEEE, 1995, 83, (1), pp. 69–93

13 Johnson, D., and Akella, V.: ‘Design and analysis of asynchronousadders’, IEE Proc., Comput. Digit. Tech., 1998, 145, (1), pp. 1–7

14 Davis, A., and Nowick, S.M.: ‘Introduction to asynchronous circuitdesign’ (University of Utah Technical Report, Department ofComputer Science, UUCS-97-013, Sept 1997)

15 Chou, W., Beerel, P., Ginosar, R., Kol, R., Myers, C., Rotem, S.,Stevens, K., and Yun, K.: ‘Average-case optimized technologymapping of one-hot domino circuit’. Proc. Int. Symp. on AdvancedResearch in Asynchronous Circuits and Systems, 1998, pp. 80–91

16 Donaghy, D., Brackenbury, L., and Hall, S.: ‘Combining SOItechnology and asynchronous design for power reduction’. Silicon-on-Insulator Technology and Devices X, Proc. Tenth Int. Symp.,(Electrochem. Soc., 2001), pp. 337–342

17 Nowick, S.M.: ‘Design of a Low-latency a synchronous adder usingspeculative completion’, IEE Proc., Comput. Digit. Tech., 1996,143, (5), pp. 301–307

18 Cheng, F.-C., Unger, S.H., and Theobald, M.: ‘Self-timed carry-lookahead adders’, IEEE Trans. Comput., 2000, 49, (7), pp. 659–672

19 Andreev, B.D., Titlebaum, E., and Friedman, E.: ‘Taperedtransmission gate chains for improved carry propagation’. IEEEProc. 45th Midwest Symp. on Circuits and Systems, 2002, III,pp. 449–452

20 Han, K.-N., Han, S.-W., and Yoon, E.: ‘A new adder scheme withreduced P, G signal generations using redundant binary numbersystem’. Proc. IEEE Int. Symp. on Circuits and Systems, 2000, 5,pp. 633–636

21 Weinberger, A., and Smith, J.L.: ‘A logic for high-speed addition’,National Bureau of Standards, Circular 591, 1958, pp. 3–12

22 Hwang, K.: ‘Computer arithmetic: principles, architecture, anddesign’ (John Wiley & Sons, New York, 1979)

23 Shoji, M.: ‘CMOS digital circuit technology’ (Englewood Cliffs, NewJersey, Prentice-Hall, 1988)

24 Franklin, M.A., and Pan, T.: ‘Performance comparison ofasynchronous adders’. Proc. IEEE Int. Symp. on AdvancedResearch in Asynchronous Circuits and Systems, ASYNC-94, 1994,pp. 117–125

25 Nagendra, C., Irwin, M., and Owens, R.: ‘Area-time-power trade-offsin parallel adders’, IEEE Trans. Circuits Syst. II, Analog Digit. SignalProcess., 1996, 43, (10), pp. 689–702

26 Bedrij, O.J.: ‘Carry select adder’, IRE Trans. Electron. Comput., 1962,EC11, pp. 340–346

27 Parhami, B.: ‘Computer arithmetic: algorithms and hardware designs’(Oxford University Press, Oxford, UK, 2000)

28 Chang, T.Y., and Hsiao, M.J.: ‘Carry select adder using single ripplecarry adder’, Electron. Lett., 1998, 34, pp. 2101–2103

29 Hashemian, R.: ‘A new design for high speed and high-density carryselect adders’. Proc. 43rd IEEE Midwest Symp. on Circuits andSystems, 2000, 3, pp. 1300–1303

30 Liao, M.-J., Su, C.-F., Chang, C.-Y., and Wu, A.C.-H.: ‘A carry-select-adder optimization technique for high-performance Booth-encoded Wallace-tree multipliers’, IEEE Int. Symp. on Circuits andSystems, 2002, 1, I-81–84

31 Wallace, W.F., Dlay, S., and Hinton, O.: ‘Probabilistic carry estimatefor improved asynchronous adder performance’, IEE Proc., Comput.Digit. Tech., 2001, 148, (6), pp. 221–226

32 Wang, C.C., Tseng, Y.L., Lee, P.M., Lee, R.C., and Huang, C.J.: ‘A1.25 GHz 32-bit tree-structured carry lookahead adder usingmodified ANT logic’, IEEE Trans. Circuits Syst. I, Fundam. TheoryAppl., 2003, 50, (9), pp. 1208–1216

IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 6, November 2005 703