Design and Analysis of CMOS Full Adders for Low Power and...

Design and Analysis of CMOS Full Adders for Low Power and LowFrequency of Operation for Scavenged-Power Wireless Sensor

Networks

Jerry Lam100323125

December 18, 2007

Abstract

While many VLSI applications require or benefit greatly from low power consumptions, scavenged powerwireless sensor networks have far more stringent power requirements, often in the sub-uW range. At thesepower levels, static and leakage power consumption can form significant amounts of the total power con-sumption of digital circuits, so traditional design methodologies must be modified somewhat to take this intoaccount. Thus, the logic styles, voltage levels and transistor sizes used must be optimized for the expectedfrequency of operation and circuit complexity.

The results shown in this paper demonstrate that leakage power can dominate over switching power atcertain regions of operation, but that both forms of power consumption can be reduced by using low supplyvoltages. Despite the increased delays that this may provide, good design techniques, including the properselection of logic families and transistor sizes, can allow sensor node controller circuitry to operate at theselevels. Finally, a design flow is suggested for the design of such circuits.

1 Introduction

As VLSI circuits become increasingly more complex, the number of transistors in typical designs increases rapidly,greatly increasing power consumption. Although newer processes can o!set this slightly with smaller and fastertransistors, the optimization of power is increasingly becoming a vital design characteristic. This is particularlytrue in wireless applications, where the available power sources limit the amount of power and energy availablefor the circuit to consume.

In some applications, traditional power sources such as batteries or power leads cannot be used. This istypical of many biomedical wireless sensor networks, particularly in the case of implanted sensors, where suchsources of power would greatly increase the size of the devices to impractical levels, or would interfere with theoperation of the sensor itself.

The solution to this problem is to use scavenge power from the environment, although this puts severeconstraints on the device performance. For instance, [1, 2] describe a variety of scavenged energy techniques, ofwhich typical power supply levels range from sub-uW to several mW (see [3] for an example).

RF transceivers have been shown to be able to operate in this power range (see [4] for instance), but only onan average basis. This forces the transceivers to only operate for short periods of time in order to transmit andreceive data, while operating in a low power sleep mode for the majority of the time. Thus, in even a simplesensor network with 2 elements, the two nodes must be synchronized so that they may be switched on and o!together. Many schemes for obtaining time synchronization of wireless sensor networks have been proposed (forinstance, see [5, 6]), but these algorithms are su"ciently complex to require implementation using digital logic.In order to allow the sensor node to scavenge su"cient power for the next radio transmission/reception, thelogic must consume power at far lower levels than that being scavenged, thus forcing total power consumptionlevels to fall below the uW range. However, since the logic used to implement such algorithms tends to be verysimple, requiring only basic calculations based on time of arrivals of signals, and the decoding of short messages.Thus, the logic can operate at low frequencies (under 1 MHz), thus allowing the use of very low supply voltages,without too much e!ect from the resulting increased delay, as discussed in [7].

1

This paper looks at the design and analysis of a CMOS full adder under these design constraints. Since fulladders are used in a variety of common digital circuits, such as DSP blocks and counters, and because they arefairly complex combinational logic circuits, full adders form a good basis of demonstrating the important designconcerns that must be taken into account when designing digital circuits for use in wireless sensor networks.Since the exact task of the circuits being demonstrated in this paper is not specified, a general purpose staticCMOS full adder is designed, although discussion is made as to when the design choices made are feasible andwhen other alternatives form better design choices.

2 Background

2.1 Sources of power dissipation

The power consumption of digital logic circuits can be divided into 4 main categories: static power, leakagepower, dynamic power and short circuit power [8]. Static power comes from bias current that naturally flowsthrough the logic circuits, even when they are not switching, due to the imcomplete turn-o! of transistors. Thisis found in logic families such as pseudo-NMOS logic or static di!erential split-level logic (SDSL) [9]. Leakagepower comes from the sub-threshold power dissipation. The sub-threshold current is an exponential functions ofthe drain, gate and source voltages, given by [7] as

Ids = KeVgs!Vt

nVT

!1! e!

VdsVT

"(1)

where K is a function of W/L. The power consumption from static power and leakage power can be calculatedby multiplying the current by the supply voltage.

Dynamic power and leakage power both arise when the gate switches states, and is thus related to theswitching frequency of the input. It is described in [8] that dynamic power, which arises from the current neededto charge or discharge the capacitive load of the next stage, is given by

Pdynamic = KCfV 2DD (2)

where K represents the “activity factor” corresponding to how frequently the gate changes state with respectto the main clock frequency f . Likewise, short circuit power, which arises from the temporary short circuitformed when all transistors in a conduction path are turned on in the middle of a transition, can be given by

Pshortcircuit = !f (VDD ! 2VT )3 (3)

with ! being a proportionality constant. It should be noted that operating at a supply voltage less than 2VT

will eliminate the short circuit power, as it will be impossible to turn on all transistors in a conduction path.In conventional digital circuits, dynamic and short circuit form the dominant sources of power dissipation

[8]. In this paper, both forms will be grouped together under the term “switching power”, as it is di"cult (andimpractical) to separate the two when simulating a circuit.

2.2 Methods of reducing power consumption

As the reduction of power consumption is increasingly becoming a key goal in design of digital circuits, muchliterature exists which is devoted to the techniques available for reducing power consumption [7, 9]. Theseinclude system level optimizations, such as shutting o! circuits that are not in use or minimizing the complexityof algorithm being implemented, or by lower level design choices, such as the method implementation of thealgorithm. Commonly discussed topics concerning the latter include discussions of the style of logic implemented,the e!ects of reduced supply voltages, and the sizing of transistors.

This paper will examine the e!ects that these have by examining the e!ects that they have on a full adderfor use in a wireless sensor network.

2

3 Simulations and Results

3.1 Overview of constraints and operating regions

As discussed previously, the 4 main types of power consumption are all proportional to the supply voltage,or a power of the supply voltage. Thus, decreasing the supply voltage levels will clearly decrease the powerconsumption. However, such decreases come at a cost of increased delays [7], as the decreased supply voltageslower the current able to drive additional loads, thus taking more time, and limiting the maximum frequency ofoperation.

However, as discussed previously, the logic required for many time synchronization schemes is not too compli-cated, and may be able to operate at lower frequencies, ranging from multiple kHz to MHz or higher, dependingon the complexity of the algorithm involved and the time resolution required. Thus, the increased delay thatarises from low supply voltages. In addition, because the circuit does not require very fast rise or fall times,the driving power of the adder can be made very weak, thus allowing the use of minimum size transistors. Thiswill also decrease the dynamic power, as it is proportional to the area of the gate area of the next stage, and istypically the dominant source of power consumption for conventional circuits.

The following section will look at the important design principles to be followed when designing an adder forthese conditions. The design procedure will be carried through on a conventional static CMOS full adder withminimum size transistors, although discussion will be made as to when this is an appropriate decision, and whenit is not.

3.2 Note on accuracy of results

All of the results demonstrated in this paper are obtained from simulations using the Spectre simulator usingthe IBM 0.13 um RF kit, using standard transistors with a maximum voltage of 1.2 V and a threshold voltageof 0.3 V. As many of the simulations involve transistors in the sub-threshold region of operation, the accuracy ofthese results are dependent on the accuracy of the models in this region. Since this region is typically used farless often then the saturation or triode regions, the accuracy of the models in this region may not be completelyaccurate. This was observed when a di!erent version of the models was used— the results obtained subsequentlydi!ered by a large degree, which could be attributable to subtle changes in the models used. Thus, to ensurecomplete accuracy, the circuits discussed should be manufactured and tested. This was not done due to timeand resource limitations.

3.3 Comparisons of Logic Families

An algorithm for a logic circuit is just a behavioral description of the circuit— it does not specify how thecircuit should be constructed to implement the function, or even how to represent di!erent binary values. Giventhe many di!erent issues that digital logic circuits face, a vast number of logic styles have emerged, includingconventional static CMOS, pass logic families, di!erential logic families, or dynamic logic families [9], as well ascustomized designs [10, 11, 12] and other logic design methodologies, such as subthreshold logic [13], or MOS-Current Mode Logic [14]. Due to time and space constraints, not every logic style and adder design has beenevaluated but rather a selection of a few designs. Discussion is made on the important characteristics of eachfamily, and how they perform under low voltage and low frequency of operation, thus showing the importantissues that must be dealt with when choosing a logic family. As comparisons of the logic families varying fromslightly above VT to the maximum VDD, subthreshold logic is not discussed, although it can provide the ultra-lowpower needed in a sensor network [13].

3.3.1 Conventional Static CMOS

Conventional static CMOS is one of the most common logic styles used, due to its relative simplicity, lack ofstatic power dissipation, and decent performance. Conventional static CMOS logic is derived by deriving thelogic for a pull-down NMOS stage and its complement for the pull-up PMOS stage.

The truth table of a full adder is seen in Table 1.From this truth table, the following equations can be derived for the Sum and Carry outputs.

3

Table 1: Truth table of a full adderA B C Sum Carry0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1

Figure 1: Schematic of first possible configuration of conventional static CMOS full adder

Carry = A (B + C) + BC (4)Carry = A

#B + C

$+ B C (5)

Sum = A#B C + BC

$+ A

#BC + BC

$(6)

Sum = A#B C + BC

$+ A

#BC + BC

$(7)

Two possible implementations of the circuit exist: one seen in Figure 1 and one seen in Figure 2. The primarydi!erence between the two lies in the position of the branching. Because of the reduced parasitics on this nodein the second implementation, it is preferred. The power consumption of each over a range of frequencies can beseen in Figure 3, where the second implementation has 20% lower switching power. The static power for bothremains the same. It should be noted the results obtained in this graph were obtained using an older set ofmodels, and so the precise values cannot be compared to those obtained in later sections of this paper. However,the relative behavior of the circuits should not be change too much from one set of models to the next.

3.3.2 Complementary Pass Logic

Complementary Pass Logic or CPL, is a logic family which uses NMOS transistors as switches to pass certainsignals, based on the values of other signals [9]. As it only uses NMOS transistors, a CPL device has a very weakpull-up as can be seen in Figure 4, where the supply voltage is at 1.2 V. Although the expected maximum voltageis VDD ! VT , this is based on the assumption of no current flow when the transistors are in the sub-thresholdregion of operation. Since the output loads are small, even the small sub-threhold current can continue to charge

4

Figure 2: Schematic of second possible configuration of conventional static CMOS full adder

Plot of average switching power vs frequency of operation

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07

Frequency (Hz)

Po

wer (

W)

Configuration 1

Configuration 2

Figure 3: Comparisons of the switching power consumption of the two possible configurations of conventionalstatic CMOS full adder with ideal inputs and no load

5

Sum and Carry waveforms of a CPL adder

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.000005 0.00001 0.000015 0.00002 0.000025 0.00003

Time (s)

Vo

ltag

e l

evel

(V

)

Sum Inverse

Carry Inverse

Figure 4: Plot of inverted sum and carry waveforms, showing weak pull-up outputs

the next stage, increasing the voltage drop from the expected value, assuming su"cient time is given for thecharging to occur. Thus, the circuit is functional, even with a supply voltage as low as 0.4 V.

The weak pull-up output can pose a problem in circuits requiring a large noise margin, as the weakenedlogic 1 level become more susceptible to additive noise. In addition, the driving strength of the transistor isalso weakened, since the pass transistors reduce the amount of current that is available to charge the next stage.In addition, if the voltage drop is too large, than the gate may hold the next stage in a state between the twologic levels, resulting in a short circuit current flow. This can be partially remedied by using a bu!er can beused to increase its maximum output value, at the expense of additional transistors and thus, additional leakagecurrent. The bu!er can also be sized properly to reduce or eliminate the short circuit current resulting fromweak outputs. Figure 5 shows the schematic of the CPL adder used, based on [9].

3.3.3 Domino Logic

A common issue with static logic is that output glitches can consume a lot of power, as the output can switchseveral times for a single set of input changes. To remedy this, dynamic logic families were created to guarantee amaximum number of output transitions per input change [9]. One example of this is found with dual rail dominologic. The logic features an NMOS stage, similar to that employed in the pull-down stage of a conventional staticCMOS logic device. Surrounding this are two transistors connected to a main clock. When the clock is low, thetop transistors is active, pre-charging the load. When the clock is high, the logic is allowed to pull-down theoutput, if the inputs are at the correct value. Thus, each clock cycle has no more than 2 transitions per clockcycle, although there is always 1 transition to charge the output.

An example of a Domino adder can be seen in Figure 6. Simulations were run and showed that the adderwas non-functional at lower frequencies (below 10 MHz). This is because the long data periods were su"cientto allow the output to discharge, even if the pull down stage is inactive (see Figure 7). While this could beremedied by a larger output capacitance (in the form of wider transistors), this would cause the dynamic powerto increase beyond the benefits caused by the glitch reduction.

3.3.4 Customized Logic

The final adder tested is that described in [12], which will be referred to as the 10T adder. Since it was observedthat the two outputs of the adder can share some logic, the number of transistors that are needed to implementan adder can be reduced to 10. However, in simulations, it was observed that the resulted in weak pull-up andpull-down outputs (see Figure 9), and so the circuit was bu!ered. The schematic of the circuit tested can beseen in Figure 8.

Most notable about this circuit is the fact that combines elements seen in conventional static CMOS and passlogic families. The circuit only needs 1 version of each input (either its true value or its complement), unlike

6

Figure 5: Schematic of complementary pass logic CMOS full adder

Figure 6: Schematic of Domino adder

7

Figure 7: Output of the Domino adder, showing the carry output (top waveform) reverting back to a logic 1after about 5 us after the clock transition (lower waveform)

the previous designs, which required both the value and its complement. This greatly reduces susceptibility totiming mismatches, which could cause glitches, and reduces the load on the previous stage.

3.4 Unloaded Performance

To compare the performance of full adders, they were first tested with no loads and ideal input pulses (witha 1 ns rise and fall time). The inputs consisted of a series of 3 clocks, each with a doubled frequency of thelast. This ensures that all possible input combinations are tested. In each case, measurements are done withthe transistor sizes being being kept minimum size. Some of the e!ects of optimization of transistor sizes arediscussed in a later section. In general, the benefits of transistor resizing do not tend to span many orders ofmagnitude, so the relative performance of each adder should be still be similar. This precludes the need tooptimize each transistor separately for each test case being used (with each scenario requiring a di!erent set ofoptimizations).

The important measurements observed include the leakage power (the power drawn by the adder when noswitching is taking place), the switching power (the average power needed to change the state of the device),as well as the rise/fall time and the maximum power. The latter measurement may be important in someapplications, as the power supply must be able to supply this peak power, if the operation of the device is to bemaintained. Depending on the power scavenging mechanism and storage mechanism used, sharp spikes in powerconsumption may pose a problem. Note that the power-delay product, usually used as a method of evaluatingthe performance of a circuit, is ignored, since in general only the power dissipation being of concern, with longdelays being tolerable, to a certain extent.

Figure 10 shows the leakage power of the adders at di!erent supply voltages. As can be seen, the conventionalstatic CMOS adder performs the worst, with high power consumption. The CPL adder and the 10T havemuch lower leakage power, although they decrease at a di!erent rate. The 10T has a significantly lower powerconsumption, although if properly optimized, they may be made more comparable.

Figure 11 shows the rise/fall time of the adders at di!erent supply voltages. Since each of the outputs has adi!erent rise/fall time associated with it, only the worst one, which limits the performance of the entire circuit,is shown. As can be seen, the conventional static CMOS adder performs the worst, with very slow rise times,particularly at voltages. Both the CPL adder and the 10T have much higher rise/fall times, being able to operatein the MHz range even at low supply voltages. The exact value is of course subject to change with the additionof loads. Not shown here is the input to output delay, although it was observed that the delay and the rise/falltime track each other fairly consistently. Here, it can be seen that the CPL adder can operate at higher speeds

8

Figure 8: Schematic of bu!ered 10T adder

Figure 9: Plot of sum and carry waveforms for the unbu!ered 10T adder showing weak pull-up and pull-downoutputs

9

Leakage Power vs. Supply Voltage

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

0.3 0.5 0.7 0.9 1.1 1.3 1.5

Supply Voltage (V)

Sta

tic P

ow

er (

W)

Conv. Static CMOS

CPL

10T

Figure 10: Plot of leakage power of the unloaded full adders

Maximum rise/fall time vs. Supply Voltage

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

0.3 0.5 0.7 0.9 1.1 1.3 1.5

Supply Voltage (V)

Maxim

um

ris

e/

fall t

ime (

s)

Compl. Static CMOS

CPL

10T

Figure 11: Plot of maximum rise and fall times of each adder.

than the 10T, due to the fact that the input signal passes through fewer pass stages in the CPL adder. Bothadders could be made faster if the need to bu!er the outputs was removed.

Figure 11 shows the maximum power consumption of the adders at di!erent supply voltages. As can be seen,the conventional static CMOS adder performs the worst, with peak powers many orders of magnitude greaterthan the rest. The CPL adder and the 10T adder have very similar results.

The switching power consumption over a range of frequencies was also observed, and can be seen in Figures13, 14, 15, and 16. As can be seen, the static CMOS adder consistently consumes 100 to 1000 times more powerthan the other two adders. The CPL and the 10T perform very similarly, with each adder outperforming theother in di!erent circumstances.

As these results show, the optimal design for an adders will very depending on the operating condition (supplyvoltage, input frequency, etc.) and on which constraints are the most important.

3.5 Loaded Performance

To simulate the adders under a more realistic condition, the adders were tested with a load. This was doneby constructing a 3-bit ripple carry adder, with inverters feeding bu!ering the input signals, and with invertersat the output to simulate loads. The inputs signals were random binary signals, with random clock jitter andrise/fall time variations to model inexact signal arrivals. The result of this is to add glitches, which will e!ectthe performance of the adder, since the output may change several times per input.

The results of the simulations can be seen in Figures 17, 18, and 19. As can be seen, the relative performance

10

Peak Power vs. Supply Voltage

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

1.00E-01

1.00E+00

0.3 0.5 0.7 0.9 1.1 1.3 1.5

Supply Voltage (V)

Peak P

ow

er (

W)

Compl. Static CMOS

CPL

10T

Figure 12: Plot maximum power of each adder

Switching Power vs. Input Frequency at 1.2 V

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1000 10000 100000 1000000

Input frequency (Hz)

Po

wer (

W)

Conv. Static CMOS

Compl. Pass Logic

10T

Figure 13: Plot of switching power for each adder at a supply voltage of 1.2 V


1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1000 10000 100000 1000000


Po

wer (

W)

Conv. Static CMOS

Compl. Pass Logic

10T


11


1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1000 10000 100000 1000000


Po

wer (

W)

Conv. Static CMOS

Compl. Pass Logic

10T



1.00E-13

1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1000 10000 100000 1000000


Po

wer (

W)

Conv. Static CMOS

Compl. Pass Logic

10T


12

Power vs. Supply Voltage for Static CMOS Ripple Carry Adder

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

Supply Voltage (V)

Po

wer (

W)

Static Power

Switching Power at 10 kHz


Max Power

Figure 17: Plot of results of a static CMOS adder using the loaded simulation at 50 kHz

Power vs. Supply Voltage for CPL Ripple Carry Adder

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

Supply Voltage (V)

Po

wer (

W)

Static Power



Max Power

Figure 18: Plot of results of a CPL adder using the loaded simulation at 50 kHz

in the loaded simulation mirrors that of the unloaded simulations, as the conventional static CMOS adderperforms much more worse than either the CPL or the 10T adder, which have similar performances. It shouldbe noted that the exact specifics change, depending on the scenario. For instance, the leakage power dominatespower consumption in certain cases (particularly at low voltages and at low frequecies), while switching powerdominates in others. Thus, when optimizing a circuit for power consumption, the operating conditions need tobe taken into account to determine what to minimize.

3.6 E!ect of changing transistor sizes

As can be seen in Table 2, the e!ect of changing the transistor sizes can change the leakage and the switchingpower. These results were obtained for the loaded case when As transistors are made longer, the leakage currentdecreases proportionally, but the switching power increases proportionally. It can be shown using elementarycalculus that this is the case, the total power is minimized when the power from both the switching and leakageare equal, as if

Ptotal = K1L +K2

L(8)

then

dPtotal

dL= K1 !

K2

L2(9)

13

Power vs. Supply Voltage for 10T Ripple Carry Adder

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

Supply Voltage (V)

Po

wer (

W)

Static Power



Max Power

Figure 19: Plot of results of a 10T adder using the loaded simulation at 50 kHz

Table 2: E!ect of transistor sizes (in um) on power consumption (in W) for a static CMOS adderPMOS NMOS

Width Length Width Length Leakage Power Switching Power Total Power Max Power160 120 160 120 6.35E-07 9.83E-07 1.62E-06 3.40E-04240 120 240 120 9.31E-07 1.19E-06 2.12E-06 4.88E-04240 180 240 180 7.80E-07 1.59E-06 2.37E-06 4.80E-04160 180 160 180 5.45E-07 1.07E-06 1.61E-06 3.08E-04160 120 160 180 6.19E-07 9.36E-07 1.56E-06 3.06E-04160 180 160 120 5.39E-07 1.15E-06 1.69E-06 3.90E-04160 200 160 200 5.53E-07 1.11E-06 1.67E-06 3.19E-04

which is equal to 0 (thus minimizing Ptotal)when L =%

K2/K1, thus making K1L and K2L equal.

From the results shown previously, this occurs with an increased transistor length of about 50%. As can beseen in the results, an increase in length by this amount reduces the total power consumption by 4%. Slightlymore power can be saved by changing only the length of the PMOS. Thus, optimization using simulation resultsmay be required to yield the best power. Making the transistors wider does not reduce overall power at all, sinceit increases both leakage and switching power. Wider transistors do result in lower delays, however, and so maybe necessary in circuits with a low supply voltage where a few critical circuits need to operate at lower delays.

The length of the transistors also e!ects the performance of the CPL adder. As can be seen in Figure 20,increasing the length can improve the voltage drop of the pull-up stage slightly. The exact reason for this wasnot ascertained, it is suspected that the slight changes to the threshold voltage caused by transistor size changesmay be responsible. It should also be noted that the increase in size lowers the rise and fall time of the adderslightly.

The e!ect of transistor sizes on power consumption can be seen in Figure 21. The results of this simulationwere run using the loaded case at 50 kHz. Here, it can seen that overall power consumption can be improvedby making the transistors longer, as it results in a much lower leakage power. After a certain size, the switchingpower begins to dominate and the power increases slightly for increases in transistor sizes. As can be seen in thefigure, the optimal transistor size varies depending on the operating conditions, with a length of 0.250 um beingthe best with a supply voltage at 0.4 V, but with a length of 1 um being the the best with a supply voltage of1.2 V.

3.7 Layout and Extracted Simulations

The basic static CMOS fuller was given a layout, using a style similar to that of a standard cell, albeit withtransistors stacked on top of each other. This version used minimum size transistors to determine the minimizethe area as much as possible, taking up an area 8 um long and 9 um tall, yielding a transistor density of 472000

14

Inverse sum waveforms with different CPL transistor lengths

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.000005 0.00001 0.000015 0.00002 0.000025 0.00003 0.000035

Time (s)

Vo

ltag

e (

V)

L=480 um

L=120 um

L=1500 um

Figure 20: Plot of CPL adder output waveforms with varied lengths showing di!erent maximum voltages

Total power at 50 kHz vs transistor length with loaded

CPL adder

1.0E-09

1.0E-08

1.0E-07

1.0E-06

100 1000 10000

Transistor length (um)

Po

wer (

W)

1.2 V

0.9 V

0.6 V

0.4 V

Figure 21: Plot of CPL adder output waveforms with varied lengths showing di!erent total power levels

15

Figure 22: Layout of static CMOS adder with minimum size transistors

transistors per square millimeter. As a result making the design compact, the bottom 3 metal layers were neededfor routing, making it more di"cult to use this cell in a standard cell design, which normally uses the secondand third layers for routing between cells. The layout can be seen in Figure 22.

The design, after passing DRC and LVS tests, was extracted and simulated. The simulation results can beseen in Figure 23. When compared to the results in Figure 17, it can be seen that the extracted results areseveral orders of magnitude lower than that of the schematic level simulations. No physical explanation can begiven, as the added parasitics should yield a slightly higher power consumption. The most likely explanationmay be due to the models used in simulating the extracted view transistors may not be completely accurate, asthe results shown here are of the same magnitude of those seen in Figure 3.

3.8 Monte Carlo Simulations

To determine the e!ects that random process variations have on power consumption, the basic conventionalstatic CMOS adder was tested under loaded conditions, with a supply voltage of 1.2 V and an input frequencyof 50 kHz. Under these conditions, both static and dynamic power play a key role. The results can be seen inFigure 24. While these show only the variations in current, they are proportional to the power by a factor of1.2 V.

As can be seen, the total average current draw, which has a nominal value 1.43 uA and a standard deviationof 0.27 uA, can vary by more than 30% of its nominal value, with an almost uniform distribution around the

16

Power vs. Supply Voltage for Extracted Static CMOS

Ripple Carry Adder

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

Supply Voltage (V)

Po

wer (

W)

Static Power


Max Power

Figure 23: Plot of results of extracted static CMOS adder using the loaded simulation at 50 kHz

median, forcing the designer to give the entire circuit a large margin for power consumption, if high yield is tobe obtained. Similar results are observed for the leakage current draw, with a nominal value of 598 uA and adeviation of 230 uA, the maximum current, with a nominal value of 1420 uA and a deviation of 300 uA, andthe deviation of the static current from state to state within the same run, with a nominal value of 62 uA anda deviation of 25 uA.

4 Conclusion and Discussion

As observed in the results shown in this paper, many important characteristics that need to be considered inthe design of digital circuits for low power and low frequency of operation. Most notable are the need to chosean a supply voltage based on the frequency of operation, and the need to optimize for both leakage power andswitching power. Since the conventional assumption that switching power always dominates power consumptionno longer holds, all design issues, such as logic families and transistor sizes, must be taken into account to reduceboth forms of power consumption. This must be done in consideration with all other issues, including outputcapacitance, rise/fall time, area, etc. Thus the design procedure becomes more complicated, and may requirethe use of automated tools for good optimization, although modifications to existing tools may be necessary, astypical design flows may underestimate leakage power.

4.1 Summary of observations

The comparisons between logic families show that the simple conventional static CMOS style provides a highpower consumption, even at low voltage levels, and has very high delays at lower voltage levels. The latter maybe remedied slightly by using wider transistors, but this increases power dissipation. A far better method is toused di!erent logic families, although the creation of customized circuits may have better performance. It shouldbe noted that dynamic logic families are unable to operate at the low frequencies, and so may not be appropriatefor a low power design. Since the performance of each circuit is dependent on its operating conditions (supplyvoltage, input frequency, output load), it is impossible to choose a best logic family. It must be chosen based onhow the circuit is to be used.

As is evident from the results shown in this paper, the decision to focus on a conventional static CMOS adderwas not appropriate for the scenarios discussed. However, in circumstances where simplicity of design, or strongoutputs are important, static CMOS may become a better design decision. In addition, certain circuits, such asNAND/NOR gates are most e!ectively implemented using conventional static CMOS, rather than other families[9]. Thus, the exact nature of the circuit to be constructed is important to take into account when selecting alogic family.

The same can be said of the transistor sizes. Although low driving strength can be tolerated, the amount oftolerance for high rise/fall times is dependent on the circuit’s clock frequency, and the fan-out of the circuit. Inaddition, longer length transistors may be needed to reduce leakage power. Thus, there is no “best” transistor

17

Figure 24: Plot of results from Monte Carlo simulations, showing average supply current (top left), maximumcurrent (top right), average leakage current (bottom left), and standard deviation of the static current in a singlerun (bottom right)

18

size for low voltage and low frequency of operation, it depends on how the device is to be used.Also evident from the results obtained, the need to keep leakage power consumption low is shown by the

average power consumption of a simple 3-bit adder. As the average power of a static CMOS adder was in the lowuW region, and a typical scavenged power voltage supply is only capable of producing power in the uW region[1], the circuit must be kept as simple as possible. The complexity constraint can be extended significantlyby using low voltage logic and by switching to more e"cient designs, but the power constraint will alwaysimpose significant design constraints such that all power dissipation types, both leakage and switching, must beconsidered.

4.2 Proposed design flow for design of sensor node controller circuitry

Based on the results obtained, the following design flow is proposed for the design of logic for scavenged powerwireless sensor nodes.

The design algorithm must be chosen based on what purpose the node will serve. Thus, the required timeresolution and accuracy, the number of nodes to by synchronized, as well as the synchronization algorithm willneed to be chosen. Choosing tighter constraints will force the logic to become more complex, and may precludethe use of very low supply voltages, but may be required to reduce the power draw of the RF transmitters, ormay be required due to the nature of the sensor node. Since these are typically more important in the design ofa wireless sensor network, these issues must be given a higher priority than the design of the implementing logic.

The logic should be implemented in an HDL, and then synthesized using a fairly complete standard celllibrary. The purpose of this is to break down the implementing logic into basic cells such as adders or multiplexersor simple gates. The purpose of this is to determine the circuits which draw the most power, either through ahigher circuit complexity, or a high frequency of operation. These areas will become the focus of power reductionthrough custom optimization. Thus, it is important that the library used should be contain complex modules,as if the design were to be implemented using a simple library of NAND gates and inverters, power optimizationwould become very di"cult. Ideally, the cell library used should be optimized for low power and low frequencyof operation, and its blocks should use logic styles that are predicted to be optimal.

A main clock frequency, supply voltage (or multiples thereof) should be chosen, and global power optimizationschemes should be picked. Thus, any logic that is not required for 100% of the circuit operation should be shutdown when not in use, assuming that doing so would be power e"cient. The clock frequency should be chosensuch that su"cient time is given for the most complex nodes to rise and fall, but should also meet timing resolutionrequirements. Since choosing di!erent clock frequencies places di!erent constraints on the logic, the synthesisstage may have to be repeated several times so that the logic can be optimized based on the di!erent timingconstraints. The supply voltage will need to be chosen based on the delay requirements, the clock frequency.Ideally, the lowest voltage possible should be picked that allows the circuit to function. However, other issues,such as the availability of multiple voltage power supplies and noise margin requirements, need to be considered.

The logic of performance critical cells should then be optimized carefully. This should be done by takinginto account the driving strength of the previous load, the capacitive load on the next stage, and the supplyvoltage and clock frequency being used. This optimization will involve the selection of logic styles and transistorsizes to reduce total power consumption while staying within the design constraints. Since this will change theconstraints on the next and previous stages, recursive optimization may be required, which should be done usingautomated tools, should they be available.

Finally, layout should done to reduce area, while minimizing parasitic capacitance as much as possible.Since biomedical sensor nodes may be required to operate in environments where minimum area is crucial (suchimplantable sensors) [2], area requirements are crucial, and thus may be a high priority design constraint. Butsince parasitic capacitances may increase switching power, care must be taken not to degrade performance toomuch by layout design choices.

Throughout the design procedures, simulations of each cell and of the entire system should be performed toensure that the power of the entire design falls within the required constraints. Since these simulation resultsmay be dependent on models with limited amount of accuracy, care must taken to ensure that the simulationresults make sense. Actual manufacturing of the chip is the only certain method to ensure the circuit is designedcorrectly.

19

References

[1] J. A. Paradiso and T. Starner. Energy scavenging for mobile and wireless electronics. IEEE PervasiveComputing, 4(1), 2005.

[2] K. A. Townsend, J. W. Haslett, T. K. K. Tsang, M. N. El-Gamal, and K. Iniewski. Recent advances andfuture trends in low power wireless systems for medical applications. Proceedings from the 5th internationalworkshop on System-on-Chip for Real-time applications, 2005.

[3] Kuan-Yu Lin, T. K. K. Tsang, M. Sawan, and M. N. El-Gamal. Radio-triggered solar and rf power scavengingand management for ultra low power wireless medical applications. Proceedings from the IEEE InternationalSymposium on Circuits and Systems, 2006.

[4] Peter H. R. Popplewell, Victor Karam, Atif Shamim, John Rogers, and Calvin Plett. An injection-locked 5.2GHz SoC transceiver with on-chip antenna for self-powered RFID and medical sensor applications. IEEESymposium on VLSI Circuits, pages 88–89, June 2007.

[5] Liming He and Geng-Sheng Kuo. A novel time synchronization scheme in wireless sensor networks. IEEE63rd Vehicular Technology Conference, 2:568–572, 2006.

[6] F. Zhang and G. Y. Deng. Probabilistic time synchronization in wireless sensor networks. Proceedings ofthe International Conference on Wireless Communications, Networking and Mobile Computing, 2:980–984,2005.

[7] Anantha P. Chandrakasan and Robert W. Brodersen. Low Power Digital CMOS Design. Kluwer AcademicPublishers, 1995.

[8] Noureddine Chabini and Wayne Wolf. Synchronous sequential digital designs using retimng and supplyvoltage scaling. IEEE Transactions on VLSI, 13(10):1113–1126, October 2005.

[9] Dimitrios Soudris, Christian Piguet, and Costas Goutis, editors. Designing CMOS Circuits for Low Power.Kluwer Academic Publishers, 2002.

[10] Hung Tien Bui, Yuke Wang, and Yingtao Jian. Design and analysis of low-power 10-transistor full addersusing novel XOR-XNOR gates. IEEE Transactions on Circuits and Systems II: Analog and Digital SignalProcessing, 49(1):25–30, January 2002.

[11] Yingtao Jiang, A. Al-Sheraidah, Yuke Wang, E. She, and Jin-Gyun Chung. A novel multiplexer-basedlow-power adder. IEEE Transactions on Circuits and Systems II: Express briefs, 51(7):345–348, July 2004.

[12] Jin-Fa Lin, Ming-Hwa Sheu, and Yin-Tsung Hwang. Low-power and low complexity full adder design forwireless base band application. Proceedings from the International Conference on Communications, Circuitsand Systems, 4:2337–2341, June 2006.

[13] H. Soeleman, K. Roy, and B. Paul. Robust ultra-low power sub-threshold DTMOS logic. Proceedings of theInternational Symposium on Low Power Electronics and Design, pages 377–380, September 2000.

[14] S. Badel and Y Leblebici. Breaking the power-delay tradeo!: Design of low-power high-speed MOS current-mode logic circuits operating with reduced supply voltage. IEEE International Symposium on Circuits andsystems, pages 1871–1874, May 2007.

20

Design and Analysis of CMOS Full Adders for Low Power and...

Documents

Transcript of Design and Analysis of CMOS Full Adders for Low Power and...