Power Optimization of Delay Constrained...

VLSI DESIGN2001, Vol. 12, No. 2, pp. 125-138Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2001 OPA (Overseas Publishers Association) N.V.Published by license under

the Gordon and Breach SciencePublishers imprint.

Power Optimization of Delay Constrained Circuits

ANSHUMAN NAYAKa’*, MALAY HALDARa’t, PRITH BANERJEEb’*, CHUNHONG CHENc’and MAJID SARRAFZADEHc’

a#L458, b#L463, C#L469, Technological Institute, 2145 Sheridan Road, Evanston, IL 60208

(Received 20 June 2000," In finalform 3 August 2000)

We present a framework for combining Voltage Scaling (VS) and Gate Sizing (GS)techniques for power optimizations. We introduce a fast heuristic for choosing gates forsizing and voltage scaling such that the total power is minimized under delay con-straints. We also use a more accurate estimate for determining the power dissipation ofthe circuit by taking into account the short circuit power along with the dynamic power.A better model of the short circuit power is used which takes into account the loadcapacitance of the gates. Our results show that the combination of VS and GS performbetter than the techniques applied in isolation. An average power reduction of 73% isobtained when decisions are taken assuming dynamic power only. In contrast, averagepower reduction is 77% when decisons include the short circuit power dissipation.

Keywords: Voltage scaling; Gate sizing; Low power; Digital signal processors; Short circuit power

1. INTRODUCTION

Advances in semiconductor technologies have ledto chips with millions of transistors. As circuitdensity and speed increases, power dissipation hasbecome one of the critical parameters in circuitdesign. The expanding and converging fields ofcomputing and digital comunications are creatingnew demands for high performance and program-mable signal processing engines. To enhance the

performance capabilities of today’s DSP systemswould imply a higher power consumption. Since,the fastest growing area in the computing industryis the provision of high throughput DSP systems ina portable form, the operating time ofthese systemsprovided by the battery becomes a major designissue. Hence, a lot of research has been done forpower reduction at various design levels of abstrac-tion (such as system, architectural, logic and layoutlevels) [1], especially for portable DSP applications.

*Corresponding author: Tel.: (847) 467-4610, Fax: (847) 491-4455, e-mail: [email protected].: (847) 467-4610, e-mail: [email protected].: (847) 491-3641, e-mail: [email protected].: (847) 491-7378, e-mail: [email protected].: (847) 491-7378, e-mail: [email protected]

125

126 A. NAYAK et al.

The average dynamic power consumed by aCMOS circuit is given by [1]

Pavg 0.5VclfEC(v)E(v) (1)

where f is the clock frequency, Vd, the supplyvoltage, C(v) the load capacitance of gate v, andE(v) is the switching activity at the output of gatev. Due to the fact that the charging/discharging ofcapacitance is the most significant source of powerdissipation in CMOS circuits, previous workoptimizes the power by considering three factorsin a circuit: supply voltage, load capacitance andswitching activity. However, most of them dealwith one factor at a time. In this work, we are

interested in power optimization by reducing boththe supply voltage and the load capacitance.

Since the dynamic power consumption is quad-ratically related to supply voltage, reducing supplyvoltage (or voltage-scaling) promises to be aneffective technique for power saving. The basicproblem with Voltage Scaling (VS) is the increasedcircuit delay, since the relation between delay (t,)and supply voltage (Va,) is given by [1]

C x Vtd (2)

K x (Wdd- VT)2

where C is the load capacitance, Vv the thresholdvoltage, and K a constant. If V,, is much greaterthan VT-, then the delay is almost inverselyproportional to supply voltage. For supply voltagenear the threshold voltage, however, the Vv termcauses the delay to increase rapidly. Anothermajor overhead in using different supply voltagesin a circuit is the additional level converters

required at the interface and layout design. Forthis reason, it is advisable to restrict oneself todual-voltage approach where two supply voltagesare available for power optimization. Anothertechnique for reducing power at the logic or

transistor level is the technique of Gate Sizing(GS) which targets power optimization by reduc-ing the load capacitance. Since the intrinsicresistance of the gate is inversely proportional to

the size of the gate, GS results in an increase indelay of the gate. Gate sizing is well known to be a

useful tool for reducing circuit delays in CMOSintegrated circuits. Several methods have beenproposed as solutions when the problem is posedas an area-delay tradeoff, such as in the work in

[9-11].From a general point of view, reducing either

supply voltage or physical size of a gate, at logiclevel, leads to a gate delay increase which impliesdecreased slack time. In this sense, VS and GS canbe effective for delay-constrained optimizationonly if the given circuit has significant timingslack available in some or all of its constituentgates. Because of the discrete nature of supplyvoltages or gate sizes, VS or GS alone tends toleave more slacks unutilized, [20] preventingeffective power reduction. Further, slacks usedup by one technique could have been used by theother technique to give higher power reduction.This fact motivates us to opt for a combined VSand GS algorithm. We propose a fast heuristic forGS and VS which would identify the maximumnumber of gates for gate sizing or voltage scalingunder the delay constraints so that the total powerdissipation of the circuit is minimized.

Previous approaches have also attempted tominimize the total power using simultaneousvoltage scaling and gate sizing [12]. But theseapproaches consider the dynamic power dissipa-tion only, and neglected the role of the short-circuit power. However, this is not a validassumption as short-circuit power accounts forunder 20% of the total power. Minimizing a powerfunction that considers only the dynamic power,without any constraints on delay, would implythat all transistors must necessarily be minimumsized. However, a minimum-sized circuit does not

necessarily correspond to a minimum powercircuit, the effect being more pronounced whenlarge loads are driven. Further, down sizing a gatemight increase the short-circuit power of thefanout gates which could be high enough to offsetthe decrease in the dynamic power. Most of thetraditional models for short-circuit power neglect

POWER OPTIMIZATION 127

the effect of the load capacitance and are incorrect.In this work, we use a more accurate estimate forshort-circuit power and minimize the total dy-namic and short circuit power using a combinedVS and GS technique. We also propose a fastalgorithm which would identify more nodes forsizing or for voltage scaling.Our optimization problem may be described as:

minimize Power(W, V) (3)

subject to Delay(W, V) < Tspec (4)

Vi Vhigh or Vtow, Vgate

Maxsize(i) >_ Wi

_Minsize(i) (6)

where both Power and Delay are functions of gatesizes (W) and supply voltages (V), Tspec is thetiming constraints, Vhigh and Vlow are two supplyvoltages, Vi and wi are the supply voltage and sizeof gate i, respectively, and Minsize(i) and Max-size(i) are given by the gate library. This is a delay-constrained power-minimizing problem. In [16], amethod which makes use of transistor reorderingwas described to address a similar problem. Sincetransistor reordering is simply intended for reduc-ing the average number of transitions at internalnodes of gates for low power, the resulting powerreduction is very limited. In this work, we providenew cost models for delay and power with voltagescaling and gate sizing. Algorithms for single VS,single GS and combined VS and GS are proposedto optimize power. Experiments show that thecombined VS and GS obtain maximum powerimprovement.For our work, we assume that switching activity

is a constant for each node and is independent ofgate delays. Switching activity is the measure ofsignal transitions per clock cycle. Switching activ-ity at all nodes inside a circuit not only dependsstrongly on the topologic structure and inputpatterns of the circuit, but may also vary with gatedelay which introduces glitching transitions. There-fore, the zero-delay model provides a lower bound

on the activity. Under a general delay model,updating activities iteratively, is computationallyprohibitive. Fortunately, VS and GS do notchange the circuit topology, and both tend toreach path-balancing by reducing the slacks. This

helps eliminate glitching to some extent. Intui-

tively, for the purpose of power reduction, thenodes with high switching activity are goodcandidates to work at low supply voltage by VS(or work with the small load capacitance by GS).The remainder of the paper is organizes as fol-

lows. Section 2 discusses delay and power model-ing with both VS and GS. Section 3 discusses theVS and GS problem in detail. In Section 4, wediscuss an algorithm for combined VS and GS forpower optimization. Finally, experimental resultsare described in Section 5.

2. TIMING AND POWER MODELS

Because of the nature of the problem shown inEqs. (3-6), the general idea behind GS (or VS) isto iteratively select a set of gates to down-size (orreduce their supply voltages), so that the totalpower reduction is maximized and the timingconstraints are met. Thus, a reasonably accuratetiming/power model is required to estimate thedelay and power consumption of a gate underspecific supply voltage and physical size. In thissection we discuss the timing model followed bythe dynamic and the short-circuit power modelused by us.

2.1. Timing Model

In most standard-cell libraries, the gate delay isdefined as

d 7- + c Cad (7)Wi

where 7" is the intrinsic delay, W and Coa are sizeand load capacitance of gate respectively, and ciis a constant. The load drive capability of gate

128 A. NAYAK et al.

increases with Wi. The internal capacitance of gatei, however, varies almost linearly with wi. Thesetogether keep 7" almost independent of wi. Coad isdetermined by the size of the fanout gates andwiring capacitances, i.e.,

j E FO(i)

where FO(i) is the set of fanouts of gate i, and c isa constant. When ignoring the wiring capacitance,(5) can be written as

capacitance Coaa, and internal capacitance

Ciint-c. Wi, operating under a clock frequency fand having a probability pr of switching is givenby

5(Cload + Cint)Ptynamic O. 2VafPr (12)

where Vdd is the supply voltage. It can be seen thatreducing the size of gate leads to the saved powerconsumption of both gate itself and its faningates.

di 7-i + ki E wjlwi (9)j FO(i)

where ki c. ci. Basically, (7) indicates that a largergate is required for the delay reduction if it drivesmore fanouts. Furthermore, it has been shown in[13] that the gate delay at supply voltage Vdd isapproximately proportional to kVad/(Vdd--Vt)2,where Vt is the threshold voltage, and k is aconstant. Assuming dg in (7) is the delay at Vhigh,the gate delay with size wi and supply voltage Vi isgiven by

di(wi, Vi) ("ri + ki E Wj/Wi) Oi (10)j FO(i)

v vg v,where ai (11)

(v- v)2 Vh,h

For the purpose of VS, Vi can be either Vhigh or

Vlow. From (8), reducing supply voltage results inincreased delay of the gate, while reducing gatesize does not always degrade the delay. The reasonis that the loading and, hence, the delay of itsfanins decreases with the reduced size of this gate.

2.3. Short Circuit Power Dissipation

Most transistor sizing methods have consideredonly the dynamic power dissipation. Recently, afew methods have also considered short circuitpower using the formula

Psc - (Vdd 2Vr)3" 7- f PT (13)

where/3 is the MOS transistor gain factor, and 7- isthe transition time of the input transition, and fand pr are as defined earlier.

Equation (13) is inaccurate since it does notmodel the effect of the load capacitance on theshort circuit power. The short circuit powerdissipated by an inverter depends on the followingparameters:

the size of the n-transistor, Wnthe size of the p-transistor, Wpthe input rise time, 7-

the output load capacitance, CL.

A more appropriate model for short-circuitpower dissipation has been proposed [14] to be:

2.2. Dynamic Power Dissipation

The dynamic power dissipated in a circuit corre-sponds to the power dissipated in charging anddischarging capacitances in the circuit. The mag-nitude of this power for a gate driving a load

0.75 0.82---0.085 1.49Psc e( wn w I.,load 7" (14)

Assuming that wp 2.wn, a modified model wouldbe:

Psc e( w1.57,-,-o.085t,,load 7"1.49 (15)


where w is the width of gate i. The input transitiontime is modeled as:

7-i O( Ri Ci (16)

Ri K" 1/wi (17)

Ci gl wi -+- K2 (18)

where Ri and C are the drain resistance andcapacitances of gate respectively and K, K and

K2 are the constants of proportionality. The con-stants were evaluated assuming a 0.18 microntechnology and a unit-sized gate’s input capaci-tance equal to 0.097 fF and output resistance equalto 23.8 kFt [15].

3. VOLTAGE SCALING

Reducing the supply voltage, or voltage scaling(VS), promises to be an effective low-powertechnique since the dynamic power consump-tion is quadratically related to the supply voltage[2-8,17]. While reducing the supply voltageof a whole circuit suffers from circuit speed loss,a low voltage applied only to non-critical pathsof the circuit does not necessarily lead to perfor-mance degradation. The major overhead in usingdifferent supply voltages at different parts of acircuit is that level converters are required to elimi-nate the static current at their interface [4, 18].However, the level converters introduce addition-al power penalty. To avoid too many level con-verters, it is reasonable to use a dual-voltageapproach in which only two supply voltages areavailable for the optimized circuits.The typical dual-voltage approach is the Cluster

Voltage Scaling (CVS) scheme [4]. Its basic idea isto use the depth-first search from the primaryoutputs to find gates which may operate at a lowsupply voltage without violating the timing con-straints of the circuit. A gate is not allowed tooperate at a low voltage until all its transitivefanouts have been selected to do so. This, to a

large extent, limits the effectiveness of the

algorithm, since a gate with small slack does notimply that the slacks of all its transitive fanins arealso small. A linear programming approach wasalso proposed [18] to address the dual-voltageproblem. However, it is based on the delaybalanced configurations whose generation requiresvery expensive computation cost. In [6, 19], a Two-Voltage Power-Optimization (TVPO) algorithm isproposed to reduce power by translating the poweroptimization problem into the Maximal-Weighted-Independent-Set (MWIS) problem and allowing asmany gates as possible working at Vtow. Thenumber of level converters at the boundary ofhigh-voltage and low-voltage gates is reducedusing the "constrained" Fiduccia-Mattheyses (F-M) algorithm [21]. Section 5 talks about the-limitations of the MWIS approach which has a

high execution time due to slow convergence of thealgorithm. We propose a path based heuristicwhich is faster than the MWIS approach. Thenumber of nodes operating at a lower voltage islimited by the slack of the circuit.

4. GATE SIZING

Gate sizing consists of choosing for each node of atechnology mapped network, a gate implementa-tion in the library so that the total power of thecircuit is minimized without affecting the overalldelay of the network, i.e., under some delayconstraints. This is possible as gates in the non-critical path of the network have a lot of slack sothat they can be down sized to save on powerwithout violating timing criticality. Figure showsthe effect of down sizing gate G on the total powerof the circuit. On down sizing gate G, the inputcapacitance of Gate G decreases. Hence, the loadcapacitances of the gates which are the fanins ofthis gate G, i.e., gate G1 decreases. According toEq. (9), this results in a decrease in the dynamicpower of gate G1. As a consequence of downsizing gate G, the transition time of the signal atthe output of gate G increases. This effects thegates which are the fanouts ofgate G as the time for

130 A. NAYAK et al.

Transition

Time

Dynamic ( GatePower Decreases Downsized J

43 Short_circuit 1

FIGURE Effect of gate sizing on dynamic power and shortcircuit power.

gates which could have been down sized. On thecontrary, if both the fanout gates G1 and G2 weredown sized, then we would have got a greaterreduction in power. Hence, gates which are part ofless paths are better candidates for down sizingbefore gates which are a part of a large number ofpaths. Again, since both dynamic and short-circuitpower is directly proportional to switching activ-ity, gates with a high switching activity shouldbe down sized earlier. Section 5 describes analgorithm for combined voltage scaling and gatesizing.

which both the n and the p gates are ON isincreased. This results in an increase in the short-circuit power dissipated by the fanout nodes.Hence, if the number of fanouts are very high, thenthe total increase in short-circuit power dissipationmay offset the decrease in dynamic power dis-sipation resulting in an increase in the total power,even though we have down sized gate G.

Figure 2 shows the need for optimally choosingthe gates for down sizing. If gate G is chosen fordown sizing, then the corresponding decrease inslack of this gate, will reduce the slack of its fanout

Slack 5

Slack 5

Slack 5

FIGURE 2 Gates which are part of less paths should be downsized.

5. COMBINATION OF VOLTAGESCALING AND GATE SIZING

Since both VS and GS decrease the available slackin the circuit, it would be better to apply the twotechniques in a simultaneous fashion rather thanone after the other. In [12], a technique for powerreduction by simultaneous VS and GS usinga maximum weighted independent set (MWIS)approach has been proposed. Formulating thepower optimization problem as a maximumweighted independent set of the sensitive transitiveclosure of the graph exposes several opportunitiesto reduce power. However, the time complexityof the algorithm is quite high. The algorithmattempts to reduce power dissipation by finding aset of nodes for which delay can be traded forpower. The selected nodes are usually sized downor operated at a lower V,a. This results in a lowerpower dissipation and increased delay for thenode. To ensure that the increase in the delay ofthe nodes does not violate any critical path timingconstraints, the delay at any step is increased by atmost min{minvQm(Ad(v)),Smax-Smax-1}. Smax isthe maximum slack available for any node in thegraph and Smax-1 is the second largest slackavailable, minv am (Ad(v)) is the minimum changein delay feasible among all the nodes of the graph.Only the nodes with the maximum slacks areconsidered to increase their delays in each itera-tion. In a graph G(V, E) where each node has a


different slack, the number of iterations may beO(V), as in each iteration the maximum slack isreduced to the next highest value. As each iterationdoes a transitive closure computation, the totaltime complexity may run upto O(V4). Further-more, due to the discrete nature of the voltagescaling and gate sizing techniques, the possibledelay increase may not equal e exactly, wheree min{minv Qm (Ad(v)),Smax Smax_ }. This push-es the number of iteration higher, increasing thecomplexity even beyond O(V4).

5.1. A Fast Heuristic

The principal reason behind the success of theMWIS based approach is that the algorithm is ableto choose the maximum number of nodes to tradedelay for power given the slacks along the paths.For example, consider Figure 3. The MWISalgorithm obtains the optimal solution because itselects the nodes V1, V2, V3, V4 over the nodes Vs, V6or V7 to introduce delay. Our heuristic is guided bythe same principle. The heuristic is based on thenumber of paths that pass through a node fromany primary input to any primary output. The

V1

0/3/3 V5V2

(2)

1/3/4V7

0/3/3 lV3 ,/ 2/3/5

0/3/3 -"(1 /

0/3/3

FIGURE 3 An example showing that our path based heuristicgives the optimum result.

intuition is that if the number of paths that passthrough a node are large, then introducing a delayat that node uses up the slack of a large number ofnodes that lie on the paths that pass through thatnode. On the other hand introducing delay to anode which has small number of paths passingthrough it will affect the slacks of a small numberof other nodes. Returning to the example ofFigure 3, the number of paths that pass througheach node are shown in parenthesis. For simplic-ity, the delay of each node is assumed to be 1. If wetake into account the number of paths that passthrough each node in selecting which nodes tointroduce delays, giving more priority to nodesthat have less paths passing through them, then wearrive at the same solution given by the MWISalgorithms. Thus we use the number of paths thatpass through each node in deciding which nodes tointroduce delays. Further, since power dissipatedat a node is directly proportional to the switchingactivity at the node, nodes with a high switchingactivity should be gate sized or voltage scaled first.This guides us to the following weight function foreach node.

Weight(i)(No. of Paths) (19)

where Pr is the switching probability and c,/3, /were assumed to be 1. The weight function assignsa larger weight to gates which have larger slack asthese gates can be sized or voltage scaled by a largefactor giving us more reduction in power. Also,gates with high switching activity are given a largerweight as power reduction is directly proportionalto the switching activity of the gates. Our pathbased heuristic assigns a lower weight to gateshaving large number of paths passing throughthem so that changing slack of an individual gatedoes not reduce slack of a large number of gates.The parameters c,/3, /were chosen to be so thatthe effect of slack, switching activity and numberof paths on the total power reduction could bestudied. These parameters could be changed toobtain better solutions.

132 A. NAYAK et al.

The heuristic is described next. Afterwards wedescribe the algorithm to calculate the number ofpaths that go through a node. Note that comput-ing the number of paths going through a node isefficient. Moreover, as it is a property of the graphthat does not change with the delays of the nodes,we need to calculate it only once as opposed to theMWIS approach where the MWIS had to becalculated after each iteration.Algorithm proposes our combined VS and GS

algorithm. This has the advantage that any slackleftover by one of the techniques will be used over

by the other technique. Further, the techniquewhich would bring the maximum power reductionwould be used for the particular node. Thealgorithm finds out the number of paths througheach gate and uses this to assign a weight to eachnode based on the available slack in the node usingEq. (19). Gates which have a larger slack and haveless paths passing through them are initiallychosen for VS or GS. The change in the totalpower per unit delay is calculated for these chosengates. Since the main objective is to achieve amaximum power reduction, gates are chosen forVS or GS depending on which operation decreasesthe total available slack in the circuit by the leastamount. This algorithm terminates when theavailable slack in the circuit is reduced so thatanymore VS or GS operation would violate thetiming constraints of the circuit.

ALGORITHM Voltage Scaling/gate Sizing

docompute Weight for each nodefor nodes with the maximum Weight

if rtodei can operate at Vtow so thatdelay <_ Tpee

(APVS/Adelay) change in totalpower per unit delay by VS

where APVS is the reduction in powerconsumption due to voltage scaling techniqueand Adelay is

the decrease in the available slackif nodei can be resized so that delay _< Twee

if total power reduction >_ 0(APGS/Adelay) change in total

power per unit delay by GSwhere APGS is the reduction in

power consumption due to gate sizing techni-que and Adelay

is the decrease in the available slackif (APVS/Adelay) _> (APGS/Adelay)

apply VS on nodeiupdate slacks on affected paths

elseapply GS on nodeiupdate slacks on affected paths

endforwhile (at least one node is changed)

Algorithm 2 proposes a linear time algorithm tocalculate the number of paths which is used tocalculate the Weight function to choose thecandidate nodes for VS or GS.Now we prove that the above algorithm indeed

gives the number of paths passing through a node.Consider the number of paths entering a particularnode. Each of these paths must either pass throughone of its predecessor or originate at one of itspredecessors. Moreover, a path passing through a

node has a unique predecessor along the path asthe graph is acyclic. Hence the number of pathsentering a node is the sum of all paths goingthrough or originating at its predecessors. Asimilar argument applies for paths leaving a node.Each path leaving a node must pass through or

terminate at a successor. The number of enteringpaths for each node is computed by visiting thenodes in a topologically sorted order and assigningthe number of paths as the summation of thenumber of paths through the predecessor nodes ororiginating at a predecessor node in case they areprimary inputs. The same algorithm can beapplied to calculate the number of paths leavinga node by reversing the edges and applying a

topological sort starting from the primary outputs.Now the total number of paths going through anode is the number of ways to enter the node timesthe number of ways to leave the node, i.e., product


of the number of entering paths and paths leavingthe node.

ALGORITHM 2through a node

Calculation of number of paths

Input Directed Acyclic Graph G(V, E)Output Number of paths passing through eachnode v E Vfor all v E V

if (v is primary input)incoming_paths[v] 1;

if (v is primary output)outgoing_paths[v] 1;

Topologically sort vertices of G(V, E).for each v V other than primary i/o in topologicalsorted orderincoming_ paths[v] Eu pred(v) incoming_paths

[u];Reverse edges and topologically sort vertices ofG(V, )for each v V other than primary i/o in topologicalsorted order

outgoing_paths[v] 2u epred(v) outgoing_paths[u];for each v V other than primary i/o

paths_ going_through[v] incoming_ paths[v] xoutgoing_ paths[v];

Since the calculation of the number of paths thatpass through each node requires a traversal of thegraph in topological sorted order, the timecomplexity for number of paths calculation isO(E), where E is the number of edges. Thiscomputation is required only once in the beginningof the algorithm as the number of paths passingthrough a node does not change. The timecomplexity for slack calculation for affected pathsin each iteration of the for loop in Algorithm isO(V), assuming the nodes are already in topolo-gical sorted order. The body of the for loop inAlgorithm is executed whenever a node is sizedor scaled. Hence the maximum number of timethe for loop body is executed is O(V) as eachnode is scaled or sized only once. Thereforethe time complexity of the algorithm is

O(E+ V. V)- O( V2). Note that the time complex-ity of the combined VS and GS sizing algorithmusing the MWIS approach is O(r V3), where r is thenumber of iterations executed by the algorithm.Hence, the proposed heuristic is orders of magni-tude faster than the MWIS approach.

6. EXPERIMENTAL RESULTS

The experimental setup consists of the combinedvoltage scaling and gate sizing algorithm imple-mented in the environment of SIS. Experimentswere carried out on a set of MCNC benchmarkcircuits. Before running our Algorithm forvoltage scaling and gate sizing, we performedtechnology mapping on the given circuit for themosisO8.genlib library under minimum delay modewith SIS and used this delay as the timingconstraint, both for voltage scaling and gate sizing.The algorithm is implemented on nodes with ahigher weight function as defined by Eq. (19). Thisensures that maximum number of nodes arechosen for gate sizing. According to Algorithm 1,since only gates that do not violate the timingconstraints on any path after down sizing or

voltage scaling are accepted, there is no need for a

post-processing step to resolve nodes with negativeslacks. The power consumption was estimatedbased on the clock frequency of 100 MHz, thresh-old voltage of V and supply voltage of Vhigh5.0 V and Vtow-3.5 V. Exact values of change intransition times was calculated using Eq. (16)through Eq. (18). The constants were evaluatedassuming a 0.18 micron technology and a unit-sized gate’s input capacitance equal to 0.097 fFand output resistance equal to 23.8 kf [15].

Table I shows the percentage reduction in totalpower using only voltage scaling technique. We seea power reduction of about 50% for circuit9symml when the total power is equal to thedynamic power and about 58% when short-circuitpower is also considered during the decision.Table II shows the percentage reduction in totalpower using only gate sizing technique when all

134 A. NAYAK et al.

TABLE Power reduction using VS technique only

Circuit

% Reduction#Total #of Vtow in powergates gates (dynamic power)

%Reductionin power

(dynamic+ short-(circuit power)

9symmlC1908C880apex7b9frglfrg2ili3i5i6i7rotterm

157 147 51.00540 481 50.99384 297 50.73307 156 50.93166 103 50.86124 92 50.78

1438 1152 50.5689 48 51.00

252 114 51.00505 306 51.00701 496 51.00828 558 41.12777 535 51.00364 320 51.00

58.4858.1958.0058.3057.5457.4558.3258.7558.2358.4959.0358.6358.0758.21

TABLE II Power reduction using GS technique only

Circuit

% Reduction% Reduction in power

#of in power (dynamic+ short-gates (dynamic power) (circuit power)

9symmlC1908C880apex7b9frg2ili3i6i7rotterm

157 47.78 54.58540 52.44 55.55384 57.61 60.18242 56.98 60.06166 41.57 45.98

1438 54.20 48.8989 62.47 63.68

252 59.99 60.19701 56.99 61.12828 51.30 57.10777 48.95 49.23364 46.52 51.99

gates operate on a single supply voltage. Figure 4shows the percentage reduction in power usinggate sizing graphically. We see a power reductionof about 47% for circuit 9symml when the totalpower is equal to the dynamic power and about54% when short-circuit power is also consider-ed during the decision. Figure 5 shows that acombined VS and GS approach gives more powerreduction than only VS. Table III gives thepercentage power reduction using our combinedVS and GS technique. A power reduction of ashigh as 80% is obtained for circuits like il. The

percentage power reduction is very high as thealgorithm finds out the maximum number ofnodes that are candidates for either VS or GSand do not violate the timing constraints. We canconclude that though VS and GS individually giveus high power reduction, we can get much higherreduction by using a combined approach as theslacks which are unutilized by one technique canbe used by the other technique. We have notconsidered the effect on power of the additionallevel converters that would be introduced due tothe dual voltages in the circuit. Figure shows that


[] % tage reduction

40

35"

30

25

20

15

10

5

09ymml 1908 apex7 frg2 alu2

FIGURE 4 Percentage power reduction with gate sizingtechnique.

down sizing a gate might not always result in totalpower reduction. Hence, a decision taken withonly the dynamic power into consideration wouldbe less accurate. We can see from Figure 6 that anadditional power reduction of as high as 6% canbe got by taking the short-circuit power in thedecision process. The improvement in powerreduction depends on the number of implementa-tions of the gates in the library. [12] definescompleteness of a gate library for gate sizing. Amore complete library would definitely improvethe flexibility of the algorithm. The execution timeof our algorithm using our fast heuristic for circuitC1908 is 85.87 seconds. The execution time using

IVS VS+GS[

40

30

9i 1908 ap7 2 alu2 alu4

FIGURE 5 Percentage power reduction with VS and with ourcombined VS and GS algorithm.

Dynamic PowerI Dynamic + Short Circuit Power]7O

60

40

30

20

10

9eymml 1908 apex7 fr82 alu2 alu4

FIGURE 6 Power reduction for combined VS and GS withand without short-circuit power.

TABLE III Power reduction using VS and GS

Circuit#Totalgates

#of Vtowgates

% Reductionin dynamicpower

% Reductionin dynamic+ short-

circuit power

9symmlC1908C880apex7b9frglfrg2ili3i5i6i7rotterm

average

1575403843O7166124

143889

2525O5701828777364

1364102642059387

11524282

303495560520288

70.2774.8577.4676.9869.6368.0477.4581.370.6978.5475.268.3373.2369.36

73.66

73.677.0880.4680.9574.0669.948O.6784.1374.6981.6277.5774.0575.5070.93

76.80

136 A. NAYAK et al.

the MWIS approach [6] is reported as 117.7seconds for Library A, 136.6 seconds using LibraryB, 256.6 seconds using Library C and 1485.7seconds using Library D. We are not reporting acomplete comparison with the combined VS andGS technique using a MWIS approach as the gatelibraries used by them was different than what wasavailable to us. But, from the execution times andthe complexity analysis presented in Section 5, itcan be concluded that out algorithm is much fasterthan the MWIS algorithm.

7. CONCLUSION

We have presented an effective framework forintegrating voltage scaling and gate sizing techni-ques for getting maximum power reduction. Wehave proposed a fast algorithm for choosing themaximum number of gates for voltage scaling andgate sizing. We have used a better model for short-circuit power dissipation and shown that thecombined voltage scaling and gate sizing generatesan average power saving of 77%, which is greaterthan the power reduction achieved when thedecisions are taken with only dynamic power.

References

[1] Chandrakasan, A. and Brodersen, R. (1995). Low-PowerCMOS Digital Design, Kluwer Academic Publishers.

[2] Raje, S. and Sarrafzadeh, M., Variable voltage scheduling,International Symposium on Low Power Design, pp. 9-14,April, 1995.

[3] Chang, J. M. and Pedram, M., Energy minimization usingmultiple supply voltages, IEEE Transactions on VLSISystems, 5(4), 1-8, December, 1997.

[4] Usami, K. and Horowitz, M., Cluster voltage scalingtechnique for low power design, International Symposiumon Low Power Design, pp. 3-8, April, 1995.

[5] Usami, K. et al. (1997). Automated low power techniqueexploiting multiple supply voltages applied to a mediaprocessor, Custom Integrated Circuit Conference, pp.131-134.

[6] Chen, C. and Sarrafzadeh, M., An effective algorithm forgate-level power-delay tradeoff using two voltages, Inter-national Conference on Computer Design, pp. 222-227,October, 1999.

[7] Raje, S. and Sarrafzadeh, M. (1997). Scheduling withmultiple voltages, Integration, VLSI Journal 23, pp.37-59.

[8] Usami, K. et al., Design methodology of ultra low-powerMPEG4 codec core exploiting voltage scaling techniques,ACM/IEEE Design Automation Conference, pp. 483-488,June, 1998.

[9] Shyu, J. M., Sangiovanni-Vincentelli, A. L., Fishburn, J.and Dunlop, A., Optimization-based transistor sizing,IEEE Journal of Solid-State Circuits, 23, 400-409, Apr.,1998.

[10] Sapatnekar, S. S., Rao, V. B., Vaidya, P. M. and Kang,S. M., An exact solution to the transistor sizing problemfor CMOS circuits using convex optimization, IEEETransactions on Computer-Aided Design, 12, 1621 1634,Nov., 1993.

[11] Berkelaar, M. R. and Jess, J. A. (1990). Gate Sizing inMOS digital circuits with linear programming, Proceed-ings of the European Design Automation Conference, pp.217-221.

[12] Chen, C. and Sarrafzadeh, M., Power Reduction bySimultaneous Voltage Scaling and Gate Sizing, Asia PacificDAC 2000, pp. 333-338.

[13] Chandrakasan, A., Sheng, S. and Brodersen, R., Low-power CMOS digital design, Journal of Solid-StateCircuits, 27(4), 473-484, April, 1992.

[14] Sapatnekar, S. S. and Chuang, W., Power-Delay Optimi-zations in Gate Sizing.

[15] Jason Cong, Zhigang Pan, Lei He, Cheng-Kok Koh andKei-Yong Khoo, Interconnect Design for Deep Sub-micron ICs, International Conference on Computer-Aided-Design, pp. 478-485, Nov., 1997.

[16] Prasad, S. C. and Roy, K. (1994). Circuit optimization forminimization of power consumption under delay con-straint, Proc. of International Workshop on Low PowerDesign, pp. 15- 20.

[17] Igarashi, M. et al. (1997). A low power design methodusing multiple supply voltages, Proc. of InternationalSymposium on Low Power Electronics and Design, pp.36-41.

[18] Sundararajan, V. and Parhi, K. K. (1999). Synthesis ofLow Power CMOS VLSI circuits using dual supplyvoltages, Proc. of ACM/IEEE Design Automation Con-ference, pp. 72- 75.

[19] Chen, C. and Sarrafzadeh, M. (1999). Provably GoodAlgorithm for Low Power Consumption with Dual SupplyVoltages, Proc. of International Conference on Computer-Aided-Design, pp. 76-79.

[20] Chen, C., Yang, X. and Sarrafzadeh, M. (2000). PotentialSlack: An Effective Metric of Combinational CircuitPerformance, Pro. of International Conference on Com-puter-Aided-Design.

[21] Fiduccia, C. M. and Mattheyses, R. M. (1982). A lineartime heuristic for improving network partitions, Proc.of ACM/IEEE Design Automation Conference, pp.175-181.

Authors’ Biographies

Anshuman Nayak received his Bachelor’s degree inElectronics and Electrical Communication Engg.from the Indian Institute of Technology in 1998and his Masters in Electrical and Computer Engg.from Northwestern University. He is currently


pursuing is Ph.D. at Northwestern University. Hisresearch interests include system level design tools,logic synthesis, embedded systems and reconfigur-able computing.Malay Haldar received his Bachelor’s degree inComputer Science and Engg. from the IndianInstitute of Technology in 1998 and his Masters inElectrical and Computer Engg. from Northwest-ern University. He is currently a doctoral studentat Northwestern University. His research interestsinclude system level design tools, embeddedsystems and reconfigurable computing.Prithviraj Banerjee received his B.Tech. degreein Electronics and Electrical Engineering from theIndian Institute of Technology, Karagpur, India,in August 1981, and the M.S. and Ph.D. degrees inElectrical Engineering from the University ofIllinois at Urbana-Champaign in December 1982and December 1984 respectively. Dr. Banerjee iscurrently the Walter P. Murphy Professor andChairman of the Department of Electrical andComputer Engineering, and Director of the Centerfor Parallel and Distributed Computing. ar North-western University in Evanston, Illinois. Prior tothat he was the Director of the ComputationalScience and Engineering program, and Professorof Electrical and Computer Engineering and theCoordinated Science Laboratory at the Universityof Illinois at Urbana-Champaign. Dr. Banerjee’sresearch interests are in Parallel Algorithms forVLSI Design Automation, Distributed MemoryParallel Compilers, and Compilers for AdaptiveComputing, and is the author of over 270 papersin these areas. Dr. Banerjee has received numerousawards and honors during his carrer. He became aFellow of the ACM in 2000. He was the recipientof the 1996 Frederick Emmons Terman Award ofASEE’s Electrical Engineering Division sponsoredby Hewlett-Packard. He was elected to the Fellowgrade of IEEE in 1995. He received the UniversityScholar award from the University of Illinois forin 1993, the Senior Xerox Research Award in1992, the IEEE Senior Membership in 1990, theNational Science Foundation’s Presidential YoungInvestigators’ Award in 1987, the IBM Young

Faculty Development Award in 1986, and thePresident of India Gold Medal from the IndianInstitute of Technology, Kharagpur, in 1981.Chunhong Chen received the Ph.D. degree inelectrical engineering from the Fudan University,Shanghai, China, in 1997. He is currently a

postdoctoral fellow at Northwestern University,Evanston, IL. From 1997 to 1998, he was with theHong Kong University of Science and Technologyas a Research Associate. His current researchfocus is on logic-level and high-level synthesis forlow power.Majid Sarrafzadeh received his B.S., M.S. andPh.D. in 1982, 1984, and 1987 respectively fromthe University of Illinois at Urbana-Champaign inElectrical and Computer Engineering. He joinedNorthwestern University as an Assistant Professorin 1987. Since 1997 he has been a Professor ofElectrical Engineerng and Computer Science atNorthwestern University. His research interests liein the area of VLSI CAD, design and analysis ofalgorithms and VLSI architecture. Dr. Sarrafzadehis a Fellow of IEEE for his contribution to"Theory and Practice of VLSI Design". Hereceived an NSF Engineering Initiation award,two distinguished paper awards in ICCAD, andthe best paper award for physical design in DACfor his work in the area of High-Speed VLSI ClockDesign. He has served on the technical programcommittee of numerous conferences in the area ofVLSI Design and CAD, including ICCAD, EDACand ISCAS. He has served as committee chairs of anumber of these conferences, including Interna-tional Conference on CAD and InternationalSymposium on Physical Design. He will be thegeneral chair of the 1998 International Symposiumon Physical Design. Professor Sarrafzadeh haspublished approximately 150 papers, is a co-editorof the book "Algorithmic Aspects of VLSILayout" (1994 by World Scientific), co-author ofthe book" An Introduction to VLSI PhysicalDesign" (1996 by McGraw Hill), and the authorof an invited chapter in Encyclopedia of Electricaland Electronics Engineering in the area of VLSICircuit Layout. This is planned for publication in

138 A. NAYAK et al.

1997 by John Wiley & Sons, Inc. Dr. Sarrafzadeh ison the editorial board of the VLSI Design Journal,co-editor-in-chief of the International Journal ofHigh-Speed Electronics, and an Associated Editor

of IEEE Transactions on Computer-Aided Design.Dr. Sarrafzadeh has collaborated with manyindustries in the past ten years including IBM andMotorola.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


Power Optimization of Delay Constrained...

Documents

Transcript of Power Optimization of Delay Constrained...