Manish Bhardwaj, Rex Min, Anantha P....

14
1 Quantifying and Enhancing Power-Awareness of VLSI Systems Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract — An increasingly important figure-of-merit of a VLSI system is “power-awareness” which is its ability to scale power consumption in response to changing operating conditions. These changes might be brought about by the time-varying nature of inputs, desired output quality or just environmental conditions. Regardless of whether they were engineered for being power aware, systems display variations in power consumption as conditions change. This implies, by the definition above, that all systems are naturally power aware to some extent. However, one would expect that some systems are “more power aware” than others. Equivalently, we should be able to re-engineer systems to increase their power awareness. In this paper, we attempt to quantita- tively define power-awareness and how such awareness can be enhanced using a systematic technique. We illustrate this technique by applying it to VLSI systems at several levels of the system hierarchy - multipliers, register-files, digital fil- ters, dynamic voltage scaled processing and data-gathering wireless networks. It is seen that, as a result, the power- awareness of these preceding systems can be significantly en- hanced leading to increases in battery lifetimes in the range of 60%-200% . Keywords — Power-Aware, Energy-Aware, Scalable, Low- Power, Low-Energy, Metrics. I. Introduction L OW power system design assuming a worst-case power dissipation scenario is being supplanted by a more comprehensive philosophy variously termed power-aware or energy-aware or energy-quality scalable design [1]. The ba- sic idea behind these essentially identical approaches is to allow the system power to scale with changing conditions and quality requirements. There are two main views to motivating power-aware de- sign and its emergence as an important paradigm. The first view is to explain the importance of power-awareness as a consequence of the increasing emphasis on making systems more scalable. In this context, making a system scalable refers to enabling the user to tradeoff system performance parameters as opposed to hard-wiring them. Scalability is an important figure-of-merit since it allows the end-user to implement operational policy, which often varies signif- icantly over the lifetime of the system. For example, con- sider the user of a portable multimedia terminal. At times, the user might want extremely high performance (say, high video quality) at the cost of reduced battery lifetime. At other times, the opposite might be true - i.e. the user might want bare minimum preceptual quality in return for maximizing battery lifetime. Such trade-offs can only be optimally realized if the system was designed in a power- aware manner. A related motivation for power-awareness The authors are with the Department of EECS, Massachusetts In- stitute of Technology (MIT), Cambridge, MA 02139. E-mail: man- [email protected] . is that a well designed system must gracefully degrade its quality and performance as the available energy resources are depleted [2]. Continuing our video example, this im- plies that as the expendable energy decreases, the system should gracefully degrade video quality (seen by the user as increased “blockiness”, for instance) instead of exhibiting a “cliff-like”, all-or-none behavior (perfect video followed by no video) [2], [3]. While the view above argues for power-awareness from a user-centric and user-visible perspective, one can also moti- vate this paradigm in more fundamental, system-oriented terms. With burgeoning system complexity and the ac- companying increase in integration, there is more diver- sity in the operating scenarios than ever before. Hence, design philosophies that assume the system to be in the worst-case operating state most of the time are prone to yield sub-optimal results. In other words, even if there is little explicit user intervention, there is an imperative to track operational diversity and scale power consumption accordingly. This naturally leads to the concept of power- awareness. For instance, the embedded processor that de- codes the video stream in a portable multimedia terminal can display tremendous workload diversity depending on the temporal correlation of the incoming video bit-stream. Hence, even if the user does not change quality criteria, the processor must exploit this operational diversity by scaling its power as the workload changes. Since low-energy and low-power are intimately linked to power-awareness, it is important and instructive to provide a first-cut delineation of these concepts even at this intro- ductory stage. This is to convince the reader that power- awareness as a metric and a design driver doesn’t devolve to traditional worst-case-centric low-power/low-energy de- sign. As preliminary evidence of this, consider the sys- tem architect faced with the task of increasing the power- awareness of the portable multimedia terminal alluded to above. While the architect can claim that certain engineer- ing reduces worst-case dissipation and/or overall energy consumption of the terminal and so on, these traditional measures still fall short of answering the related but differ- ent questions: How well does the terminal scale its power with user or data or environment dictated changes? What prevents it from being arbitrarily proficient in tracking operational diversity? How can we quantify the benefits of such profi- ciency? How can we systematically enhance the system’s ability to scale its power? What are the costs of achieving such enhancements? In this paper, we attempt to formally answer these ques- tions. We initiate the process of formally understanding

Transcript of Manish Bhardwaj, Rex Min, Anantha P....

Page 1: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

1

Quantifying and Enhancing Power-Awareness ofVLSI Systems

Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan

Abstract— An increasingly important figure-of-merit of aVLSI system is “power-awareness” which is its ability toscale power consumption in response to changing operatingconditions. These changes might be brought about by thetime-varying nature of inputs, desired output quality or justenvironmental conditions. Regardless of whether they wereengineered for being power aware, systems display variationsin power consumption as conditions change. This implies,by the definition above, that all systems are naturally poweraware to some extent. However, one would expect that somesystems are “more power aware” than others. Equivalently,we should be able to re-engineer systems to increase theirpower awareness. In this paper, we attempt to quantita-tively define power-awareness and how such awareness canbe enhanced using a systematic technique. We illustrate thistechnique by applying it to VLSI systems at several levels ofthe system hierarchy - multipliers, register-files, digital fil-ters, dynamic voltage scaled processing and data-gatheringwireless networks. It is seen that, as a result, the power-awareness of these preceding systems can be significantly en-hanced leading to increases in battery lifetimes in the rangeof 60%-200% .

Keywords— Power-Aware, Energy-Aware, Scalable, Low-Power, Low-Energy, Metrics.

I. Introduction

LOW power system design assuming a worst-case powerdissipation scenario is being supplanted by a more

comprehensive philosophy variously termed power-aware orenergy-aware or energy-quality scalable design [1]. The ba-sic idea behind these essentially identical approaches is toallow the system power to scale with changing conditionsand quality requirements.

There are two main views to motivating power-aware de-sign and its emergence as an important paradigm. The firstview is to explain the importance of power-awareness as aconsequence of the increasing emphasis on making systemsmore scalable. In this context, making a system scalablerefers to enabling the user to tradeoff system performanceparameters as opposed to hard-wiring them. Scalabilityis an important figure-of-merit since it allows the end-userto implement operational policy, which often varies signif-icantly over the lifetime of the system. For example, con-sider the user of a portable multimedia terminal. At times,the user might want extremely high performance (say, highvideo quality) at the cost of reduced battery lifetime. Atother times, the opposite might be true - i.e. the usermight want bare minimum preceptual quality in return formaximizing battery lifetime. Such trade-offs can only beoptimally realized if the system was designed in a power-aware manner. A related motivation for power-awareness

The authors are with the Department of EECS, Massachusetts In-stitute of Technology (MIT), Cambridge, MA 02139. E-mail: [email protected] .

is that a well designed system must gracefully degrade itsquality and performance as the available energy resourcesare depleted [2]. Continuing our video example, this im-plies that as the expendable energy decreases, the systemshould gracefully degrade video quality (seen by the user asincreased “blockiness”, for instance) instead of exhibitinga “cliff-like”, all-or-none behavior (perfect video followedby no video) [2], [3].

While the view above argues for power-awareness from auser-centric and user-visible perspective, one can also moti-vate this paradigm in more fundamental, system-orientedterms. With burgeoning system complexity and the ac-companying increase in integration, there is more diver-sity in the operating scenarios than ever before. Hence,design philosophies that assume the system to be in theworst-case operating state most of the time are prone toyield sub-optimal results. In other words, even if there islittle explicit user intervention, there is an imperative totrack operational diversity and scale power consumptionaccordingly. This naturally leads to the concept of power-awareness. For instance, the embedded processor that de-codes the video stream in a portable multimedia terminalcan display tremendous workload diversity depending onthe temporal correlation of the incoming video bit-stream.Hence, even if the user does not change quality criteria, theprocessor must exploit this operational diversity by scalingits power as the workload changes.

Since low-energy and low-power are intimately linked topower-awareness, it is important and instructive to providea first-cut delineation of these concepts even at this intro-ductory stage. This is to convince the reader that power-awareness as a metric and a design driver doesn’t devolveto traditional worst-case-centric low-power/low-energy de-sign. As preliminary evidence of this, consider the sys-tem architect faced with the task of increasing the power-awareness of the portable multimedia terminal alluded toabove. While the architect can claim that certain engineer-ing reduces worst-case dissipation and/or overall energyconsumption of the terminal and so on, these traditionalmeasures still fall short of answering the related but differ-ent questions:

How well does the terminal scale its power with useror data or environment dictated changes? What preventsit from being arbitrarily proficient in tracking operationaldiversity? How can we quantify the benefits of such profi-ciency? How can we systematically enhance the system’sability to scale its power? What are the costs of achievingsuch enhancements?

In this paper, we attempt to formally answer these ques-tions. We initiate the process of formally understanding

Page 2: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

2

power awareness by using a multiplier as a simple but ped-agogic example. This is followed by a more rigorous pre-sentation of these concepts in the form of a metric whichis shown to be fundamentally linked to the overall batterylifetime of a system (Section II). Methods to enhance thepower-awareness of the system are discussed in Section III.Section IV demonstrates the efficacy of the proposed met-ric and enhancement techniques using register-files, digitalfilters and dynamic voltage scaling systems as examples.Section V summarizes the paper.

II. Quantifying Power-Awareness

A. Preliminaries

In this section we develop the basic power-awareness for-malisms using a simple system - a 16x16-bit array multi-plier [4] - as an example. This will allow us to elucidatethe essence of our arguments without getting bogged downby detail.

Consider a given system H that performs a certain setof operations F while obeying a set of constraints C. Forthe illustrative system, H would be the given implementa-tion of a 16x16 bit array multiplier. While the set F wouldideally contain all m-bit by n-bit multiplications, wherem,n ∈ [1, 16], we restrict F to be set of all m-bit by m-bitmultiplications instead. We shall see the value of this re-striction in the following discussion. Finally, the constraintmight be simply one of fixed latency (i.e. H cannot takemore than a given time, t, to perform F ).

Given this information, we ask the following question:The Power-Awareness Question:How well does the

energy of a system, H, scale with changing operating sce-narios?

Note that we use energy and not power in the state-ment above because energy allows us to seamlessly includelatency constraints later on. Next, observe that our under-standing of power-awareness can only be as exact as ourunderstanding of operating “scenarios”. As one might ex-pect, these scenarios can be characterized with arbitrarilyhigh detail. For instance, in the case of the multiplier, wecan define the scenario by the precision of the current mul-tiplicands or the multiplicands themselves or even the cur-rent multiplicands and the previous multiplicands, since thepower dissipation is a function of those too. In the interestsof simplicity, we choose to characterize the set of scenar-ios S by the precision of the multiplicands. Normally, thiswould need a two-tuple since there are two multiplicands.But, by our choice of F only one number (the precision ofthe two identical bit-width multiplicands) characterizes thescenario. Hence, H can find itself in one of 16 scenarios.We denote henceforth, a scenario by s and the set of 16scenarios by S.

Having defined scenarios, we take the first step towardscharacterizing the power-awareness of H by tracing its en-ergy behavior as it moves from one scenario to the other.For a 16-bit multiplier, we would do this by executing alarge number of different scenarios and measuring the en-

ergy consumed by each scenario 1. Henceforth, we callthese energy vs. scenario curves of H simply as the “en-ergy curves” of H.

0 2 4 6 8 10 12 14 160

5

10

15

20

25

30

Input Precision (bits)

Ene

rgy

Con

sum

ed (

norm

aliz

ed u

nits

)

E(16x16)

Fig. 1. Multiplier energy as a function of input precision.

Figure 1 shows the energy curve of our 16x16-bit arraymultiplier over 16 scenarios (which represent the precisionof the multiplication). Note that the multiplier has a nat-ural degree of power-awareness even though it was not ex-plicitly designed for it. This is easy to understand sincelower precision vectors lead to lesser switched capacitancethan higher precision ones.

An energy-curve like the one in figure 1 is the first stepto answering the power-awareness question. However, atthis stage, it is difficult to answer the “how well” in thequestion by looking at a system by itself. Hence, insteadof a single energy curve, we look at a few curves togetherto get a better understanding of the desirable properties ofenergy-curves.

0

0.2

0.4

0.6

0.8

1

Scenarios

Nor

mal

ized

Ene

rgy

H1

H2

H3

Fig. 2. Energy curves of three different hypothetical systems.

Figure 2 plots the energy curves of three hypotheticalsystems H1, H2 and H3 executing a certain, identical set ofscenarios. If we had to judge the power-awareness of thesesystems from their energy curves, we would intuitively clas-

1All mutliplier energy curves that we discuss in this paper werederived by extensive (>1000 vectors) PowerMillTM simulations ofmultiplier SPICE netlists in a 0.35µ process.

Page 3: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

3

sify H1 as the system that is the most unaware of the ex-ecuting scenario. Such an undifferentiated energy curvemight be expected if, for instance, these systems were im-plementing multiplication and H1 was a 32-bit RISC pro-cessor (since the energy taken by the other parts would beso great that the actual precision of multiplication wouldhave insignificant impact). H2, on the other hand, defi-nitely displays more energy differentiation than H1 and isintuitively “more scalable”. Furthermore, since the energyof H2 is strictly less than H1, it seems unequivocally bet-ter. Similar arguments can be applied while comparing H3to H1 and we conclude that H3 is more scalable than H1.However, these “intuitive” arguments break down when wetry comparing H3 with H2. On the one hand, H3 displaysbetter scalability than H2. On the other, its energy dissipa-tion exceeds that of H2 over a certain interval. For this rea-son, at this point in our development, it is unclear whetherwe should pick H2 or H3 as the more power-aware system.To help answer that question, it might help to think of theenergy curve of the most desirable system, say Hperfect ex-ecuting the same operations under the same constraints asthe three systems discussed above. In a second step, wecould potentially compare the curves of H2 and H3 to thatof Hperfect to decide which is more power-aware. It helpsto state that:

The Perfectly Power-Aware System (I) A systemHperfect is defined as the most power-aware system iff forevery scenario in S, Hperfect consumes only as much en-ergy as its current scenario demands 2.

It is clear from the above statement that we need to for-mally capture the concept of “only as much energy as ascenario demands”. To derive this energy for a given sce-nario, say s1, we consider constructing a system Hs1 that isdesigned to execute this and only this scenario. The reason-ing is that we should not hope that a given system H canever consume lesser energy in a scenario compared to Hs1 -a dedicated system which was specially designed to executeonly that scenario. We often refer to the Hsis as “point”systems because of their focussed construction to achievelow energy for a particular scenario (or point) in the energycurve. Hence, in the context of power-awareness, the en-ergy consumed by Hs1 is in a sense, the lower bound on thedissipation of H while executing scenario s1. Generalizingthis statement,

Bounds on Efficiency of Tracking Scenarios Theenergy consumed by a given system H while executing ascenario si cannot be lower than that consumed by the adedicated system Hsi constructed to execute only that sce-nario si as efficiently as possible.

This leads to our next definition of Hperfect,The Perfectly Power-Aware System(II) The perfect

system, Hperfect,is as energy efficient as Hsi while execut-

2More formally, Hperfect is the most power-aware system iff forevery scenario in S, Hperfect consumes only as much energy as de-manded by its current operation ∈ F executing in the current scenariounder constraints C. In our multiplier example, we have chosen toconstruct S such that it has a one-one correspondence with F andhence, it makes sense to talk about the “energy of a scenario” exe-cuting on H.

ing scenario si ∀ si ∈ S.We denote the energy-curve of the perfect system by

Eperfect. From a system perspective, the perfect systembehaves as if it contains a collection of dedicated point sys-tems - one for each scenario. When Hperfect has to executea scenario si, it routes the scenario to the point systemHsi . After Hsi is done processing, the result is routed tothe common system output. This abstraction of Hperfectas an ensemble of point systems is illustrated in figure 3.

H s1

H sS||

H s2

H si

DEMUX

Scenario DeterminingUnit

Dedicated Point Systems

Input Output

MUX

Fig. 3. The Perfect System (Hperfect) can be viewed as an ensembleof point systems.

The task of identifying the scenario by looking at thedata input is carried out by the scenario determining block.Once this block has identified the scenario, it configuresthe mux and de-mux blocks such that data is routed to,and results routed from, the point system that correspondsto the current scenario. Note that if the energy costs ofidentifying the scenario, routing to and from a point systemand activating the right point system are zero, then theenergy consumption of Hperfect will indeed be equal to thatof Hsi for every scenario si. Since these costs are never zeroin real systems, this implies that Hperfect is an abstractionand does not correspond to a physically realizable system.Its function is to provide a non-trivial lower bound for theenergy-curve.

To construct the Eperfect curve for our 16-bit multiplier,we emulated the ensemble of points construction outlinedabove. The point systems in our example were 16 dedi-cated point multipliers - 1x1-bit, 2x2-bit, . . . , 16x16-bit -corresponding to Hs1 to Hs16 . When a pair of multipli-cands with precision i came by, we diverted them to Hsi

(i.e. the ixi-bit multiplier). Since we are deriving Eperfect,only the energy consumed by the Hsis was taken into ac-count. The Eperfect curve thus derived is plotted in figure4, where the energy-curve of a single 16x16 multiplier isrepeated for comparison.

Note that Eperfect scales extremely well with precisionsince the scenarios are being executed on the best possible

Page 4: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

4

0 2 4 6 8 10 12 14 160

5

10

15

20

25

30

Input Precision (bits)

Ene

rgy

Con

sum

ed (

norm

aliz

ed u

nits

)Ep E(16x16)

Fig. 4. Comparing the 16x16 multiplier curve to the ”perfect” curve(Eperfect) denoted by Ep in the plot.

point systems that we could construct. Before we indulge ina more detailed comparison of the two curves, it is essentialto note that the latter curve really depends on the kind of“point” systems we allow. In the case of the multiplier, weallowed any ixi−bit multiplier. The set of point systemswe allow is henceforth denoted by P . This set capturesthe resources available to engineer a power-aware system.Like the scenario and constraint sets, it can be specifiedwith increasing rigor and detail. This new formalism, Phas two key purposes. Firstly, it gives a more fundamentalbasis to Eperfect. While it is not possible to talk about the“best possible energy curve”, it is indeed possible to talkabout the “best possible energy curve for a specified P”.Secondly, P is also important when we discuss enhancingthe power-awareness of H. In that context, P specifiesexactly which building blocks are available to us for suchan enhancement.

To quantify how power-aware our multiplier is, we plotthe scenario efficiency ratio,

ηi =E(Hperfect, si)

E(H, si)(1)

in figure 5. 3

A ηi value of unity indicates that the system under con-sideration is as power-aware as it can be for that scenario.The smaller the ηi value, the worse the system’s awarenessof scenario si. In the case of the multiplier, note that Htracks Hperfect fairly closely for higher precisions. This isto be expected since a 16x16 bit multiplier would be veryefficient for scenarios where the operand precision is closeto 16. For lower precision scenarios, H loses its ability totrack as well as Hperfect and can dissipate upto two ordersmore energy than Eperfect. This is a recurring theme insystem design. There are energy costs to pay when a sin-gle system (H) is used over diverse operating conditionsand the η curve above quantifies those costs.

3The notation E(H, si) denotes the energy consumed by a systemH while executing scenario si.

0 2 4 6 8 10 12 14 160

0.2

0.4

0.6

0.8

1

Input Precision (bits)

η i

Fig. 5. The scenario efficiency or awareness (ηi) of a 16x16 bitmultiplier.

B. Defining Power-Awareness

In the preceding section, we quantified the power-awareness of a system on a scenario-by-scenario basis us-ing the η function. Hence, we have partially answered thepower-awareness question posed earlier. The η curve is apartial answer because it still doesn’t help us resolve theother question we posed in the last section - given the en-ergy curves of two systems, can we determine which of thetwo is more power-aware? It is clear that to answer thisquestion, we need to develop a measure of power-awarenessthat distills the entire η curve. Mathematically, our prob-lem is one of mapping a vector η to a scalar φ (power-awareness) by a well defined function f ,

φ = f(η)

Although there are infinitely many possibilities, we willdescribe those that have a useful system-level meaning andcan be practically employed by system architects. A defini-tion that reflects the average-case power-aware behavior ofthe system is its expected ability to track scenario changes,

φ1 = E(η) =∑|S|

i=1 ηi

|S|(2)

where |S| is the cardinality of set S and E(.) is the ex-pectation operator and should not be confused with en-ergy. The physical interpretation of φ1 is that if all scenar-ios were equally likely, H would track the scenario changeswith an expected efficiency of φ1. For the 16x16 multiplier,φ1 = 0.501.

Clearly, we can refine the definition of φ1 to be more real-istic if we had a sense of the likelihood that the system willreside in a particular scenario rather than just assumingall scenarios to be uniformly likely. For instance, figure 6charts out the probability that a multiplier will be in a cer-tain precision scenario when it is filtering a typical speechsignal [5].

We call the curve in figure 6 a scenario-distribution curve(henceforth, we use d to denote scenario-distributions anddi to denote the probability of occurrence of scenario si).

Page 5: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

5

2 4 6 8 10 12 14 160

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Input Precision (bits)

Pro

babi

lity

Fig. 6. Typical multiplier usage pattern in speech filtering applica-tions.

We can now factor in d to arrive at a more reasonable valueof the expected power-awareness of H:

φd = E(η) =∑|S|

i=1 ηidi∑|S|

i=1 di

(3)

For the case of the multiplier for the distribution dspeech,this turns out to be close to 0.42. Hence, if we were to ob-serve the multiplier executing the speech filtering applica-tion at a randomly picked point in time, we would expectto see it dissipating about 140% more energy for a scenariocompared to Hperfect.

While a scenario’s frequency of occurrence is a fair indi-cator of its importance, its not the only one. For instance,a scenario might have a low probability of occurrence, butwhen it does occur, the architect might want the system totrack the change well. If we plug in this importance (m),we arrive at a more generalized version of eqn. (3),

φd−m =∑|S|

i=1 ηidimi∑|S|

i=1 dimi

(4)

A very useful application of power-awareness as definedby eqn. (4) is in predicting and enhancing battery lifetimeof the system H. In the context of maximizing batterylifetime, the importance, mi, of a scenario is simply theenergy dissipated by that scenario,

mi = E(H, si)

Plugging in this definition of importance into eqn. (4)and simplifying using eqn. (1), we get,

φ =∑|S|

i=1 E(Hperfect, si)di∑|S|

i=1 E(H, si)di

= E(Normalized Battery Lifetime) (5)

The interpretation above is important enough that we donot attach any sub-script and consider it the default defini-tion of power-awareness unless specified otherwise. It is one

of the most useful interpretations of power-awareness sinceit directly equates the metric to the expected battery life-time of the system normalized to the lifetime of the perfectsystem. To see why this is so, note that the denominatoris a summation of the expected energy consumption perscenario and hence equal to the expected energy consumedby the system H displaying a scenario distribution d. Sim-ilarly, the numerator is the expected energy dissipation ofthe perfect system. Since battery lifetime is inversely pro-portional to the energy consumed, φ as defined by eqn. (5)represents the normalized battery lifetime of the systemH. For our 16-bit multiplier, it turns out that φ = 0.57,which implies that in a speech filtering application, thismultiplier will have a battery lifetime that is about half ofthe perfectly power-aware system. Note that in equatingφ to the expected normalized lifetime of the system, we ig-nore second-order effects like the dependence of the batterycapacity on the discharge pattern [6].

Coming back to our original motivation - resolving whichof H2 or H3 is more power-aware - we see that the questioncannot be answered in the battery lifetime sense withoutspecifying a scenario distribution. We can unambiguouslyanswer which of H2 or H3 is more efficient for any specifiedd. Interestingly, if we lack scenario distribution informa-tion, and assume that all scenario distributions are equallylikely, this statistical assumption is equivalent to assumingthat all scenarios are equally likely. In this case d reducesto a uniform distribution.

III. Enhancing Power Awareness

A. Motivation

Enhancing the power-awareness of a system is composedof two well defined steps:1. Engineering the best possible point systems.2. Engineering the desired system using the point systemsconstructed in step (1) such that power-awareness is max-imized.

In the context of a power-aware multiplier, the first taskis understood easily. It involves engineering 1x1, 2x2, . . . ,16x16-bit multipliers that are as efficient as possible whileperforming 1x1, 2x2, . . . , 16x16-bit multiplications respec-tively. The second task - that of engineering a system usingpoint systems - is illustrated by the multiplier shown in fig-ure 7.

Note the overall similarity between this figure and theabstraction of Hperfect in figure 3. The ensemble of pointsystems was used as an abstract concept in the context ofexplaining Hperfect’s energy curve. In the present context,however, we are illustrating an actual physical realizationof a system based on this concept. The basic idea is todetect the precision of the incoming operands using a zerodetection circuit and then route them to the most suitablepoint system. In the case of Hperfect, the matching is donetrivially - multiplier operands which need a minimum preci-sion of i-bits are directed to a ixi-bit multiplier. Similarly,the output of the chosen multiplier is multiplexed to thesystem output. As we might expect though, Hperfect, hassignificant overheads. Even if we were to ignore the area

Page 6: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

6

1x1

2x2

16x16

X Y X.Y

Zero Detection Circuit

Fig. 7. The Hperfect system mimics the abstract Hperfect systemby using an ensemble of 16 dedicated point multipliers and azero-detection circuit as the scenario-detector.

cost of having 16 point multipliers, and focus solely on thepower-awareness, the energy curve of Hperfect wouldn’t bethe same as Eperfect. This is because, while the scenarioexecution itself is the best possible, the energy costs of de-termining the scenario (the zero detection circuit), routingthe multiplicands to the right point system and routingthe result to the system output (the output mux) can benon-trivial.

A system that uses a less aggressive ensemble in an effortto reduce the energy overhead of assembling point systemsis shown in figure 8.

16x16

14x14

11x11

9x9

Zero Detection Cicruit

X

Y X.Y

Fig. 8. The 4-point ensemble multiplier system.

The basic operation of this multiplier ensemble is thesame. The precision requirement of the incoming multi-plicand pair is determined by the zero detection circuitry.Unlike the previous 16-point ensemble, this 4-point ensem-ble is not complete and hence mapping scenarios to pointsystems is not one-one. Rather, precision requirements of:1. ≤ 9 bits are routed to the 9-point multiplier,2. 10,11 bits are routed to the 11-point multiplier,

3. 12-14 bits are routed to the 14-point multiplier,4. 15,16 bits are routed to the 16-point multiplier,

Similarly, the results are routed back from the activatedmultiplier to the system output. While scenarios are nolonger executed on the best possible point systems (withthe exception of 16, 14, 11 and 9 bit multiplications), thisensemble has the advantage that energy overheads of rout-ing are significantly reduced over Hperfect. Also, while thescenario to point system mapping of the 4-point ensembleis not as simple as the one-one mapping, it is importantto realize two things. Firstly, the energy dissipated by theextra gates needed for the slightly more involved mappingin the 4-point ensemble is low relative to that dissipated inthe actual multiplication. Secondly, only 4 systems haveto be informed of the mapping decision compared to 16earlier. This reduction further offsets the slight increase inscenario mapping. The energy curve of the 4-point ensem-ble is plotted in figure 9 where it is compared to Eperfectand the energy curve of a single 16x16 multiplier.

2 4 6 8 10 12 14 160

5

10

15

20

25

30

Input Precision (bits)

Ene

rgy

Con

sum

ed (

norm

aliz

ed u

nits

)

Ep E(16x16) E(16x16,14x14,11x11,9x9)

Fig. 9. Energy curve of the 4-point multiplier system in figure 8compared to the “perfect” curve and the conventional 16x16 mul-tiplier curve.

It is not difficult to see the basic trade-off at workhere. Increasing the number of point systems decreasesthe energy needed for the scenario execution itself but in-creases the energy needed to co-ordinate these point sys-tems. Hence, it is intuitively reasonable to assume theexistence of an optimal ensemble of point systems whichstrikes the right balance. Motivated by this possibility, wecan now pose the problem of enhancing power-awarenessthus:

Determining the Most Power-Aware SystemPractically Realizable (I) Can we construct a systemHoptimal as an ensemble of point systems drawn from Psuch that Hoptimal is unconditionally more power-awarethan any other such constructed system?

Invoking property (9) discussed earlier, unconditionalpower-awareness only leads to partial ordering. Hence, theexistence of a unique Hoptimal as defined above cannot beguaranteed. In other words, while it is possible to present aset of solutions that are unconditionally more power-awarethan all other solutions, we cannot guarantee that this setwill have only one member. In fact, this last condition is

Page 7: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

7

highly unlikely to occur in practice - unless routing costsare very low or very high compared to scenario executioncosts (in which cases the optimal ensembles would be thecomplete and single-point solutions respectively). Hence,in general, it is futile to search for an “optimal” ensem-ble of point systems that is unconditionally better than allother ensembles. Thus, we set our ambitions lower and aska slightly different question:

Determining the Most Power-Aware SystemPractically Realizable (II) Can we construct a systemHoptimal as an ensemble of a point systems drawn fromP such that Hoptimal is more power-aware than any othersuch constructed system for a specified scenario dgiven?

Since a specified scenario distribution dgiven imposes atotal ordering on the power-awareness of all possible sub-sets of P , it is easy to prove the existence of an optimalsystem. Note that the proof based on total ordering isnon-constructive i.e. it only tells us that Hoptimal existsbut doesn’t help us determine what it is. This is unfortu-nate because a brute-force search of the optimal subset ofP would require an exponential number of operations in|P | - a strategy that takes unacceptably long even for themodestly large P .

To see if there are algorithms that can find Hoptimal innon-exponential run-times we pose the problem more for-mally as follows.

B. Formal Statement of the Power Awareness Enhance-ment Problem

Given:1. F : A system function to be realized.2. S : A set of scenarios characterized by a scenario basis.For example, the basis in our multiplier example was theprecision of the multiplicands.3. P : A set of point systems available to realize F . Also,we denote the power-set of P i.e. the set containing all thesub-sets of P by P(P ).4. d : The scenario distribution,

d : S →R+ ∪ {0}

that obeys the additional constraint,∑

si∈S

d(si) = 1

expected of a distribution functions.5. e : The energy function,

e : S × P →R+ ∪ {∞}

In other words, for any given pair (si, pi), e gives us theenergy consumed when scenario si is executed on pointsystem pi. For instance, the energy taken by a 4x4-bitmultiplication is different on a 4-bit multiplier than, say, a9-bit multiplier. If the scenario si cannot be executed onthe point system pi, an infinite cost is assigned to the pair4.

4This has the nice property of preventing any infeasible ensemble

6. w : The energy overhead cost function,

w : P(P ) →R+

Hence, w maps every sub-set of P to the sum of all energyspent in co-ordinating the points in the sub-set ensemble(routing energy, determining the scenario, mapping the sce-nario etc.).

Form of the Solution:1. An ensemble of point-systems, h ∈ P(P ), and,2. A corresponding mapping,

g : S → h

i.e. g maps each scenario to a point system in h. Forinstance, in the 4-point multiplier example above, g wouldspecify that scenarios 1-9 execute on the 9-bit multiplier,10 and 11 execute on the 11-bit multiplier and so on.

Measure of the Solution:Since we are interested in the expected battery lifetime

of a system, the measure of a proposed solution - h, g - isthe expected energy consumption E(h) given by,

E(h) = w(h) +|S|∑

i=1

e(si, g(si))di (6)

Note that like all models, the one for energy above canbe made increasingly more precise. For instance, the inter-connect energy will display some dependence on scenariodistributions. Hence, the w function can take d as an ar-gument and so on. However, we refrain from these refine-ments because our intent here is to use a realistic but simplemodel to analyze the complexity of finding a solution.

Problem:Determine a solution that minimizes the measure.

It seems very likely that the problem of finding Hoptimalas stated above belongs to the class of NP-complete prob-lems. In other words, we cannot hope to determine theconstruction of Hoptimal in polynomial time [7]. The proofof NP-completeness and suitable approximation algorithmsto find Hoptimal are beyond the scope of this paper. Atthis point, it suffices to say that we are currently workingwith heuristics to determine Hoptimal and as the applica-tion examples in the next section show, these heuristicsyield good results. For example, in the case of the multi-plier system and the speech distribution dspeech, the 4-pointsystem described above was constructed using a greedyincremental algorithm and achieved a power-awareness ofclose to 0.9 compared to about 0.57 for the single point16x16-bit multiplier. Finally, it is important to note thatthe re-engineered system must not violate any constraintsthat the original system was expected to obey unless theconstraints are relaxed explictly for the sake of increasingpower-awareness.

of point systems. For instance the ensemble {14-bit, 11-bit, 9-bitmultiplier} is a subset of P , but is an infeasible solution since 15-and 16- bit multiplications cannot be executed on any of these pointsystems.

Page 8: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

8

C. Reducing Area Costs Incurred in Enhancing Power-Awareness

Our focus in the preceding discussion was maximizingpower-awareness without regard to implementation costslike area. While such an approach is acceptable for sys-tems where power-awareness must be increased at all costs,it might need to be reformulated for those with area con-straints. In these latter cases, the problem would be tofind the most power-aware ensemble for a specified distri-bution while still obeying specified area constraints. If thearea costs are significant enough, it is often beneficial tothink of implementing an ensemble temporally rather thanspatially. For example, instead of a spatial layout of 4 mul-tipliers as illustrated earlier, we must imagine a temporallayout of these 4 multipliers. In other words, the samephysical hardware is reconfigured to a 16, 14, 11 or 9-bitmultiplier as desired. A possible solution is to selectivelyshut off the parts of a 16-bit multiplier and make it behavelike smaller multipliers. While such a solution may or maynot save any energy in the case of multipliers (due to theoverhead of latches and the latch control network), it isan important illustration of the fact that spatial mappingsaren’t the only means to implement ensembles. In fact, ourdiscussion of power-aware processors in the next section isa real world example of a system where a purely temporalensembles increase power-awareness significantly.

If we reformulated the fitness measure of an ensembleto include its silicon real estate costs, we can expect thatthe optimal ensemble might neither be totally temporal nortotally spatial, but a hybrid. Continuing our multiplier ex-ample, it might mean that we end up with, say 3, pointmultipliers, one or more of which are reconfigurable to dif-fering extents. To find such an optimal, possibly hybrid, so-lution we must extend the spatial formulation of the prob-lem (as stated in the last section) in two ways. Firstly, wemust allow new point systems that correspond to tempo-rally reconfigurable ensembles. In the multiplier example,this means including point systems like a nxn-bit multiplierthat can be explicitly reconfigured as more efficient r0xr0,r1xr1, . . . , rkxrk-bit multiplier where ri ∈ [1, n]. Secondly,we must factor in the energy costs of temporal reconfigura-tion. In simple models, these costs could be factored intothe scenario execution energy itself. Hence, the e functionthat maps (si, pi) pairs to energy values would not only in-clude the cost of executing scenario si on point system pi,but also the expected energy cost of possibly reconfiguringpi to execute scenario si.

Finally, it is worth noting that although we motivatedtemporal and hybrid ensembles to reduce area costs, suchensembles might in fact outperform purely spatial ones inpower-awareness even if we allow unlimited area for both.In other words, one should not expect area saving temporalensembles to be always inferior than the best possible, area-unconstrained spatial ensemble. With some thought, thisshould not be surprising because in moving from spatial totemporal ensembles, we augment our set of point systemsallowing temporal ensembles a larger solution space to pickfrom.

IV. Practical Illustrations of EnhancingPower-Awareness

It is amply clear from the previous section that enhanc-ing power-awareness by constructing ensembles of pointsystems carefully chosen from P is a general technique thatcan be used not just for multipliers but other systems aswell. In this section, we shall illustrate how this ensembleidea can be applied to enhance the power-awareness of mul-tiported register files, digital filters and a dynamic voltagescaled processor. In each case, we express the problem interms of the framework we have developed above and char-acterize the power-awareness of the system. Then we usean ensemble construction to enhance power-awareness. Itis interesting to note that these applications cover not justspatial ensembles, but purely temporal (processor exam-ple) and spatial-temporal hybrid ensembles (register filesand adaptive digital filters) as well.

A. Power-Aware Register Files

A.1 Motivation

Architecture and VLSI technology trends point in thedirection of increasing energy budgets for register files [8].The key to enhancing the power-awareness of register filesis the observation that over a typical window of operation, amicroprocessor accesses a small group of registers repeat-edly, rather than the entire register file. This locality ofaccess is demonstrated by the 20 benchmarks comprisingthe SPEC92 suite that were run on a MIPS R3000 (fig.10). More than 75% of the time, no more than 16 registerswere accessed by the processor in a 60-instruction window.Equally importantly, there was strong locality from win-dow to window. More than 85% of the time, less than 5registers changed from window to window.

510

1520

2530

5

10

15

20

0

0.5

1

Benchmark

Distinct Registers Accessed

Pro

babi

lity

Fig. 10. Distribution showing number of distinct registers accessedin a 60-instruction long window for the MIPS R3000 processorexecuting 20 benchmarks.

If we think of the number of registers the processor typi-cally needs over a certain instruction window as a scenario,the curves in figure 10 are simply scenario distributions.When a processor uses n registers over a window, we wouldwant the file to behave as if it were a n-register (i.e. n-word)file. This would lead to a register file architecture which

Page 9: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

9

is significantly more power-aware than one where the filesalways behaves as a, say, 32-register file. The reason forthis of course is that smaller files have lower costs of accessbecause the switched bit-line capacitance is lower. Hence,from a power-awareness perspective, over any instructionwindow, we want to use a file that is as small as possible.

A.2 Modeling the problem

We model the problem of increasing the power-awarenessof register files using the terminology developed in sectionIII:

1. Function to be realized (F ): A 2n-word x m-bit registerfile with r read ports and t write ports.2. Set of scenarios (S): We use the number of registersaccessed in an instruction window of length k to charac-terize scenarios. In picking k, one must remember that thelonger the window, the larger the number of accessed reg-isters, leading to lesser differentiation. A smaller windowneeds frequent scenario to point-system mapping changeswhich has energy costs too. In this paper, we choose k=60.3. Point Systems Available (P): We assume the availabilityof 1, 2, 4, ... 2n word x m-bit register files with r read portsand t write ports. Hence, the number of words is the onlydegree of freedom allowed. While it is possible to havemore exotic point systems (different read and write ports,bit-widths etc.), our choice is reasonable and works wellwhen practically implemented.4. Scenario Distributions (d): The twenty register accessprofiles in figure 10 are the scenario distributions.5. Energy function (e) and overhead energy (w): Allregister file results were obtained by generating lay-outs using a custom-written program, extracting the lay-outs into SPICE netlists, and simulating the netlists inPowerMillTM with test vectors. The register files them-selves were implemented using NAND-style row decodingin dynamic logic with precharged address decoding lines,and use a standard cross-coupled inverter pair for staticstorage. The file that we use to illustrate power-aware en-gineering is a 32x4 bit, 3 read, 2 write port file. We chosem = 4 although, as long as m is not unreasonably large,it does not affect the results in any material way. This isbecause the bitline switched cap is essentially independentof m.

A.3 Results

A monolithic 32-word file has an awareness varying be-tween 0.2 and 0.3 for the different distributions. Using a(16,8,4,4) ensemble as shown in figure 11 we increase aware-ness to between 0.5 and 0.8 for the different distributions.

The energy curves of the single point solution (a 32-wordfile) and the 4-point (16,8,4,4) ensemble are plotted in fig-ure 12. Interpreted in terms of lifetime increase, the non-uniform 4-point ensemble increases lifetime by between 2and 2.5 times for the twenty distributions used.

Bank-0 (4 registers)

Bank-1 (4 registers)

Bank-2 (8 registers)

Bank-3 (16 registers)

Bank Select Logic

Address Data

Fig. 11. This (16,8,4,4) ensemble of 4 register files increases power-awareness by twice for the d1 scenario distribution and about 2.5times for the d2 distribution.

0 5 10 15 20 25 30 3540

60

80

100

120

140

160

180

200

220

240

Registers Accessed

Ene

rgy

per

Acc

ess

(nJ)

E(32) E(16,8,4,4)

Fig. 12. Energy curves of a monolithic 32-word file and the non-uniform 4-point ensemble (16,8,4,4).

B. Power Aware Filters

B.1 Motivation

There are significant motivations for investigating power-aware filters. As an example, consider the adaptiveequalization filters that are ubiquitous in communicationsASICs. The filtering quality requirements depend stronglyon the channel conditions (line lengths, noise and interfer-ence), the state of the system (training, continuous adapta-tion, freeze etc.), the standard dictated specifications andthe quality of service (QoS) desired. All these considera-tions lead to tremendous scenario diversity which a power-aware filtering system can exploit [9].

B.2 Modeling the problem

1. Function to be realized (F ):

y[n] =Number of Taps

k=1

h[k]x[n− k]

Page 10: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

10

We have chosen a 64-tap, 24-bit filter.2. Set of scenarios (S): We use the basis <Number of taps,Precision> to characterize the operational state that thesystem is in. The precision refers to both the data andcoefficients.3. Point Systems Available (P ): We assume the availabil-ity of all possible < Number of taps, Precision> filters.We pick distributed arithmetic (DA) filters as describedin [10] because they allow the energy to scale with bothtaps and desired precision. A 4-tap DA filter is shown infigure 13. In each step, incoming and delayed data bitswith the same weights are used to access a memory whichhas pre-computed combinations of coefficients h[n]. Thisprecomputed value is then either added to or subtractedfrom a partially accumulated sum y[n]. The number ofcycles needed is the same as the precision of the multipli-cands. Thus filters that have to manage precisions lowerthan the maximum can scale their voltages lower and stillmeet deadlines. Hence, this filter architecture allows ex-tremely fine-grained control over energy dissipation and ishighly aware. The problem is that since it relies on lookups,it has to resort to partitioning and hybrid schemes to re-main feasible as the number of taps grows [10].

h[3]

h[2]

h[3] +h[0]

h[3] +h[1]

h[2] +h[0]

h[2] +h[1]

h[2] +h[1]+h[0]

h[3] +h[1]+h[0]

h[3] +h[2]

h[3] +h[2]+h[0]

h[3] +h[2]+h[1]

h[3] +h[2]+h[1]+h[0]

h[1]+h[0]

h[1]

h[0]

0

y[n]

x 2 Add/Sub

x[n-3]

x[n-2]

x[n-1]

x[n]

LSB MSB

Fig. 13. A 4-tap distributed arithmetic (DA) filter architecture.

4. Scenario Distributions (d): We model the desired filter-ing quality using a synthetic distribution centered around a<16-taps,8-bit> scenario. Such a distribution will prevail,for instance, when the system is in the freeze mode with ahigh line quality and/or low SNR requirements.5. Energy function (e) and overhead energy costs (w): Weparametrically model the filters described since the na-

20

40

60

510

1520

5

10

15

x 10−3

Precision (bits)

Taps

Pro

babi

lity

(d)

Fig. 14. Probability distribution of anticipated adaptive filter qual-ity needed when the system is in ”freeze” mode with good lineconditions and/or low SNR requirements.

ture of the DA architecture lends itself to reasonably accu-rate energy model [5]. The energy curve that results fromthis model is shown in figure 15. Note that while energyscales about linearly with the number of taps, it scales ina quadratic manner with precision. This is because of thefact that lower precision filters can scale their voltage.

20

40

60

510

1520

0.5

1

1.5

2

2.5

x 104

Precision (bits)

Taps

E(P

erfe

ct)

Fig. 15. The “perfect” energy curve for 64-tap, 24-bit DA-basedfiltering.

B.3 Results

Before we illustrate suitable ensemble constructions thatenhance power-awareness, it is instructive to look at the en-ergy characteristics of the perfect system. Figure 16 plotsthe product of scenario energy and scenario probability forthe perfect system (which would be an ensemble of 1536(=64.24) point systems). The scenario energy-probabilityproduct curve shows the energy consumed as a function ofscenarios. Note that although energy consumption aroundthe (16-tap,8-bit) scenario is clearly prominent in figure16, some high-precision, high-tap scenarios also account forsignificant contributions to the overall energy consumed.This is easily understood because although they occur in-frequently (as seen in the distribution plot in figure 14),they consume significant energy when they do occur (asseen in the energy plot in figure 15).

Page 11: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

11

10

20

30

40

50

60

46

810

1214

1618

2022

24

2

4

6

8

10

Precision (bits)Taps

EP

erfe

ct x

d

Fig. 16. The perfect energy-probability curve for 64-tap, 24-bit DA-based filtering.

If we used a single, 64-tap, 16-bit filter (i.e. a one pointensemble), the resultant energy-probability product curveturns out to be the one plotted in figure 17. A rough com-parison of the energies consumed by different scenarios inthis system to that in the perfect system shows that theformer is significantly non-optimal. In fact, the power-awareness of the single point system is only 0.17.

20

40

60

510

1520

50

100

150

200

250

Precision (bits)Taps

E(6

4,24

) x

d

Fig. 17. The energy-probability curve for a single 64-tap, 24-bitDA-based filter.

To find more optimal ensembles, we programmed abrute-force exhaustive search algorithm that could find thebest 4-point ensemble. Due to exponential timing require-ments, it broke down after that and a greedy heuristictook over. The optimal 4-point ensemble turns out to be((64, 24), (64, 15), (58, 20), (51, 10)) 5 as shown in figure 18.

Its energy-probability curve is plotted in figure 19. Notethat although not quite optimal, it has a power-awarenessof 0.52, which is over three times better than the singlepoint ensemble.

Interestingly, our greedy heuristic revealed that if we in-clude 4 more points - (30,17),(43,23),(64,7),(43,13) - in theabove ensemble, we can increase the power awareness to0.64.

5(64,24) stands for 64-tap, 24-bit precision etc.

51-tap, 10-bit FIR

58-tap, 20-bit FIR

64-tap, 24-bit FIR

64-tap, 15-bit FIR

Arbiter

X[n] Y[n]

Fig. 18. This 4-point ensemble of DA filters improves power-awareness by over three times over a single point system.

10

20

30

40

50

60

46

810

1214

1618

2022

24

5

10

15

20

Precision (bits)Taps

E(4

−po

int e

nsem

ble)

x d

Fig. 19. The energy-probability curve for the 4-point ensemble infigure 18. Note the similarity to the “perfect” curve in figure 16.

C. Power-Aware Processors

C.1 Motivation

Having looked at three examples of power-aware sub-systems (multipliers, register files and digital filters), weillustrate power-awareness at the next level of the systemhierarchy - a power-aware processor that scales its energywith workload. Unlike previous examples, however, thisone illustrates how an ensemble can be realized in a purelytemporal rather than a spatial manner.

It is well known that processor workloads can vary sig-nificantly and it is highly desirable for the processor toscale its energy with the workload. A powerful techniquethat allows such power-awareness is dynamic frequency andvoltage scaling [11]. The basic idea is to reduce energyin non-worst-case workloads by extending them to use allavailable time, rather than simply computing everythingat the maximum clock speed and then going into an idleor sleep state. This is because using all available time al-lows one to lower the frequency of the processor which inturn allows scaling down the voltage leading to significantenergy savings [11], [12], [13].

In terms of the power-awareness framework that we havedeveloped, a scenario would be characterized by the work-load. The point systems would be processors designed tomanage a specific workload. As the workload changes, we

Page 12: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

12

would ideally want the processor designed for the instanta-neous workload to execute it. It is clear that implementingsuch an ensemble spatially is meaningless and must be donetemporally using a dynamic voltage scaling system. Beforewe look at such a system, we state the problem more con-cisely.

C.2 Modeling the problem

1. Function to be realized (F ): Any workload running ona given processor. In this case, the processor we use is theIntel StrongArm SA-1100. The workload variation comesfrom a variable tap filter running on the SA-1100 (thereader is referred to [12] for details of the actual setup).2. Set of scenarios (S): We use the workload ∈ [0, 1] as abasis (with 0 for no workload to 1 for a completely utilizedprocessor). Note that the workload requirement has a one-one mapping to a frequency and voltage requirement.3. Point Systems Available (P ): A point system in thiscase would refer to the SA-1100 designed for a specificworkload. Since we are interested in achieving powerawareness through voltage scaling, this corresponds to aSA-1100 with a dedicated voltage and frequency (which arethe minimum possible to achieve the workload). Also, dueto an infinite number of scenarios, there are infinite numberof point systems - one for every workload between 0 and1. Equivalently, in terms of voltages, there are an infinitenumber of point systems between 0 and V max

dd , the latterbeing the highest voltage the SA-1100 can run at, whichalso corresponds to its highest frequency and a workloadof unity.4. Scenario distribution (d): We assume, for simplicity,that all workloads are equally probable. As we see be-low, such an assumption is pessimistic and in real applica-tions, we can expect to see even better numbers for power-awareness.5. Energy function (e) and energy overhead (w): The en-ergy dissipated by the SA-1100 was physically measured.

C.3 Results

We now analyze an actually constructed system thatrecently demonstrated this power-awareness concept [12].The overall setup is summarized in figure 20 adapted from[12]. The basic idea is that a power-aware operating sys-tem (µ−OS) running on the SA-1100 determines the cur-rent workload, scales the frequency accordingly and theninstructs a switched regulator supply to scale the voltageaccordingly. The reader is referred to [12] for the details ofthe setup and the dynamic voltage circuitry etc.

The DVS system uses a temporal ensemble of 32 pointsystems with voltage levels uniformly distributed between0 and V max

dd . The energy-curves of a non-aware i.e. fixedvoltage system and the implemented dynamic voltage sys-tem are plotted in figure 21 .

For uniform workload distributions, power-awareness im-proves from 0.63 for a fixed voltage system to 1.0 for theimplemented dynamic voltage system. Note that althoughthe 32-point ensemble is by no means perfect, it was cho-sen as a reference to define the power-awareness (since the

SA-1100

VddBuck

Regulator

Controller+

Prog. Logic

Vddmax

µ-OS

Desired Supply Voltage (Digital Value)

VariableVdd

Fig. 20. The dynamic voltage scaling (DVS) system used to enhancepower-awareness.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Workload

Nor

mal

ized

Ene

rgy

E(Fixed Voltage)E(DVS)

Fig. 21. The energy curves of the fixed voltage and DVS system.Both curves have been normalized with respect to the maximumload case.

ratio of the power-awareness of one system to the otheris independent of the perfect system). Hence, for uniformload distributions, DVS leads to battery lifetime increasesof about 60%.

D. Power-Aware Data Gathering Wireless Networks

D.1 Motivation

Increasing levels of integration and advanced low-powertechniques are enabling ad-hoc, wireless networks of mi-crosensor nodes. Each node is composed of a sensor, analogpre-conditioning circuitry, A/D, processing elements (DSP,RISC, FPGA etc.) and a radio link, all powered by abattery. Replacing high quality macro-sensors with suchnetworks has several advantages - robustness and fault-tolerance, autonomous operation for years, enhanced dataquality and optimal cost-performance [14], [15]. Such datagathering networks are expected to find wide use in re-mote monitoring applications, intrusion detection, smart-medicine etc. An illustrative data gathering network isshown in figure 22. The network is live as long as it canguarantee that any source in region R will be sensed andthe data relayed back to a fixed basestation. To accomplishthis objective, different nodes take on different roles overthe lifetime of the network as seen in the figure. A note-

Page 13: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

13

worthy point is that nodes must often change roles even ifthe source does not move. This is to enable energy drainto be spread throughout the network which leads to in-creased lifetimes. An assignment of roles to nodes thatleads to data-gathering is termed a feasible role assign-ment6. A data-gathering strategy or collaborative strategycan be completely characterized by specifying a sequenceof feasible role assignments and the time for which the as-signment is sustained.

B

R

S0

S1

S2

123

4

5

6

78

9 10

Fig. 22. A sensor network gathering data from a circularly observablesource (denoted by a ×) residing in the shaded region R. Livenodes are denoted by • and dead ones by ◦. The basestation ismarked B. In this example we require that at-least two nodessense the source. When the source is at S0, nodes 1 and 7 assumethe role of sensors and nodes 2 → 3 → 4 → 5 → 6 form the relaypath for data from node 1 while nodes 7 → 8 → 9 → 5 → 6 formthe relay path for data from node 7. Data might be aggregatedinto one stream at node 5. This is not the only feasible roleassignment that allows the source to be sensed. For instance,node 10 could act as the second sensor instead of node 7 and10 → 7 → 8 → 4 → 5 → 6 could form the corresponding relaypath. Also, node 6 might aggregate the data instead of node5 etc. Finally, note how the sensor, aggregator and relay rolesmust change as the source moves from S0 to S1.

A key challenge in unlocking the potential of data-gathering networks is attaining long lifetime despite theseverely energy constrained nature of the network . Forexample, networks composed of ultra-compact nodes car-rying less than 2 J of battery energy might be expected tolast for 5-10 years [16]. It is possible to address these chal-lenges by power-aware design. Data-gathering networkscan be aware to the desired quality of gathered data, tochanging source behavior, to the changing state of the net-work and finally to the environment in which they reside.In this section, we focus on this last aspect i.e. we tacklethe problem of designing a power-aware data gathering net-work that tracks changes in the environment to maximizeenergy efficiency. It is well known that the transmit powercan be scaled with changing noise power to maintain the

6We also require feasible role assignments to be non-redundant i.e.data from a sensor should not be routed via multiple links.

same SNR and hence the same link performance. A moreholistic approach is to view environmental variations asaffecting changes in the energy needed to process a bit(i.e. carry out some computation on it) versus the en-ergy needed to communicate it. A power-aware network isthen simply one that can track changes in the computation-to-communication energy ratio. For large ratios i.e. highcomputation costs, the network will favor unaggregated orraw sensor streams. Conversely, for low ratios i.e. highcommunication costs, aggregation will be favored. Hence,the challenge in power-aware data gathering is to deter-mine and execute the collaborative strategy that assignsroles optimally for a specified computation to communica-tion energy ratio.

D.2 Modelling the Problem

1. Function to be realized (F ): Gathering data from aspecified point source using a specified network. The net-work is specified by its topology (including the location ofthe basestation) and the initial energy in the nodes.2. Set of scenarios (S): A scenario is characterized by theratio of computation to communication energy.3. Point systems available (P ): A point system simply cor-responds to a collaborative strategy as defined above.4. Scenario distribution (d): We will analyze the power-awareness for uniform scenario distributions.5. Energy function (e) and energy overhead (w): The en-ergy needed for communicating a bit is modelled using dn

path-loss model as α1 + α2dn where n is typically between2 to 4 [17]. The energy required to aggregate two bits intoone is denoted by α3. Nominal values of α1, α2 and α3are typically 150 nJ/bit, 10 pJ/bit/m2 and 50 nJ/bit [16].The computation to communication energy ratio is definedas α3

α2because variations in channel noise mainly mandate

changes in the power dissipation in the transmit amplifier.

D.3 Results

To illustrate power-aware data gathering, we simulateda 8-node network for a whole range of computation tocommunication ratios centered about the nominal ratio.Specifically, the ratio was varied in steps of 3 dB start-ing with 30 dB below nominal and ending at 24 dB abovenominal. For each ratio, the optimal collaborative strat-egy was determined via linear programming. This strategywas executed by the network and the lifetime recorded.The inverse of the lifetime was used as a measure of theaverage data-gathering power dissipated in the network.Figure 23 shows the variation in the data-gathering powerwith changing scenarios. The impact of adapting the data-gathering strategy to track the energy ratio is clear - thepower-aware network displays close to two orders of dissipa-tion diversity. Translated in terms of the proposed power-awareness metric, the temporal ensemble of collaborativestrategies is 3.22 times more power-aware than an unawarenetwork for a uniform scenario distribution. Rephrased,power-aware data-gathering can increase network lifetimeby more than 3 times compared to an unaware network.

Page 14: Manish Bhardwaj, Rex Min, Anantha P. Chandrakasandelta.cs.cinvestav.mx/~pmejia/power/manishb01.pdf · Manish Bhardwaj, Rex Min, Anantha P. Chandrakasan Abstract— An increasingly

14

−30 −20 −10 0 10 20

−4

−2

0

2

4

6

8

10

12

14

16

18

Computation:communication energy (dB, 0=nominal)

Ave

rage

dat

a ga

ther

ing

pow

er (

dB, 0

=no

min

al)

Fig. 23. The energy curve of a power-aware data gathering network.

V. Conclusions

In this paper, our objective was two-fold - to quantifythe increasingly important notion of power-awareness of aVLSI system and having done that, to propose a systematictechnique to enhance this quality.

The first step in quantifying the power-awareness of ageneral system was to develop the notion of a perfectlypower-aware system (Hperfect). The awareness of this sys-tem was shown to be an upper bound on practically achiev-able power-awareness. In the next step, we proposed apower-awareness metric whose physical interpretation isthe expected battery lifetime of a system normalized tothe lifetime of the perfect system.

Next, the problem of enhancing power-awareness wastreated formally using the concept of ensembles of pointsystems. We showed that constructing systems by intelli-gently putting together dedicated point systems could sig-nificantly enhance power-awareness. The basic factor thatlimited a monotonic increase in power-awareness as moreand more point systems were put together was the increas-ingly amount of energy spent in co-ordinating these pointsystems. Hence, the problem of finding an optimal subsetof point systems that struck the right balance was formallyproposed. While it is unlikely that this optimal sub-set canbe found using polynomial time algorithms, greedy heuris-tics were seen to work reasonably well.

The technique of ensemble construction was illustratedusing four different applications - multipliers, register files,digital filters and dynamic voltage processors. Significantpower awareness improvements leading to system batterylifetime improvements in the range of 60% to 200% wereseen.

It is our sincere hope that the power-awareness metricproposed here will be used to quantify this important as-pect of VLSI systems and that the proposed framework willbe employed by system architects to engineer systems thatscale their power and energy requirements with changing

operating scenarios leading to significant improvements inoverall battery lifetimes.

Acknowledgments

M. Bhardwaj is supported by an IBM Research Fel-lowship. R. Min is suported by a NDSEG Fellowship.This research is sponsored by the Defense Advanced Re-search Projects Agency (DARPA) Power Aware Comput-ing/Communication Program and the Air Force ResearchLaboratory, Air Force Materiel Command, USAF, underagreement number F30602-00-2-0551.

References[1] A. Sinha, A. Wang, A. P. Chandrakasan, Algorithmic Transforms

for Efficient Energy Scalable Computation, Proceedings of theInternational Symposium on Low Power Electronics and Design,2000.

[2] S. H. Nawab, et. al., Approximate Signal Processing Journalof VLSI Signal Processing Systems for Signal, Image, and VideoTechnology, vol. 15, no. 1/2, Jan. 1997, pp. 177-200.

[3] L. McMillan and L. A. Westover, A Forward-Mapping Realizationof the Inverse Discrete Cosine Transform, Proceedings of theData Compression Conference, March 1992, pp. 219-228.

[4] N. Weste and K. Eshraghian, Principles of CMOS VLSI De-sign: A System Perspective, Addison-Wesley Publishing Com-pany, 1994.

[5] A. Sinha and A. P. Chandrakasan, Energy Efficient Filtering Us-ing Adaptive Precision and Variable Voltage, 12th Annual IEEEASIC Conference, Sept. 1999.

[6] T. Martin and D. Siewiorek, A Power Metric for Mobile Systems,Proceedings of the 1996 International Symposium on Lower PowerElectronics and Design, 1996, pp. 37-42.

[7] M. R. Garey and D. S. Johnson, Computers and Intractability:A Guide to the Theory of NP-Completeness, W. H. Freeman &Company, New York, NY, 1979.

[8] V. Zyuban and P. Kogge, The energy complexity of register files,Proc. 1998 International Symposium on Low Power Electronicsand Design (ISLPED), pp. 305-310, 1998.

[9] C.J. Nicol et. al., A low power 128-tap digital adaptive equalizerfor broadband modems, IEEE Journal of Solid State Circuits,32(11), pp. 1777-1789, November 1997.

[10] R. Amirtharajah, T. Xanthopoulos and A.P. Chandrakasan,Power Scalable Processing Using Distributed Arithmetic, Proc.1999 International Symposium on Low Power Electronics and De-sign, pp. 170-175, 1999.

[11] V. Gutnik, A. P. Chandrakasan, Embedded Power Supply forLow-Power DSP, IEEE Transactions on Very Large Scale Inte-gration (VLSI) Systems, Volume 5 (4), Dec. 1997, pp. 425-435.

[12] R. Min, T. Furrer, and A. P. Chandrakasan, Dynamic VoltageScaling Techniques for Distributed Microsensor Networks Pro-ceedings of the IEEE Computer Society Annual Workshop onVLSI (WVLSI ’00), April 2000.

[13] T. Burd, T. Pering, A. Stratakos, and R. Brodersen, A DynamicVoltage Scaled Microprocessor System, Proc. ISSCC 2000, pp.294-295.

[14] G. Pottie, Wireless Sensor Networks, Information Theory Work-shop, 1998, pp. 139-140.

[15] G. Pottie and W. Kaiser, Wireless Integrated Network Sensors,Communications of the ACM, vol. 43 (5), pp. 51-58, May 2000.

[16] W. Heinzelman, Application-Specific Protocol Architectures forWireless Networks, Ph.D. thesis, Massachusetts Institute of Tech-nology, 2000.

[17] T. Rappaport, Wireless Communications: Principles & Prac-tice, Prentice-Hall, Inc., New Jersey, 1996.