Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
Transcript of Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 1/13
I l l0 IEEE TRANSA CTlONS ON COMP UTER-AIDE D DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13. NO. 9, SEPTEMBER 1994
Power Efficient Technology
Decomposition and Mapping Under an
Extended Power Consumption ModelC hi - Y i ng Tsui, Massoud P e d r a m , Member, IEEE, and A l v i n M. Despain, Member, IEEE
Abstract-We propose a new power consumption model thataccounts for the power consumption at the internal nodes ofa CMOS gate. Next, we address the problem of minimizing theaverage power consumption during the technology dependentphase of logic synthesis. Our approach consists of two steps. Inthe first step, we generate a NAND decomposition of an optimizedBoolean network such that the sum of average switching ratesfor all nodes in the network is minimum. In the second step,we perform a power efficient technology mapping that finds aminimal power ma pping for given timing constraints (subject tothe unknown load problem).
I. INTRODUCTION
ORTABLE applications have shifted from conventionalP ow performance products such as wristwatches and cal-
culators to high throughput and computationally intensive
products such as notebook computers and cellular phones.
The new applications require high speed yet low power
consumption, because for such products longer battery life
translates to extended use and better marketability. With
the convergence of telecommunications, computers, consumer
electronics, and biomedical technologies, the number of low-
power applications is expected to grow. Another driving force
behind design for low power is that excessive power con-
sumption is becoming the limiting factor in integrating more
transistors on a single- or multi-chip module due to cooling,packaging, and reliability problems. Exploring the trade-offs
between area, performance, and power during synthesis and
design is thus demanding more attention.
PRIORWORK
Many researchers (in solid state circuits and technology
areas) have been studying low power/ low voltage design tech-
niques. For example, research is being conducted in low power
DRAM and SRAM designs, aggressive voltage scaling, and
process optimization for active logic circuits, device modeling
and simulation tools for low power circuits, low power analog
circuit design, etc. Other researchers (in computer architecturearea) are exploring instruction set architectures and novel
memory management schemes for low-power processor designManuscript received August 16, 1993; revised March 18, 1994. This
research was supported in part by the NSF’s Research Initiation Award undercontract No. MIW211668 and DAPRA under contract No. J-FBI-91-194.This paper was recommended by Associate Editor R. E. Bryant.
The authors are with the Department of Electrical Engineering-System,
University of Southern California, Los Angeles, CA 90089 USA.IEEE Log Number 94 0 1870.
using self-clocking, static, and dynamic power management
strategies, etc. The computer- aided design community has
recently started paying more attention to power estimation
and low-power design.
Power consumption is a function of the switching of the load
capacitances, which in turn depends on the sequence of logic
values applied to the circuit inputs. It is impractical to exhaus-
tively simulate all possible input sequ ences, hence probabilistic
approaches are developed to estimate the switching activities.
If the set of all possible transitions at the inputs of the circuit
is known, we can consider each transition as an event with a
certain probability. Over this probability space, the transition
at an internal node of the circuit becomes a stochastic process[161. The statistical information about the node activities can
then be captured by the expected number of transitions which
in turn depends on the signal and switching probabilities of
the node.
Several researchers have studied the problem of estimating
power consumption. Cirit [6] estimates the dynamic power
consumption at the transistor level based on the probability
of the drain of a transistor making a power-consuming tran-
sition. Burch et al. [4] introduce the concept of probability
waveforms. Given such waveforms at the primary inputs and
with some convenient partitioning of the circuit, they examine
every sub-circuit and derive corresponding waveforms at theinternal circuit nodes. Najm et al . [15] use the notions of
transition probabilities and probabilistic simulation to estimate
the average current drawn by a circuit. These methods assume
inputs to sub-circuits are independent and thus do not account
for the reconvergent fanout and input correlations.
Ghosh et al . [9] propose symbolic simulation in order to
produce a set of Boolean functions that represent conditions for
switching at each gate in the.circuit. A general delay mod el that
correctly computes the Boolean conditions causing glitchings
is used and correlations due to the reconvergence of input
signals are taken into account.
Recently, researches on synthesis for low power have beencarried out. Algorithms that minimize power consumption
during multi-level logic minimization [191, [ 171, state assign-ment for finite state machines [17], and placement [22] were
proposed.
All of the above power-estimation tools and low power
synthesis and design algorithms are based on a simple-power
consumption model where power is consumed only during thecharging and discharging of the external capacitances driven
02784l070/94$04.00 0 994 IEEE
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 2/13
TSUI et al.: POWER EFF'ICIENT TECHNOLOGY DECOMPOSITION AND MAPPING
~
1111
by the gates. This assumption surely underestimates the power
consumption of the circuit since it does not account for the
power consumption due to the charging and discharging of
the source/drain capacitance of the internal nodes of the gates.
Power consumption due to internal capacitances contributes
a significant portion of the total power consumption in the
circuit (see Section V). Any power-estimation tool ignoring
this factor will thus be inaccurate.In this paper, we propose a new power-consumption model
that considers the internal power consumption and present a
technique to estimate the average internal power consumption
for dynamic DOMINO CMOS as well as static CMOS circuits.
We also propose a low-power technology decomposition pro-
cedure that produces a decomposed network with minimum
switching activities and a low-power technology mapping
algorithm that hides the high switching nodes inside gates
and reduces the fanout load driven by high switching activity
nodes.
Calculation of Switching Prob abilities
Estimation of switching probabilities depends on the delay
model used. Under a zero delay model where gate delays are
assumed to be zero [9], signal transitions due to glitching are
ignored. This tends to underestimate the power consumption.
There is, however, a good correlation between the estimated
power under a zero delay model and a real delay model [191.
In the following, we assume a zero delay model (see Section
VI, however, for more discussion on issues regarding the use
of a real delay model).
Input correlations also affect the calculation of signal prob-
abilities. In combinational circuits such as datapath modules,
the input patterns are usually random. Thus, the primary inputs
can be assumed to be spatially and temporally uncorrelated.
What this means is that each circuit input is independent of
the others, and furthermore, the present value of the input is
independent of its previous value. In sequential circuits suchas finite state machines, the present state bits are spatially
and temporally correlated, thus the independence assumption
does not hold. Reference [21] describes exact and approximate
methods to calculate signal and transition probabilities in
such circuits. In the remainder of this paper, we assume that
the circuit inputs are spatially uncorrelated (see Section VI,
however, for a method to approximate the spatial correlations
among circuit inputs).
For n-type dynamic circuits, the output is precharged to 1
and hence the switching probability is given by
where PO=o s the probability that signal 0 assumes value 0.
0 and hence the switching probability is given byFo r p-type dynamic circuits, the output is predischarged to
where Po=l is the probability that signal 0 assumes value I .
For static circuits, if we assum e that the present input signal
value is independent of the previous value, then the switching
probabilities are given by
-~ o L ( z - > u ) ~ 0 2 ( ~ ~ > L ~o ~ = ~ / o * = y ~ o ~ = u ( o ~ = v4)
where Pol=zloz=ys the probability that signal 01 assumes xgiven that signal 02 assumes y. The signal probability at the
output of a node is calculated by first building an Ordered
Binary-Decision Diagrams (OBDD) 3 ] corresponding to the
global function of the node and then performing a linear
traversal of the OBDD using the procedure given in [14].
Organization of the Paper
The rest of this paper is organized as follows. In Section
I1 we review the conventional power-consumption models
and present an extended model that accounts for the intemai
power consumption. We also introduce the notion of charging
and discharging functions for an internal node of a gate and
present techniques that use these functions to estimate the
internal power consumption. In Section 111, we describe aprocedure for con structing a NAN D-decomposed network that
has minimum total switching activity. Section IV presents a
technology mapping paradigm for low power that exploits the
power/delay trade-off curves to generate a mapped network
with minimum power consumption subject to given timing
constraints. Experimental results are presented in Section V .
We discuss some issues and extensions in Section VI and give
conclusions in Section VII.
11. POWER CONSUMPTION MODELING N D ANALYSIS
A Simple Power Consumption Model
Power consumption in CMOS circuits is caused by three
sources: the charging and discharging of capacitive loadsduring output switchings, the short circuit current that flows
during output transitions, and leakage current. The last two
sources can be made small with proper device and circuit de-
sign techniques [ 2 3 ] .We therefore concentrate on the dynamic
power consumption. -The average power consumption for a gate g in a syn-
chronous CMOS circuit is given by
where
and CL is the gate capacitance of j , Cwire s the wiring
capacitance of the output net, V& s the supply voltage, Tcycle
is the clock cycle time, and E(switching) is the expected
number of transitions at the output of g per clock cycle.
E ( s w i t c h i n g ) is a function of the design style, the logic
function being computed, the delay model and switching
probabilities of the primary inputs.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 3/13
1112 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 9, SEFTEMBER 1994
1->o i2 4r Edischarged at t2
ou t 0->o
i1i i2i\Fig. I. Charging and discharging of an internal node of a 2-input NOR gate.
Since the gate capacitance is proportional to the gate area,
the traditional approach for minimizing the power consump-
tion has been to minimize the total gate area. However, the
power consumption also depends on the switching activities
of the gates. Indeed, we have to minimize the total weighted
switching activity in the circuit i.e. (EC~,,J3(switching)) .Therefore, the minimum area solution is in general different
from the minimum power solution.
Power Consumption ut Internal Nodes of U Gate
In a typical 0.8-pm technology, the diffusion capacitance
is about 20% of the gate capacitance [I]. The charging anddischarging in the sourceldrain capacitance of transistors in
a gate thus makes a considerable contribution to the power
consump tion of the gate. In particular, the internal capacitancesmay be charged and discharged without any change in the gate
output. This is illustrated in Fig. 1. For a two-input NOR gate,
if the inputs are changing from 01 to 10, the output remains
unchanged. However, when the inputs are 01, the capacitance
of node A is charged then discharged when the inputs arechanged to 10. This contributes to the power consumption in
the circuit. The amount of power consumption depends on thefrequency of charging and discharging and the sourceldrain
capacitances (hence sizes of the transistors).
Internal power consum ption is especially important for com-
plex gates that have more internal nodes with higher parasitic
capacitances. A complex gate can be logically decomposed
into a network consisting of only two input AND/OR gates
and inverters. It may appear that the power consumption of
internal nodes of the complex gate can be accounted for
by calculating the power consumption of the external nodes
in the corresponding decomposed network. This is incorrect,
however, for the following reasons:1) A complex gate may be decomposed in different ways
(see Fig. 2(b)).
2 ) There does not exist a one-to-one correspondence be-tween internal nodes of a complex gate and the external
nodes of any given decomposition (internal node A of
the four input NAND gate shown in Fig. 2(a) is one
such node).
Configuration A
(a) (b)
Fig. 2decompositions.
(a) Four-input N A N D gates and (b) two-input AN D and inverter
3) The internal nodes of a complex gate may float while
the external nodes of a &en decompo sition are always
either low or high.
The power consumption at the internal node n of gate g in
a static circuit is given by
p::i(n) = 0 . 5 1 / d 2 d C ~ D E I I ( c h a I . g e / d i s c h ~ I . g e ) 7)Tcyc le
where CED is the sourceldrain capacitance at n and
E , (chargeldischarge) is the expected number of charge
and discharge events at n per clock cycle.'
CSnDand E,(ch nrge /disch arge ) for internal nodes of gates
in dynamic circuits is somewhat different and deserves more
explanation. We will consider n-type dynamic circuits. Similar
results can be derived for p-type dynamic circuits. Here, output
nodes are charged to 1 during the precharge period while all the
gate inputs that are fed from the previous stage are maintained
at 0. Therefore, only the precharged output nodes have direct
charging path to V&. Consider Fig. 3. During the evaluationperiod, if i l is 1 and i2 is 0, there will be charge sharing
between output node ou t and internal node n and the voltage
seen at n will be equal to c,:;;&,,Vdd.
The power consumption due to charge sharing is thus equal
to
where C r f f = ( )2C, is the effective capacitance seen
at n.
The effective capacitance of an internal node depends on
how many internal nodes are connected to the prechargednode at the same time. In Fig. 3, if i~ is also 1, then m is
connected to n and out. The effective capacitance seen at n
I Equation (7) assumes that the threshold voltage \? IS negligible comparedto I &d . In general , I f the internal node 71 is in the nmos (or pmos) chainof the gate, then In (7) should be replaced by ( \ i d - \kN ' (or
(t2d- 1y p ) where bkN (or 1k , ) IS one threshold voltage of the nmos(or pmos) transistor
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 4/13
TSUI er al . : POWER EFFICIENT TECHNOLOGY DECOMPOSITION AND MAPPING 1113
c lk
Calculation of Expected Number of ChargelDischarge Events
The internal power consumption depends on C iD ,which
is determined by the size of transistors in the gate, &d on
E ; ( c h a r g e / d i s c h a r g e ) ,which depends on the global function
of i in terms of the primary inputs, the switching probabilities
of the primary inputs, and the internal structure of the gate.
The main theorem for calculating theEi ( c harge ld i sc harge ) is given next.
Theorem: The expected number of charge/discharge events
per clock cycle at some internal node i that is not precharged
(or predischarged) is given by
G nd
Fig. 3. Charge sharing in dynamic circuits.
is thus smaller and is given by
)C.^f = (eout c, +c,
Cout
In general, the effective capacitance of a node can be
approximated as
(9)
where Cj is the sourceldrain capacitance of internal node j
that lies on the path from n to the precharged node. Charge
sharing creates a drop in the output voltage and if this drop is
greater than a threshold voltage, it may create a logic error or
loss of noise margin in the subsequent stages. Therefore, for
a dynamic circuit to operate properly, the voltage drop due to
charge sharing must be small. This implies that Gout >> C;
an d C M C,.
When the charge sharing problem is severe, designers often
precharge the internal nodes of the pulldown NFET chain for
n-type circuits. In this case, for each precharged node n, since
it is precharged every cycle, E , ( c harge ld i sc harge ) is equal
to 4 o w ( n ) .
An Extended Power Consumption Model
tances, ( 5 ) must be augmented as follows:
To capture the power consumption due to internal capaci-
where P:Ek(i) is given by either (7) or (8).
where Phigh i ) andplow i ) denote probabilities of i being
charged to V& and discharged to G n d .Proof: We find E(n) , the expected number of
patternsh(f*)l in a string of characters consisting of n
elements from the alphabet { h , f , l }where h , f ,1 denote the
input combinations that give rise to charging, floating, and
discharging conditions at node i respectively, and f* meanszero or more f characters.
E ( n ) s composed of two parts. Th e first part is the expected
number of patterns h ( f * ) Z n a string of n - 1 characters (i.e.
E ( n - 1)).The second part is the product of P(h ) and the
probability P ( ( f * ) l , - 1) hat a string with n - 1 charactersbegins with pattern ( f * ) l . Th e reason for this is that if the first
character is an 1 or f, t will not create any pattern in addition
to those included in E ( n- 1); f the first character is an h,
then any string of n - 1 characters that begins with pattern
( f* ) l will add one additional h( f * ) l attern to E(n ) . ince
P( ( f * ) l , n 1)= P(1)+ P ( f ) P ( l ) . . + P( f ) " - 2P( l )
E(n) s given by the following recurrence formula
E(n)= E ( . - 1)+P(h)P(( f*) l ,n 1)
This recurrence formula can be solved to yield
Each character represents an input vector, and n input
vectors means n cycles have elapsed. So, the expected number
of charge/discharge events per clock cycle is given by
as n approaches 00 . Because
.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 5/13
1114 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO . 9, S E ~ E M B E R 994
this is equal to
0
In the following, we show how to calculatePhi,h(i) and
Pl,,(i). For both dynamic and static circuits, P h l g h ( n ) andPlow (n) are given by
where F , ( n ) and F d ( n ) are Boolean functions representing
conditions for 71 to be charged and discharged, respectively.
Le t TG(N ,E ) be transistor graph of gate g where each
node represents an internal node of the gate. For dynamic
circuits, V d d (or G71d) is replaced by the precharged (or
predischarged) node. Each edge that represents an n-transistor
is labeled with the signal name of the gate of the correspondingtransistor. Each edge that represents a p-transistor is labeled
with the inverted gate signal of the corresponding p-transistor.From the transistor graph, logic functions that charge/discharge
the internal nodes can be obtained as stated below.
Theorem: Le t 0 denote the gate output node, G be the
logic function of 0,puths(n1;n2 ) denote the set of all simplepaths connecting n1 and 722 in TG (N , E ) , n d L ( P ) =
z1A x2 A . . .A x,where P is a simple path consisting of edges
e l , e2 , . . . ,en and xi is the label of edge e,. The charging and
discharging functions for internal node n are given as follows
(proof omitted):
n-type dynamic:
v L ( P ) , ( 1 7 )dy namz cC N ( n ) = G A
PEpaths(n,precharged-node)
PEpaths (n .Gnd)
Vdd1 'y2
TG: TransistorGraphCF: hargingFunction
DF: Discharging Function
Fig. 4.
NOR gate.
The transistor graph and charg inddischarg ing functions for a 3-input
static PMOS:
FdF!OCs(n)= E A v L(P) (24)PEpaths (n ,O)
where A and V denote Boolean AN D and OR operations.
functions for a 3-input NO R gate.
Fig. 4 shows an example of the charging and discharging
111. POWEREFFICIENTECHNOLOGYECOMPOSITION
It is difficult to come up with a decomposed network thatleads to a minimum power-consumption implementation after
power-efficient technology mapping since gate loading and
mapping information are unknown at this stage. Nevertheless,
we have observed that a decomposition scheme that minimizes
the sum of the switching activities at the internal nodes of the
network is a good starting point for power-efficient tech nology
mapping. We illustrate this point with a simple example (Fig.
5(a)). A four-input AN D gate can be decomposed into a tree
of 2-input AN D gates in two ways. These two decompositions
have different total switching activities. Assuming n-type dy -
namic circuit and independent inputs, let P(u)= 0.7 , P(b)=0 .3,P(c)=0.3 and P(d)=0.7. The total switching activities for
configurations A and B are 2.317 and 2.464, respectively.
Configuration A appears to be better than configuration Bsince the sum of the switching activities at its internal nodes
is smaller. Furthermore, if we assume that the cell library has
2- and 3-input AN D gates, the minimum power mapping with
a power value of 2.107 is obtained from configuration A (Fig.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 6/13
TSUI et al.: POWER EFRCIENT TECHNOLOGY DECOMPOSITION AND MAPPING 1115
iD’“Fig. 5. An example showing the effect
MgurationB csw- ContigurationB
(0 ) (b)
of technology decomposition on total switching activity.
5(b)). In this example, decomposition with lower switching
activity leads to mapping with lower power consumption.
We denote the problem of generating a NAND-decomposed
network with minimum total switching activity as the MIN-
POWER decomposition. The performance-oriented version of
the above problem requires that the increase in the height ofthe decomposed network (compared to the undecomposed net-
work) be bounded. We refer to this problem as the BOUNDED-HEIGHT MINPOWER decomposition.
Tree Decomposition
We describe algorithms for solving the MINPOWER decom-
position for a fanout-free logic function (i.e. a function that
has a tree realization).
The basic algorithm is similar to Huffman’s algorithm
[lo] for constructing a binary tree with minimum average
weighted path length. We denote the leaves of a binary tree
by ~ 1 , 2 1 2 , . U,, the “path length” from the root to U, by I, ,
and the weight of leaf w; by w,. Assuming that the root is at
level zero (the highest level), leaf U; is at level 1; . Given a set
of weights w;, there is a simple O(n1ogn)-time algorithm due
to Huffman for constructing a binary tree such that the costfunction w;l; is minimum .
In a more general setting, a weight combination function
F ( z , ) (which is any symmetric function chosen as a binary
operator on the weight space U ) s used to produce the weight
W of internal nodes during tree construction. For each tree
T , tree costfunction G( W 1,Wz, . . ,Wn-l)gives the cost.2
Parker [111 characterized a wide class of weight combination
functions for which Huffman’s algorithm produces optimal
trees under the corresponding tree cost functions. We give
some definitions first and then state Parker’s main theorem.
Definition: A weight combination function F is quasi-
linear if F ( z , ) = q5-l(XI$(z) XI$(y)) where X is a nonzeroconstant and I$ is a real-valued invertible function on the
weight space U . (Note F is symmetric, and conjugate underI#J to the linear map X(x + y).)
Definition: A tree cost function G: U”-’ --+ R is Schur
concave if
21n Huffman’s onginal paper, F(s, ) = T + y and G =XI‘=;‘ t IS
easy to show that G = E:=, f z lL .
f o r a l l x ; , x j E U , z , j E 1 . . . ,n - 1 .Theorem: If the weight combination functionF is quasi-
linear and the corresponding function q5 is convex, positive,
(or concave,negative) and stiictly monotone and A> 1, then
the Huffman tree will have the least cost when G is any Schur
concave function of the internal node weights [111.
If F ( z ,y) does not satisfy the above conditions, then Huff-
man’s algorithm may not produce the optimal solution. Wepropose the following O (n2 10 gn ) greedy algorithm to solve
the decomposition problem for general weight combination
functions.
Algorithm 3.1: For every pair w; ndwj of the n nonneg-
ative weights w1 ,WZ, . . . w,, compute F;j(w ;, wj) nd storein list L. Find the smallest Fi j , say F ~ z . eplace the two
nodes by a single node having the weight W , = F ~ znd two
sons with weight w1 an d wz. liminate all F l k ( w 1 ,wk) nd
F2k(wg, wk) from L. ompute F l j ( W 1 ,w j ) and insert it intoL . Do this recursively for the n - 1 weights W,, wg , . . .,w,.
The final single node with weight Wn-l is then the root ofthe binary tree.
To solve the MINPOWER tree decomposition problem, we
must use appropriate weight combination and tree cost func-tions. We thus distinguish among two cases as follows.
Dynamic Circuits: Fo r (n-type circuits), dynamic gate out-
puts are precharged to 1 and switching occurs when the output
changes to 0 during the evaluation phase. For a 2-input AND
gate composed of a 2-input NAND gate and a static inverter,
the inverter output is 0 during the precharge period and the
switching probability (without input signal correlations) is
given by
.
WO= w;,w;, (26) ’
where WO nd w;, alues are probabilities of the output and
inputs assuming value 1.
Fo r p-type circuits, dynamic gate outputs are predischarged
to 0 and transition occurs when output evaluates to 1. Th ecorresponding formula for the switching probability for a 2-
input AN D gate composed of a 2-input NAND gate and a static
inverter is then given by
W, = 1- (1- q-)(1- q) (27)
where the WZ and y values are probabilities of the output
and inputs assuming value 0.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 7/13
1116 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 1994
The weight combination functions F( w Z l ,w Z 2 ) WO r
F ( w c .q) W, are used during the AN D decomposition.The corresponding tree cost function G is given by
n-1
G = C W i ,
i= l
Lemma: WO and W,, given in (26) and (2 7) above, arequasi-linear functions.Proof: For (26), since w, is within the range of [0,1] we
can take 4(x) = - l o g ( z ) , which is a convex, positive, and
decreasing function and A= 1. This shows that WO is quasi-
linear.Similarly, for (27), since w~ is within the range of [0,1] we
can take $(z) = -Iog(l - x) which is a convex, positive
and increasing function and A= 1. This shows that Wz is
0
quasi-linear. 0
Proof: Since = E, w2W ~ ) ( S e)0,
G is Schur concave.Theorem: MINPOWER tree decomposition for dynamic CMOS
circuits with uncorrelated input signals can be solved opti-mally by Huffman's algorithm using the weight combination
functions (26) and (27).30
If the input signals to the AN D gate are correlated, (26) and
(27) cannot be used as the weight combination function. The
switching probabilities for n-type and p-type circuits are then
given by
Lemma: G given in (28) above is Schur concave.
Proofi Follows from Lemma 3.2 and 3.3.
WO = ~Z1WZZIZ1. (29)
W, = 1- (1- w,)(l- w-2121 ) ! (30)
respectively, where wislil( w ~ ~ , )s the conditional probability
of i 2 (G)iven 2 1 .
Lemma: WO nd W, given in (29) and (30) are not quasi-linear functions.
Proof: We present the proof for (29). Proof for the other
case is similar.
One criterion for a function F to be quasi-linear is in-
creasingness, i.e. F(u , z ) < ( > ) F ( u , ; y )f 2 < (>)y [ l l ] .
However, in (29), we could have wiz 5 wi3, yet wi21il >711i:31i1, thus F(wi,,i 2 )> F ( w i , ,w,,). O oes not satisfy
the increasingness property and hence is not quasi-linear.
Sinc e WO 's given in (29) and (30) are not quasi-linear, we
use Algorithm 3.1 to solve the decomposition problem.
Static Circuits: For static CMOS circuits, we need to mini-
mize sum of the probabilities for output switching from 0 to
1 and 1 to 0. Thus, the weight combination function WOor a
2-input AND gate is equal to WO,->, + W,,->,where-' o o - > l - W i l o - > l ~ i 2 0 - > l w i l l- > 1 w i2 0 - > 1
+ Wile- > 1wi21- >1 > (31)
WOl->, = ~ f l i l l - > l w i 2 ~ - > o w i l l _ > 0 w i 2 1 - > I
+ ~ i l l - > " ~ , 2 1 - > 0 ~ (32)
Indeed, a chain-like tree decomposition will be obtained.
If input signals are correlated, conditional probabilities
between the input signal transitions have to be used in order to
calculate the output transition probability. W OO -> , nd Wol-,o
are then given by
WO, - >o = w 11- > 1w221- > 1211- > 1 + wz11- " wzz1- > 1 121 - 20
+ wz11- > o w,zl ->0 1211- > O f (34)
Note that for static circuits, every internal node is assigned
multiple weights (i.e. 7 ~ , ~ - > ~ , w , ~ - > ~ ) .he weightcombination uses all these weights through (3 1)-(34). T hisgeneral class of problems cannot be solved using the Huff-
man's algorithm, which applies when each internal node is
given a single weight. Therefore, we resort to Algorithm 3.1.
Under one temporal independence assumption for one gate
inputs, (31) and (32) can be replaced by + WO, - ,=
2WZ1Wz2(1-W21W22)while (33) and (34) can be replaced byWOO+,WO,- , = 2Wz1Wz2~,1(1 Wz1W,2111).The resultingweight combination functions F W2 1Wt2) are not quasi-linear,
and hence Algorithm 3.1 must be used.
Bounded-Height Tree Decomposition
Here, the objective is to construct a MINPOWER binary tree
for a given list of weights (switching probabilities) with the
restriction that the height of each weight (defined as muxi
I , ) does not exceed a given integer L . The best known
algorithm for solving BOUNDED-HEIGHTMINSUM problem is an
O ( n L ) algorithm due to Larmore and Hirschberg [131. Their
approach (based on the PACKAGE-MERGE algorithm) transforms
the BOUNDED-HEIGHTMINSUM tree decomposition problem to
an instance of the COIN COLLECTOR'S p r ~ b l e m . ~he PACKAGEstep in this algorithm uses the Huffman's Algorithm to merge
two items at level i to form a new item at level i + 1. The
MERGE step merges the newly formed items with the originalitems at each level. Details of the algorithm can be found in
For the general weight combination functions, the Larmore-
Hirschberg's algorithm has to be modified as follows: In the
PACKAGE step, replace the Huffman's algorithm by Algo-
rithm 3.1; the MERGE step is unchanged. Using the modified
Larmore-Hirschb erg s algorithm, the BOUNDED-HEIGHT MIN-
POWER tree decomposition can be solved heuristically for the
general weight combination functions.
~ 3 1 .
IV. POW ER FFICIENT ECHNOLOGY APPINGThe problem of technology mapping for low power can
then be stated as follows: Given a Boolean network repre-
senting a combinational logic circuit optimized by technologyindependent synthesis procedures and a target library, we bind
'An instance ( I .S) f the COIN COLLECTOR'S problem is defined asitems, each of which has a wi d t h and i w i g h f , find aiven a set I of
subset of S f I whose widths su m to S nd has m inimum total weight.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 8/13
TSUI er al.: POWER EFFICIENT TECHNOLOGY DECOMPOSITION AND MAPPING 1117
nodes in the network to gates in the library such that average
power consumption of the final implementation is minimized
and timing constraints are satisfied. A successful and efficient
solution to the minimum area mapping problem was suggested
in [12] and implemented in programs such as DAGON and
MIS. The idea is to reduce technology mapping to DAGcovering and to approximate DA G covering by a sequence of
tree coverings that can be performed optimally using dynamicprogramming.
Traditional goal for technology mapping is to minimize the
total area (or delay) of the mapped circuit. In [ 5 ] ,a near-
optimal solution is presented for finding the minimum area
solution under delay constraints. Their approach consists oftwo steps: In the first step, delay functions (which capture
arrival time-gate area trade-offs) at all nodes in the network
are generated. In the second step, a mapping solution based
on the computed delay functions and the required times at
the primary outputs is found. For a NAND-decomposed tree,
subject to load calculation errors, this two-step approach finds
the minimum area mapping satisfying all delay constraints if
such solution exists.
Our low-power technology mapper follows a procedure
similar to the above, with the difference that the objective
is to minimize the sum over all gates of the average power
consumed in the gate. The approach also consists of two steps:
First, a postorder traversal is used to determine a set of possible
amval times and accumulated power consumptions at each
node of the network. Once the user specifies a single required
time, a second preorder traversal starting from the primary
outputs is performed to determine the mapping solution that
minimizes the average power subject to the required time
constraints.
Terminology
Consider a match y at node n of a NAND-decomposed
tree. The inputs to node n consist of nodes n, that fanoutto node n (that is, n = ni + n i if n has two inputs
or n = ni if 71 has a single input). The nodes that are
covered by match y are denoted by m e r g e d ( n ,9). The nodesthat are not in m e r g e d ( r i , y ) but fanin to merged(n,g) are
denoted by znputs(n, 9 ) . Th e mapped-parent(n,) is some
node n for which there exists a matching gate g such that
ri{ E ~npu t s ( n ,). Note that given node n and gate ,y matching
at n, mpufs(n . ) are uniquely determined. However, ri, ma y
have many distinct mapped parents.
With each node in the network we store a power-delay
curve. A point on the curve represents the arrival time at
the output of the node and the average power consumed in
its mapped transitive fanin cone (but excluding the power
consumed at the load driven by the node that has yet to bedecided by a later mapping). In addition to the power and
delay values, the matching gate and input bindings for the
match are also stored with each point on the curve. Points on
the curve represent vanous mapping solutions with different
trade-offs between average power and speed. A point ( f l , 1 )
is inferior if there exists another point ( t 2 > p 2 ) n the curve
such that either p l > pp and t l 2 f 2 or pl 2 p2 and
tl > t 2 . All inferior points are dropped from the power-delay
curve.
Arrival Time and Power Cost Calculation
We have adopted the pin-dependent SIS library delay model
for the calculation of the arrival time and power consumption.
Suppose that gate g has matched at node 71, then the outputarrival time at TL is given by
where r;,g s the intrinsic gate delay from input i to output of
g, Ri.g is the drive resistance of g corresponding to a signal
transition at input i, C, is the load capacitance seen at 71,
A(n; g; , Cn,) s the arrival time at input i corresponding to
load Cnt seen at that input;’and g; is the best match found
at input i .
Under the simple power consumption model, the power cost
for gatey matching at node n is given by
(36)where E,& is the expected switching activity at node n , and
Pauy(nz,gz)s the accumulated power cost at 7 t L assuming a
gate matching 9% .
The input to the technology mapper is a Zinput N A N D
decomposed network of the optimized network from the
technology-independent phase. The switching activity E,, of
each node n of the decomposed network is calculated prior
to the mapping.
Under the extended power consumption model, the power
cost is given by
( 3 7 )
where P l $ ( j ) is given by (7).The typical gate library model contains information such as
input capacitance for each input pin, pin-dependent intrinsic
gate delay and the pin-dependent output drive. To support the
calculation of the internal power consumption, we add to this
data the following information: 1) the source/drain capacitance
fo r each internal node and 2) the charging and dischargingfunctions in terms of the gate inputs for each internal node.
When calculating the power cost at node 71, the power
contribution from the output load driven by 71 is not included.
We have the following lemma, however, for the power cost
calculation:
Lemma: The power cost of a match g at noden given by
( 3 7 ) s the exact power cost under a zero delay model.
’
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 9/13
1118 IEEE TRAhSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13. NO. 9, SEPTEMBER 1994
Pror$: Under a zero delay model, the switching activity atnode 11 only depends on the global function of the node in
terms of the circuit primary inputs and not its particular im-
plementation. Hence, E , is independent of the gate matching
at 7i. Furthermore, note that since the power consumption due
to output load of 71 is calculated only when the fanout gate of
n is known, (37) is nor subject to the unknown load problem.
The average power contribution of the n’s output load willbe included at mapped-par-ent(n).When the mapping reaches
a primary output, every point on the power-delay curve has a
p o u w value equal to the total average power consumption of
the mapped tree minus the power consumption at the primary
output load. The power consumption at the primary output
load depends on E, , at the output and the load capacitance
connected to it, which are both independent of the mapping
configuration. Hence, it only causes a fixed shift of the curve
along the power axis; it does not affect the selection of the0ptimal point from the power-delay curve.
Tree Mapping
In this section, we focus on tree mapping. Later, we shalldescribe the extension to DA G mapping. In particular, wc
describe two tree-traversal operations applied to a NAND-
decomposed tree in order to obtain a technology mapping
solution that minimizes the average power consumption while
satisfying th e timing constraints.
Tree Traversals: On the first traversal, we begin at the
leaf nodes of the NAND-decomposed tree. Since each node n
possesses a set of possible arrival time-average power points
that are reflected in its power-delay function, the power-delay
function at any mapped-parent( 1) must also reflect these
possible arrival time-average power trade-offs. A postorder
traversal of the NAND-decomposed tree is perform ed, w here
for each node 71. and for each gate g matching at 7 t , a new
power-delay function is produced by appropriately merging
the power-delay functions at the , i r ~ p , u t s ( n ,) . Merging mustoccur in the common region in order to ensure that the
resulting function reflects feasible matches at the / n p ~ t s ( n ,).
The power-delay functions for successive gates g matching atri are then merged by applying a lower-bound merge operation
on the corresponding power-delay functions (see [ 5 ] or details
of these operations). At a given node 71, the resulting power-
delay function will describe the arrival time-average power
trade-offs in propagating a signal from the tree inputs to the
The second traversal begins when the mapping reaches
the root (primary output). The user is allowed to select the
arrival time-average pow er trade-off that is most suitable
for his application. Given the required time t at the root of
the tree, a suitable ( t . p ) point on the power-delay functionfor the root node is chosen. The gate y matching at theroot that corresponds to this point and 2 7 ~ p 1 ~ t s ( ~ 0 0 t ,) are,
thus, identified. The required times ti at inputs(root , .9)
are computed from t and ,y. The preorder traversal resumes
at , i 7 r p u t s ( , / , r / r ~ t .)where t , is the constraining factor and amatching gate ,yz with minimum power consumption satisfying
t ; is sought.
Output Of 71 .
Optimality of the Tree Mapping Algo rithm: The following
theorem can be stated.Theorem: Under a zero delay model, the tree mapping
algorithm finds the minimum power consumption solution for
given delay constraints (subject to the error due to unknown
loads during the arrival time calculation).
Proof: From Lemma 4.1, under a zero delay model, the
power consumption calculation is exact and not subject to theunknown load problem and hence is similar in nature to the
area calculation. The proof of optimality of the tree mapping
algorithm is thus similar to that of the area-delay mapping
algorithm.
DA G Mapping
Most of the real circuits are not trees, but general DAG’S.
The problem of mapping a DA G even for the unit fanout
model is NP-hard [2]. Therefore, we resort to heuristics. D uring
the power-delay curve computation step, nodes are visited in
postorder. For each node, we compute the power-delay curve
as in case of trees. However, if the input for a candidate
match at the node is coming from a multiple fanout node,
we divide the average power contribution of that input by
the fanout count of the input node . By reducing the average
power contribution, we favor a solution in which multiple
fanout nodes are preserved after mapping, which reduces logic
duplication and improves the final mapped average power.
This heuristic that permits tree boundary crossing only for
nodes with relatively few fanouts was also adopted by the
MIS mapper [7]. During the gate selection step, if we come to
a node that has already been mapped, we check if the mappedsolution at the node satisfies the timing requirement. If so,
we keep the mapping; otherwise, we replace it with another
solution from the power-delay curve that satisfies the current
timing requirement and has minimum average power.
V. EXPERIMENTALESULTS
Table I shows the ratio of the internal power consumption
(due to the parasitic source/drain capacitances) to the power
consumption due to the external gate capacitances for a subset
of ISCAS-89 and MCNC-91 benchmarks. It is seen that if
internal power consumption is disregarded, then the power
consumption will be underestimated by an average of 16.9%.
To evaluate the effectiveness of the technology decomposi-
tion and mapping for low power, we applied our algorithmson the above benchmarks. Static CMOS circuits were used in
the experiments and all primary inputs were assumed to be
spatial and temporally independent. The delay was calculatedusing the augmented pin-dependent SI S library delay model as
described in Section IV . We used an industrial library with44 gates.
Table I1 shows the total switching activity of the de-
composed networks for different technology decompositionmethods. It is shown that the low power technology decom-
position reduces the total switching activity in the networks
by 5% over the conventional balanced-tree decomposition
method.
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 10/13
1119
power
256110
16312439
152888417146888488
1418991
141167
411
TSUI el al.: POWER EFFICIENT TECHNOLOGY DECOMPOSITION AN D MAPPING
Barea
69452429
52462990921
437121583
7951753203120562139199333762297226439414025
10509
TABLE I
INTERNAL POWER CONSUMFTION VERSUS EXTERNALOWER
CONSUMFTION.: POWER ONSUMITIONUE O GATECAPACITANCE,U: POWERONSUMPTIONUETO INTERNALAPACITANCE
TABLE 111
TECHNOLOGYECOMPOSITIONN D MAPPINGESULTS. : a d m a p WITHBALANCEDECOMPOSITION;: p d m a p WITHBALANCEDECOMPOSJTION-
ower
282131
20014247
1761001
5283649898
101164108108178204
485
-
-
delay-6.1
30.7
26.011.76.4
11.328.2
7.012.611.410.815.211.020.42-1.121.514.39.4
13.4-
delay-5.4
30.1
29.011.66.69.8
29.86.1
13.111.711.318.412.215.820.220.011.08.1
11.5-
Aarea
60912278
42662734820
382218380
8531637157918291785181928692137211934503683
9200
cir-cuit
C1908C432
alu2apex7cordicex2pairparity53495386540054205444551056415713s832x l
x3
-1
49.018.0
31.017.05.318.0128.07.212.34.812.212.412.421.510.810.718.520.5
58.1
-92.0
132.0107.034.5134.0760.034.759.441.875.872.076.4120.379.080.3122.7147.3
353.0
207.0
(1/11)*100
23.619.5
23.415.815.313.416.820.720.711.416.017.216.217.813.613.315.013.9
16.416.9
circuit
C432
alu2apex7cordic
ex2pair
paritys349s3865400s4209444s510s6415713s832x l
x3
c1908
avg.
TABLE IV
TECHNOLOGYECOMPOSITIONND MAPPING ESULTS. : p d m a p
WITHminpower&iecomp; D : p d m a p WITH hminpower-tdecornpTABLE I1
FORDEFERENTDECOMPOSITIONCHEMES.: BALANCEDTHETOTALWITCHING ACTIVITYN TH E NETWORK
DECOMPOSITION,1: POWER EFFICIENTECOMPOSITIONcir-cuitC1908C432a h 2apex7cordicex2pairparity
5349s3865400s42054445510564157135832x lx3
delay-6.8
30.329.011.98.3
10.229.6
6.1
13.111.111.119.412.115.219.721.811.38.1
14.2
delay-5.6
29.128.511.57.2
12.229.7
6.1
13.110.511.418.612.115.520.120.110.68.9
12.7
-ower
25411015912438
15087841
7245878188
138
.8 890
138161409
-rea7226250654513123
8664522
2 1803795
175421 1421122290204334422512251342234293
10691
area7058245653753090
8794426
21896795
1765203620342243199933442372233740244125
10643
power24910616112337
15086741
7145868188
1388687
134159406
-6723354223941194732628
1102071252472752524103003093854991224
--1657318389385110457
2491
1102061092392582453922822843524531192
-
-
%-a2.235.077.822.287.563.385.21
00.4812.803.246.182.784.396.008.098.579.222.615.15
circuit
C432a h 2
apex7cordic
ex2pair
paritys349s38654205400s444s5109641s7135832x lx3
avg.
c1908
(minpower- tdecomp), while method D uses the BOUNDED-
HEIGHT MINPOWER decomposition ( b h m i n p o w e r J d e c o m p ) .
To see the impact of the p d m u p on the average power
consumption, we compare results of methods A and B . It isseen that with p d m u p , the power consumption is improved
by an average of 15.9% over ad-map . The active cell area is
increased by an average of 12.5% while the circuit delay is
improved by 2.8%.
To illustrate the impact of the (minpower- tdecomp) on
the average power consumption, we compare the results of
methods of B and C. The power consumption is improved by
Tables I11 and IV contain our ex perimental results using dif-
ferent technology decomposition and mapping combinations.All methods have the same starting point, that is, circuits
optimized by the SI S rugged script [18]. Method A uses the
area-delay mapping (ad-map) algorithm of [5 ] and methods Bto D use the power-delay mapping (pd-map) algorithm. Meth-
od s A and B take in a NAND-decomposed network generated
by the conventional balanced-tree decomposition algorithm.
Method c uses the MINPOWER technology decomposition
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 11/13
1120 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 1994
an average of 2.5% at the expense of 3.6% increase in area and
3.4% degradation in performance. From B and D, we see that
hhminpower- tdecomp improves the power consumption by
an average of 1.2% over the conventional decomposition atthe expense of 1.2% increase in area and 1.8% degradation
in performance (note that bhminpower- tdecomp improvesperformance by an average of 1.3% over minpower-tdecomp).
The best overall result is achieved when pd-map is appliedafter the network is decomposed using minpowerddecomp.
The average power consumption is reduced by an average
17.9% at the expense of 16.5% increase in area while the
average circuit delay is almost unchanged.
Comparing Tables I1 and IV , we observe that although the
reduction in power consumption after mapping is not directly
proportional to the reduction in the switching activities of
the network before mapping, networks with lower switching
activity often result in a mapped circuit with lower powerconsumption. We believe the small gain of minpower-tdecomp
is due to the fact that most nodes in the optimized networkare relatively simple because of the fast extract and quick
decomposition operations performed prior to the technology
decomposition step. Therefore, the minpower-tdecomp doesnot have much freedom in improving the power efficiency
through NAND decomposition.
It is noteworthy that in most cases, in order to reduce the
power consumption, circuit area is increased. The explanation
is as follows. To minimize power consumption, p d m u p hides
the high switching nodes inside the complex gates and/or
reduces the load capacitance of these nodes. In general, these
two operations do not produce minimum area mapping.
To see how power consumption is actually reduced by
p d m u p , we selected the s832 circuit and plotted the number
of nets and the average loading on nets versus the switching
rate. Figs. 6 and 7 summarize the results. From Fig. 6, we
see that pd-map tends to reduce the number of high switching
activity nets at the expense of increasing the number of low
switching activity nets. From Fig. 7, we see that for theremaining high switching activity nets, pd-map tries to reduce
the average loading on nets. By doing the se, pd-map minimizes
the total weighted switching activities and hence the total
power consumption in the circuit.
VI . DISCUSSIONSND EXTENSIONS
We assumed a zero-delay model (which does not account
for power consumption due to glitches) during the technology
mapping phase. This shortcoming can be overcome by using a
real delay model (such as the SI S pin-dependent library delay
model) when calculating the expected number of transitions.Using a real delay model, however, has several drawbacks.
The first is the huge computational effort to calculate thepossible glitches for each nodes. In [9], symbolic simulation
is used to produce a set of Boolean functions that represent
conditions for switching at each gate in the circuit at a
specific time instance. This procedure is exact, however,
requires exponential computation time and storage space. In
[20], a faster method of estimating the power consumption
including glitches is described using the notion of transition
6o + no.ofaeb
n
h i t :a32
amadday mapping
0 ower-delay mnpp hg
n
O!1 0.2 0.3 (y.4 5 sw%hingrate
Fig. 6p d m a p .
Number of nets versus switching rate for s832 using a d m a p versus
Average Load
clrcult:s832
= rea-delay mnpplng
0 ower-delay mapplng
hhg rate
Fig. 7.versus p d m a p .
Average load per net versus switching rate for s832 using a d m a p
probability. The transition probabilities are propagated from
the primary inputs to the circuit outputs using a linear time
algorithm. Although the later method significantly reduces the
computation time and space requirement, it is still costly.
The second drawback is that under a real delay model, the
dynamic programming-based tree-mapping algorithm does not
guarantee to find an optimum solution, even for a tree. The
mapping algorithm assumes that the current best solution isderived from the best solutions stored at the fanin nodes of
the matching gate. This is true for power estimation under a
zero delay model, but not for that under a real delay model.
Consider Fig. 8 as an example. Let SI nd 52 be two mapping
candidates at n with the same delay. Let n have a larger
E,(switching) value for SI.t appears that S2 is superior
to SI nd thus SI s dropped from the power-delay curve
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 12/13
TSUI ef al.: POWER EFFICIENT TECHNOLOGY DECOMPOSITION AND MAPPING 1121
d S-1 is an inferior solution for
_ -
r-+s-2n
uL3YA- 11
(b)
Fig. 8. An example to illustrate complica tions due to a real delay model
(Fig 8(a)). Howe ver, when doing mapping at m-the mapped
parent of n - d u e to the difference in the transition waveform
timing for SI nd S,, he best solution at m may be coming
from SI instead of S2 (Fig. 8(b)). So, if SI is dropped from
the power-delay curve, the optimal solution may not be found.
The reason is that glitches depend on the relative timing ofvarious events on the transition waveforms of the gate inputs.
It is very expensive to store all partial solutions, so we have to
drop the locally inferior solutions. The dynamic programming
approach only acts as a heuristic approach (even for trees).
Another assumption we made was that primary inputs
are spatially uncorrelated. If this assumption is relaxed, we
must use the conditional probabilities to calculate the signal
probability at the ou tputs of intermediate nodes. It is, however,
too costly to consider conditional probabilities among all
subsets of primary inputs. Instead, we can consider only the
pairwise conditional probabilities and use the correlation coef-
ficient method proposed by Ercolani et al . [8 ] to approximate
the conditional probabilities among other subsets of primary
inputs.Several extensions can be made on the pd-map to improve
its effectiveness and versatility. First, pin permutation can be
considered during the gate matching procedure. In general,
library gates have pins that are functionally equivalent, which
means that inputs can be permuted on those pins without
changing function of the gate output. These equivalent pins
may have different input pin loads and pin-dependent delays.
If high switching activity inputs are matched with pins that
have low input load, the power consumption can be reduced.
In particular, the internal power consumption depends on
the switching activities and the pin assignment of the inputsignals. To find the minimum power pin assignment for gate g
matching at node n, we must solve the following optimization
problem:
cjEzn t e rna l -node s (g )
where PP(71,g) is the set of all valid pin permutations
for gate g matching at node n , p i n s ( g ) is the set of pins
of y, ~ ( i )s the input that is mapped to pin z under pin
permutation x, nd E , ( chnrge / dz scharg , T ) s the expected
number of charge/discharge events for an internal node j under
pin permutation T . The optimum solution can be found by
enumerating all valid pin permutations. This is not so costly
since the number of pins for typical library gates is small (< 6) .
The pin permutation can be directly incorporated in the
technology mapping process as follows. For each match g, al lequivalent pin permutations are generated and the correspond-
ing power and delay costs are calculated. Permutations that
result in inferior power-delay trade-off points will be dropped.
Permutations that result in non-inferior solutions are stored in
the power delay trade-off curve along with the corresponding
power and delay values. This may lead to a large number of
points on the trade-off curve. In that case, we can restrict the
number of points on the curves to a user defined number Mand thus manage the space complexity (see Section 4.5).
The concept of power delay trade-off curve can be extended
to include the area trade-off. Instead of generating a set
of (power, delay) values, the trade-off curve can consist
of a set of (power, delay, area) values. One drawback of
the three dimensional trade-off space is that the number ofpoints that must be generated in order to find good mapping
solutions, become very large. A balance between optimality
and computation cost has to be reached in order to determine
how many trad-eoff points should be stored.
-
VII. CONCLUDINGEMARKS
We first described an extended power consumption model
that includesthe power consumption at the internal nodes of
a gate. We then showed how to calculate the internal power
consumption based on the notion of charging and discharging
probabilities and transistor graphs. Experimental results show
that the internal power consumption is about 16.5% of the
power consumption due to gate capacitances and hence is
significant. Based on the extended power consumption model,we presented algorithms for low power technology decompo-
sition and mapping. In particular, we generate networks with
minimum total switching activity and feed them to a delay
constrained power efficient technology mapper that hides the
highly active nodes inside the mapped gates. Experimental
results show that significant reduction in power consumption
is achieved without performance degradation.
'
8/3/2019 Power Efficient Technology Decomposition and Mapping Under an Extended Power Consumption Model
http://slidepdf.com/reader/full/power-efficient-technology-decomposition-and-mapping-under-an-extended-power 13/13
I122 IEEE TRANSACTIONS ON COMPUTER-A.IDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 9, SEWEMBER 1994
Technology mapping for low power under a real delaymodel was also addressed where we showed that even for
trees, Optimal mapping
we considered signal-to-Din assignment for low Dower.
[22] H. Vaishnav and M. Pedram, “Pcube: A performance driven placementalgorithm for low power designs,” Prnc. EURO-DAC, pp. 72-77, Sept.1993.
[23] H. J. M. Veendrick, “Short-circuit dissipation of static CMOS circuitryand its impact on the design of buffer circuits.” IEEE J . Sdid StateCircuits, vol. SC-19, pp. 468-473, Aug. 1984.
cannot be Obtained-
REFERENCES
MOSIS data sheet, Information Sciences Institute, University of South-ern California, 1992.R. K. Brayton, G. D. Hachtel, and A. L. Sangiovanni-Vincentelli,
“Multilevel logic synthesis,” Proc. IEEE, vol. 78, pp. 264-300, Feb.1990.R. Bryant, “Graph-based algorithms for { B Joolean function m anipula-tion,” IEEE Trans. Comput., vol. C-35, pp. 677-691, Aug. 1986.A. R. Burch, F. Najm, P. Yang, and D. Hocevar, “Pattern independentcurrent estimation for reliability analysis of CM OS c ircuits,” in Prnc.
25th Design Automation Conf., June 1988 , pp. 294-299.K. Chaudhary and M. Pedram, “A near-optimal algorithm for technologymapping minimizing area under delay constraints,” in Proc. 29th DesignAutomation Conf..,June 1992, pp. 4924 98 .M. A. Cirit, “Estimating dynamic power consumption of CMOS cir-cuits,” in Proc. IEEE Int . Con&Computer-Aided Design, Nov. 1987, pp.534-537.E. Detjens, G. Cannot , R . Rudell, A. Sangiovanni-Vincentelli, ndA. Wang, “Technology mapping in (MIS] ,” in Proc. IEEE Inr. Conf.
Computer-Aided Design, Nov. 1987, pp. 116-1 19.S . Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, “Esti-
mate of signal probability in combinational logic networks,” in Proc.European Test Conf., 1989, pp. 132-138.A. A. Ghosh, S . Devadas, K. Keutzer, and J. White, “Estimation ofaverage switching activity in combinational and sequential circuits,” in
Proc. 29th Design Automation Con f., June 1992, pp. 253-259.D. A. Huffman, “A method for the construction of minimum redundancycodes,” in Prnc. IRE, vol. 40. pp. 1098-1 101, Sept. 1952.D. S . Parker, Jr. “Conditions for optimality of the [ Huffman] algo-rithm,” SIAM J Computing, 9(3 ):47 W8 9, August 1980.K. Keutzer, “Dagon: technology binding and local optimization by dagmatching,” in Proc. 24th Design Automation Con f., pp. 341-347, June1987.L. Larmore and D. . Hirschberg, “A fast algorithm for optimal length-limited Huffman codes,” J. Assoc. fo r Computing Machinery, vol. 37,no. 3, pp. 464-473, 1990.F. Najm, “Transition density, a stochastic measure of activity in digitalcircuits,’’ in Proc. 28th Design Automation Conf., pp, -9 June1991.F. N. Najm, R. Burch, P. Yang, and I. Hajj, “Probabilistic simulationfor reliability analysis of CMO S VLSI circuits,” IEEE Trans. Computer-
Aided Design, vol. 9, no. 4, pp. 439 45 0, Apr. 1990.A. P apoulis, Probabiliiy. Random Variables and Stochastic Processes..McGraw-Hill, 1984.S. C. Prasad and K. Roy, “Circuit activity driven multilevel logicoptimization for low power reliable operation,” in Proc. EuropeanDesign Automation Conf., pp. 368-372, Feb. 1993.H. Savoj and H. Y. Wang, “Improved scripts in MIS-I1 fur logicminimization of combinational circuits,” in Proc. Int. Workshop on LogicSynthesis, May 1991.A. A. Shen, A. Ghosh, S . Devadas, and K. Keutzer, “On average powerdissipation and random pattern testability of CMOS combinational logicnetworks,” in Proc. IEEE Int . Cot$ Computer-Aided D esign, Nov. 1992,p p. 4 0 2 4 0 3 .C. Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamicpower consumption under real delay model,’’ Proc. IEEE Int. Conf.
Computer-Aided D esign, Nov. 1993, pp. 224-228.C. Y. Tsui, M. Pedram, and A. Despain, “Exact and approximatemethods for calculating signal and transition probabilities in FSMs,”in Proc. 31st Design Automation C o n f , pp. 18-23, June 1994.
Chi-ying Tsui received his B.Sc. degree in electricalengineering from the University of Hong Kong in1982 and the M.Sc. degree in computer engineeringfrom the University of Sou them California in 1989.Currently, he is a Ph.D. candidate in computerengineering in the U niversity of Southern Califor-nia working on VLSI design and CAD algorithmsfor high performance low power microprocessordesign. His research interests include power anal-ysis and optimization for CMO S circuits and hard-wadsoftware co-design for high performance lowpower processors.
Massoud P e d r a m (S’88-M’90) is an assistant pro-fessor of Electrical Engineering - Systems at the
University of Southern California. He received hi s
B.S. degree in Electrical Engineering from the Cali-fornia Institute of Technology in 1986 and his M.S.and Ph.D. degrees in Electrical Engineering andComputer Sc iences from the U niversity of Califor-nia, Berkeley in 1989 and 1991, respectively. Dr.Pedram received the best paper award in the CADtrack at the IEEE Int’l Conf. on Computer Design(1989) and the National Science Foundation’s Re -
search Initiation Award (1992). He has served on the Technical ProgramCommittee of the ACMD EEE Design Automation Conference since 1992 andwas the General Chair of the 1994 International Workshop on Low PowerDesign. His research interests and contributions have been in the areas oflogic synthesis, physical design and CAD for low power.
Alvin Despain (S’58-M’65) received his B.S.(1960), M.S. (1962), and Ph.D. (1966) degrees inElectrical Engineering from The University of Utah.He has been an assistant research professor at The
University of Utah, an associate professor at UtahState University, a visiting associate professor atStanford University, a professor at The Universityof California at Berkeley and has been at USCsince 1989. Dr. Despain is a pioneer in the study ofhigh performance computer systems for symboliccalculations. His research group builds experimental
software and hardware systems including com pilers, custom VLSI processors,and multiprocessor systems. The goal is to determine principles for thedesign of high performance com puters systems. His research interests includecomputer architecture, multiprocessor and multicomputer systems, logicprogramming, and design automation. Dr. Despain is a member of IEEE,ACM, and AAAI.