EMPIRE: Empirical power/area/timing models for register files

Microprocessors and Microsystems 33 (2009) 295–300

Contents lists available at ScienceDirect

Microprocessors and Microsystems

journal homepage: www.elsevier .com/ locate/micpro

EMPIRE: Empirical power/area/timing models for register files

Praveen Raghavan a,b,*, Andy Lambrechts a,b, Murali Jayapala a, Francky Catthoor a,b, Diederik Verkest a,b

a NES, IMEC vzw, 3001 Heverlee, Belgiumb ESAT, K.U. Leuven, VUB Brussels, Leuven 3000, Belgium

a r t i c l e i n f o a b s t r a c t

Article history:Available online 20 February 2009

Keywords:Power modelingAreaTimingRegister fileProcessor

0141-9331/$ - see front matter � 2009 Elsevier B.V. Adoi:10.1016/j.micpro.2009.02.009

* Corresponding author. Address: NES, IMEC vzw, 3E-mail address: [email protected] (P. Raghavan).

1 Switching activity includes all the switching insinterface: word lines, bit lines, decoders, buffers, clock

With the growth of the embedded devices consumer market, power efficient hardware is needed. There-fore power-aware architectural exploration is one of the most crucial design steps. For such an explora-tion procedure, it is important to accurately model the power consumption of all main components of theembedded system. Registers and register files are one of the highest power consumers of any program-mable processor, but there is a lack of accurate and publicly available models. This paper provides such apower model for standard cell based register files for 130 and 90 nm technologies. The proposed modelprovides dynamic power, leakage power, area and timing information for register files in terms of keyparameters like width, depth, activity, ports, and capacitive loading. It is shown that current models cap-ture neither correct absolute nor relative trends present in register files. It is shown that some key, butoften neglected parameters like switching activity, load have a larger influence in some particular sizesof the register files than others. Therefore, using the Empire model, accurate architectural explorationis possible.

� 2009 Elsevier B.V. All rights reserved.

1. Introduction

Growth of the embedded consumer market has been fueled bynew possibilities provided by the ever growing number of availabletransistors in technologies still following Moore’s law. Architectureexploration allows the designer to push limits of performance andenergy efficiency and is therefore an important design step.

In the domain of embedded processors as well as general pur-pose computing, power/energy has become one of the biggestdesign constraints. It has been shown in Ref. [1] that the registerfile is one of the main energy consumers in programmable proces-sors. Register files are widely used in wireless and multimediaprocessors, ranging from VLIWs to streaming processors likeStream [2]. Different processors like AD’s BlackFin, TI’s C6x series,Trimedia’s TM3270 [3] etc. are some widely used VLIWs used formultimedia applications.

There is a strong need for accurate power modeling of all pro-cessor components. Various power models like CACTI [4] andWATTCH [5] have been used to model the register file. However,for current technologies (130 nm or smaller) the absolute and rel-ative accuracy is insufficient and these models fail to take theswitching activity1 of the register file into account. Models like[6,7] and others do take the activity into account but do not model

ll rights reserved.

001 Heverlee, Belgium.

ide the register file and itstree etc.

other essential parameters like leakage, which is growing as technol-ogy scaling continues.

In this paper, we introduce an empirical formula for computingdynamic power, leakage power, timing and area while taking intoaccount the switching activity in both the address and data lines.Such an activity factor is crucial as it significantly influences theenergy per access. For more accuracy, the influence on the energyconsumption due to the load on the register file drivers is takeninto account. A separate model using the same design flow and li-braries is provided. The energy estimates presented in this paperare derived after complete layout has been done and simulationsare performed after parasitic capacitance has been extracted. Thissignificantly improves the accuracy of the proposed power modelfor 130 and 90 nm register files.

Hence the contributions of this work are as follows:

1. Energy/access model for scaled technologies by charactering atlow level simulations (after layout).

2. Model incorporating switching activity.3. Model taking into account load on the ports.

This paper is organized as follows: Section 2 presents the stateof the art in power modeling for register files. Section 3 introducesthe proposed framework for measurement of the deriving themodel. Section 4 presents the proposed model for power, areaand timing. Section 5 performs a comparison of the proposed mod-el against state of the art in terms of accuracy. Finally, Section 8concludes this paper.

mailto:[email protected]

http://www.sciencedirect.com/science/journal/01419331

http://www.elsevier.com/locate/micpro

296 P. Raghavan et al. / Microprocessors and Microsystems 33 (2009) 295–300

2. Related work

A lot of effort has been spent on developing models for registerfiles, but only a few power models have been made public. WATT-CH [5] is one of the most widely used power models, but the accu-racy of the absolute number from the different versions ofWATTCH is poor for current technologies as it has been scaled out-side of its intended range. Also WATTCH does not take into accountthe activity of the data while modeling. It is shown in Section 5 thatthe energy consumption of register files heavily depends on theactivity in the address and the data lines. Other models like [8,7]suffer from the same disadvantages. They do not consider activityand provide poor absolute numbers.

The model in [6] does take activity into account but does notmodel leakage, area and timing, which are modeled in this paper.Although the models in [6,4,8] are analytical, it is not clear if theyhave been verified against or have been derived from physical de-signs. The power model presented in [9] models only memoriesand does not support the possibility of having multiple ports foraccess, a feature that is especially needed for VLIWs.

Analytical modeling of register files is typically done byseparately modeling different components of the register file(decoder, bit lines, word lines, etc.). Such a characterization isbecoming more difficult with scaling. As we scale down thetechnology a large number of DSM issues need to be addressed.Therefore, the range of validity of these analytical models islimited.

It can be seen from [10] that driving interconnect in the scaledtechnologies is one the largest contributors to the net energy con-sumption of a SoC. Also inside the processor the energy cost ofinterconnect is increasing. Hence it is crucial that the model alsoaccounts for the output load, the interconnect that has to be driven.To our best knowledge, there does not exist any published modelthat takes the loading of the ports into account for estimatingthe energy consumption.

2 Hamming distance is defined as the number of bits that switch (from either 1 to 0or from 0 to 1) in the data lines as well as the address lines from one cycle to another.

3. Modeling procedure

To obtain the most accurate model, data has to be collected bymeasurement of a physical implementation of a chip [11]. To be ofpractical use a model has to be parameterizable over a wide rangeof different register files, varying width, depth and number of port,while maintaining the accuracy. This accuracy can only be reachedby using instances over the complete range, leading to a high num-ber of required measurements. Taping out and fabricating manydifferent register files for measurement is not practical or not fea-sible due to cost constraints. High accuracy can be obtained bydoing a low level simulation on a complete design. Therefore thedifferent register files and the power have been estimated usinglow level simulations. This takes into account the layout detailsand detailed transistor characterization provided by the standardcell library. Fig. 1 shows the complete flow of the proposed mod-eling technique to obtain the power, area and timing model forregister files. Optimized Synopsys Design Ware components fromthe UMC130 nm and UMC90 nm libraries were taken to makethe register file. Layouts were generated for register files with anumber of ports varying from 3 to 12, a depth varying from 4 to64 words and a width varying from 8 to 128 bits. All these combi-nations of register files were designed, parasitic capacitances inthe routing wires and the gate capacitances of each transistor wereextracted from the layouts. The extracted netlist was then simu-lated in ModelSim with different activity factors to obtain the finalpower, timing and area estimates. These values were then used toperform curve fitting to obtain empirical formulae for the registerfile.

E=Accessðin : JÞ ¼ ½7:32�10�12�3:12�10�13�D�1:97�10�13�W

�8:88�10�13� Pþ8:95�10�13�D�W

þ2:26�10�14�D� Pþ1:74�10�14�W � P

þ1:65�10�15�D2þ7:04�10�16�W2

þ2:11�10�14� P2� � ðHD=PÞ ð1Þ

Leakage Powerðin : lWÞ ¼ 1:24�102�4:44�D�3:49�W

�1:96�Pþ1:85�10�1�D�W

þ4:96�10�1�D�Pþ3:68�10�1�W �P

þ1:5�10�1�D2þ1:08�10�2�W2

þ5:86�10�1�P2 ð2Þ

Area ðin : lm2Þ ¼ 3:29�104�1:09�103�D�8:83�102�W

�5:55�103� Pþ5:35�101�D�W

þ1:42�102�D�Pþ3:68�102�W � P

þ1:50�10�2�D2þ1:08�10�2�W2

þ5:86�10�1� P2 ð3Þ

Timing ðin : nsÞ ¼ 2:11�10�1 þ1:76�10�2�Dþ7:40�10�3�W

þ4:7�10�1� Pþ4:98�10�4�D�W

þ1:24�10�3�D� Pþ1:07�10�3 �W � P

�2:79�10�4�D2�4:79�10�5�W2

þ2:42�10�2� P2 ð4Þ

Equation Set 1: UMC90 nm register file model equationsTo keep the design modular, the basic register file was modeled

for a FO4 load connected to the read and write ports of the registerfile. On top of this an extra load is modeled using a parameterizedbuffer chain, using the same design libraries, for both 130 and90 nm nodes.

4. Analytical power model description

After completing over 100 register file designs for each technol-ogy node, the dynamic energy, leakage energy, timing, and area foreach design were tabulated. Curve fitting was performed on eachvariable using depth, width and ports, as well as the activity factoras independent input variables. Additional loads have been mod-eled separately to make the model more complete. A quadraticcurve fit gives an acceptable error and hence is used to constructthe empirical formula. For all the designs, it is assumed that eachof the ports of the register file were driving a load of FO4. The energyconsumption required for driving the extra load on top of the FO4 ismodeled by a chain of buffers and is given by Eqs. 9 and 10 for thetwo technologies. It is assumed that the load driven by the registerfile is not so large that it cannot meet the timing constraints. Theenergy consumption of the clock tree network inside the registerfile has also been incorporated inside the model for estimation.

As a first degree of approximation, and to keep the problemtractable, it is assumed that the energy/access scales linearly withthe hamming distance between consecutively read/writtenwords.2 This assumption is valid as long as the amount of logicin the register file between the bit cell and the read port is small.Because there are only storage elements and little sequential logicelements inside the register file, a linear approximation of dynamicenergy with the hamming distance is valid. This assumption is val-idated with simulations with different hamming distances.

Activity Specificationin Testbench

Synopsys

UMC130nmand UMC90nm

LibraryOptimized Layout

BackannotatedNetlist

ModelSimSimulation

Physical Compiler

Timing Estimate Energy/Access

Formulae FormulaeEmpirical Timing Empirical Energy

ParasiticExtraction

OptimizedVHDL Code

Area Estimate

Empirical AreaFormulae

Buffer−ChainEnergy Estimate

Buffer Chain Energy Model

Depth, Width, PortsHamm. Dist, Cap. Load

Fig. 1. Power estimation framework.

P. Raghavan et al. / Microprocessors and Microsystems 33 (2009) 295–300 297

For the empirical formulae presented in Sections 4.1 and 4.2 andgiven in Equation Sets 1 and 2 the following definitions are used:D number of words in the register file,W number of bits in one word,P total number of ports (Read + Write),HD total number of bits that switch (either from 1 to 0 or from

0 to 1) on the data and address lines from one read/writecycle to another,

LD capacitive load to each port of the register file.

4.1. UMC90 nm register file

After applying curve fitting on the empirical data, the formulaegiven in Equation Set 1 have been obtained for a register file in90 nm and 0.9 V Vdd. The energy/access/port is given by Eq. 1.Leakage power, area, timing are given by Eqs. (2)–(4), respectively,

4 8 16 32 64 4 8 16 32 64 4

0.00E+00

1.00E-11

2.00E-11

3.00E-11

4.00E-11

5.00E-11

6.00E-11

Ener

gy/A

cces

s

Depth

90nm Register

3 Ports

6 Ports

Fig. 2. Energy/access for a 90 nm

and the energy per access for each variation in 90 nm and 0.9 VVdd are illustrated in Fig. 2.

4.2. UMC130 nm register file

The same experiment has been performed for the UMC130 nmtechnology and 1.2 V Vdd. The corresponding formulae are givenin Equation Set 2. The energy/access/port for a register file in130 nm can be given by 5. Leakage power, area, timing are givenby Eqs. (6)–(8) respectively. Likewise, the energy per access foreach variation 130 nm and 1.2 V Vdd are displayed in Fig. 3.

5. Validation

To validate the curve-fitted empirical formulae presented inSection 4, they were compared against the actual implementations.

8 16 32 64 4 8 16 32 64

816

3264

Width

File

9 Ports 12 Ports

register file with HD = 0.2.

4 8 16 32 64 4 8 16 32 64 4 8 16 32 64 4 8 16 32 648

32

0.00E+00

2.00E-11

4.00E-11

6.00E-11

8.00E-11

1.00E-10

1.20E-10

1.40E-10

Ener

gy/A

cces

s

Depth

Width

130nm Energy/Access

3 Ports

6 Ports

9 Ports 12 Ports

Fig. 3. Energy/access for a 130 nm register file HD = 0.2.

Table 1Averaged error of models with respect to accurate simulation.

Model 130 nm 90 nm

Proposed model 6.87% 12.12%WATTCH [5] 70.1% 330.0%


The proposed empirical model shows on average about 10% of errorcompared to the value obtained from detailed simulation. This isshown for a hamming distance of 0.2 and a load of FO4. Even whenthe hamming distance and load are changed, the error does notchange significantly.

Values obtained from the model are accurate since they arebased on detailed designs that include layout extraction [11].

E=Access ðin : JÞ ¼ ½2:23�10�11�8:06�10�13�D�5:89�10�13�W

�3:35�10�12� Pþ2:06�10�14�D�W

þ7:57�10�14�D�Pþ6:34�10�14�W � P

þ2:48�10�15�D2þ9:93�10�16�W2

þ8:72�10�14� P2� � ðHD=PÞ ð5Þ

Leakage Power ðin : lWÞ ¼ 5:43�101�1:76�D�1:62�W

�8:42� Pþ8:55�10�2�D�W

þ2:15�10�1�D� Pþ1:61�10�1�W � P

þ1:73�10�3�D2þ4:23�10�3�W2

þ2:10�10�1� P2 ð6Þ

Area ðin : lm2Þ ¼ 7:36�104�2:37�103�D�2:12�103�W

�1:21�104� Pþ1:24�102�D�W

þ3:33�102�D� Pþ2:58�102�W � P

�4:98�10�1�D2þ1:56�W2þ2:71�102� P2 ð7Þ

Timing ðin : nsÞ ¼ 1:90�10�1þ1:57�10�2�Dþ1:72�10�2�W

þ4:08�10�1� Pþ5:91�10�4�D�W

þ1:10�10�3�D� Pþ1:62�10�3�W � P

�1:69�10�4�D2�2:39�10�4�W2

�1:74�10�2� P2 ð8Þ

Equation Set 2: UMC130 nm register file model equations.

Total E=Access ¼ E=Accessþ 0:405� C ð9Þ

Equation Set 3: Energy consumption on the buffer chain for 90 nm.

TotalE=Access ¼ E=Accessþ 0:72� C ð10Þ

Equation Set 4: Energy consumption on the buffer chain for 130 nm.Compared to [6], which implements two instances and makes a

linear approximation of the model, our model is based on over 100implementations. Others like [8] also derive a model from a few in-stances. To evaluate the accuracy of the widely used WATTCH [5]model, the values from WATTCH were scaled to 130 nm and90 nm technology. Table 1 shows the error margin of the proposedempirical model and the WATTCH model with respect to the sim-ulation of the actual designs. The error values were obtained forthe complete design range and are averaged out. It can be observedthat besides the absolute numbers even the relative trends ofWATTCH and the proposed model are different. Therefore, currentmodels like WATTCH, are skewed for relative comparisons be-tween two register file implementations.

6. Analysis of the model

Figs. 4–6 show the normalized energy consumption per accessof WATTCH and the proposed model when width, number of portsand depth are varied respectively. The hamming distance for theproposed model was taken to be 0.2 for all these comparisons. Itcan be seen from Fig. 6 that WATTCH roughly corresponds to theproposed model. For larger sized register files, the proposed modelwas close to WATTCH, whereas for smaller sized register files,WATTCH under estimates.

Fig. 4 shows that WATTCH is fairly accurate for relative compar-isons when width alone is varied for given depth and port config-urations. Even when the depth and ports are changed to a different

0 10 20 30 40 50 60 700

2

4

6

8

10

12

14

16

18

Width

Nor

mal

ized

Ene

rgy/

Acce

ss

Proposed Model 16 Deep 9 PortsWattch 16 Deep 9 PortsProposed Model 32 Deep 12 PortsWattch 32 Deep 12 Ports

Fig. 4. Relative comparison of proposed model with WATTCH (variation in width,130 nm, HD = 0.2 and LD = FO4).

1 2 3 4 5 6 7 8 9 10 11 121

2

3

4

5

6

7

8

Ports

Nor

mal

ized

Ene

rgy/

Acce

ss

Wattch 32 Deep 32 WideProposed Model 32 Deep 32 WideWattch 16 Deep 64 WideProposed Model 16 Deep 64 Wide

Fig. 5. Relative comparison of proposed model with WATTCH (variation in ports,90 nm, HD = 0.2 and LD = FO4).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6x 10−11

Hamming Distance (HD)

Ener

gy/A

cces

s (J

oule

s)

64 Depth 32−bit Width 12 ports64 Depth 16−bit Width 9 ports32 Depth 32−bit Width 9 ports64 Depth 32−bit Width 12 ports

Fig. 6. Relative comparison of proposed model with WATTCH (Variation in Depth,90 nm, HD = 0.2 and LD = FO4).

P. Raghavan et al. / Microprocessors and Microsystems 33 (2009) 295–300 299

configuration, the normalized curve for WATTCH and the proposedmodel follow similar curves.

From Fig. 5 it can be seen that WATTCH severely overestimates(relatively) the energy consumption when the number of ports areincreased for a given depth and width.

If for a given port and width, the depth of the register file is var-ied (Fig. 6), WATTCH underestimates the energy consumption forincreasing the depth. For 130 nm, relative comparisons of registerfiles in WATTCH are not accurate. This skewness is more pro-nounced in 90 nm technology. This can be seen from the slope ofthe curves of WATTCH are much higher than those of the proposedmodel.

Fig. 7 shows the linear increase of energy consumption as ham-ming distance is changed for the input address as well as data from0 to 1. It was verified with back annotation that a linear approxi-mation of the empirical formulae with hamming distance is within10%. From Fig. 7, it can be seen that depth has a larger influencewhen hamming distance is changed. Variation in ports does nothave as much influence as depth.

As it can be seen from the equations of the buffer and in Fig. 8,the energy consumption increases linearly with the increase incapacitive load that it has to drive. As the influence of extra loading

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

Depth

Nor

mal

ized

Ene

rgy/

Acce

ssWattch 16 Wide 9 PortsWattch 64 Wide 12 PortsProposed Model 16 Wide 9 PortsProposed Model 64 Wide 12 Ports

Fig. 7. Comparison of proposed model with variation in the hamming distance(HD), (130 nm, LD = FO4).

0 1 2 3 4 5 6 7x 10−12

0

1

x 10−11

Extra Capacitance (F)

Tota

l Ene

rgy/

Acce

ss (J

oule

s)

32 Depth 32−bit 12 ports64 Depth 16−bit 9 ports64 Depth 32−bit 12 ports64 Depth 64−bit 12 ports

FO4

Fig. 8. Comparison of proposed model with capacitive loading (130 nm, HD = 0.2).


is purely linear and is additive with the FO4 load energy, the influ-ence of load on energy consumption is higher in case of smallerregister files (smaller depth, width, ports). In case of larger registerfiles, in terms of absolute numbers, loading has a (relatively) smal-ler influence.

Models like WATTCH do not take activity or load into accountfor estimating energy consumption. The effect of the extra loadcan be incorporated in other models if the user inserts a powermodel for a buffer chain. The problem with such an approach isthat the libraries of the two (register file and the buffer chain) com-ponents would be incompatible.

7. Scalability of model

To test the scalability of the model, it was used outside of therange of simulations used for curve-fitting. When the modelparameters are taken higher than the range of the simulations(for example 128-bit wide, 24 ports, 32 deep), the values obtainedusing the model were within reasonable error (15-20%). However,when the model was taken outside the range by reducing the sizefurther (for e.g. reducing number of ports to 1 or reducing width to4 and depth to 1), it produces incorrect numbers and cannot beused.

The designs were optimized to respect timing constraints of anembedded system which run from 100 MHz to 800 MHz. Outsidethis timing range, the model can still be used for computing dy-namic energy, but the area, leakage and timing equations do nothold. This was verified by constructing new designs with timingconstraints of up to 1.5 GHz and obtaining dynamic energy, leak-age, area and timing from low level simulations. The valuesobtained for dynamic energy from the equations were within10-25% of error, whereas the computed area, leakage values wereseverely underestimated. Errors in dynamic energy also increasedas the frequency increased, because the linear modelling of thedrivers for the register file at higher frequency is inaccurateespecially for larger register files.

Values obtained from the model cannot be easily scaled to fu-ture technologies, but the framework created for deriving the mod-el can easily be used with any new technology given thetechnology library.

Scaling of the Empire model to future technologies like 65 nm,45 nm or lower would not be advised as the equations would notlonger be accurate.

8. Conclusion

This paper presents publicly available empirical formulae formodeling energy/access, leakage power, area and timing for regis-ter files of different sizes. The proposed formulae are validatedagainst complete designs and simulations at a low level. The meth-od is compared to popular models like WATTCH. DSM effects fromscaling invalidate old models. It can be concluded that taking intoaccount activity and load is very important for the modeling proce-dure. It was also shown that the relative as well as absolute trendsalso change significantly when these extra parameters are takeninto account.

References

[1] A. Lambrechts, P. Raghavan, A. Leroy, G. Talavera, T. VanderAa, M. Jayapala, F.Catthoor, D. Verkest, G. Deconinck, H. Coporaal, F. Robert, J. Carrabina, Powerbreakdown analysis for a heterogeneous NoC platform running a videoapplication, in: Proceedings of the IEEE 16th International Conference onApplication-Specific Systems, Architectures and Processors (ASAP), 2005, pp.179–184.

[2] W. Dally, U. Kapasi, B. Khailany, J. Ahn, A. Das, Stream processors:programmability with efficiency, ACM Queue 2 (1) (2004) 52–62.

[3] J.-W. van de Waerdt, S. Vassiliadis, S. Das, S. Mirolo, C. Yen, B. Zhong, C. Basto, J.-P.van Itegem, D. Amirtharaj, K. Kalra, P. Rodriguez, H. van Antwerpen, The TM3270media-processor, in: Proceedings of the 38th Annual IEEE/ACM InternationalSymposium on Microarchitecture (MICRO-38), 2005, pp. 331–342.

[4] P. Shivakumar, N. Jouppi, CACTI3.0: a integrated cache timing, power, and areamodel, Technical Report, COMPAQ Western Research Laboratory, 2001.

[5] D. Brooks, V. Tiwari, M. Martonosi, Wattch: a framework for architectural-levelpower analysis and optimizations, in: Proceedings of the 27th InternationalSymposium on Computer Architecture (ISCA-27), 2000, pp. 83–94.

[6] L. Benini, D. Bruni, M. Chinosi, C. Silvano, V. Zaccaria, A power modeling andestimation framework for vliw-based embedded system, ST Journal of SystemResearch 3 (1) (2002) 110–118.

[7] K. Buyuksahin, P. Patra, F. Najm, Estima: An architectural-level powerestimator for multi-ported pipelined register file, in: Proceedings of theInternational Symposium on Low Power Electronics and Design (ISLPED), 2003,pp. 294–297.

[8] S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, J. Owens, Registerorganization for media processing, in: Proceedings of the InternationalSymposium on High Performance Computer Architecture (HPCA), 2000, pp.375–386.

[9] M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir, Analytical models for leakagepower estimation of memory array structures, in: Proceedings of theInternational Conference on Hardware/Software Codesign and SystemSynthesis (CODES+ISSS), 2004, pp. 146–151.

[10] H. DeMan, Ambient intelligence: giga-scale dreams and nano-scale realities, in:Keynote Speech at International Solid-State Circuits Conference (ISSCC), 2005.

[11] A. Balakrishnan, An experimental study of the accuracy of multiple powerestimation models, Master’s Thesis, University of Tennesse, Knoxville, TN,2004.

EMPIRE: Empirical power/area/timing models for register files

Documents

Transcript of EMPIRE: Empirical power/area/timing models for register files