Low Power Functions

8/12/2019 Low Power Functions

1/22

Approaches to Low-Power

Implementations of DSP Systems

Class Advisor : Dr. Fakhraie Presentor : Nariman Moezi DSP Design & Implementation Course Seminar Spring 2004


2/22

Out line

Reduced twos complement representation Low power Scheduling Techniques for embedded DSP softwareLow power multiplier

- Mitchell-Based logarithm multiplier- Power-Aware pipelined multiplier


3/22

Reduced twos complementrepresentation

twos complement representation is widely used in the implementationof arithmetic operations.

If X has a small magnitude and switches between a positive and anegative value,its sign extension changes between strings of zeros

and ones.

If X has magnitude less than 2 m-1 (m


4/22

APPLICATION : Low power FIR filter using Reduced TwosComplement Representation Consider a hybrid-form adaptive FIR filter ,where the inputsare 5-level data symbols and take values in {-2,-1,0,-1,2} . Assuming coefficients are N- bit twos complement numbers

Such multiplications are simply shift and complementoperations

Assume that we detect that the maximum magnitude of acoefficient H is less than 2 m-2 .We know that correspondingpartial product P has a magnitude less than 2 m-1 .


5/22

- Coefficient MaximumMagnitude Detection

(An example with two taps and6 bit coefficients)

- Partial-Product generation usingreduced twos complementrepresentation


6/22

-As the adaptive filter updates thecoefficients, the word-length of thereduced representation will change. So

does the error introduced by using thereduced representation.We can build acompensation vector correction paththat imitates the error propagation inthe accumulation path.

-A test chip was implemented in 0.25 um CMOS technology.There were useda hybrid-form filter of 160 taps and having 8 taps per hybrid section.Thecoefficient word-length is 10 bits.when operating at 2.5V with a 100MHzclock, a 32% power saving has been measured as summarized in this table :


7/22

Low-Power Scheduling Techniques forEmbedded DSP Software

This section describes an instructional-level power model for a processor (Fujitsu) ,and techniques to reduce the power of this processor.The DSP processor has a special architecture that allows instructions to be packedinto pairs.The Booth multiplier on this processor is a major source of energy consumption forDSP programs.So a micro-architectural power model for the on chip Booth-multiplier is developed andanalyzed for further power minimization.Based on this model, an effective technique of local code modification by operandswapping is used to further reduce power consumption.

(S. Malik, IEEE Trans 1997 )


8/22

The sum of measured current for the four instructions is 204 mA. The sum of the base costs (37.2+14.4+36.6+14.4) and the overhead costs of adjacent

instructions (18.4+18.4+18.4+18.4) is only 176.2 ,which under estimates the actualcost by 13.6%.

The difference ,27.8,in the two estimates comes from the circuit state overheadbetween non-adjacent instructions 1&3.

This is due to a special design at the inputs of the multiplier.there is a latch between

each operand and multiplier to retain the the old values until the next multiplyinstruction is executed. This overhead is dependent on the previous and current values of input latches

for each multiply operation .

An example of a sequence four instructions where theoverhead cost between 1 and 3 can nat be ignored


9/22

Instruction packing for lowpower

A special architecture of the target DSP processor is the capability ofpacking an ALU-type instruction and a data transfer instructioncodeword for simultaneous execution .

The average current for packed instructions is only slightly more thanthe average current for a sequence of the two unpacked instructions.

Comparision of energy consumed by packedand unpacked instructions


10/22

As to the overhead cost of MAC instructions, when MAC is packed witha data transfer instruction, especially LAB ,which changes data valuesin registers A and B used by MAC as inputs, significantly wide variation

of overhead cost is observed(from 1.4mA to 33.0mA).

Such wide variation is mainly due to the complex booth multiplierimplemented in the MAC unit.

The fundamental idea behind boothmultiplier is to recode B by skipping over1s technique.

For example a 7-digit B value 0011110

that would need four additions of shifted A,can be recoded to a new value whichrequires one addition and a subtraction

weight=4 weight=2

Micro architectural model forthe booth multiplier

0101000 0011110 _

recode


11/22

we can reduce the number of additions and subtractions by justswapping the operands in registers A and B , which can result in current

reduction. The table gives three experiments where swapping :

Another that determines power consumption of the multiplier,isswitching activity

For the booth multiplier the characteristic of A is its switching activityand for B, weight factor and switching activity

Variation of measured current by swapping operands op1 andop2 in registers A and B for MAC:LAB instructions.


12/22

Average current drawn by MAC:LAB for different characteristics ofconsecutive values in A and B.

For a typical DSP application MAC:LAB instructions are usually applied to asequence data for filter operations such as

As we know only C and there is no information about X we , consider C as thevalue B .If switching activity or weight factor of value C is high we can swapoperands.

ii X c

Comparison of power consumption for 5 DSP

programs by different scheduling techniques


13/22

Improved Mitchell-Based Logarithmic Multiplier for Low- power DSP Applications

The technique of multiplying two numbers using logarithms is simple. Take the logarithms of twomultiplicands, add the logarithms together and then take the antilogarithm of the resultingsummation.

Mitchell method of calculating logarithms :assume N = 2510 = 110012The MSB is bit 4,that gives a characteristic of 1002 and the retaining bits(10012) gives the fraction.This gives a value for the logarithm of 100.10012 (=4.562510).The correct value of log2(25) is 4.6439.

(Duncan J. McLaren et al IEEE 2003)


14/22

A binary number N ,can be written as:

Note thatk represents the characteristic and xthe binary fraction,with x in the range 0< x < 1.The true logarithm and the approximation

using the Mitchell method are:

The logarithm of a product is equal to the sum

of the logarithms of the multiplicands

Antilogarithms of this two equations are:

To correct the error the following is used:


15/22

This shows that to provide the correct answer, an error correction factor should be added to thesummation before the antilogarithm is calculated.

however this would be impractical. The approach is to average the value of the correction factorover a range of x values, and add this to the summation. This results in a multiplier of improvedaccuracy.

multiplier of improved accuracy. The two fractional parts are split into 8 ranges, from 0 to 1 in stepsof 0.125. This means that the 3 most significant bits of x can be used to determine the errorcorrection factor (which is pre calculated).


16/22

To test the multiplier further, it was usedas part of a real application, in this case aFinite Impulse Response (FIR) Filter. Thefilter was an 11-tap low-pass FIR, with anormalized cut-off frequency of 0.25. Thefilter was implemented in Verilog usingthe standard multiplier, the un-modifiedMitchell multipliers and the ImprovedMitchell multipliers. The input was 16-bitand the output was 32-bit. The figure

below shows the magnitude responsefrom each of the three implementations.


17/22

Power-aware Pipelined Multiplier Design Based On

2-Dimensional Pipeline Gating

Although Boolean multipliers have natural power awareness to the changing of inputprecision, deeply pipelined designs do not have this benefit.

In Boolean unpipelined multipliers, low input precision calculation (like 0001 0001)

dissipates much less power than high input precision calculation (like 1111 1111). SoBoolean unpipelined multipliers are naturally power aware to the changing of inputprecision.

In deeply pipelined designs, the number

of registers is much larger than that ofother elements, these designs do not havethe natural power awareness to thechanging of input precision.

(Jia Di, J. S. Yuan et al GLSVLSI 2003)


18/22

To solve this problem and improve the power awareness of deeply pipelinedmultipliers,a novel technique,2-dimensional pipeline gating is proposed.Thistechnique is to gate the clock to the registers in both vertical and horizontaldirection.


19/22

In a 4*4 multiplier , when the input precision is 4, for example, calculating 1111 1111,S is generated based on all inner partial products. If the inputprecision is 2,for example, calculating 00110011, the partial productscontaining X2 or Y2 (the ones enclosed by a rectangular) canalso be disabled.


20/22


21/22


22/22

References

M. T. Lee, V. Tiwari, S. Malik, and M. Fujita, Power analysis andminimization techniques for embedded DSP software," IEEE Trans.VLSI Syst. , vol. 5, pp. 123-135, Mar. 1997.

Jia Di, J. S. Yuan et al,Power -aware Pipelined Multiplier Design Based On 2-Dimensional Pipeline Gating GLSVLSI03 , April 28-29, 2003

Zhan Yu et al,A Low Power Adaptive Filter Using Dynamic Reduced 2SCRepresentation, IEEE Custom Integrated Circuits Conference 2002

Duncan J. McLaren et al,Improved Mitchell -Based Logarithmic Multiplierfor Low Power DSP Applications IEEE 2003

Low Power Functions

Documents

Transcript of Low Power Functions