DSP C5000 Chapter 13 Numerical Issues Copyright © 2003 Texas Instruments. All rights reserved.

DSP C5000DSP C5000

Chapter 13Chapter 13

Numerical IssuesNumerical Issues

Copyright © 2003 Texas Instruments. All rights reserved.Copyright © 2003 Texas Instruments. All rights reserved.

Copyright © 2003 Texas Instruments. All rights reserved.

ESIEE, Slide 2

Learning ObjectivesLearning Objectives

Data formatsData formats Fixed pointFixed point: integer and fractional numbers: integer and fractional numbers Use methods for handling Use methods for handling multiplicativemultiplicative

and and accumulative overflowaccumulative overflow Floating pointFloating point Block floating pointBlock floating point Comparison of formatsComparison of formats


ESIEE, Slide 3

Data Formats and Numerical IssuesData Formats and Numerical Issues

Common data sizes: 8, 16, 24, 32 bitsCommon data sizes: 8, 16, 24, 32 bits Fixed or floating pointFixed or floating point For a given technology:For a given technology:

Fixed point is faster and less expensiveFixed point is faster and less expensive But fixed point programming is more But fixed point programming is more

difficultdifficult Processors of the ‘C5000 family are Processors of the ‘C5000 family are

fixed point processors. fixed point processors. But they can also execute floating point But they can also execute floating point

operations through softwareoperations through software


ESIEE, Slide 4

Digital Representation of a SignalDigital Representation of a Signal

SamplingSampling ADC Analog to Digital ConversionADC Analog to Digital Conversion

QuantizationQuantization Coding of the quantized valueCoding of the quantized value

Digital representation used in DSPDigital representation used in DSP


ESIEE, Slide 5

Digital Coding of Data and ArithmeticDigital Coding of Data and Arithmetic

Finite precision:Finite precision: Representation uses a given number of bitsRepresentation uses a given number of bits

Fixed pointFixed point Floating pointFloating point Block floating pointBlock floating point


ESIEE, Slide 6

Interface ADC - DSP - DACInterface ADC - DSP - DAC

ADC

DSP

DAC

Possible Conversions: fixed point floating point

A or mu law linear law(Compression-Expansion)


ESIEE, Slide 7

Binary Representation of Signed IntegersBinary Representation of Signed Integersused in ADC-DAC or DSPused in ADC-DAC or DSP

in Fixed Point Formatin Fixed Point Format

2’s Complement (digital processors)2’s Complement (digital processors) 1’s Complement1’s Complement Sign, magnitudeSign, magnitude Offset BinaryOffset Binary


ESIEE, Slide 8

Fixed Point ArithmeticFixed Point Arithmetic

2’s Complement Representation2’s Complement Representation


ESIEE, Slide 9

Example of Size 3 bits for Integers,Example of Size 3 bits for Integers,Decimal and Binary RepresentationsDecimal and Binary Representations

Positive integers

Positive integers

Signed integers

Signed integers

Signed integers

Signed integers

decimal Binary DecimalOffset binary

DecimalSign +

magnitude7 1 1 1 3 1 1 1 3 0 1 16 1 1 0 2 1 1 0 2 0 1 05 1 0 1 1 1 0 1 1 0 0 14 1 0 0 0 1 0 0 0 0 0 03 0 1 1 -1 0 1 1 0 1 0 02 0 1 0 -2 0 1 0 -1 1 0 11 0 0 1 -3 0 0 1 -2 1 1 00 0 0 0 -4 0 0 0 -3 1 1 1

Weights22 21 20


ESIEE, Slide 10

Example of Size 3 bits for Integers,Example of Size 3 bits for Integers,Decimal and Binary RepresentationsDecimal and Binary Representations

Signed integers Signed integers Signed integers

Decimal 1's complement 2's complement

3 0 1 1 0 1 12 0 1 0 0 1 01 0 0 1 0 0 10 0 0 0 or 1 1 1 0 0 0-1 1 1 0 1 1 1-2 1 0 1 1 1 0-3 1 0 0 1 0 1-4 1 0 0

Ny 2 x 1 Ny 2 x


ESIEE, Slide 11

Representation of Signed Integers Representation of Signed Integers in 2’s Complement Formatin 2’s Complement Format

N 1 k 0

N 1k

kk 0

N 1N k

kk 0

N 2N 1 k

N 1 kk 0

x b b b

x 0 x b 2

x 0 y 2 x y b 2

x 2 b b 2


ESIEE, Slide 12

Format Qk : k fractional bits associated with Format Qk : k fractional bits associated with negative power of 2.negative power of 2.

The binary representation of a number x in The binary representation of a number x in format Qk is the 2’s complement format Qk is the 2’s complement representation of the integer y:representation of the integer y:

Non-Integer Numbers Using Fixed PointNon-Integer Numbers Using Fixed Point

y xk2

Fractional PartInteger Part

N 1 k 1 0 1 kb b b b b,

1 2 11 2 0 12 2 2 2N k N k k

N k N k kx b b b b b


ESIEE, Slide 13

Some Properties ofSome Properties of2’s Complement Representation2’s Complement Representation

N 1

N-1

N 1 N 1 N 1

Max number=2 1

Min number=-2

Circular Representation: (OVM, SATD)

(2 1) 1 2 2

Sign bit Extension: (SXM, SXMD)

Related status bits in C5000 DSPRelated status bits in C5000 DSP OVMOVM = OVerflow Mode of the C54 DSPs on C54 DSPs = OVerflow Mode of the C54 DSPs on C54 DSPsSATDSATD = SATuration mode of the D unit on C55 DSPs= SATuration mode of the D unit on C55 DSPsSXMSXM = Sign eXtension Mode on C54 DSPs = Sign eXtension Mode on C54 DSPsSXMDSXMD = Sign eXtension Mode of the D unit on C55 DSPs = Sign eXtension Mode of the D unit on C55 DSPs


ESIEE, Slide 14

Addition and Subtraction Addition and Subtraction Using 2’s ComplementUsing 2’s Complement

Simple hardware operator: to add 2 Simple hardware operator: to add 2 signed N-bit integers with a result of signed N-bit integers with a result of size N bits. Whatever the sign of size N bits. Whatever the sign of numbers, it is sufficient to add the 2’s numbers, it is sufficient to add the 2’s complement values.complement values.

1

2

3-4

-3

-2

0-1

OV=1

Carry

111+ 111--------1 110

010+ 001--------0 011

110+ 011--------1 001

110+ 001--------0 111

Overflow (intermediate)


ESIEE, Slide 15

Multiplying and Shifting in 2’s ComplementMultiplying and Shifting in 2’s Complement

Simple hardware operator but more Simple hardware operator but more difficult than with a sign-magnitude difficult than with a sign-magnitude representation.representation.

The product of 2 N-bit numbers needs The product of 2 N-bit numbers needs support for 2N-bit results.support for 2N-bit results.

Generally, the product register is of size Generally, the product register is of size 2N bits => 2 identical MSB (1bit left 2N bits => 2 identical MSB (1bit left shift).shift).

Booth Algorithm (on 3 bits)Booth Algorithm (on 3 bits)AB=-4A(b2-b1)-2A(b1-b0)-A(b0-0)AB=-4A(b2-b1)-2A(b1-b0)-A(b0-0)

k bits right Arithmetic shifting: sign bit k bits right Arithmetic shifting: sign bit extension necessary.extension necessary.


ESIEE, Slide 16

Sign eXtension Mode Sign eXtension Mode SXMSXM or SXMD or SXMD

With 2’s complement, when 16-bit data With 2’s complement, when 16-bit data are loaded into a 32-bit accumulator, the are loaded into a 32-bit accumulator, the sign bit is also extended.sign bit is also extended.

ThisThis sign extension may be annoying: sign extension may be annoying: e.g. Calculation of 16-bit addresses. e.g. Calculation of 16-bit addresses.

The user can choose whether or not to The user can choose whether or not to use sign bit extension mode.use sign bit extension mode. SXMSXM = Sign eXtension Mode bit in the = Sign eXtension Mode bit in the

status word ST1 in C54 DSPs.status word ST1 in C54 DSPs. SXMSXMDD = Sign eXtension Mode bit for the D = Sign eXtension Mode bit for the D

unit in the status word ST1unit in the status word ST1_55_55 in C55 DSPs in C55 DSPs


ESIEE, Slide 17

Sign Bit ExtensionSign Bit Extension

Example data size 6 bits, Accumulator size 12 bitsExample data size 6 bits, Accumulator size 12 bits

1 0 1 0 0 1

1 0 1 0 0 1

1 0 1 0 0 1

1 1 1 1 1 1

0 0 0 0 0 0

DataData

Loading of ACCU with sign extensionLoading of ACCU with sign extension

Loading of ACCU without sign extensionLoading of ACCU without sign extension


ESIEE, Slide 18

Addition OverflowAddition Overflow

When adding 2 numbers of size N bits, When adding 2 numbers of size N bits, the result may need N+1 bits.the result may need N+1 bits.

Example for integers of N=3 bits:Example for integers of N=3 bits: 3+3 = 6 cannot be represented using 3 bits, 3+3 = 6 cannot be represented using 3 bits,

but can be expressed using 4 bits.but can be expressed using 4 bits. In format Q2 of N=3 bits, 0.75 + 0.5 =1.25 In format Q2 of N=3 bits, 0.75 + 0.5 =1.25

cannot be represented using 3 bits, needs 4 cannot be represented using 3 bits, needs 4 bits.bits.

When adding M numbers of N bits, the When adding M numbers of N bits, the result potentially needs N+ log2(M) bitsresult potentially needs N+ log2(M) bits..


ESIEE, Slide 19

Using SaturationUsing Saturation Overflows in 2’s complement create Overflows in 2’s complement create

unexpected sign changes and peaks that unexpected sign changes and peaks that are difficult to filter.are difficult to filter.

Saturation arithmetic detects the Saturation arithmetic detects the overflow and replaces the result with a overflow and replaces the result with a saturation value.saturation value.

0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Saturation at 0.75Saturation at 0.75

2’s complement overflow2’s complement overflow

Example, max value = 0.75Example, max value = 0.75


ESIEE, Slide 20

SeSettting saturation modes with ting saturation modes with OVM or SATDOVM or SATD The user can choose whether or not to use The user can choose whether or not to use

saturation mode by setting the saturation mode by setting the corresponding mode bits.corresponding mode bits. OVMOVM = OVerflow Mode bit in status word = OVerflow Mode bit in status word

ST1 in C54 DSPs.ST1 in C54 DSPs. If OVM = 1:If OVM = 1:

positive results are saturated to 00 7FFF FFFFpositive results are saturated to 00 7FFF FFFF Negative results are saturated to FF 8000 0000.Negative results are saturated to FF 8000 0000.

SATDSATD = SATuration mode bit for the D unit = SATuration mode bit for the D unit in the status word ST1_55 in C55 DSPs.in the status word ST1_55 in C55 DSPs. If SATD = 1 and M40 =0, same as for C54 DSPIf SATD = 1 and M40 =0, same as for C54 DSP If SATD=1 and M40 =1If SATD=1 and M40 =1

positive results are saturated to 7F FFFF FFFFpositive results are saturated to 7F FFFF FFFF Negative results are saturated to 80 0000 0000.Negative results are saturated to 80 0000 0000.


ESIEE, Slide 21

Saturation mode for the A unit in C55 DSPsSaturation mode for the A unit in C55 DSPs

SATASATA = = SATuration mode bit for the SATuration mode bit for the A-A-unit unit ALU ALU in the status word STin the status word ST33_55 in _55 in C55 DSPs. C55 DSPs. If SATIf SATAA=1=1, if a calculation in the A-unit , if a calculation in the A-unit

results in an overflow:results in an overflow: positive results are saturated to 7FFFpositive results are saturated to 7FFF Negative results are saturated to Negative results are saturated to 80008000..


ESIEE, Slide 22

Effect of 2’s Complement OverflowEffect of 2’s Complement Overflow As 2’s complement is a circular representation, As 2’s complement is a circular representation,

if the result holds on N bits, the intermediate if the result holds on N bits, the intermediate overflows do not alter the final resultoverflows do not alter the final result

This is not the case for saturationThis is not the case for saturation Example of N = 3 bits:Example of N = 3 bits:

Calculate x = 3+2-4, the theoretical result is 1Calculate x = 3+2-4, the theoretical result is 1 With 2’s complement overflow:With 2’s complement overflow:

Calculate first y=(3+2)= 011+010 =101 =-3 overflowCalculate first y=(3+2)= 011+010 =101 =-3 overflow Then (y-4)=101+100=1 001 = 1 and carry =1 correct resultThen (y-4)=101+100=1 001 = 1 and carry =1 correct result

With saturation:With saturation: Calculate first y=(3+2)=3 saturationCalculate first y=(3+2)=3 saturation Then (y-4) = 011+100=111=-1 wrong resultThen (y-4) = 011+100=111=-1 wrong result

If a system has a unity gain, saturation should If a system has a unity gain, saturation should not be used.not be used.


ESIEE, Slide 23

Example of 2’s Complement Binary Example of 2’s Complement Binary RepresentationsRepresentations

Represent x = 1.75 using N=6 bits in Represent x = 1.75 using N=6 bits in format Q3format Q3 Answer 001.110 = 1 +1/2 +1/4 Answer 001.110 = 1 +1/2 +1/4

Represent x = -1.75 using N=6 bits in Represent x = -1.75 using N=6 bits in format Q3format Q3 Answer 110.0 10 = - 4 +2+1/4Answer 110.0 10 = - 4 +2+1/4

Represent x = 1. 805 using N=6 bits in Represent x = 1. 805 using N=6 bits in format Q3 format Q3 Answer 001.110 = 1 + 1/2 + 1/4Answer 001.110 = 1 + 1/2 + 1/4


ESIEE, Slide 24

Operations with Fractional Numbers using Operations with Fractional Numbers using Fixed Point FormatFixed Point Format

Addition: align on same size N and align Addition: align on same size N and align bits with same weight.bits with same weight.

Multiplication: product requires 2NMultiplication: product requires 2N bitsbits

k k kQ Q Q

k k ' k k 'Q Q Q


ESIEE, Slide 25

Example of 2’s Complement Binary Example of 2’s Complement Binary OperationsOperations

Data size N=6, format Q3 Data size N=6, format Q3 Product 12 bits, Q6Product 12 bits, Q6

Product 1.75 x 2.5 = 4.375Product 1.75 x 2.5 = 4.375 Binary representation:Binary representation:

001.110 x 010.010 = 000100.011000001.110 x 010.010 = 000100.011000

Sum 6 bits, format Q3Sum 6 bits, format Q3 Sum 1.75 + 1.5 = 3.25Sum 1.75 + 1.5 = 3.25 Binary representation:Binary representation:

001.110 + 001.100 = 011.010001.110 + 001.100 = 011.010


ESIEE, Slide 26

Accumulator and size of the resultAccumulator and size of the result

The final result of a calculation usually The final result of a calculation usually uses more than 16 bits (size of memory uses more than 16 bits (size of memory words).words).

ACCUs use 32, 40, 56 ... BitsACCUs use 32, 40, 56 ... Bits If we want to save the result in a single If we want to save the result in a single

memory word, the question is:memory word, the question is: Which pack of N bits must be saved Which pack of N bits must be saved

from accumulator?from accumulator? Possibility of overflow and underflowPossibility of overflow and underflow Overflow during accumulation or during Overflow during accumulation or during

saving.saving.


ESIEE, Slide 27

ACCUMULATORACCUMULATOR

Possibility of overflow and underflowPossibility of overflow and underflow Scaling when adding many productsScaling when adding many products

16 bits to save

0 15 16 31 32 39

Guard bits ACCU High ACCU Low


ESIEE, Slide 28

Saturation on store mode, Saturation on store mode, SSTSST bit bit

SSTSST = mode bit in PMST (C54) or = mode bit in PMST (C54) or ST3_55 (C55) status word.ST3_55 (C55) status word.

If SST is set, the CPU saturates a shifted If SST is set, the CPU saturates a shifted or unshifted accumulator value before or unshifted accumulator value before storing it.storing it.

The saturation value depends on the The saturation value depends on the value of the sign extension mode bit.value of the sign extension mode bit.

ACCU remains unchanged.ACCU remains unchanged.


ESIEE, Slide 29

Example of Fixed Point Processing Example of Fixed Point Processing y(n)=x(n)+ay(n)=x(n)+a11y(n-1)y(n-1)

Data size N=16, product size 32 bits, accumulator size 40 bitsData size N=16, product size 32 bits, accumulator size 40 bits The coefficient aThe coefficient a11 is smaller than 1: format Q15. Format of is smaller than 1: format Q15. Format of

data = Q15, accumulator size 40bitsdata = Q15, accumulator size 40bitsAccumulatorAccumulator

0 15 16 31 32 39

aa11yy((nn--11)),, QQ3300

0 15 16 31 32 39

++ xx((nn)),, QQ1155

16 bits to save

0 15 16 31 32

aa11yy((nn--11)),, QQ3300

yy((nn)),, QQ1155 39


ESIEE, Slide 30

Representation of Sum of ProductsRepresentation of Sum of Products

The basic sum of M products operation, The basic sum of M products operation, for data and coefficients of size N bits:for data and coefficients of size N bits:

Needs 2N bits for each product + Needs 2N bits for each product + log2(M) bits for the sum of M products, log2(M) bits for the sum of M products, or maximum 2N+log2(M) bits.or maximum 2N+log2(M) bits.

The C5000 DSP has Accumulators of The C5000 DSP has Accumulators of size 32+8 bits that allow for the sum of size 32+8 bits that allow for the sum of 256 products without overflow.256 products without overflow.

If M>256, may necessitate scaling of If M>256, may necessitate scaling of datadata

M 1

kk 0

y(n) b x(n k)


ESIEE, Slide 31

Solutions to OverflowSolutions to Overflow

Overflow multiplication can be prevented by Overflow multiplication can be prevented by using pure fractional numbers (< 1)using pure fractional numbers (< 1)

Saturation of the resultSaturation of the result Scaling of the inputs and use of fractional Scaling of the inputs and use of fractional

arithmeticarithmetic But loss of precisionBut loss of precision

Use double precision or double wordUse double precision or double word But decreases speed of calculationBut decreases speed of calculation

Use DSP with larger accumulators.Use DSP with larger accumulators. 8 guard bits in the’C5000 accumulators.8 guard bits in the’C5000 accumulators.

Design system with unity gain.Design system with unity gain. Use floating pointUse floating point


ESIEE, Slide 32

Products of size 2N or 2N-1 Bits? 1 of Products of size 2N or 2N-1 Bits? 1 of 33

The product of 2 data values of size N The product of 2 data values of size N bits can be stored using 2N-1 bits, bits can be stored using 2N-1 bits, except where the two most negative except where the two most negative numbers are multiplied together.numbers are multiplied together.

Example of size N=3 bits for integer Example of size N=3 bits for integer values.values. The integer values are between –4 and +3. The integer values are between –4 and +3. All the products are between –16 and +15 All the products are between –16 and +15

and can be written on 2N-1=5 bits, and can be written on 2N-1=5 bits, Except –4 x –4 = 16.Except –4 x –4 = 16.

Example on N=16 bits and Q15 format:Example on N=16 bits and Q15 format: -1 x –1 = +1 cannot be written on 31 bits in -1 x –1 = +1 cannot be written on 31 bits in

Q30.Q30.


ESIEE, Slide 33

Products of Size 2N or 2N-1 Bits? 2 of Products of Size 2N or 2N-1 Bits? 2 of 33 Consider the case of data <Consider the case of data < 1 1 using using

N=16 bits, Q15 format.N=16 bits, Q15 format. Their products are <Their products are < 1 and can be 1 and can be

expressed using 32 bits format Q30 with expressed using 32 bits format Q30 with 2 sign bits2 sign bits

It is possible with the C5000 DSP to It is possible with the C5000 DSP to automatically eliminate one sign bit by a automatically eliminate one sign bit by a left shift of 1 bit, thus obtaining a Q31 left shift of 1 bit, thus obtaining a Q31 result.result. If bit If bit FRCTFRCT in ST1 is set to 1, products are in ST1 is set to 1, products are

automatically shifted left by 1 bit.automatically shifted left by 1 bit.


ESIEE, Slide 34

Products of Size 2N or 2N-1 Bits? 3 of Products of Size 2N or 2N-1 Bits? 3 of 33

The exception –1 x –1 can be treated The exception –1 x –1 can be treated using the using the SMULSMUL status bit that saturate status bit that saturate the result of the multiplication before the result of the multiplication before accumulation. –1 is equal to 8000 in accumulation. –1 is equal to 8000 in hexadecimal on 16 bits.hexadecimal on 16 bits. If SMUL=1, SATD or OVM=1, FRCT =1If SMUL=1, SATD or OVM=1, FRCT =1

The product of (1)8000 x (1)8000 is saturated The product of (1)8000 x (1)8000 is saturated to the positive number 7FFF FFFF after the to the positive number 7FFF FFFF after the multiplication and before accumulation in multiplication and before accumulation in MAC or MAS insMAC or MAS insttructions.ructions.

Consistent with ETSI-GSM specifications.Consistent with ETSI-GSM specifications.


ESIEE, Slide 35

Fixed Point ProgrammingFixed Point Programming

Perpetual compromise between Perpetual compromise between dynamic range and precision dynamic range and precision constraintsconstraints

Keep enough bits to represent the Keep enough bits to represent the integer part of the resultinteger part of the result

Keep enough bits in the fractional part Keep enough bits in the fractional part to satisfy the precision.to satisfy the precision.

Rounding results.Rounding results.


ESIEE, Slide 36

Entering Non-Integer Values using the Entering Non-Integer Values using the Software Development ToolsSoftware Development Tools

The tools do not support fractionsThe tools do not support fractions To store 0.707 in Q15 use:To store 0.707 in Q15 use:

.word 32768*707/1000.word 32768*707/1000

To store 3.252 in Q13 use:To store 3.252 in Q13 use: .word 8192*3252/1000.word 8192*3252/1000

Generally, to convert a real number x Generally, to convert a real number x using 2’s complement representation using 2’s complement representation with size N bits and format Qk:with size N bits and format Qk: Calculate the integer y=round(x 2Calculate the integer y=round(x 2kk)) The 2’s comp. representation of y is the 2’s The 2’s comp. representation of y is the 2’s

comp. representation of x in format Qk.comp. representation of x in format Qk.


ESIEE, Slide 37

Some more stuff on SaturationSome more stuff on Saturation

0

-1

1 128

-128

AC0AC0 SAT AC0SAT AC0

Two saturation methods exist:Two saturation methods exist: ManualManual: using the : using the SATSAT instruction (ACx only) instruction (ACx only)

AutoAuto: using the SATA/SATD : using the SATA/SATD or OVM or OVM control bitscontrol bits SATASATA affects TAx registers (T0-3/AR0-7) in A unit affects TAx registers (T0-3/AR0-7) in A unit

ex: 7FFFh + 2 = ex: 7FFFh + 2 = 7FFF7FFFhhex: 8001h - 3 = ex: 8001h - 3 = 80008000hh

SATDSATD affects AC0-3 registers in D unit affects AC0-3 registers in D unit

(ST1_55(ST1_55M40 M40 = 0) = 0) 00.7FFF.FFFF00.7FFF.FFFF or or FF.8000.0000FF.8000.0000

(ST1_55(ST1_55M40 M40 = 1) = 1) 7F.FFFF.FFFF7F.FFFF.FFFF or or 80.0000.000080.0000.0000

- Affects ST0_55- Affects ST0_55ACxOV ACxOV and can be testedand can be tested


ESIEE, Slide 38

RoundingRounding

$ 1.53$ 1.53$ 0.50$ 0.50$ 2.03$ 2.03$ 2.$ 2.

How do you round this amount to the nearest $ ?How do you round this amount to the nearest $ ?- Add $0.50- Add $0.50

Biased Rounding (STBiased Rounding (ST22_55_55RDMRDM = 0) = 0) or round to the infinite or round to the infinite

- - Direct:Direct: ROUND AC0ROUND AC0- - Store:Store: MOV unsMOV uns((rndrnd(HI(saturate(AC0))(HI(saturate(AC0))))),*AR1),*AR1

rnd()rnd() and and ROUNDROUND perform the following operation: perform the following operation: (add 1 to bit 15) and (truncate) (add 1 to bit 15) and (truncate) (ACx+0x8000) & 0xFFFF0000(ACx+0x8000) & 0xFFFF0000

- Partial result- Partial result- Truncate result (to nearest $)- Truncate result (to nearest $)

Instructions RND in C54 DSPs or ROUND in C55 DSPs, Instructions RND in C54 DSPs or ROUND in C55 DSPs, rounds the content of the accumulator.rounds the content of the accumulator.

For the C55, 2 kinds of rounding: biaised or unbiaised, For the C55, 2 kinds of rounding: biaised or unbiaised, depending on the bit RDM in ST2_55.depending on the bit RDM in ST2_55.


ESIEE, Slide 39

Other Useful Stuff...Other Useful Stuff...

Setting ST1_55 Setting ST1_55 SMUL, FRCT, SATDSMUL, FRCT, SATD = 1 will saturate (-1 x -1) to = 1 will saturate (-1 x -1) to

7FFF_FFFFh prior to adding/subtracting to/from the 7FFF_FFFFh prior to adding/subtracting to/from the accumulator. This ensures a 1 cycle ETSI-compatible accumulator. This ensures a 1 cycle ETSI-compatible operation and prevents temporary overflow. operation and prevents temporary overflow.

Absolute Value Absolute Value ABS AC0,AC1ABS AC0,AC1

2’s Complement 2’s Complement NEG AC0,AC1NEG AC0,AC1

1’s Complement 1’s Complement NOT AC0,AC1NOT AC0,AC1

1-bit division1-bit division SUBC Smem,ACxSUBC Smem,ACx

NormalizationNormalization MANT; EXPMANT; EXP

Copyright © 2003 Texas Instruments. All rights reserved.Copyright © 2003 Texas Instruments. All rights reserved. - - 3939


ESIEE, Slide 40

Floating Point ArithmeticFloating Point Arithmetic


ESIEE, Slide 41

Floating Point RepresentationFloating Point Representation

Number x -> Mantissa M and Exponent ENumber x -> Mantissa M and Exponent E

If M is of size m bits and E is of size e bits, If M is of size m bits and E is of size e bits, then x is of size N = m + e bitsthen x is of size N = m + e bits

Range of positive numbers for 0.5Range of positive numbers for 0.5 |M| <1 |M| <1 and 2’s comp. representation of M and E:and 2’s comp. representation of M and E:

x M E 2

1

22 1 2 22 1 2 11 1

e em,


ESIEE, Slide 42

Normalization of the mantissaNormalization of the mantissa

The decomposition of a real value x The decomposition of a real value x into the product of a mantissa and an into the product of a mantissa and an exponent term is not unique:exponent term is not unique: x=Mx=M1122E1E1=M=M2222E2E2 … … Example: 12.8=0.8 2Example: 12.8=0.8 244 and also 12.8= 1.6 2 and also 12.8= 1.6 233

M must be normalized to make the M must be normalized to make the decomposition unique.decomposition unique.

The normalization is a constraint The normalization is a constraint applied to M applied to M for example:for example: 0.5 0.5 |M| < 1 |M| < 1 The ratio of the limits of the interval must The ratio of the limits of the interval must

be smaller than 2 to have the same be smaller than 2 to have the same exponent.exponent.


ESIEE, Slide 43

Floating Point RepresentationFloating Point Representation

Non-linear scale:Non-linear scale: The precision decreases geometrically The precision decreases geometrically

while the data size increases.while the data size increases.

0xmin2xmin4xmin8xmin

values2 2m

values

For a given number of bits: the number of bits of the mantissa determines the precision the number of bits of the exponent determines the dynamic range.

2 2m


ESIEE, Slide 44

Floating Point Overflow or UnderflowFloating Point Overflow or Underflow

Very unlikely to occurVery unlikely to occur

Overflow

Underflow

x

x

e

e

2

1

22

2 1

2

1

1


ESIEE, Slide 45

Floating Point Addition OperatorFloating Point Addition Operator

It is necessary to denormalize the It is necessary to denormalize the smallest number (B)smallest number (B) Its mantissa is multiplied by 2Its mantissa is multiplied by 2Eb-EaEb-Ea before before

being added to Ma.being added to Ma. Loss of precision due to the rounding of Loss of precision due to the rounding of

the mantissathe mantissa

A B M M M MaE

bE

a bE E Ea b b a a 2 2 2 2


ESIEE, Slide 46

Floating Point Multiplication OperatorFloating Point Multiplication Operator

It is necessary to normalize MIt is necessary to normalize MaaMMbb

1 extra bit would be necessary to 1 extra bit would be necessary to prevent overflow of Eprevent overflow of Eaa+E+Ebb..

2m-1 bits are necessary to represent 2m-1 bits are necessary to represent MMaaMMbb

If M is truncated to m bits, the absolute If M is truncated to m bits, the absolute error increases rapidly.error increases rapidly.

A B M Ma bE Ea b 2


ESIEE, Slide 47

Examples of Floating Point DSPExamples of Floating Point DSP

Some DSP devices of the C6000 family:Some DSP devices of the C6000 family: C67xx support both single and double C67xx support both single and double

precision format.precision format. The C5000 DSP are fixed point DSP but The C5000 DSP are fixed point DSP but

can be programmed in floating point if can be programmed in floating point if necessary.necessary.


ESIEE, Slide 48

Example of Floating Point RepresentationExample of Floating Point Representation

Represent x=1.75 in Floating PointRepresent x=1.75 in Floating Point Use N=8, Mantissa size m=5 bits, exponent Use N=8, Mantissa size m=5 bits, exponent

size e=3 bits, M and E in 2’s complementsize e=3 bits, M and E in 2’s complement Mantissa normalized to 0.5Mantissa normalized to 0.5 |M| <1 |M| <1

Solution:Solution: E=1 in binary representation: 001E=1 in binary representation: 001 M=0.875 in binary representation 0.1110M=0.875 in binary representation 0.1110


ESIEE, Slide 49

Comparison of Fixed and Floating Point Comparison of Fixed and Floating Point FormatsFormats

Fixed point: linear scaleFixed point: linear scale Absolute error more or less constantAbsolute error more or less constant SNR decreases when the input decreasesSNR decreases when the input decreases

Floating point: non-linear scale with a Floating point: non-linear scale with a geometrical progressiongeometrical progression Relative error more or less constantRelative error more or less constant SNR more or less constant over the full SNR more or less constant over the full

data rangedata range


ESIEE, Slide 50

Quantization Error and SNR with Quantization Error and SNR with Fixed-PointFixed-Point

2 2dB 10 x 10 max 10

3SNR 10log 6N 10log x 10log

2

2x

dB 10 2d

SNR 10log

d x x rounding

dq

x x

E d

E dq

d

( )

max

( )

For 2

0

122 2

2


ESIEE, Slide 51

Quantization Error with Floating PointQuantization Error with Floating Point

d x x r

d

d

dd

x

d

M

m

mm

rm

( ounding)

= rounding error on mantissa

= relative error on x =

01

22 1


ESIEE, Slide 52

Quantization Error and SNR with Quantization Error and SNR with Floating PointFloating Point

r

m 1r

r

r

2 2 2d x d

d 2

For x random with "fast variations"

d white noise uncorrelated with x

ˆd x x xd

dB SNR 6m 1.44


ESIEE, Slide 53

Comparison of Fixed Point and Comparison of Fixed Point and Floating Point SNRFloating Point SNR

Example for N=16 bits: m=12 e=4

-100 -50 0 50 100-20

0

20

40

60

80

100

Signal Power in dB

RS

B e

n dB

73 dB

86dB

Floating point SNR

Fixed point SNR


ESIEE, Slide 54

Comparison of Fixed Point and Comparison of Fixed Point and Floating PointFloating Point

For N bits, with Floating Point format For N bits, with Floating Point format there is a compromise between dynamic there is a compromise between dynamic range (E), and precision (M).range (E), and precision (M).

Example for N=32 bits:Example for N=32 bits:

10

9

9

digits

10

7

77

digits

Fixed Point 32 bits

Dynamic Range

Precision max

Dynamic Range

Precision

Floating Point: m=24 b=8

Dynamic range is defined as the ratio of the largest positive Dynamic range is defined as the ratio of the largest positive value on the smallest non zero positive valuevalue on the smallest non zero positive value


ESIEE, Slide 55

Fixed Point vs. Floating PointFixed Point vs. Floating Point

Fixed Point: Fixed Point: Simple operators of addition and Simple operators of addition and

multiplicationmultiplication But it is necessary to monitor overflow and But it is necessary to monitor overflow and

underflow in order to keep precision and underflow in order to keep precision and dynamic range at their best.dynamic range at their best.

Floating Point:Floating Point: Greater dynamic range and simpler Greater dynamic range and simpler

programmingprogramming More complex operators, so the More complex operators, so the

performances in terms of speed or power performances in terms of speed or power consumption are not so good as those of consumption are not so good as those of fixed point DSP.fixed point DSP.


ESIEE, Slide 56

IEEE 754 Floating Point Format 1 of 4IEEE 754 Floating Point Format 1 of 4

Most processors respect the IEEE 754 Most processors respect the IEEE 754 format for Floating Point format for Floating Point representation of numbers.representation of numbers.

IEEE format for N=32 bits:IEEE format for N=32 bits:32 bits = 1 bit (Sign bit) + 8 bits (Exponent) + 23 bits (Fraction)32 bits = 1 bit (Sign bit) + 8 bits (Exponent) + 23 bits (Fraction)

ExponentExponent: offset binary, offset = 127, exponent=expo-127: offset binary, offset = 127, exponent=expo-127MantissaMantissa: sign-magnitude, normalized between 1.0...0 and 1.1...1: sign-magnitude, normalized between 1.0...0 and 1.1...1

Hidden bit Hidden bit 11,... .Only the fractional part (Fraction) is stored.,... .Only the fractional part (Fraction) is stored.When exponent not equal to 0, |Mantissa = 1.fractionWhen exponent not equal to 0, |Mantissa = 1.fraction

e.g. : x=28=1,75 2e.g. : x=28=1,75 24 4 0 0 10000011 1100...0 10000011 1100...0

value = (-1)sign * (1.fraction) * 2(expo-127)

for non-zero exponent


ESIEE, Slide 57

Dynamic Range of IEEE 754 Single Precision Dynamic Range of IEEE 754 Single Precision Floating Point Format Floating Point Format 2 of 4 2 of 4

Largest positive number:Largest positive number: Max exponent = 254-127=127Max exponent = 254-127=127 Max Mantissa = 2-2Max Mantissa = 2-2-23-23

Max positive value = (2 -2Max positive value = (2 -2-23-23)x2)x2127 127 2 2128128

Smallest positive number (non-zero)Smallest positive number (non-zero) Min exponent = 1-127Min exponent = 1-127 Min Mantissa = 1.0Min Mantissa = 1.0 Min positive value = 1.0 x 2Min positive value = 1.0 x 2-126-126

value = (-1)sign * (1.fraction) * 2(expo-127)

for non-zero exponent


ESIEE, Slide 58

IEEE 754 Single precision IEEE 754 Single precision Floating Point Format, Special Cases 3 of 4Floating Point Format, Special Cases 3 of 4

Zero: 32 bits are 0Zero: 32 bits are 0 Underflow: exponent < 1Underflow: exponent < 1 Overflow: exponent > 254Overflow: exponent > 254


ESIEE, Slide 59

IEEE 754 Floating Point Format 4 of 4IEEE 754 Floating Point Format 4 of 4

Double precision 64 bits:Double precision 64 bits: 1+11+521+11+52 Exponent offset binary: offset= 1023Exponent offset binary: offset= 1023

Extended simple precision 43 bitsExtended simple precision 43 bits ::1+11+311+11+31

Extended Double precision 79 bits:Extended Double precision 79 bits:1+15+631+15+63


ESIEE, Slide 60

Block Floating PointBlock Floating Point


ESIEE, Slide 61


This is not a DSP formatThis is not a DSP format This is a way of doing floating point This is a way of doing floating point

operations efficiently on a fixed point operations efficiently on a fixed point DSPDSP

Natural approach for block operations Natural approach for block operations such as the Fast Fourier Transform such as the Fast Fourier Transform (FFT).(FFT). See details in chapter 19.See details in chapter 19.


ESIEE, Slide 62


A register contains the value of the A register contains the value of the exponent (constant) to be applied to a exponent (constant) to be applied to a block of data: block of data: BLOCK EXPONENTBLOCK EXPONENT

The mantissa is of size N bits.The mantissa is of size N bits. Each data block is tested and scaled by Each data block is tested and scaled by

the exponent in order to avoid the exponent in order to avoid overflows.overflows.

Useful when N is small (e.g.: N=16 bits)Useful when N is small (e.g.: N=16 bits) Limits the loss of precision due to the Limits the loss of precision due to the

increase in dynamic range of floating increase in dynamic range of floating point.point.

DSP C5000 Chapter 13 Numerical Issues Copyright © 2003 Texas Instruments. All rights reserved.

Documents

Transcript of DSP C5000 Chapter 13 Numerical Issues Copyright © 2003 Texas Instruments. All rights reserved.