C Optimization Techniques for StarCore SC140

59
Motorola General Business Information Version #: 1.1 Date: 11-12-2003 MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001. C Optimization Techniques for StarCore SC140 Created by Bogdan Costinescu

description

C Optimization Techniques for StarCore SC140. Created by Bogdan Costinescu. Agenda. C and DSP Software Development Compiler Extensions for DSP Programming Coding Style and DSP Programming Techniques DSP Coding Recommendations Conclusions. What do optimizations mean?. - PowerPoint PPT Presentation

Transcript of C Optimization Techniques for StarCore SC140

Page 1: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version #: 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

C Optimization Techniques for StarCore SC140

Created by Bogdan Costinescu

Page 2: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Agenda

• C and DSP Software Development• Compiler Extensions for DSP Programming• Coding Style and DSP Programming

Techniques• DSP Coding Recommendations• Conclusions

Page 3: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

What do optimizations mean?

• reduction in cycle count• reduction in code size• reduction in data structures• reduction in stack memory size• reduction of power consumption

• how do we measure if we did it right?

Page 4: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Why C in DSP Programming?

• C is a "structured programming language designed to produce a compact and efficient translation of a program into machine language”

• high-level, but close to assembly• large set of tools available• flexible, maintainable, portable

=> reusability

Page 5: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Compiler-friendly Architectures

• large set of registers• orthogonal register file• flexible addressing modes• few restrictions• native support for several data types

Page 6: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Example - StarCore SC140

• Variable Length Execution Set architecture– one execution set can hold up to:

• four DALU instructions• two AGU instructions

• scalable architecture with rich instruction set• large orthogonal register file

– 16 DALU– 16 + 4 + 4 AGU

• short pipeline => few restrictions• direct support for integers and fractionals

Page 7: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Is C a Perfect Match for DSPs?

• fixed-point not supported• no support for dual-memory architectures• no port I/O or special instructions• no support for SIMD instructions• vectors represented by pointers

Page 8: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

C Support for Parallelism

• basically, NONE• C = a sequential programming language

– cannot express actions in parallel– no equivalent of SIMD instructions

• the compiler has to analyze the scheduling of independent operations– the analysis depends on the coding style

Page 9: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Can C be made efficient for DSPs?

YES,

IF the programmer understands• compiler extensions• programming techniques• compiler behavior

Page 10: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Compiler Extensions

• intrinsics– access efficiently the target instruction set

• new data types– use processor-specific data formats in C

• pragmas– providing extra-information to the compiler

• inline assembly– the solution to “impossible in C” problem

Page 11: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Intrinsic functions (Intrinsics)

• special compiler-translated function calls• map to one or more target instructions• extend the set of operators in C• define new types

– simple and double precision fractional– extended precision fractionals

• emulation possible on other platforms

Page 12: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Data Types

• DSPs may handle several data formats– identical with C basic types

• integers– common length, but different semantics

• fractionals on 16 and 32 bits– particular DSP (i.e. including the extension)

• fractionals on 40 bits

• all DSP specific data types come with a set of operators

Page 13: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

C Types for Fractionals

• conventions:– Word16 for 16 bit fractionals– Word32 for 32 bit fractionals– Word40 for 40 bit fractionals (32 bits plus 8 guard bits)– Word64 for 64 bit fractionals (not direct representation on

SC140)

• intrinsics, not conventions, impose the fractional property on values

Page 14: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Fractional Emulation

• needed on non-StarCore platforms• maintain the meaning of fractional operations• in some limit cases, the emulated results are

not correct– e.g. mpysu uses 32-bits for results plus sign

• special Metrowerks intrinsics to solve possible inconsistencies– e.g. mpysu_shr16

Page 15: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Fractional Values and Operators

• introduced by fractional intrinsics• operators standardized (ITU-T, ETSI)

– addition, multiplication– saturation, rounding– scaling, normalization

• operate on specific data typesWord16 add(Word16, Word16);Word32 L_add(Word32, Word32);Word40 X_add(Word40, Word40);Word64 D_add(Word64, Word64);

Page 16: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Pragmas

• standard mode to speak with compilers• describe code or data characteristics• pragmas do not influence the correctness• but influence significantly the performance

#pragma align signal 8#pragma loop_unroll 4#pragma inline#pragma loop_count (20, 40, 4)

Page 17: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Alignment pragma

• #pragma align *ptr 8– for parameters

• #pragma align ptr 8– for vector declaration and definition

• derived pointers may not inherit the alignmentWord16 *p = aligned_ptr + 24; /* not recommended */#define p (aligned_ptr + 24) /* recommended */

• create a new function if you want to specify alignment for internal pointers

Page 18: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Inline Assembly

• whole functions• only one instruction

– asm(“ di”);

• difficult access to local function symbols– global can be accessed by their name

• the asm statements block the C optimization process

Page 19: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Inline Assembly

• access to special target features not available through intrinsics

• special calling convention– the programmer has more freedom

• fast assembly integration with C– easy integration of legacy code

• two types– one instruction, e.g. asm(“ di");– a whole function

Page 20: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Inline Assembly - Exampleasm Flag GetOverflow(void){asm_header return in $d0;.reg $d0,$d1;asm_body clr d0 ; extra cycle needed to allow DOVF to ; be written even by the instructions ; in the delay slot of JSRD bmtsts #$0004,EMR.L ; test the Overflow bit from EMR move.w #1,d1 tfrt d1,d0 ; if Overflow return 1, else return 0asm_end}

Page 21: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Compiler Optimizations

• standard C optimizations• low level (target) optimizations• several optimization levels available

– fastest code– smallest code

• global optimizations– all files are analyzed as a whole– the best performance

Page 22: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

DSP Application Success Factors

• high-performance, compiler-friendly target• good optimizing compiler• good C coding style in source code

=> the only factor tunable by the programmer is the C coding style

– coding style includes programming techniques– no compiler can defeat from GIGO

Page 23: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Programming Techniques

• described in the compiler’s manual– styles to be used– pitfalls to be avoided

• explicitly eliminate data dependencies• complement compiler’s transformations that

restructure the code

Page 24: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

SC140 Programming Techniques

• general DSP programming techniques– loop merging– loop unrolling– loop splitting

• specific to StarCore and its generation– split computation– multisample

Page 25: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Merging

• combines two loops in a single one– precondition: the same number of iterations– may eliminate inter-loop storage space

• increases cache locality• decreases loop overhead• increases DALU usage per loop iteration• typically, reduces code size

Page 26: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Merging Example/* scaling loop */for ( i = 0; i < SIG_LEN; i++){ y[i] = shr(y[i], 2);}/* energy computation */e = 0;for ( i = 0; i < SIG_LEN; i++){ e = L_mac(e, y[i], y[i]);}

/* Compute in the same time the *//* energy of the scaled signal */e = 0; for (i = 0; i < SIG_LEN; i ++) { Word16 temp; temp = shr(y[i], 2); e = L_mac(e, temp, temp); y[i] = temp;}

Page 27: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Unrolling

• repeats the body of a loop• increases the DALU usage per loop step• increases the code size• depends on

– data dependencies between iterations– data alignment of implied vectors– register pressure per iteration– number of iterations to be performed

• keeps the bit-exactness

Page 28: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Unrolling Example int i; Word16 signal[SIGNAL_SIZE]; Word16 scaled_signal[SIGNAL_SIZE];

/* ... */ for(i=0; i<SIGNAL_SIZE; i++) { scaled_signal[i] = shr(signal[i], 2); } /* ... */

int i; Word16 signal[SIGNAL_SIZE]; #pragma align signal 8 Word16 scaled_signal[SIGNAL_SIZE]; #pragma align scaled_signal 8

/* ... */ for(i=0; i<SIGNAL_SIZE; i+=4) { scaled_signal[i+0] = shr(signal[i+0], 2); scaled_signal[i+1] = shr(signal[i+1], 2); scaled_signal[i+2] = shr(signal[i+2], 2); scaled_signal[i+3] = shr(signal[i+3], 2); } /* ... */

Page 29: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Compiler generated code; Without loop unrolling : 6 cycles ; inside the looploopstart3 L5 [ add #<2,d3 ;[20,1] move.l d3,r2;[20,1] move.l <_signal,r1 ;[20,1] ] move.l <_scaled_signal,r4 adda r2,r1 [ move.w (r1),d4 adda r2,r4 ] asrr #<2,d4 move.w d4,(r4)loopend3

; With loop unrolling : 2 cycles inside; the loop. ; The complier uses software pipelining[ move.4f (r0)+,d0:d1:d2:d3 move.l #_scaled_signal,r1][ asrr #<2,d0 asrr #<2,d3 asrr #<2,d1 asrr #<2,d2 ] loopstart3 [ moves.4f d0:d1:d2:d3,(r1)+ move.4f (r0)+,d0:d1:d2:d3][ asrr #<2,d0 asrr #<2,d1 asrr #<2,d2 asrr #<2,d3] loopend3 moves.4f d0:d1:d2:d3,(r1)+

Page 30: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Why to use loop unrolling?

• increases DALU usage per loop step• explicit specification of value reuse• elimination of “uncompressible” cycles

– destination the same as the source

Page 31: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop unroll factor

• difficult to provide a formula• depends on various parameters

– data alignment– number of memory operations– number of computations– possible value reuse– number of iterations in the loop

• unroll the loop until you get no gain• two and four are typical unroll factors

– found also three in max_ratio benchmark

Page 32: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Unroll Factor – example 1#include <prototype.h>

#define VECTOR_SIZE 40

Word16 example1(Word16 a[], Word16 incr) { short i; Word16 b[VECTOR_SIZE]; for(i = 0; i < VECTOR_SIZE; i++) { b[i] = add(a[i], incr); } return b[0];}

Page 33: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Unroll Factor - example 2#include <prototype.h>

#define VECTOR_SIZE 40

Word16 example2(Word16 a[], Word16 incr)

{#pragma align *a 8 int i; Word16 b[VECTOR_SIZE]; #pragma align b 8 for(i = 0; i < VECTOR_SIZE; i++) { b[i] = add(a[i], incr); } return b[0];}

; compiler generated code move.4f (r0)+,d4:d5:d6:d7 ... loopstart3 [ add d4,d8,d12 add d5,d8,d13 add d6,d8,d14 add d7,d8,d15 moves.4f d12:d13:d14:d15,(r1)+ move.4f (r0)+,d4:d5:d6:d7 ] loopend3

Page 34: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Unroll Factor - example 3#include <prototype.h>

#define VECTOR_SIZE 40

Word16 example3(Word16 a[], Word16 incr)

{#pragma align *a 8 int i; Word16 b[VECTOR_SIZE]; #pragma align b 8 for(i = 0; i < VECTOR_SIZE; i++) { b[i] = a[i] >> incr; } return b[0];}

; compiler generated code loopstart3 [ move.4w d4:d5:d6:d7,(r1)+ move.4w (r0)+,d4:d5:d6:d7 ] [ asrr d1,d4 asrr d1,d5 asrr d1,d6 asrr d1,d7 ] loopend3

Page 35: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Split Computation

• typically used in associative reductions– energy– mean square error– maximum

• used to increase DALU usage• increases the code size• may require alignment properties• may influence the bit-exactness

Page 36: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Split Computation Example int i; Word32 e; Word16 signal[SIGNAL_SIZE];

/* ... */ e = 0; for(i = 0; i < SIGNAL_LEN; i++) { e = L_mac(e, signal[i], signal[i]); } /* the energy is now in e */

int i; Word32 e0, e1, e2, e3; Word16 signal[SIGNAL_SIZE]; #pragma align signal 8

/* ... */ e0 = e1 = e2 = e3 = 0; for(i = 0; i < SIGNAL_LEN; i+=4) { e0 = L_mac(e0, signal[i+0], signal[i+0]); e1 = L_mac(e1, signal[i+1], signal[i+1]); e2 = L_mac(e2, signal[i+2], signal[i+2]); e3 = L_mac(e3, signal[i+3], signal[i+3]); } e0 = L_add(e0, e1); e1 = L_add(e2, e3); e0 = L_add(e0, e1); /* the energy is now in e0 */

Page 37: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

How many splits?

• depending on memory alignment• loop count should be multiple of number of

splits• the split have to be recombined to get the final

result– decision depends on the nesting level

Page 38: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Loop Splitting

• splits a loop into two or more separate loops• increases cache locality in big loops• decreases register pressure• minimizes dependencies• may create auxiliary storage

Page 39: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample

• complex transformation for nested loops– loop unrolling on the outer loop– loop merging of the inner loops– loop unrolling of the merged inner loop

• keeps bit-exactness• does not impose alignment restrictions• reduces the data move operations

Page 40: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample Example - Step 1Word32 acc;Word16 x[], c[];int i,j,N,T;assert((N % 4) == 0);assert((T % 4) == 0);

for (j = 0; j < N; j++) { acc = 0; for (i = 0; i < T; i++) { acc = L_mac(acc, x[i], c[j+i]); } res[j] = acc;}

for (j = 0; j < N; j += 4) { acc0 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); res[j+0] = acc0; acc1 = 0; for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); res[j+1] = acc1; acc2 = 0; for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); res[j+2] = acc2; acc3 = 0; for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+3] = acc3;}

1 2

Outer loop unrolling

Page 41: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

for (j = 0; j < N; j += 4) { acc0 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); res[j+0] = acc0; acc1 = 0; for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); res[j+1] = acc1; acc2 = 0; for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); res[j+2] = acc2; acc3 = 0; for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+3] = acc3;}

Multisample Example - Step 2

for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

23

Rearrangement

Page 42: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample Example - Step 3for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

3

for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) { acc0 = L_mac(acc0, x[i], c[j+0+i]); acc1 = L_mac(acc1, x[i], c[j+1+i]); acc2 = L_mac(acc2, x[i], c[j+2+i]); acc3 = L_mac(acc3, x[i], c[j+3+i]); } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

4

Inner loop merging

Page 43: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample Example - Step 4for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) { acc0 = L_mac(acc0, x[i], c[j+0+i]); acc1 = L_mac(acc1, x[i], c[j+1+i]); acc2 = L_mac(acc2, x[i], c[j+2+i]); acc3 = L_mac(acc3, x[i], c[j+3+i]); } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

4

Inner loop unrolling

for (j = 0; j < N; j += 4) { acc0 = 0;acc1 = 0;acc2 = 0;acc3 = 0; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, x[i+0], c[j+0+i]); acc1 = L_mac(acc1, x[i+0], c[j+1+i]); acc2 = L_mac(acc2, x[i+0], c[j+2+i]); acc3 = L_mac(acc3, x[i+0], c[j+3+i]);

acc0 = L_mac(acc0, x[i+1], c[j+1+i]); acc1 = L_mac(acc1, x[i+1], c[j+2+i]); acc2 = L_mac(acc2, x[i+1], c[j+3+i]); acc3 = L_mac(acc3, x[i+1], c[j+4+i]); /* third loop body for x[i+2] */ /* fourth loop body for x[i+3] */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

5

Page 44: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample Example - Step 5Explicit scalarizationfor (j = 0; j < N; j += 4) {

acc0 = 0;acc1 = 0;acc2 = 0;acc3 = 0; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, x[i+0], c[j+0+i]); acc1 = L_mac(acc1, x[i+0], c[j+1+i]); acc2 = L_mac(acc2, x[i+0], c[j+2+i]); acc3 = L_mac(acc3, x[i+0], c[j+3+i]);

acc0 = L_mac(acc0, x[i+1], c[j+1+i]); acc1 = L_mac(acc1, x[i+1], c[j+2+i]); acc2 = L_mac(acc2, x[i+1], c[j+3+i]); acc3 = L_mac(acc3, x[i+1], c[j+4+i]); /* third loop body for x[i+2] */ /* fourth loop body for x[i+3] */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

5 for (j = 0; j < N; j += 4) { acc0=acc1=acc2=acc3=0; xx = x[i+0]; c0 = c[j+0]; c1 = c[j+1]; c2 = c[j+2]; c3 = c[j+3]; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, xx, c0); acc1 = L_mac(acc1, xx, c1); acc2 = L_mac(acc2, xx, c2); acc3 = L_mac(acc3, xx, c3); xx = x[i+1]; c0 = c[j+4+i]; acc0 = L_mac(acc0, xx, c1); acc1 = L_mac(acc1, xx, c2); acc2 = L_mac(acc2, xx, c3); acc3 = L_mac(acc3, xx, c0); xx = x[i+2]; c0 = c[j+5+i]; /* similar third and fourth loop */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

6

Page 45: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Multisample Example - Final Resultfor (j = 0; j < N; j += 4) { acc0=acc1=acc2=acc3=0; xx = x[i+0]; c0 = c[j+0]; c1 = c[j+1]; c2 = c[j+2]; c3 = c[j+3]; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, xx, c0); acc1 = L_mac(acc1, xx, c1); acc2 = L_mac(acc2, xx, c2); acc3 = L_mac(acc3, xx, c3); xx = x[i+1]; c0 = c[j+4+i]; acc0 = L_mac(acc0, xx, c1); acc1 = L_mac(acc1, xx, c2); acc2 = L_mac(acc2, xx, c3); acc3 = L_mac(acc3, xx, c0); xx = x[i+2]; c0 = c[j+5+i]; /* similar third and fourth loop */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}

one cycle (4 macs + 2 moves)

one cycle (4 macs + 2 moves)

Page 46: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Search Techniques Adapted for StarCore SC140

• finding for maximum• finding the maximum and its position• finding the maximum ratio• finding the position of the maximum ratio

Page 47: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Maximum search• reduction operation => split maximum

#include <prototype.h>

Word16 max_search(Word16 vector[], unsigned short int length ) {#pragma align *vector 8 signed short int i = 0; Word16 max0, max1, max2, max3;

max0 = vector[i+0]; max1 = vector[i+1]; max2 = vector[i+2]; max3 = vector[i+3]; for(i=4; i<length; i+=4) { max0 = max(max0, vector[i+0]); max1 = max(max1, vector[i+1]); max2 = max(max2, vector[i+2]); max3 = max(max3, vector[i+3]); } max0 = max(max0, max1); max1 = max(max2, max3); return max(max0, max1);}

Page 48: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Maximum search - Comments

• presented solution works for both 16-bit and 32-bit values

• better solution for 16-bit (asm only)– based on max2 instruction (SIMD style)– fetches two 16-bit values as a 32-bit one– eight maxima per cycle

Page 49: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Maximum Search – ASM with max2

move.2l (r0)+n0,d4:d5 move.2l (r1)+n0,d6:d7 move.2l (r0)+n0,d0:d1 move.2l (r1)+n0,d2:d3 FALIGN LOOPSTART3 [ max2 d0,d4 max2 d1,d5 max2 d2,d6 max2 d3,d7 move.2l (r0)+n0,d0:d1 move.2l (r1)+n0,d2:d3 ] LOOPEND3 [ max2 d0,d4 max2 d1,d5 max2 d2,d6 max2 d3,d7 ]

Page 50: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Maximum position

• based on comparisons– around N cycles– in C, two maxima required

• based on maximum search– around N/2 cycles

• based on max2-based maximum search– around N/4 cycles

• care must be taken to preserve the original semantics

Page 51: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Maximum ratio

• given Word16 a[ ] and Word16 b[ ] vectors of positive values, compute max{a[i]/b[i]}

• division is very expensive on StarCore• ideas:

– a/b < c/d a*d < b*c– keep a[ ] and b[ ] intermixed

• two 16-bit values form a 32-bit one– use cross multiplication (mpyus, mpysu)

• final solution in N cycles

Page 52: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Max ratio position

• based on max ratio search, plus– movet, tfrt instructions– position is kept as pointer, not index

• software pipelining plus loop unrolling three times

Page 53: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

DSP Coding Recommendations

• use loops with a fixed number of iterations– enables the detection of hardware loops

• provide extensive static information– pragmas, constants

• use only supported data types• do not mix integer and fractional operations on

the same value

Page 54: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

DSP Coding Recommendations (2)

• replace tests with computations– small tests are ok due to predicative execution

• use synonym operations if more flexible in the target architecture

• use modulo addressing• alignment in data structures should be

provided using field arrangement• use custom calling conventions

Page 55: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Software Development Notes

• SDN database contains coding guidelines– try different solutions for the same problem– analyze the generated assembly file– save the best solution in the SDN database

• SDN DB captures team experience• leverage organizational development skills

Page 56: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Speed versus Size

• typically cannot be both optimized• follow the 80%-20% rule

– speed optimize 20% of the code– size optimize 80% of the code

• loop merging may optimize both• a high register pressure may kill both• multisample provides best speed

– but also significant kernel size increase

Page 57: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Diet Speed Optimization

• compile for speed only time-consuming loops– compile for size the rest

• try speed and size optimizations combined

• reduce the unroll factors in the multisample transformation

• avoid register pressure that creates spill code

Page 58: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

C Optimization Limits

• effort concentrated on loops– no packed moves outside loops– loops and ifs delimit optimization blocks

Page 59: C Optimization Techniques for StarCore SC140

Motorola General Business Information

Version # : 1.1 Date: 11-12-2003

MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.

Conclusions

• complexity of future DSP applications require development using C

• C can be transformed in efficient DSP C– compiler extensions are a must

• the coding style is programmer’s key• optimized C code is suitable for high

performance applications• assembly remains for critical sections