C Optimization Techniques for StarCore SC140
description
Transcript of C Optimization Techniques for StarCore SC140
![Page 1: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/1.jpg)
Motorola General Business Information
Version #: 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
C Optimization Techniques for StarCore SC140
Created by Bogdan Costinescu
![Page 2: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/2.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Agenda
• C and DSP Software Development• Compiler Extensions for DSP Programming• Coding Style and DSP Programming
Techniques• DSP Coding Recommendations• Conclusions
![Page 3: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/3.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
What do optimizations mean?
• reduction in cycle count• reduction in code size• reduction in data structures• reduction in stack memory size• reduction of power consumption
• how do we measure if we did it right?
![Page 4: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/4.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Why C in DSP Programming?
• C is a "structured programming language designed to produce a compact and efficient translation of a program into machine language”
• high-level, but close to assembly• large set of tools available• flexible, maintainable, portable
=> reusability
![Page 5: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/5.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Compiler-friendly Architectures
• large set of registers• orthogonal register file• flexible addressing modes• few restrictions• native support for several data types
![Page 6: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/6.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Example - StarCore SC140
• Variable Length Execution Set architecture– one execution set can hold up to:
• four DALU instructions• two AGU instructions
• scalable architecture with rich instruction set• large orthogonal register file
– 16 DALU– 16 + 4 + 4 AGU
• short pipeline => few restrictions• direct support for integers and fractionals
![Page 7: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/7.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Is C a Perfect Match for DSPs?
• fixed-point not supported• no support for dual-memory architectures• no port I/O or special instructions• no support for SIMD instructions• vectors represented by pointers
![Page 8: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/8.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
C Support for Parallelism
• basically, NONE• C = a sequential programming language
– cannot express actions in parallel– no equivalent of SIMD instructions
• the compiler has to analyze the scheduling of independent operations– the analysis depends on the coding style
![Page 9: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/9.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Can C be made efficient for DSPs?
YES,
IF the programmer understands• compiler extensions• programming techniques• compiler behavior
![Page 10: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/10.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Compiler Extensions
• intrinsics– access efficiently the target instruction set
• new data types– use processor-specific data formats in C
• pragmas– providing extra-information to the compiler
• inline assembly– the solution to “impossible in C” problem
![Page 11: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/11.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Intrinsic functions (Intrinsics)
• special compiler-translated function calls• map to one or more target instructions• extend the set of operators in C• define new types
– simple and double precision fractional– extended precision fractionals
• emulation possible on other platforms
![Page 12: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/12.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Data Types
• DSPs may handle several data formats– identical with C basic types
• integers– common length, but different semantics
• fractionals on 16 and 32 bits– particular DSP (i.e. including the extension)
• fractionals on 40 bits
• all DSP specific data types come with a set of operators
![Page 13: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/13.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
C Types for Fractionals
• conventions:– Word16 for 16 bit fractionals– Word32 for 32 bit fractionals– Word40 for 40 bit fractionals (32 bits plus 8 guard bits)– Word64 for 64 bit fractionals (not direct representation on
SC140)
• intrinsics, not conventions, impose the fractional property on values
![Page 14: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/14.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Fractional Emulation
• needed on non-StarCore platforms• maintain the meaning of fractional operations• in some limit cases, the emulated results are
not correct– e.g. mpysu uses 32-bits for results plus sign
• special Metrowerks intrinsics to solve possible inconsistencies– e.g. mpysu_shr16
![Page 15: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/15.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Fractional Values and Operators
• introduced by fractional intrinsics• operators standardized (ITU-T, ETSI)
– addition, multiplication– saturation, rounding– scaling, normalization
• operate on specific data typesWord16 add(Word16, Word16);Word32 L_add(Word32, Word32);Word40 X_add(Word40, Word40);Word64 D_add(Word64, Word64);
![Page 16: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/16.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Pragmas
• standard mode to speak with compilers• describe code or data characteristics• pragmas do not influence the correctness• but influence significantly the performance
#pragma align signal 8#pragma loop_unroll 4#pragma inline#pragma loop_count (20, 40, 4)
![Page 17: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/17.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Alignment pragma
• #pragma align *ptr 8– for parameters
• #pragma align ptr 8– for vector declaration and definition
• derived pointers may not inherit the alignmentWord16 *p = aligned_ptr + 24; /* not recommended */#define p (aligned_ptr + 24) /* recommended */
• create a new function if you want to specify alignment for internal pointers
![Page 18: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/18.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Inline Assembly
• whole functions• only one instruction
– asm(“ di”);
• difficult access to local function symbols– global can be accessed by their name
• the asm statements block the C optimization process
![Page 19: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/19.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Inline Assembly
• access to special target features not available through intrinsics
• special calling convention– the programmer has more freedom
• fast assembly integration with C– easy integration of legacy code
• two types– one instruction, e.g. asm(“ di");– a whole function
![Page 20: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/20.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Inline Assembly - Exampleasm Flag GetOverflow(void){asm_header return in $d0;.reg $d0,$d1;asm_body clr d0 ; extra cycle needed to allow DOVF to ; be written even by the instructions ; in the delay slot of JSRD bmtsts #$0004,EMR.L ; test the Overflow bit from EMR move.w #1,d1 tfrt d1,d0 ; if Overflow return 1, else return 0asm_end}
![Page 21: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/21.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Compiler Optimizations
• standard C optimizations• low level (target) optimizations• several optimization levels available
– fastest code– smallest code
• global optimizations– all files are analyzed as a whole– the best performance
![Page 22: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/22.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
DSP Application Success Factors
• high-performance, compiler-friendly target• good optimizing compiler• good C coding style in source code
=> the only factor tunable by the programmer is the C coding style
– coding style includes programming techniques– no compiler can defeat from GIGO
![Page 23: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/23.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Programming Techniques
• described in the compiler’s manual– styles to be used– pitfalls to be avoided
• explicitly eliminate data dependencies• complement compiler’s transformations that
restructure the code
![Page 24: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/24.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
SC140 Programming Techniques
• general DSP programming techniques– loop merging– loop unrolling– loop splitting
• specific to StarCore and its generation– split computation– multisample
![Page 25: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/25.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Merging
• combines two loops in a single one– precondition: the same number of iterations– may eliminate inter-loop storage space
• increases cache locality• decreases loop overhead• increases DALU usage per loop iteration• typically, reduces code size
![Page 26: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/26.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Merging Example/* scaling loop */for ( i = 0; i < SIG_LEN; i++){ y[i] = shr(y[i], 2);}/* energy computation */e = 0;for ( i = 0; i < SIG_LEN; i++){ e = L_mac(e, y[i], y[i]);}
/* Compute in the same time the *//* energy of the scaled signal */e = 0; for (i = 0; i < SIG_LEN; i ++) { Word16 temp; temp = shr(y[i], 2); e = L_mac(e, temp, temp); y[i] = temp;}
![Page 27: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/27.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Unrolling
• repeats the body of a loop• increases the DALU usage per loop step• increases the code size• depends on
– data dependencies between iterations– data alignment of implied vectors– register pressure per iteration– number of iterations to be performed
• keeps the bit-exactness
![Page 28: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/28.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Unrolling Example int i; Word16 signal[SIGNAL_SIZE]; Word16 scaled_signal[SIGNAL_SIZE];
/* ... */ for(i=0; i<SIGNAL_SIZE; i++) { scaled_signal[i] = shr(signal[i], 2); } /* ... */
int i; Word16 signal[SIGNAL_SIZE]; #pragma align signal 8 Word16 scaled_signal[SIGNAL_SIZE]; #pragma align scaled_signal 8
/* ... */ for(i=0; i<SIGNAL_SIZE; i+=4) { scaled_signal[i+0] = shr(signal[i+0], 2); scaled_signal[i+1] = shr(signal[i+1], 2); scaled_signal[i+2] = shr(signal[i+2], 2); scaled_signal[i+3] = shr(signal[i+3], 2); } /* ... */
![Page 29: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/29.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Compiler generated code; Without loop unrolling : 6 cycles ; inside the looploopstart3 L5 [ add #<2,d3 ;[20,1] move.l d3,r2;[20,1] move.l <_signal,r1 ;[20,1] ] move.l <_scaled_signal,r4 adda r2,r1 [ move.w (r1),d4 adda r2,r4 ] asrr #<2,d4 move.w d4,(r4)loopend3
; With loop unrolling : 2 cycles inside; the loop. ; The complier uses software pipelining[ move.4f (r0)+,d0:d1:d2:d3 move.l #_scaled_signal,r1][ asrr #<2,d0 asrr #<2,d3 asrr #<2,d1 asrr #<2,d2 ] loopstart3 [ moves.4f d0:d1:d2:d3,(r1)+ move.4f (r0)+,d0:d1:d2:d3][ asrr #<2,d0 asrr #<2,d1 asrr #<2,d2 asrr #<2,d3] loopend3 moves.4f d0:d1:d2:d3,(r1)+
![Page 30: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/30.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Why to use loop unrolling?
• increases DALU usage per loop step• explicit specification of value reuse• elimination of “uncompressible” cycles
– destination the same as the source
![Page 31: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/31.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop unroll factor
• difficult to provide a formula• depends on various parameters
– data alignment– number of memory operations– number of computations– possible value reuse– number of iterations in the loop
• unroll the loop until you get no gain• two and four are typical unroll factors
– found also three in max_ratio benchmark
![Page 32: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/32.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Unroll Factor – example 1#include <prototype.h>
#define VECTOR_SIZE 40
Word16 example1(Word16 a[], Word16 incr) { short i; Word16 b[VECTOR_SIZE]; for(i = 0; i < VECTOR_SIZE; i++) { b[i] = add(a[i], incr); } return b[0];}
![Page 33: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/33.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Unroll Factor - example 2#include <prototype.h>
#define VECTOR_SIZE 40
Word16 example2(Word16 a[], Word16 incr)
{#pragma align *a 8 int i; Word16 b[VECTOR_SIZE]; #pragma align b 8 for(i = 0; i < VECTOR_SIZE; i++) { b[i] = add(a[i], incr); } return b[0];}
; compiler generated code move.4f (r0)+,d4:d5:d6:d7 ... loopstart3 [ add d4,d8,d12 add d5,d8,d13 add d6,d8,d14 add d7,d8,d15 moves.4f d12:d13:d14:d15,(r1)+ move.4f (r0)+,d4:d5:d6:d7 ] loopend3
![Page 34: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/34.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Unroll Factor - example 3#include <prototype.h>
#define VECTOR_SIZE 40
Word16 example3(Word16 a[], Word16 incr)
{#pragma align *a 8 int i; Word16 b[VECTOR_SIZE]; #pragma align b 8 for(i = 0; i < VECTOR_SIZE; i++) { b[i] = a[i] >> incr; } return b[0];}
; compiler generated code loopstart3 [ move.4w d4:d5:d6:d7,(r1)+ move.4w (r0)+,d4:d5:d6:d7 ] [ asrr d1,d4 asrr d1,d5 asrr d1,d6 asrr d1,d7 ] loopend3
![Page 35: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/35.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Split Computation
• typically used in associative reductions– energy– mean square error– maximum
• used to increase DALU usage• increases the code size• may require alignment properties• may influence the bit-exactness
![Page 36: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/36.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Split Computation Example int i; Word32 e; Word16 signal[SIGNAL_SIZE];
/* ... */ e = 0; for(i = 0; i < SIGNAL_LEN; i++) { e = L_mac(e, signal[i], signal[i]); } /* the energy is now in e */
int i; Word32 e0, e1, e2, e3; Word16 signal[SIGNAL_SIZE]; #pragma align signal 8
/* ... */ e0 = e1 = e2 = e3 = 0; for(i = 0; i < SIGNAL_LEN; i+=4) { e0 = L_mac(e0, signal[i+0], signal[i+0]); e1 = L_mac(e1, signal[i+1], signal[i+1]); e2 = L_mac(e2, signal[i+2], signal[i+2]); e3 = L_mac(e3, signal[i+3], signal[i+3]); } e0 = L_add(e0, e1); e1 = L_add(e2, e3); e0 = L_add(e0, e1); /* the energy is now in e0 */
![Page 37: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/37.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
How many splits?
• depending on memory alignment• loop count should be multiple of number of
splits• the split have to be recombined to get the final
result– decision depends on the nesting level
![Page 38: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/38.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Loop Splitting
• splits a loop into two or more separate loops• increases cache locality in big loops• decreases register pressure• minimizes dependencies• may create auxiliary storage
![Page 39: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/39.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample
• complex transformation for nested loops– loop unrolling on the outer loop– loop merging of the inner loops– loop unrolling of the merged inner loop
• keeps bit-exactness• does not impose alignment restrictions• reduces the data move operations
![Page 40: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/40.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample Example - Step 1Word32 acc;Word16 x[], c[];int i,j,N,T;assert((N % 4) == 0);assert((T % 4) == 0);
for (j = 0; j < N; j++) { acc = 0; for (i = 0; i < T; i++) { acc = L_mac(acc, x[i], c[j+i]); } res[j] = acc;}
for (j = 0; j < N; j += 4) { acc0 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); res[j+0] = acc0; acc1 = 0; for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); res[j+1] = acc1; acc2 = 0; for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); res[j+2] = acc2; acc3 = 0; for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+3] = acc3;}
1 2
Outer loop unrolling
![Page 41: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/41.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
for (j = 0; j < N; j += 4) { acc0 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); res[j+0] = acc0; acc1 = 0; for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); res[j+1] = acc1; acc2 = 0; for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); res[j+2] = acc2; acc3 = 0; for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+3] = acc3;}
Multisample Example - Step 2
for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
23
Rearrangement
![Page 42: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/42.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample Example - Step 3for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) acc0 = L_mac(acc0, x[i], c[j+0+i]); for (i = 0; i < T; i++) acc1 = L_mac(acc1, x[i], c[j+1+i]); for (i = 0; i < T; i++) acc2 = L_mac(acc2, x[i], c[j+2+i]); for (i = 0; i < T; i++) acc3 = L_mac(acc3, x[i], c[j+3+i]); res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
3
for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) { acc0 = L_mac(acc0, x[i], c[j+0+i]); acc1 = L_mac(acc1, x[i], c[j+1+i]); acc2 = L_mac(acc2, x[i], c[j+2+i]); acc3 = L_mac(acc3, x[i], c[j+3+i]); } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
4
Inner loop merging
![Page 43: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/43.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample Example - Step 4for (j = 0; j < N; j += 4) { acc0 = 0; acc1 = 0; acc2 = 0; acc3 = 0; for (i = 0; i < T; i++) { acc0 = L_mac(acc0, x[i], c[j+0+i]); acc1 = L_mac(acc1, x[i], c[j+1+i]); acc2 = L_mac(acc2, x[i], c[j+2+i]); acc3 = L_mac(acc3, x[i], c[j+3+i]); } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
4
Inner loop unrolling
for (j = 0; j < N; j += 4) { acc0 = 0;acc1 = 0;acc2 = 0;acc3 = 0; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, x[i+0], c[j+0+i]); acc1 = L_mac(acc1, x[i+0], c[j+1+i]); acc2 = L_mac(acc2, x[i+0], c[j+2+i]); acc3 = L_mac(acc3, x[i+0], c[j+3+i]);
acc0 = L_mac(acc0, x[i+1], c[j+1+i]); acc1 = L_mac(acc1, x[i+1], c[j+2+i]); acc2 = L_mac(acc2, x[i+1], c[j+3+i]); acc3 = L_mac(acc3, x[i+1], c[j+4+i]); /* third loop body for x[i+2] */ /* fourth loop body for x[i+3] */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
5
![Page 44: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/44.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample Example - Step 5Explicit scalarizationfor (j = 0; j < N; j += 4) {
acc0 = 0;acc1 = 0;acc2 = 0;acc3 = 0; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, x[i+0], c[j+0+i]); acc1 = L_mac(acc1, x[i+0], c[j+1+i]); acc2 = L_mac(acc2, x[i+0], c[j+2+i]); acc3 = L_mac(acc3, x[i+0], c[j+3+i]);
acc0 = L_mac(acc0, x[i+1], c[j+1+i]); acc1 = L_mac(acc1, x[i+1], c[j+2+i]); acc2 = L_mac(acc2, x[i+1], c[j+3+i]); acc3 = L_mac(acc3, x[i+1], c[j+4+i]); /* third loop body for x[i+2] */ /* fourth loop body for x[i+3] */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
5 for (j = 0; j < N; j += 4) { acc0=acc1=acc2=acc3=0; xx = x[i+0]; c0 = c[j+0]; c1 = c[j+1]; c2 = c[j+2]; c3 = c[j+3]; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, xx, c0); acc1 = L_mac(acc1, xx, c1); acc2 = L_mac(acc2, xx, c2); acc3 = L_mac(acc3, xx, c3); xx = x[i+1]; c0 = c[j+4+i]; acc0 = L_mac(acc0, xx, c1); acc1 = L_mac(acc1, xx, c2); acc2 = L_mac(acc2, xx, c3); acc3 = L_mac(acc3, xx, c0); xx = x[i+2]; c0 = c[j+5+i]; /* similar third and fourth loop */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
6
![Page 45: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/45.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Multisample Example - Final Resultfor (j = 0; j < N; j += 4) { acc0=acc1=acc2=acc3=0; xx = x[i+0]; c0 = c[j+0]; c1 = c[j+1]; c2 = c[j+2]; c3 = c[j+3]; for (i = 0; i < T; i += 4) { acc0 = L_mac(acc0, xx, c0); acc1 = L_mac(acc1, xx, c1); acc2 = L_mac(acc2, xx, c2); acc3 = L_mac(acc3, xx, c3); xx = x[i+1]; c0 = c[j+4+i]; acc0 = L_mac(acc0, xx, c1); acc1 = L_mac(acc1, xx, c2); acc2 = L_mac(acc2, xx, c3); acc3 = L_mac(acc3, xx, c0); xx = x[i+2]; c0 = c[j+5+i]; /* similar third and fourth loop */ } res[j+0] = acc0; res[j+1] = acc1; res[j+2] = acc2; res[j+3] = acc3;}
one cycle (4 macs + 2 moves)
one cycle (4 macs + 2 moves)
![Page 46: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/46.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Search Techniques Adapted for StarCore SC140
• finding for maximum• finding the maximum and its position• finding the maximum ratio• finding the position of the maximum ratio
![Page 47: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/47.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Maximum search• reduction operation => split maximum
#include <prototype.h>
Word16 max_search(Word16 vector[], unsigned short int length ) {#pragma align *vector 8 signed short int i = 0; Word16 max0, max1, max2, max3;
max0 = vector[i+0]; max1 = vector[i+1]; max2 = vector[i+2]; max3 = vector[i+3]; for(i=4; i<length; i+=4) { max0 = max(max0, vector[i+0]); max1 = max(max1, vector[i+1]); max2 = max(max2, vector[i+2]); max3 = max(max3, vector[i+3]); } max0 = max(max0, max1); max1 = max(max2, max3); return max(max0, max1);}
![Page 48: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/48.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Maximum search - Comments
• presented solution works for both 16-bit and 32-bit values
• better solution for 16-bit (asm only)– based on max2 instruction (SIMD style)– fetches two 16-bit values as a 32-bit one– eight maxima per cycle
![Page 49: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/49.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Maximum Search – ASM with max2
move.2l (r0)+n0,d4:d5 move.2l (r1)+n0,d6:d7 move.2l (r0)+n0,d0:d1 move.2l (r1)+n0,d2:d3 FALIGN LOOPSTART3 [ max2 d0,d4 max2 d1,d5 max2 d2,d6 max2 d3,d7 move.2l (r0)+n0,d0:d1 move.2l (r1)+n0,d2:d3 ] LOOPEND3 [ max2 d0,d4 max2 d1,d5 max2 d2,d6 max2 d3,d7 ]
![Page 50: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/50.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Maximum position
• based on comparisons– around N cycles– in C, two maxima required
• based on maximum search– around N/2 cycles
• based on max2-based maximum search– around N/4 cycles
• care must be taken to preserve the original semantics
![Page 51: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/51.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Maximum ratio
• given Word16 a[ ] and Word16 b[ ] vectors of positive values, compute max{a[i]/b[i]}
• division is very expensive on StarCore• ideas:
– a/b < c/d a*d < b*c– keep a[ ] and b[ ] intermixed
• two 16-bit values form a 32-bit one– use cross multiplication (mpyus, mpysu)
• final solution in N cycles
![Page 52: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/52.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Max ratio position
• based on max ratio search, plus– movet, tfrt instructions– position is kept as pointer, not index
• software pipelining plus loop unrolling three times
![Page 53: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/53.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
DSP Coding Recommendations
• use loops with a fixed number of iterations– enables the detection of hardware loops
• provide extensive static information– pragmas, constants
• use only supported data types• do not mix integer and fractional operations on
the same value
![Page 54: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/54.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
DSP Coding Recommendations (2)
• replace tests with computations– small tests are ok due to predicative execution
• use synonym operations if more flexible in the target architecture
• use modulo addressing• alignment in data structures should be
provided using field arrangement• use custom calling conventions
![Page 55: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/55.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Software Development Notes
• SDN database contains coding guidelines– try different solutions for the same problem– analyze the generated assembly file– save the best solution in the SDN database
• SDN DB captures team experience• leverage organizational development skills
![Page 56: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/56.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Speed versus Size
• typically cannot be both optimized• follow the 80%-20% rule
– speed optimize 20% of the code– size optimize 80% of the code
• loop merging may optimize both• a high register pressure may kill both• multisample provides best speed
– but also significant kernel size increase
![Page 57: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/57.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Diet Speed Optimization
• compile for speed only time-consuming loops– compile for size the rest
• try speed and size optimizations combined
• reduce the unroll factors in the multisample transformation
• avoid register pressure that creates spill code
![Page 58: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/58.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
C Optimization Limits
• effort concentrated on loops– no packed moves outside loops– loops and ifs delimit optimization blocks
![Page 59: C Optimization Techniques for StarCore SC140](https://reader035.fdocuments.us/reader035/viewer/2022062310/56815d3b550346895dcb3fd4/html5/thumbnails/59.jpg)
Motorola General Business Information
Version # : 1.1 Date: 11-12-2003
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2001.
Conclusions
• complexity of future DSP applications require development using C
• C can be transformed in efficient DSP C– compiler extensions are a must
• the coding style is programmer’s key• optimized C code is suitable for high
performance applications• assembly remains for critical sections