Optimization of C Code The C for Speed Ahmed Helmi Maaroufi Pang Hui EL 2310 Scientific Programming.

31
Optimization of C Code The C for Speed Ahmed Helmi Maaroufi Pang Hui EL 2310 Scientific Programming

Transcript of Optimization of C Code The C for Speed Ahmed Helmi Maaroufi Pang Hui EL 2310 Scientific Programming.

Optimization of C CodeThe C for Speed

Ahmed Helmi MaaroufiPang Hui

EL 2310 Scientific Programming

Why optimizing your C code ?

• For less memory consumption.

• For faster computation speed.

Two different optimization goals that might sometimes

conflict with each other——might have to find a balance.

First Things First!

• Optimize your algorithm before even start writing your code. No point optimizing code that is slow by design.

• Make common case fast. use a profiler to identify performance bottlenecks.

So what can we do?

• Line level– Peephole optimization

• Memory level– properly arrange data

types– Quick access to large

arrays– Hot&cold data

separation

• Function level– Strength reduction– Jumps/Branches– Condition checking– Loop trick

• Compiler level– Avoid memory aliasing– Function calls using

global variables

Peephole Optimization

• Performed over a very small set of instructions in a segment of generated code.

• Works by recognizing sets of instructions that can be replaced by shorter or faster sets of instructions.

Peephole Optimization

• For most classes, use the operators += , -= , *= , and /= , instead of the operators + , - , * , and / .

• For objects, use the prefix operator (++obj) instead of the postfix operator (obj++).

• Use shift operations >> and << instead of integer multiplication and division, where possible.

• Test if something is equal to zero is faster than to compare two different numbers.

Peephole Optimization

• Before:

    x = y % 32;

    x = y * 8;

    x = y / w + z / w;

    if ( a==b &&c==d &&e==f ) {...}

• After:

    x = y &31;

    x = y <<3;

    x = (y + z) / w;       if ( ((a-b)|(c-d)|(e-f))==0 ) {...}

Strength Reduction

• Replace expensive operations with equivalent but less expensive operations.

• Many compilers will do this for you automatically.

• The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array addressing.

Strength Reduction

• Before:

c = 7; for (i = 0; i < N; i++){

y[i] = c * i; }

• After:

c = 7; k = 0; for (i = 0; i < N; i++) {

y[i] = k; k = k + c;

}

Minimize jumps/branches

• The elimination of branching is an important concern with today's deeply pipelined processor architectures.

• “Mispredicted" branches often cost many cycles.

Minimize jumps/branches

• Use inline functions for short functions to eliminate function overhead.

• Move loops inside function calls.

• Iteration is preferred over recursion.

Minimize jumps/branches

• Before:

for (i=0;i<N;i++) {

DoSomething(i);}

• After:

DoSomething(N){for (i=0;i<N;i++)

{…} }

Minimize Condition Checking

• You don’t actually process anything when checking conditions.

• Whenever possible, replace if’s with switch’s. • If a switch statement is not possible, put the

most common clauses at the beginning of the if chain.

• Try to remove “else" clause if there is a lop-sided probability.

Loop Tricks

• Loop Unrolling:– reducing the number of iterations and replicating

the body of the loop to reduce loop overhead.• Loop jamming:– combine adjacent loops which loop over the same

range of the same variable.• Early loop breaking:– not necessary to process the entirety of a loop.

Loop Unrolling

• Before:

for (int i=0;i<1000;i++)a[i] = b[i] + c[i];

 

• After:

for (int i=0;i<1000;i+=2) {

a[i] = b[i] + c[i];a[i+1] = b[i+1] + c[i+1];

}

Loop Jamming

• Before:

for (i = 0; i < MAX; i++)

for ( j = 0; j < MAX;

j++)

a[i][j] = 0.0;

for ( i = 0; i < MAX; i+

+)

a[i][i] = 1.0; 

• After:

for ( i = 0; i < MAX; i++)

{

for ( j = 0; j < MAX; j+

+)

a[i][j] = 0.0;

a[i][i] = 1.0;

}

Early loop breakingfound = false;for(i=0;i<10000;i++){ if( list[i] == -99 ) { found = true; }}if( found ) printf(“…”);

found = false;for(i=0; i<10000; i++){ if( list[i] == -99 ) { found = true; break; }}if ( found )

printf(“…”);.

Memory and Cache

• Better arrange members in structure for data aligning. In order to align the data in memory, empty bytes are inserted between memory addresses which are allocated for other members while memory allocation.

• Try to avoid casting where possible. Integer and floating point instructions often operate on different registers, so a cast requires copy and communication between registers.

Memory and Cache

• Use pointers when dealing with large objects

instead of copying them to memory.

• Accessing data the same way as stored in

physical memory—go row after the row in

your matrix.

• Use memset() to copy large arrays in memory.

Memory and Cache

• Before:int a[3][3][3];int b[3][3][3];...for(i=0;i<3;i++) for(j=0;j<3;j++)        for(k=0;k<3;k++)            b[i][j][k] = a[i][j][k];    for(i=0;i<3;i++)      for(j=0;j<3;j++)         for(k=0;k<3;k++)           a[i][j][k] = 0;

• After: typedef struct {int element[3][3][3];

    } Three3DType;

    Three3DType a,b;    ...    b = a;    memset(a,0,sizeof(a));

• Hot & Cold Data Separation: splitting your

data structures into frequently accessed

("hot") and rarely accessed ("cold") sections.

Hot & Cold Data Separation

struct Customer{ int ID; int AccountNumber; char Name[128]; char Address[256];}; Customer customers

[1000];

struct CustomerAccount{ int ID; int AccountNumber; CustomerData *pData;};struct CustomerData{ char Name[128]; char Address[256];}CustomerAccount customers[1000];

Data Alignment

Before:struct structure1{ int id1; char name1; int id2; char name2; float percentage; };

After:struct structure1 { int id1; int id2; char name1; char name2; float percentage;};

Compiler

• Write source code that the compiler can effectively optimize to turn into efficient executable code.

• Therefore important to understand the capabilities and limitations of optimizing compilers.

Compiler Optimization in GCC

• Most compilers, including GCC, provide users with some control over which optimizations they apply.

• The simplest control is to specify the optimization level. invoking GCC with the command-line flag ‘-O1’, ‘-O2’ or ‘-O3’ will cause it to apply a different level of optimizations.

• Optimization may expand the program size and make the program more difficult to debug using standard debugging tools.

Safe Optimizations

• Compilers must be careful to apply only safe

optimizations to a program.

• In performing only safe optimizations, the

compiler must assume that different pointers

may be aliased.

Memory Aliasing

• Therefore, memory aliasing can severely limit the opportunities for a compiler to generate optimized code.

• Programmers using GCC must put more effort into writing programs in a way that simplifies the compiler’s task of generating efficient code.

Memory Aliasing

void twiddle1(int *xp, int *yp)

{*xp += *yp;*xp += *yp;

}

void twiddle2(int *xp, int *yp)

{*xp += 2* *yp;

}

Function Calls Using Global Variables

• Most compilers do not try to check if function

is free of side effects. Instead, it assumes the

worst case and leaves function calls intact.

• Code involving function calls can be optimized

by a process known as inline substitution.

Function Calls Using Global Variables

int f(); int func1() { return f() + f() + f() + f(); }

This function has a side effect—it modifies some part of the global program state. Changing the number of times it gets called changes the program behavior.

Int f();int func2() {return 4*f();}

-Assume this case:

int counter = 0;int f() {return counter++;}

Questions ?