EXCELlentway of looking at FIR optimization as a function...

26
EXCELlent way of looking at FIR optimization as a function of processor architecture Assignment 3 Knowledge expected by midterm

Transcript of EXCELlentway of looking at FIR optimization as a function...

Page 1: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

EXCELlent way of looking at FIR optimization as a function of 

processor architectureAssignment 3

Knowledge expected by midterm

Page 2: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Start with basic FIR filter

float FIR_Filter(float newValue, float *FIFO, float *coeffs, int numTaps)R4                R8                  R12                 ?

Course exams– I WILL PROBALY say – pretend numTaps comes in R16

How to handle in real life  – write in C++ first and see what compiler does to handle this situation – then copy thatg

Page 3: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Careful – Compiler treats these situation differently as “it knows more in second case”float FIR_Filter_1(float newValue, float *FIFO, float *coeffs, int numTaps){

}

And Extern volatile float FIFO[ ];Extern volatile float coeffs[ ]; float FIR_Filter_2(float newValue, int numTaps){

}

Page 4: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

And these differently – and perhaps differently between debug and release modesExtern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; #define numTaps 120float FIR_Filter_2(float newValue) {}

Extern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; Volatile int numTaps = 120;float FIR_Filter_2(float newValue) {

}

Extern volatile float FIFO[ ]; Extern volatile float coeffs[ ]; int numTaps = 120;float FIR_Filter_2(float newValue) {

}

Page 5: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Standard FIR filter from Lab 1

float FIR_Filter(float newValue, float *FIFO, float *coeffs, int numTaps) {For (int count = 1; count < numTaps, count++) FIFO[count – 1] = FIFO[count];

float *FIFOpt = FIFO + numTaps – 1;  //  DOes C do pointer arithmetic?*FIFOpt = newValue;

sum = 0.0;for (int count = 0; count < numTaps, count++)  

sum = sum + *FIFOpt‐‐ *  *coeffs++;

return sum

Page 6: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Assume processor architecture is von‐Neumann and can’t do data fetch, add or multiplication in same cycle

Page 7: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Now – increase cycle time by 25% to do pt++ in same cycle as fetch – STEP 1

Page 8: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Now – increase cycle time by 25% to do pt++ in same cycle as fetch – STEP 1   ‐‐ Change pipeline to allow 1 Math op to occur during next fetch – STEP 2 

Page 9: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

UNROLL LOOP  TO OPEN UP OTHER POSSIBLE PARALLEL INSTRUCTIONSTOTALLY MEMORY / DAG 1 RESOURCE LIMITEDNEED TO CHANGE PROCESSOR ARCHITECTURE

Instead of 1 cycle mult + 1 cycle addUse 2 cycle (pipeline MACC instruction)Multiply / Accumulate

Page 10: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Does 1 or 2 cycle MACC improve performance

• FETCH MULT INSTRUCTION• DO MULT ‐‐ FETCH ADD INSTRUCTION• DO ADD

• Compared to 2 cycle MACC• FETCH MACC INSTRUCTION• DO MULT• DO ADD

Page 11: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Assume Harvard – Architecture with floating point MACC (SHARC)

Page 12: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Harvard processor without the MACColour each resource for an instruction

Page 13: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Take advantage (carefully) of parallel DM and PM operations to fetch instructions earlier

Page 14: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

In principle 4 cycles faster for twice round the loop – but data dependencies conflict

Page 15: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

You complete the analysis with separate Add and Mult instruction

Page 16: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Show the advantages  of using a 2 cycle MACC instruction.  Is 1 cycle MACC offer any further advantage?

Page 17: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Move over to Super Harvard architecture with instruction cache in use always.  Start using PM bus for data ops

Page 18: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

• DON’T LOOK AT NEXT SLIDE UNTIL YOU HAVE TACKLED LAST SLIDE

Page 19: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Loop of size 10 for twice around loopKey resource – FETCH INSTR  8 / 10

Page 20: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Using cache ONLY when instr / data conflict on pm bus means can have smaller (cheaper) cache

Page 21: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Get more speed by UNROLLING THE LOOP 3 times and then thinking

Page 22: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Re‐Roll the loop and execute N‐2 times

Page 23: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Next step – MOVE TO VLIW instruction setWHERE INSTR ALLOWS MATH‐OP, dm and pm fetch at the same time

DOES NOT HAVE TO WAIT

Page 24: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Next step – MOVE TO V‐VLIW instruction setWHERE INSTR ALLOWS + and *, dm and pm fetch at the same time

DOES NOT HAVE TO WAIT

IF USE V‐VLIW INSTR* + dm pm

then loop is 1 cycle

Page 25: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

FIR loop look like this

• FETCH DATA1• FETCH DATA2, DO MULT OF DATA1• FETCH DATA3, DO MULT OF DATA2, ADD OF DATA1• FETCH DATA4, DO MULT OF DATA3, ADD OF DATA2• DO MULT OF DATA4, ADD OF DATA3• ADD OF DATA4

Page 26: EXCELlentway of looking at FIR optimization as a function ...people.ucalgary.ca/~smithmr/2017webs/encm515_enel653_17/17_Le… · EXCELlentway of looking at FIR optimization as a function

Lab 2

• Programming VLIW assembly code (single cycle FIR hardware loop)

• Does C++ automatically switch to this mode in release mode if we pass dm and pm memory array pointers

• If not – how do we make C++ switch to this mode