Program design and analysis
description
Transcript of Program design and analysis
![Page 1: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/1.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Program design and analysisCompilation flow.Basic statement translation.Basic optimizations.Interpreters and just-in-time
compilers.
![Page 2: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/2.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
CompilationCompilation strategy (Wirth):
compilation = translation + optimizationCompiler determines quality of code:
use of CPU resources; memory access scheduling; code size.
![Page 3: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/3.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Basic compilation phasesHLL
parsing, symbol table
machine-independentoptimizations
machine-dependentoptimizations
assembly
![Page 4: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/4.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Statement translation and optimizationSource code is translated into
intermediate form such as CDFG.CDFG is transformed/optimized.CDFG is translated into instructions
with optimization decisions.Instructions are further optimized.
![Page 5: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/5.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Arithmetic expressionsa*b + 5*(c-d)
expression
DFG
* -
*
+
a b c d
5
![Page 6: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/6.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
2
3
4
1
Arithmetic expressions, cont’d.
ADR r4,aMOV r1,[r4]ADR r4,bMOV r2,[r4]MUL r3,r1,r2
DFG
* -
*
+
a b c d
5ADR r4,cMOV r1,[r4]ADR r4,dMOV r5,[r4]SUB r6,r4,r5MUL r7,r6,#5ADD r8,r7,r3
code
![Page 7: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/7.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Control code generationif (a+b > 0)
x = 5;else
x = 7;a+b>0 x=5
x=7
![Page 8: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/8.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
3
21
Control code generation, cont’d.
ADR r5,aLDR r1,[r5]ADR r5,bLDR r2,bADD r3,r1,r2BLE label3
a+b>0 x=5
x=7LDR r3,#5ADR r5,xSTR r3,[r5]B stmtent label3 LDR r3,#7ADR r5,xSTR r3,[r5]
stmtent ...
![Page 9: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/9.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Procedure linkageNeed code to:
call and return; pass parameters and results.
Parameters and returns are passed on stack. Procedures with few parameters may
use registers.
![Page 10: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/10.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Procedure stacks
proc1
growthproc1(int a) {
proc2(5);}
proc2
SPstack pointer
FPframe pointer
5 accessed relative to SP
![Page 11: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/11.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
ARM procedure linkageAPCS (ARM Procedure Call Standard):
r0-r3 pass parameters into procedure. Extra parameters are put on stack frame.
r0 holds return value. r4-r7 hold register values. r11 is frame pointer, r13 is stack pointer. r10 holds limiting address on stack size
to check for stack overflows.
![Page 12: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/12.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Data structuresDifferent types of data structures
use different data layouts.Some offsets into data structure can
be computed at compile time, others must be computed at run time.
![Page 13: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/13.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
One-dimensional arraysC array name points to 0th element:
a[0]
a[1]
a[2]
a= *(a + 1)
![Page 14: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/14.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Two-dimensional arraysColumn-major layout:
a[0,0]
a[0,1]
a[1,0]
a[1,1] = a[i*M+j]
...
M
...
N
![Page 15: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/15.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
StructuresFields within structures are static offsets:
field1
field2
aptrstruct { int field1; char field2;} mystruct;
struct mystruct a, *aptr = &a;
4 bytes
*(aptr+4)
![Page 16: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/16.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Expression simplificationConstant folding:
8+1 = 9Algebraic:
a*b + a*c = a*(b+c)Strength reduction:
a*2 = a<<1
![Page 17: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/17.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Dead code eliminationDead code:#define DEBUG 0if (DEBUG) dbg(p1);Can be eliminated
by analysis of control flow, constant folding.
0
dbg(p1);
1
0
![Page 18: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/18.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Procedure inliningEliminates procedure linkage
overhead:
int foo(a,b,c) { return a + b - c;}z = foo(w,x,y);z = w + x - y;
![Page 19: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/19.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Loop transformationsGoals:
reduce loop overhead; increase opportunities for pipelining and
parallelism; improve memory system performance.
![Page 20: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/20.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Loop unrollingReduces loop overhead, enables some
other optimizations.for (i=0; i<4; i++)
a[i] = b[i] * c[i];
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];a[i*2+1] = b[i*2+1] * c[i*2+1];
}
![Page 21: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/21.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Loop fusion and distributionFusion combines two loops into 1:for (i=0; i<N; i++) a[i] = b[i] * 5;for (j=0; j<N; j++) w[j] = c[j] * d[j]; for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];}
Distribution breaks one loop into two.Changes optimizations within loop
body.
![Page 22: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/22.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Array/cache problemsPrefetching.Cache conflicts within a single array.Cache conflicts between arrays.
![Page 23: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/23.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
1-D array and cacheLayout of 1-D array in cache:
A[0] A[1] A[2] A[3]Line 0
A[4] A[5] A[6] A[7]Line 1
A[8] A[9] A[10] A[11]Line 2
A[12] A[13] A[14] A[15]Line 3
X += A[I]
![Page 24: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/24.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Two 1-D arrays and cacheArrays don’t immediately conflict:
A[0] A[1] A[2] A[3]Line 0
A[4] A[5] A[6] A[7]Line 1
Line 2
Line 3
B[0] B[1] B[2] B[3]
B[4] B[5] B[6] B[7]
B[8] B[9] B[10] B[11]
B[12] B[13] B[14] B[15]
A[8] A[9] A[10] A[11]
A[12] A[13] A[14] A[15]
X += A[I] * B[I]
![Page 25: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/25.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Conflicting 1-D arraysArrays immediately conflict in cache:
A[0] A[1] A[2] A[3]Line 0
A[4] A[5] A[6] A[7]Line 1
A[8] A[8] A[10] A[11]Line 2
A[12] A[13] A[14] A[15]Line 3
B[0] B[1] B[2] B[3]
X += A[I] * B[I]
![Page 26: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/26.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
2-D array and cacheCache behavior depends on
row/column access pattern:
A[0,0] A[0,1] A[0, 2] A[0, 3]Line 0
A[0, 4] A[0, 5] A[0, 6] A[0, 7]Line 1
A[1,0] A[1,1] A[1,2] A[1,3]Line 2
A[1,4] A[1,5] A[1,6] A[1,7]Line 3
![Page 27: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/27.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Array/cache solutionsMove origin of the array in memory.Pad the array.Change the loop to access array
elements in a different order.
![Page 28: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/28.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Loop tilingBreaks one loop into a nest of loops.Changes order of accesses within
array. Changes cache behavior.
![Page 29: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/29.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Loop tiling examplefor (i=0; i<N; i++)
for (j=0; j<N; j++)c[i] = a[i,j]*b[i];
for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,n); ii++) for (jj=0; jj<min(j+2,N); jj++)
c[ii] = a[ii,jj]*b[ii];
![Page 30: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/30.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Array paddingAdd array elements to change
mapping into cache:
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
before
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
after
a[0,3]
a[1,3]
![Page 31: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/31.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Register allocationGoals:
choose register to hold each variable; determine lifespan of varible in the
register.Basic case: within basic block.
![Page 32: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/32.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Register lifetime graphw = a + b;x = c + w;y = c + d;
time
abcdwxy
1 2 3
t=1
t=2
t=3
![Page 33: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/33.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Instruction schedulingNon-pipelined machines do not need
instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast.
In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands.
![Page 34: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/34.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Reservation tableA reservation table
relates instructions/time to CPU resources.
Time/instr A Binstr1 Xinstr2 X Xinstr3 Xinstr4 X
![Page 35: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/35.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Software pipeliningSchedules instructions across loop
iterations.Reduces instruction latency in
iteration i by inserting instructions from iteration i+1.
![Page 36: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/36.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Software pipelining in SHARCExample:for (i=0; i<N; i++)
sum += a[i]*b[i];Combine three iterations:
Fetch array elements a, b for iteration i. Multiply a,b for iteration i-1. Compute dot product for iteration i-2.
![Page 37: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/37.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Software pipelining in SHARC, cont’d/* first iteration performed outside loop */ai=a[0]; bi=b[0]; p=ai*bi;/* initiate loads used in second iteration; remaining loads
will be performed inside the loop */for (i=2; i<N-2; i++) {
ai=a[i]; bi=b[i]; /* fetch for next cycle’s multiply */p = ai*bi; /* multiply for next iteration’s sum */sum += p; /* make sum using p from last iteration */
}sum += p; p=ai*bi; sum +=p;
![Page 38: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/38.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Software pipelining timing
ai=a[i]; bi=b[i];
p = ai*bi;
sum += p;
ai=a[i]; bi=b[i];
p = ai*bi;
sum += p;
ai=a[i]; bi=b[i];
p = ai*bi;
sum += p;
time pipe
iteration i-2 iteration i-1 iteration i
![Page 39: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/39.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Instruction selectionMay be several ways to implement an
operation or sequence of operations.Represent operations as graphs,
match possible instruction sequences onto graph.
*
+
expression templates
* +*
+
MUL ADD
MADD
![Page 40: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/40.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Using your compilerUnderstand various optimization
levels (-O1, -O2, etc.)Look at mixed compiler/assembler
output. Modifying compiler output requires
care: correctness; loss of hand-tweaked code.
![Page 41: Program design and analysis](https://reader033.fdocuments.us/reader033/viewer/2022042901/56814f46550346895dbcea39/html5/thumbnails/41.jpg)
© 2000 Morgan Kaufman
Overheads for Computers as Components
Interpreters and JIT compilersInterpreter: translates and executes
program statements on-the-fly.JIT compiler: compiles small sections
of code into instructions during program execution. Eliminates some translation overhead. Often requires more memory.