A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact...
Transcript of A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact...
![Page 1: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/1.jpg)
Programmers Talk to Compilers
A Language to Describe the Optimization Search Space
Albert Cohen ALCHEMY Group, INRIA Futurs, France
![Page 2: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/2.jpg)
2IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Scalable On-Chip Parallel ComputingC
onte
xtC
onte
xtC
onte
xtC
onte
xt
Massive parallelism on a chip
Physically distributed, layered and heterogeneous resources
Structure and nature of the hardware exposed to the software...... need to be considered for correctness and/or performance
General-purpose applications need choice for scalable performance
Towards adaptive programs (multi-version, continuous optimization)
SW/HW negociation, from load balancing to algorithm selection
![Page 3: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/3.jpg)
2IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Scalable On-Chip Parallel ComputingC
onte
xtC
onte
xtC
onte
xtC
onte
xt
Massive parallelism on a chip
Physically distributed, layered and heterogeneous resources
Structure and nature of the hardware exposed to the software...... need to be considered for correctness and/or performance
General-purpose applications need choice for scalable performance
Towards adaptive programs (multi-version, continuous optimization)
SW/HW negociation, from load balancing to algorithm selection
Impa
cton
Impa
cton
Impa
cton
Impa
cton
Programming models
Optimizing compilers
Component models
Run-time systems
![Page 4: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/4.jpg)
THE X -LANGUAGE
A TOOL FOR EXPERT PROGRAMMERS TO DRIVE
PROGRAM OPTIMIZATION WHILE MAINTAINING
HIGH PRODUCTIVITY AND PORTABILITY
![Page 5: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/5.jpg)
Goals of the X-Language
1. Compact representation of multiple program versions→ Derive multiple or multi-version programs from a single source
→ Generate code at run-time if necessary
2. Explicit multiple optimization strategies→ Rely on predefined transformation primitives
→ Declare high-level optimization goals rather than explicit transformations
3. Implement and apply custom optimizations→ Custom transformations can be implemented by expert programmers
→ Derive decision trees automatically from abstract descriptions
4. Bring together individual transformations and actual performance measurements→ Implement local/layered learning/search strategies
→ Couple with hardware counters, sampling mechanisms and phase detection
![Page 6: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/6.jpg)
5IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Key Design Ideas
Build on top of advanced multistage programming technology
Manipuate code expressionscode c = ‘{ bar(42); ‘}
Splice code into code‘{ foo(‘%(c)); ‘} // foo(bar(42));
Generate and run coderun(c);
Cross-stage persistenceint x = 42; code c = ‘{ foo(bar(x)); ‘}
![Page 7: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/7.jpg)
5IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Key Design Ideas
Build on top of advanced multistage programming technology
Manipuate code expressionscode c = ‘{ bar(42); ‘}
Splice code into code‘{ foo(‘%(c)); ‘} // foo(bar(42));
Generate and run coderun(c);
Cross-stage persistenceint x = 42; code c = ‘{ foo(bar(x)); ‘}
Provide some form of reflection that do not alter observable semantics
(Assuming transformation legality)
Use code annotations: #pragma xlang#pragma xlang transformation [ scope_name ] node_name_regexp
[ parameters ] [ additional_names ]
Example: #pragma xlang unroll loop1 4
![Page 8: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/8.jpg)
6IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Features of the Language
Transformations primitives
Loop transformations→ unrolling, strip-mining, distribution, fusion, coalescing, interchange, skewing, reindexing,
hoisting, shifting, scalar promotion, privatization
Interprocedural transformations→ inlining, cloning, partial evaluation, slicing
![Page 9: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/9.jpg)
6IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Features of the Language
Transformations primitives
Loop transformations→ unrolling, strip-mining, distribution, fusion, coalescing, interchange, skewing, reindexing,
hoisting, shifting, scalar promotion, privatization
Interprocedural transformations→ inlining, cloning, partial evaluation, slicing
Compound transformations
Composition of code generators (multi-stage evaluation with splicing)
Sequence of annotation pragmas
Procedural abstraction (build custom transformations from primitives)
Control the application and parameters of each transformation
![Page 10: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/10.jpg)
6IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Features of the Language
Transformations primitives
Loop transformations→ unrolling, strip-mining, distribution, fusion, coalescing, interchange, skewing, reindexing,
hoisting, shifting, scalar promotion, privatization
Interprocedural transformations→ inlining, cloning, partial evaluation, slicing
Compound transformations
Composition of code generators (multi-stage evaluation with splicing)
Sequence of annotation pragmas
Procedural abstraction (build custom transformations from primitives)
Control the application and parameters of each transformation
Static analyses (crude scalar data-flow information right now)
![Page 11: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/11.jpg)
6IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Features of the Language
Transformations primitives
Loop transformations→ unrolling, strip-mining, distribution, fusion, coalescing, interchange, skewing, reindexing,
hoisting, shifting, scalar promotion, privatization
Interprocedural transformations→ inlining, cloning, partial evaluation, slicing
Compound transformations
Composition of code generators (multi-stage evaluation with splicing)
Sequence of annotation pragmas
Procedural abstraction (build custom transformations from primitives)
Control the application and parameters of each transformation
Static analyses (crude scalar data-flow information right now)
Dynamic analyses (only time measurement right now)
![Page 12: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/12.jpg)
7IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Example: Transformation Sequences
Each transformation regererates annotations for the next transformation
#pragma xlang name loop1for (i=m; i<n; i++)a[i] = b[i];
#pragma xlang stripmine loop1 4 loop1_2 loop1_3#pragma xlang unroll loop1_2
−→
#pragma xlang name loop1for (ii=m; ii+4<n; ii+=4) {
#pragma xlang name loop1_2for (i=ii; i<ii+4; i++)a[i] = b[i];
#pragma xlang name loop1_3}for (i=ii; i<n i++)
a[i] = b[i];
#pragma xlang unroll loop1_2
![Page 13: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/13.jpg)
8IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Example: Transformation Sequences
Each transformation regererates annotations for the next transformation
#pragma xlang name loop1for (ii=m; ii+4<n; ii+=4) {#pragma xlang name loop1_2for (i=ii; i<ii+4; i++)
a[i] = b[i];#pragma xlang name loop1_3
}for (i=ii; i<n i++)a[i] = b[i];
#pragma xlang unroll loop1_2−→
#pragma xlang name loop1for (ii=m; ii+4<n; ii+=4) {
#pragma xlang name loop1_2i = ii;a[i] = b[i];i = ii+1;a[i] = b[i];i = ii+2;a[i] = b[i];i = ii+3;a[i] = b[i];
}#pragma xlang name loop1_3for (i=ii; i<n i++)
a[i] = b[i];
![Page 14: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/14.jpg)
9IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Example: Evaluating Multiple Versions
for (u=1; u<8; u++) {code c = ‘{
#pragma xlang name loop1for (i=m; i<n; i++)a[i] = b[i];
#pragma xlang stripmine loop1 u loop1_2 loop1_3‘}run(c, &elapsed_time);// drive search/learning strategy from this evaluation
}
![Page 15: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/15.jpg)
10IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Implementation
Multistage evaluation
Currently: syntactic hacks with strings and printf
Plan: MetaOCaml with offshoring (implicitely heterogeneous multistage program generation)
Transformation primitives
Currently: syntactic hacks with patterns and rewrite rules
Plan: Stratego
Compound transformations
Currently: limited by naming scheme and programmer’s understanding
Plan: go for goal-driven inference of transformation strategies
![Page 16: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/16.jpg)
11IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Full Example: Matrix Product in ATLAS
#pragma xlang name iloopfor (i=0; i<NB; i++)#pragma xlang name jloopfor (j=0; j<NB; j++)
#pragma xlang name kloopfor (k=0; k<NB; k++) {c[i][j] = c[i][j] + a[i][k] * b[k][j];
}// Simplified transformation sequence for IA64
// (excluding search engine, pipelining, prefetch and page copying)
#pragma xlang transform stripmine iloop NU NUloop#pragma xlang transform stripmine jloop MU MUloop#pragma xlang transform interchange kloop MUloop#pragma xlang transform interchange jloop NUloop#pragma xlang transform interchange kloop NUloop#pragma xlang transform fullunroll NUloop#pragma xlang transform fullunroll MUloop#pragma xlang transform scalarize_in b in kloop#pragma xlang transform scalarize_in a in kloop#pragma xlang transform scalarize_in&out c in kloop#pragma xlang transform hoist kloop.loads before kloop#pragma xlang transform hoist kloop.stores after kloop
![Page 17: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/17.jpg)
12IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Full Example: Matrix Product in ATLAS
#pragma xlang name iloopfor (i=0; i<NB; i++) {#pragma xlang name jloopfor (j=0; j<NB; j+=4) {#pragma xlang name kloop.loads{ c_0_0 = c[i+0][j+0]; c_0_1 = c[i+0][j+1];
c_0_2 = c[i+0][j+2]; c_0_3 = c[i+0][j+3]; }#pragma xlang name kloopfor (k=0; k<NB; k++) {
{ a_0 = a[i+0][k]; a_1 = a[i+0][k];a_2 = a[i+0][k]; a_3 = a[i+0][k]; }
{ b_0 = b[k][j+0]; b_1 = b[k][j+1];b_2 = b[k][j+2]; b_3 = b[k][j+3]; }
{ c_0_0=c_0_0+a_0*b_0; c_0_1=c_0_1+a_1*b_1;c_0_2=c_0_2+a_2*b_2; c_0_3=c_0_3+a_3*b_3; }
// ...
}#pragma xlang name kloop.stores{ c[i+0][j+0] = c_0_0; c[i+0][j+1] = c_0_1;
c[i+0][j+2] = c_0_2; c[i+0][j+3] = c_0_3; }}}// Remainder code
![Page 18: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/18.jpg)
13IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Preliminary Results
![Page 19: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/19.jpg)
14IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Main Limitations
1. Hard to understand and keep track of transformations effects→ Build and manage long sequences of transformations
→ Convince the expert programmer that it saves him time
2. Define custom transformations, beyond combination of existing primitive ones→ General kind of program construction
→ Algorithm selection
![Page 20: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/20.jpg)
OVERCOMING THESE LIMITATIONS
COMPLEX TRANSFORMATION
SEQUENCES
![Page 21: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/21.jpg)
Managing Complex Transformation Sequences
SpecFP 2000 on Alpha 21264C (EV68)
Speed-ups w.r.t. HP F90 and C compiler with KAP preprocessor
Base SPEC: -arch ev6 -fast -O5
Peak SPEC Manual Optimizations
SWIM 1.00 1.61
GALGEL 1.04 1.39
WUPWISE 1.20 2.90
APPLU 1.47 2.18
APSI 1.07 1.23
FACEREC 1.04 1.42
AMMP 1.18 1.40
EQUAKE 2.65 3.22
MESA 1.12 1.17
MGRID 1.59 1.45
FMA3D 1.32 1.09
ART 1.22 1.07
![Page 22: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/22.jpg)
17IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Managing Complex Transformation Sequences
A 1 : - 3 2 sA 2 : - 3 0 s A 3 : - 1 5 s
S h i f t i n g
F u s i o n T i l i n gP a d d i n g
P e e l i n g
P e e l i n g
P e e l i n g
S h i f t i n g
S h i f t i n gA r r a y F w dS u b s t i t u t i o n
SWIM (base 204s)A 2 : + 2 4 s
A 1 : - 1 4 s A 5 : - 6 s
A 4 : - 5 s
A 3 : - 2 4 sF u s i o n
S c a l a rP r o m o t i o nI n t e r c h a n g e
S t r i p - M i n i n g
S t r i p - M i n i n gS h i f t i n gI n s t r u c t i o n
S p l i t i n g F i s s i o n S t r i p - M i n i n g
F i s s i o nF i s s i o nF u s i o n
F u s i o n
S h i f t i n g A r r a y C o p yP r o p a g a t i o n
S c a l a rP r o m o t i o nF u s i o n
F u s i o nS t r i p - M i n i n g
H o i s t i n g R e g i s t e rP r o m o t i o n
U n r o l la n d J a m
F u s i o n
F u s i o n
F u s i o n
L 1
L 1 - L 2 L 2
GALGEL (base 171s)
G : - 1 1 s
A : - 2 9 s
B 1 : - 4 sB 2 : - 1 8 s
F u l lU n r o l l i n g
S c a l a rP r o m o t i o n
F l o a t . p o i n tR e o r d e r i n g
I n s t r u c t i o nS p l i t t i n g F i s s i o n
P r i v a t i z a t i o nF u s i o n
F u l lU n r o l l i n g
P r i v a t i z a t i o n
APPLU (base 214s)
B 1 : - 3 1 s
B 2 : - 1 s B 3 : - 1 s
C 1 : - 1 1 s
A 1 : - 3 s A 2 : - 2 s A 3 : - 1 s
G : - 2 7 s P r i v a t i z a t i o n
P r i v a t i z a t i o n
F i s s i o n
I n t e r c h a n g e S o f t w a r eP i p e l i n i n gF i s s i o n
P r i v a t i z a t i o n
P r i v a t i z a t i o n
F i s s i o n
I n t e r c h a n g e
F i s s i o n F u s i o nP e e l i n g
S o f t w a r eP i p e l i n i n g
S o f t w a r eP i p e l i n i n g
D a t aL a y o u t
APSI (base 378s)
![Page 23: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/23.jpg)
18IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Polyhedral Representation for Complex Sequences
Program and transformations represented as matricesEach transformation amounts to minor coefficient changes w.r.t. the previous stateNo code duplication before the final code generation step
S o u r c ec o d e
U R U K C L o o G W L o o Gw 2 p
P i p L i b P o l y L i b
W R a Pt r a n s f o r m e d
W R a P s o u r c e c o d eW H I R L
W H I R L
L N OP r e O p t
W O P TR V I 2C G
B i n a r y
O R C
C o n v e r s i o n C o n v e r s i o nT r a n s f o r m a t i o n s
P o l y h e d r a M a n a g e m e n t L i b r a r y
P a r a m e t r i c i n t e g e r l i n e a r p r o g r a m m i n g s o l v e r
![Page 24: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/24.jpg)
19IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Polyhedral Representation for Complex Sequences
Example: optimizing SWIM2000 for IA32
polydeps_save(BLOOP)
# Move and shift calc3 backwardsshift(enclose(C3L1),{’1’,’0’,’0’})(...)shift(C3L17,{’1’})motion(enclose(C3L1),BLOOP)(...)motion(C3L17,BLOOP)
# Peel and shift to enable fusionpeel(enclose(C3L1,2),’3’)peel(enclose(C3L1_2,2),’N-3’)(...)shift(enclose(C1L1_2_1_2_1),{’0’,’1’,’1’})shift(enclose(C2L1_2_1_2_1),{’0’,’2’,’2’})
# Double fusion of the three nestsmotion(enclose(C2L1_2_1_2_1),TARGET_2_1_2_1)motion(enclose(C1L1_2_1_2_1),C2L1_2_1_2_1)motion(enclose(C3L1_2_1_2_1),C1L1_2_1_2_1)
# Register blocking and unrolling (factor 2)time_stripmine(enclose(C3L1_2_1_2_1,2),2,2)time_stripmine(enclose(C3L1_2_1_2_1,1),4,2)interchange(enclose(C3L1_2_1_2_1,2))fullunroll(enclose(C3L1_2_1_2_1_2_1_2_1,2))fullunroll(enclose(C3L1_2_1_2_1_2_1_2_1,1))
polydeps_check(BLOOP)
![Page 25: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/25.jpg)
20IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Managing Complex Transformation Sequences
Example: optimizing SWIM 2000
38% speed-up on AMD Athlon 64 3400+ at 2.2GHz(Three main procedures inlined by Open64)
Aut
omat
icTr
ansf
orm
atio
nA
utom
atic
Tran
sfor
mat
ion
Aut
omat
icTr
ansf
orm
atio
nA
utom
atic
Tran
sfor
mat
ion
Main SCoP
421 lines of code in the source program
112 statements and 215 array references in the polyhedral representation
Exact dependence analysis
Nesting depth: 3 original, 5 after tiling
441 matrices for all pairs of references at all depths
12s analysis time
Code generation
2267 lines of code (≈ 1500 after scalar optimization)
URGenT (improvement on CLooG, regeneration of Open64’s WHIRL): 4s
Back-end compilation: 22s
![Page 26: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/26.jpg)
OVERCOMING THESE LIMITATIONS
PROGAM CONSTRUCTION
![Page 27: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/27.jpg)
Program Construction
Example: sorting network for even-odd merge sort of size 16
a_2 = t[0] < t[8] ? t[0] : t[8];b_3 = t[0] < t[8] ? t[8] : t[0];a_4 = t[4] < t[12] ? t[4] : t[12];b_5 = t[4] < t[12] ? t[12] : t[4];a_6 = a_2 < a_4 ? a_2 : a_4;b_7 = a_2 < a_4 ? a_4 : a_2;a_8 = b_3 < b_5 ? b_3 : b_5;b_9 = b_3 < b_5 ? b_5 : b_3;a_10 = b_7 < a_8 ? b_7 : a_8;b_11 = b_7 < a_8 ? a_8 : b_7;// ...
t[11] = b_95 < a_112 ? b_95 : a_112;t[12] = b_95 < a_112 ? a_112 : b_95;t[13] = b_87 < b_113 ? b_87 : b_113;t[14] = b_87 < b_113 ? b_113 : b_87;
![Page 28: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/28.jpg)
23IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Program Construction
Define safe progam combinators/generators: MetaOCaml + offshoring to C
Offers run-time program generation... at a high overhead cost
But is far from natural for application experts
let compare a b cont =let rec compare_rec a b l cont = match (a, b) with
([], []) -> cont l| ([], t) | (t, []) -> cont (l @ t)
| (x :: c, y :: d) ->.<
let a = if (.~x < .~y) then .~x else .~y inlet b = if (.~x < .~y) then .~y else .~xin .~(compare_rec c d (l @ [.<a>.; .<b>.]) cont)
>.in compare_rec a b [] cont
let rec merge l cont = match l with[] -> cont []
| [x] -> cont [x]
| [x; y] -> compare [x] [y] cont
| _ ->merge (even l) (fun e -> merge (odd l) (fun o -> (compare (List.tl e) o (fun l -> cont ((List.hd e) :: l)))))
let sort n =let rec sort_rec l cont = match l with
[] -> cont []| [x] -> cont [x]
| _ ->sort_rec (even l) (fun e -> sort_rec (odd l) (fun o -> (merge (e @ o) cont))) in
.< fun t -> .~(sort_rec (make_list .<t>. n) (make_array .<t>.)) >.
![Page 29: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/29.jpg)
24IB
MW
orks
hop
onC
ompi
latio
nan
dA
rchi
tect
ure
Conclusion: Future Optimizing CompilersR
esea
rch
Dire
ctio
nsR
esea
rch
Dire
ctio
nsR
esea
rch
Dire
ctio
nsR
esea
rch
Dire
ctio
ns
Compilers must do tedious things in a predictable manner...... but should not try to be smart
→ Fully automatic framework for abstraction-penalty removal
→ Machine learning and rule-based system for architecture-aware optimizations
→ Let application experts tell what is important
Tightly coupled off-line and on-line optimization→ Aggressive off-line analysis and narrowing of the optimization search-space
→ Low-overhead just-in-time/run-time transformations and code generation
Complement intermediate representations with program generators→ Expose algebraic properties of the search space
→ Support global and complex transformation sequences
Pro
gres
ses
Pro
gres
ses
Pro
gres
ses
Pro
gres
ses SPIRAL
Tools for safe and efficient metaprogramming
Machine learning compilers
![Page 30: A Language to Describe the Optimization Search Space · Goals of the X-Language 1. Compact representation of multiple program versions → Derive multiple or multi-version programs](https://reader034.fdocuments.us/reader034/viewer/2022042613/5f9a78e0876a6b77057d0da2/html5/thumbnails/30.jpg)
THANK YOU