ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization...
Transcript of ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization...
![Page 1: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/1.jpg)
1
ASSIST: A Feedback-Directed Optimization
source to source transformation tool for
HPC applications
William Jalby, Y. Lebras, Andres S. Charif-Rubial
UVSQ/ECR
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 2: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/2.jpg)
2
Outline
1. Introduction: motivation, goals
2. ASSIST
• Requirements
• Implementation & Design
• Available Transformations
3. Examples and Experimental Results
• ASSIST PGO Versus Intel PGO
• Other Transformations Apply to Real Applications
4. Conclusion
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 3: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/3.jpg)
3
I - INTRODUCTION
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 4: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/4.jpg)
4
Motivations
Combine source level knowledge and static/dynamic performance analyses is very attractive to perform accurate performance diagnostic
Source code V.S. actual executed code
Better understand memory related issues (dependencies, array accesses)
The Feedback Directed Optimization (FDO)/ Profile Guided Optimizations (PGO), are well known optimization approach used by compiler with its but…
Lack of information of what is really done
Limited in performance information used (loop trip count, branch behavior)
Limited in transformation power
Cannot be configured by the user
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 5: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/5.jpg)
5
Goals
Basic idea: MAQAO is pretty good at performance problems diagnostic, we need to go further and fix performance issues.
ASSIST an “Auto-tuning” framework: for us, auto tuning essentially means fully automated
Exploiting MAQAO’s metrics & knowledge
Detecting & exploiting information from source code
Transformation driven framework: ideally dtect whether a transformation is beneficial or not
Full control on transformations
Help developers to maintain their code
Ensure portability
Ease code refactoring (e.g. change data types across a program)
Provide users with a mean to provide extra information that cannot be encoded in the program (i.e. programming language limitations)
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 6: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/6.jpg)
6
II - ASSIST
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 7: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/7.jpg)
7
Implementation & Design
11th Parallel Tools Workshop – 11/09/2017
Optimization Process
ASSIST
![Page 8: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/8.jpg)
8
Requirements
Compiler infrastructure requirement
Allowing to manipulate the Abstract Syntax Tree (AST)
Performing source-to-source
Handling Fortran, C and C++ languages
The Rose Compiler
Meeting all these criteria
Robust to these languages
No equivalent when we started
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 9: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/9.jpg)
9
Implementation & Design
ASSIST: Automatic Source-to-Source assISTant
Support the following input languages
• Fortran 77, 90, 95, 2003 / C / C++03
Readable output
• Special effort on indentation and spaces
Easy to use with a simple user interface
• Annotations
• Configuration file
Target audience
• User with the ability to modify/annotate the code
• Application developers
Integrated as a MAQAO Module
• Take advantage of the interconnection between the core (binary manipulation and analysis layers) and the modules
• Use the modules’ output to perform transformation(s)
• Extend MAQAO to source code manipulation
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 10: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/10.jpg)
10
Available Transformations
Three types of transformations
User Interface
Annotations – Source code annotation
Configuration file – Describing line per line which transformation performed on which statement
11th Parallel Tools Workshop – 11/09/2017
ASSIST
AST Modifier
• Unroll
• Full unroll
• Interchange
• Strip mining
• Tilling
• Loop/Function Specialization
Directive(s) insertion
• Loop count (involving dynamic analyses)
Mix of both
• Block Vectorization
![Page 11: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/11.jpg)
11
Transformations
Specialization
Transformation of type : AST Modifier
Specialization of integer parameters provides to the compiler optimizations opportunities
• Constant propagation
• Partial Dead Code Elimination
• Loop unrolling, tiling, block vectorization, etc
Single values or ranges can be defined
Two distinct cases
• Loop specialization
• Function specialization
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 12: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/12.jpg)
12
Transformations
Loop Specialization Example
• Set bounds
• Conservatives : keep a generic version
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 13: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/13.jpg)
13
Transformations
Function Specialization
• Partial Dead Code Elimination
• More information to perform another transformation
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 14: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/14.jpg)
14
Transformations
Loop count
Loop oriented transformation of type : Directives insertion
Loop count knowledge enables the compiler to perform optimizations
• The compiler cannot always guess the loop trip count at compile time => it may refuse to vectorize
• Most of time simplifies
The control flow (less loop versions)
The choice of the vectorization/unrolling
Requires the dynamic feedback
Performed by VPROF (MAQAO module)
Returns the number of iterations of loops (min, max & average)
Limitation
• Loops’ bounds are dataset dependent
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 15: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/15.jpg)
15
Example
Dynamic feedback example
Original loop
Extract of VPROF’s output
Exploiting the feedback
Return a file with corresponding directives
11th Parallel Tools Workshop – 11/09/2017
ASSIST
maqao s2s \
-vprof_xp=/home/ylebras/vprof_dir/vprof.csv \
-bin=/home/ylebras/NBP3.3.1/NPB3.3.1-SER/bin/is.B.x
for (i=0; i < NUM_KEYS; i++)
key_buff_ptr[key_buff_ptr2[i]]++;
#pragma loop_count max=134217728, 134217728, avg=134217728
for (i=0; i < NUM_KEYS; i++) {
key_buff_ptr[key_buff_ptr2[i]]++;
}
![Page 16: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/16.jpg)
16
Transformations
Block Vectorization
Loop oriented transformation of type : Directives insertion & AST modifier
Performing a loop decomposition increase the vectorization ratio
Increasing the vectorization ratio by :
• Forcing the vectorization (“SIMD” Directive)
• Avoiding dynamic or static loop peeling transformation (use of UNALIGNED PRAGMA)
If the loop bound is not known at compile time
• The loop will be specialized by checking the modulo of a given input
11th Parallel Tools Workshop – 11/09/2017
ASSIST
Loop not
vectorized
by the
compiler
Target: AVX2
Body: DP
![Page 17: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/17.jpg)
17
Transformations
Block Vectorization
Loop oriented transformation of type : Directives insertion & AST modifier
Performing a loop decomposition increase the vectorization ratio
Increasing the vectorization ratio by :
• Forcing the vectorization (“SIMD” Directive)
• Avoiding dynamic or static loop peeling transformation
If the loop bound is not known at compile time
• The loop will be specialized by checking the modulo of a given input
11th Parallel Tools Workshop – 11/09/2017
Loop
decomposition
Residual
ASSIST
Loop not
vectorized
by the
compiler
![Page 18: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/18.jpg)
18
Example
Example of the block vectorization performed in AVBP (target architecture : Skylake)
Original loop
Extract of CQA’s output
11th Parallel Tools Workshop – 11/09/2017
ASSIST
In this case, “nproduct” is often called with the value “3”
Exploiting the CQA feedback
![Page 19: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/19.jpg)
19
Example
Example of the block vectorization performed in AVBP (target architecture : Skylake using AVX2)
11th Parallel Tools Workshop – 11/09/2017
ASSIST
Step 1 –
Specialization of
the loop
Step 2 –
Apply block
vectorization
Keep a generic
version of the
code
![Page 20: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/20.jpg)
20
Results
CQA report before and after block vectorization
11th Parallel Tools Workshop – 11/09/2017
Before
The loop is partially
vectorized
(33% of SSE/AVX
instructions are used
in vector mode) : Only
50% of vector length is
used.
33% of SEE/AVX loads
are used in vector
mode.
33% of SSE/AVX stores
are used in vector mode
After
Loop is vectorized
(all SSE/AVX
instructions are
used in vector
mode) but on 75%
vector length.
ASSIST
![Page 21: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/21.jpg)
21
Transformations
Configuration file sample
• File: Source file path
• Arch: Architectures to support.
• Target a loop by its line number or by a label attached on the loop
A way to annotate an application without add directives in the source code
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 22: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/22.jpg)
22
III – Experimental Results
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 23: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/23.jpg)
23
Results
Test cases
NPB-3.3.1-SER (Fortran77/C) https://www.nas.nasa.gov/publications/npb.html
• NAS Parallel Benchmarks
Applications
AVBP (Fortran95) http://www.cerfacs.fr/avbp7x/
• A parallel CFD code that solves the three-dimensional compressible Navier-Stokes on unstructured and hybrid grids
Yales2 (Fortran2003) https://www.coria-cfd.fr/index.php/YALES2
• YALES2 aims at the solving of two-phase combustion from primary atomization to pollutant prediction on massive complex meshes
Warp3D (Fortran77) http://www.warp3d.net/
• A research code for the solution of large-scale, 3-D solid models subjected to static and dynamic loads
ABINIT (Fortran90) https://www.abinit.org
• ABINIT is a software suite to calculate the optical, mechanical, vibrational, and other observable properties of materials
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 24: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/24.jpg)
24
Results
Experimental setup
Compiled with icc17.0.4
Intel Skylake (Intel® Xeon® Platinum 8170 CPU@2,10GHz)
Multiple (around 30) executions to be statiscally meaning full and avoid outliers
PGO performance comparison
Original version
ICC’s PGO
ASSIST’s PGO like (loop count transformation)
Results of other transformations
Block Vectorization
Specialization
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 25: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/25.jpg)
25
Results on NAS
Speedups with the ICC’s PGO versus loop count transformation compared to the original version
11th Parallel Tools Workshop – 11/09/2017
Number of loops processed with loop
count transformation
BT.B 34
CG.B 11
DC.B 5
EP.B 2
FT.B 6
IS.B 14
LU.B 49
MG.B 18
SP.B 79
UA.B 80
ASSIST
Not
significant
results
Many loop bounds
have been hard coded
![Page 26: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/26.jpg)
26
Results on AVBP, Yales2 & Warp3D
Speedups with the ICC’s PGO versus loop count transformation compared to the original version
11th Parallel Tools Workshop – 11/09/2017
number of loops processed
with loop count transformation
1D_COFFE 122
3D_Cylinder 162
SIMPLE 158
NASA 149
test_68 57
Original
version
![Page 27: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/27.jpg)
27
Results on AVBP(model = SIMPLE)
Speedup by function before and after applying function/loop specialization an block vectorization
11th Parallel Tools Workshop – 11/09/2017
Original
version
![Page 28: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/28.jpg)
28
Results on AVBP(model = SIMPLE)
Execution time by function before and after applying function/loop specialization an block vectorization
11th Parallel Tools Workshop – 11/09/2017
![Page 29: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/29.jpg)
29
Results on ABINIT(Ti-256)
Speedup with function specialization + tiling versus only specialization versus ICC’s PGO compared to the original version
11th Parallel Tools Workshop – 11/09/2017
ASSIST
Time (sec) Speedup
Original version
1,14 1,00
icc's PGO 1,14 1,00
ASSIST Spe
1,1 1,04
ASSIST Spe+Tilling
0,65 1,75
![Page 30: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/30.jpg)
30
IV - Conclusion
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 31: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/31.jpg)
31
Conclusion
A framework performing selective source-to-source transformations/optimizations guided by static/dynamic performance analysis.
An open source FDO tool
• Harnessing static and dynamic analyses from MAQAO
• Defining transformations on a per architecture basis either automatically or by the user
• Transformations done directly or by pragmas
Encouraging results
• Using the loop count transformation alone is already competitive with Intel’s PGO
• Block vectorization only needs a static analysis of the binary and provides significant speedup when the compiler failed to vectorize efficiently
• Automatic specialization allows to gain in maintainability and performance
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 32: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/32.jpg)
32
Future work
Enhance our FDO tool
• Keep working on function/loop specialization, from annotation and automatic using feedback from MAQAO tools
• Use more data from dynamic feedback (hardware counters, static analyses)
• Enable the tool to launch MAQAO modules (autotuning mode) based on the detected opportunities
Unified view of source and binary level analyses
• Help application developers understand the gap between how the code should run and how it actually performs
Continue to work with our application developer partners on code maintainability features
Keep on adding other transformations based on MAQAO’s research work to detect more optimization opportunities
• Use multiple dataset as input
• Detect values for specialization
• …
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 33: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/33.jpg)
33
Thanks
Any question ?
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 34: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/34.jpg)
34
Requirements
Find a compiler infrastructure allowing to perform source-to-source transformations handling Fortran, C and C++ languages
11th Parallel Tools Workshop – 11/09/2017
License C C++ Fortran Source-to-source Documentation Weakness
GNU OSI ✓ ✓ ✓ ~ ~ GPL Licen
Misses information in AST
Cetus GPL ✓ x x ✓ ✓ Handle only C
Par4All MIT ✓ x ✓ ✓ Only for parallelism
LLVM BSD ✓ ✓ ~ ~ ~ No fortran when we stated Now first version of Flang
Rose BSD ✓ ✓ ✓ ✓ ✓ EDG license for C/C++
Orio BSD ~ x x ~ x Only subset of C
to other languages
✓ Requirement OK
~ Theoretically
possible / Weak
x Requirement KO
ASSIST
![Page 35: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/35.jpg)
35
Transformations
Unroll
• Unroll the body of a loop by a N factor
• Allow to reduce instructions that control the loop
• Reduce branch penalties
• Help the compiler to vectorize
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 36: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/36.jpg)
36
Transformations
Full Unroll
• The loop is replaced by the body fully unrolled
• Same advantage as previously
• Remove the loop overhead
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 37: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/37.jpg)
37
Transformations
Interchange
• Better access to array elements
• Moving from Column-major to Raw-major or inverse.
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 38: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/38.jpg)
38
Transformations
Strip Mine
• Reorganizes a loop to iterate over blocks of data sized to fit in the cache
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 39: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/39.jpg)
39
Transformations
Tilling / Blocking
• Strip mining applied to two more dimensions
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 40: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/40.jpg)
40
Transformations
Generic Block Vectorization
• If the loop bound is not know
The loop will be specialized by checking the modulo of a given input
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 41: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/41.jpg)
41
Transformations
Generic Block Vectorization
• If the loop bound is not know
The loop will be specialized by checking the modulo of a given input
11th Parallel Tools Workshop – 11/09/2017
ASSIST
![Page 42: ASSIST: A Feedback-Directed Optimization source …...ASSIST: A Feedback-Directed Optimization source to source transformation tool for HPC applications William Jalby, Y. Lebras, Andres](https://reader033.fdocuments.us/reader033/viewer/2022042622/5fa4f13e6d142019cd2bd16c/html5/thumbnails/42.jpg)
42
Results
AVBP(SIMPLE) : Block vectorization Versus the specialization of function or loop Execution Time and Speedup (compare to the original version)
11th Parallel Tools Workshop – 11/09/2017
ASSIST
time(s) Speedup time(s) Speedup time(s) Speedup time(s)
Original version
Function specialization
Loop specialization
Block vectorization (on best case)
grad_4obj 3,862 1,62 2,38 1,55 2,49 2,04 1,89
scatter_o_add 3,78 0,85 4,44 1,21 3,13 0,97 3,88
scatter_add 4,164 1 4,16 0,99 4,22 1,38 3,01
scatter_o_sub 2,63 0,98 2,69 1 2,62 1,21 2,17
gather_o_cpy 16,324 0,81 20,12 1,04 15,68 1,28 12,76
balance_cor 0,492 1 0,49 1 0,49 1,24 0,39
central 0,86 1,35 0,64 1,59 0,54 1,85 0,46
central_nv 0,945 1,6 0,59 1,21 0,78 2,65 0,36
mass_product 2,238 1,02 2,84 1,27 2,69 2,58 1,49
laxwe 2,278 0,79 2,23 0,83 1,8 1,51 0,88