Post on 05-Jan-2016
Investigating Adaptive Compilation using the MIPSpro Compiler
Keith D. Cooper Todd Waterman
Department of Computer Science
Rice University
Houston, TX USA
2
Motivation
• Despite astonishing increases in processor performance certain applications still require a heroic compiler effort Scientific applications: weather, earthquake, and nuclear
physics simulations
• High quality compilation is difficult The solutions to many problems are NP-complete Many decisions that impact performance must be made
The correct choice can depend on the target machine, source program, and input data
Exhaustively determining the correct choices is impractical
• Typical compilers use a single preset sequence of decisions
• How do we determine the correct sequence for each context?
3
Adaptive Compilation
• An adaptive compiler experimentally explores the decision space
Uses a process of feedback-driven iterative refinement Program is compiled repeatedly with a different sequence of
optimization decisions Performance is evaluated using either execution or estimation Performance results are used to determine future sequences
Sequence of compiler decisions is customized to always provide a high level of performance
Compiler easily accounts for different input programs, target machines and input data
• Can current compilers be used for adaptive compilation?
4
Experimental Setup
• Searched for certain properties in a compiler Produces high quality executables Performs high-level optimizations Command-line flags that control optimization
• Selected the MIPSpro compiler Initial experiments showed that changing blocking sizes
could improve running times
• Loop Blocking A memory hierarchy transformation that reorders array
accesses to improve spatial and temporal locality Major impact on array based codes
Includes DGEMM -- a general matrix multiply routine Allows comparison with ATLAS
5
ATLAS
• Automatically tuned linear algebra software
• Goal is to achieve hand-coded performance for linear algebra kernels without a programmer modifying the code for each processor Kernel is modified and parameterized once by a
programmer When ATLAS is installed on a machine experiments are run
to determine the proper parameters for the kernel
• Saves human time at the expense of additional machine time
• Adaptive compilation aims to take this tradeoff one step further
6
Adjusting Blocking Size
• Compare three versions of DGEMM Compiled with MIPSpro and varying specified block sizes Built by ATLAS Compiled with MIPSpro using built-in blocking heuristic
• Test machine: SGI MIPS R10000 195 MHz processor 256 MB memory 32 KB L1 data cache 1 MB unified L2 cache
7
DGEMM running time for 500 x 500 arrays
8
DGEMM running time for 1000 x 1000 arrays
9
DGEMM running time for 1500 x 1500 arrays
10
DGEMM running times for square matrices
11
Relative DGEMM running times
12
L1 Cache Misses for DGEMM
13
L2 Cache Misses for DGEMM
14
Adjusting Blocking Size
• The performance of MIPSpro using the built-in blocking heuristic drops off substantially when the array size reaches 900 x 900 Far more L1 cache misses Fewer L2 cache misses Heuristic uses a rectangular blocking size that increases as
the total array size increases
• MIPSpro with adaptively chosen blocking sizes delivers performance close to ATLAS level Remains close as array size increases Fewer L1 and L2 cache misses than ATLAS
• Similar results were observed for non-square matrices as well
15
Determining Blocking Size
• Exhaustively searching for blocking sizes is expensive
• Intelligent exploration of blocking sizes can find very good blocking sizes while only examining a few block sizes
• Our approach: Determine the result for block size 50 Sample higher and lower block sizes in increments of ten
until results are more than 10% from optimal Examine all of the block sizes within five of the best found
in the previous step
• This approach always found the best block size in our experiments
• Quicker approaches could be found at the expense of finding less ideal block sizes
16
Search time required
17
Making Adaptive Compilation General
• Making adaptive compilation general will require changing how compilers work
• Adaptive compilation is limited by the decisions the compiler exposes If the MIPSpro compiler only allowed blocking to be turned
on and off our experiments would not have been possible
• The interface between adaptive system and compiler needs to allow complex communication Which transformations are applied Granularity Optimization scope Detailed parameter settings
18
Conclusions
• Adaptively selecting the appropriate blocking size for DGEMM provides performance close to ATLAS The standard compiler’s performance drops off for larger
array sizes Only a small portion of possible block sizes needs to be
examined
• Making adaptive compilation a successful technique for a wide variety of applications will require changes to the design of compilers
19
Extra slides begin here.
20
DGEMM running times for varying M
21
DGEMM running times for varying N
22
DGEMM running times for varying K