Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman...
-
Upload
anne-hodge -
Category
Documents
-
view
219 -
download
3
Transcript of Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman...
![Page 1: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/1.jpg)
Investigating Adaptive Compilation using the MIPSpro Compiler
Keith D. Cooper Todd Waterman
Department of Computer Science
Rice University
Houston, TX USA
![Page 2: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/2.jpg)
2
Motivation
• Despite astonishing increases in processor performance certain applications still require a heroic compiler effort Scientific applications: weather, earthquake, and nuclear
physics simulations
• High quality compilation is difficult The solutions to many problems are NP-complete Many decisions that impact performance must be made
The correct choice can depend on the target machine, source program, and input data
Exhaustively determining the correct choices is impractical
• Typical compilers use a single preset sequence of decisions
• How do we determine the correct sequence for each context?
![Page 3: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/3.jpg)
3
Adaptive Compilation
• An adaptive compiler experimentally explores the decision space
Uses a process of feedback-driven iterative refinement Program is compiled repeatedly with a different sequence of
optimization decisions Performance is evaluated using either execution or estimation Performance results are used to determine future sequences
Sequence of compiler decisions is customized to always provide a high level of performance
Compiler easily accounts for different input programs, target machines and input data
• Can current compilers be used for adaptive compilation?
![Page 4: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/4.jpg)
4
Experimental Setup
• Searched for certain properties in a compiler Produces high quality executables Performs high-level optimizations Command-line flags that control optimization
• Selected the MIPSpro compiler Initial experiments showed that changing blocking sizes
could improve running times
• Loop Blocking A memory hierarchy transformation that reorders array
accesses to improve spatial and temporal locality Major impact on array based codes
Includes DGEMM -- a general matrix multiply routine Allows comparison with ATLAS
![Page 5: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/5.jpg)
5
ATLAS
• Automatically tuned linear algebra software
• Goal is to achieve hand-coded performance for linear algebra kernels without a programmer modifying the code for each processor Kernel is modified and parameterized once by a
programmer When ATLAS is installed on a machine experiments are run
to determine the proper parameters for the kernel
• Saves human time at the expense of additional machine time
• Adaptive compilation aims to take this tradeoff one step further
![Page 6: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/6.jpg)
6
Adjusting Blocking Size
• Compare three versions of DGEMM Compiled with MIPSpro and varying specified block sizes Built by ATLAS Compiled with MIPSpro using built-in blocking heuristic
• Test machine: SGI MIPS R10000 195 MHz processor 256 MB memory 32 KB L1 data cache 1 MB unified L2 cache
![Page 7: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/7.jpg)
7
DGEMM running time for 500 x 500 arrays
![Page 8: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/8.jpg)
8
DGEMM running time for 1000 x 1000 arrays
![Page 9: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/9.jpg)
9
DGEMM running time for 1500 x 1500 arrays
![Page 10: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/10.jpg)
10
DGEMM running times for square matrices
![Page 11: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/11.jpg)
11
Relative DGEMM running times
![Page 12: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/12.jpg)
12
L1 Cache Misses for DGEMM
![Page 13: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/13.jpg)
13
L2 Cache Misses for DGEMM
![Page 14: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/14.jpg)
14
Adjusting Blocking Size
• The performance of MIPSpro using the built-in blocking heuristic drops off substantially when the array size reaches 900 x 900 Far more L1 cache misses Fewer L2 cache misses Heuristic uses a rectangular blocking size that increases as
the total array size increases
• MIPSpro with adaptively chosen blocking sizes delivers performance close to ATLAS level Remains close as array size increases Fewer L1 and L2 cache misses than ATLAS
• Similar results were observed for non-square matrices as well
![Page 15: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/15.jpg)
15
Determining Blocking Size
• Exhaustively searching for blocking sizes is expensive
• Intelligent exploration of blocking sizes can find very good blocking sizes while only examining a few block sizes
• Our approach: Determine the result for block size 50 Sample higher and lower block sizes in increments of ten
until results are more than 10% from optimal Examine all of the block sizes within five of the best found
in the previous step
• This approach always found the best block size in our experiments
• Quicker approaches could be found at the expense of finding less ideal block sizes
![Page 16: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/16.jpg)
16
Search time required
![Page 17: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/17.jpg)
17
Making Adaptive Compilation General
• Making adaptive compilation general will require changing how compilers work
• Adaptive compilation is limited by the decisions the compiler exposes If the MIPSpro compiler only allowed blocking to be turned
on and off our experiments would not have been possible
• The interface between adaptive system and compiler needs to allow complex communication Which transformations are applied Granularity Optimization scope Detailed parameter settings
![Page 18: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/18.jpg)
18
Conclusions
• Adaptively selecting the appropriate blocking size for DGEMM provides performance close to ATLAS The standard compiler’s performance drops off for larger
array sizes Only a small portion of possible block sizes needs to be
examined
• Making adaptive compilation a successful technique for a wide variety of applications will require changes to the design of compilers
![Page 19: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/19.jpg)
19
Extra slides begin here.
![Page 20: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/20.jpg)
20
DGEMM running times for varying M
![Page 21: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/21.jpg)
21
DGEMM running times for varying N
![Page 22: Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f275503460f94c3f89c/html5/thumbnails/22.jpg)
22
DGEMM running times for varying K