Optimizing Compilers CISC 673 Spring 2011 Inlining
description
Transcript of Optimizing Compilers CISC 673 Spring 2011 Inlining
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing CompilersCISC 673
Spring 2011Inlining
John CavazosUniversity of Delaware
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Background Inlining is important
Removes call overhead Enables optimization opportunities
Can be detrimental Increased compilation time Increased register pressure Cache effects
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Interprocedural Optimization Some optimizations are disrupted
by calls Constant propagation might
stop at call site Possible solution: interprocedural
optimization Optimization that involves more
than one function Gets complicated (e.g., when
functions not in same file)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Replace a function call with body of
called function Assumed to be beneficial to a certain
point Enables optimizations
Constant folding, Common subexpression elimination, better global register allocation
Optimizations can outweigh call overhead reduction
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Advantages Eliminates call disruption
No register save/restore required
Call overhead removed Allows context-specific tailoring Eliminates call barrier for
analysis/optimizations
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Disadvantages Eliminates benefits
Resets state for register allocation Increase register pressure
Procedure calls (reuse) keep code size small
Compilation time increases Larger functions
Code bloat
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining for Object Oriented Plays a particular important role
in optimization of OO languages High ratio of calls (and
overhead) Many methods are short
(e.g., setter/getter) Issues mapping virtual calls to
concrete implementations Requires inserting a run-time
type test
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining example
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining example
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Transformation Easy Actual transformation is easy
Rewrite call site with callee’s body
Rewrite formal parameter names with actual parameter names
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Decision Hard Resource constraint decision Code size
must whole program and procedure
Excessive code growth leads to excessive compilation time (important for JITs!)
Profitability depends on specific context Can callee be tailored and
optimized Each decision affects profitability
and resources available later!
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Decision Hard Consider following call graph
Assign each edge a type {inline, no-inline} Choice at each edge affects other
decisions Each decision has a profit and a cost
(in terms of resources)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Inlining Decision Procedures Some decisions are obvious Inline small procedures
Code smaller than linkage Inline procedures called only once Still lots of experimental work to do!
Cavazos 2005, Waterman 2006 Cooper, Hall, & Torczon or Davidson
& Holler
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Adaptive Decision Making How should we determine a good
decision heuristic? Cavazos proposed an adaptive
solution Train a heuristic
Specialized for a given hardware or benchmark
Prior Art Ad hoc (manually-constructed) heuristic
based on program properties Combine ad hoc heuristics into a single a
single test applied at each call site – applied in a fixed order
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Proposed Solution Use machine learning Features predict which methods
to inline Heuristic function controls
inlining Tune heuristic to :
Different compilation scenario Different architecture
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Applying Genetic Algorithms Cross-validation
Evolve heuristic over set of benchmarks
Test on a different set of benchmarks
Average high performance Self-validation
Evolve heuristic for one benchmark
Best performance for benchmark
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
High Performance Compiler IBM Jikes RVM• Java JIT Compiler• Tuned for Server Applications
Commercial quality Used by Several Hundred Researchers Over 100 Publications Several papers on Inlining
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Default Inlining Heuristic Small methods
Always inline Medium-sized methods
Use static heuristic (IBM) Large methods
Never inline
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Default Inlining Heuristicif (calleeSize >
CALLEE_MAX_SIZE)return NO
if (calleeSize < ALWAYS_INLINE_SIZE)return YES
if (inlineDepth > MAX_INLINE_DEPTH)return NO
if (callerSize > CALLER_MAX_SIZE)return NO
return YES
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Genetic Algorithms Tune parameters of IBM
heuristic Individual
Vector of Integers Fitness is benchmark running
time Tuning time
Few hours per benchmark Few days per suite
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Parameters Tuned by GA
Metric to Evaluate an Individual
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Genetic Algorithms Primer
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Scenarios and Metrics Scenarios
Adaptive Optimizing
Metrics Running Time Total Time
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Experimental Setup High-Performance Java compiler
Jikes RVM 2.3.3 Intel Pentium 4, 2.6 GHz PowerPC G4, 500 MHz (not shown) Training Set
SPEC JVM benchmarks Test Set
DaCapo benchmarks + SPEC JBB
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Adaptive Scenario(SPEC JVM98)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Adaptive Scenario(DaCapo+JBB)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing Scenario(SPEC JVM98)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing Scenario(DaCapo+JBB)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Self-Tuned Results
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Conclusions
Out-performs well-tuned heuristic 37% total time reduction on Intel 7% total time reduction on PowerPC
Automatically tunes compiler heuristic Compilation Scenario Different Architectures