Order from Chaos — the big picture — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper...
-
Upload
aileen-golden -
Category
Documents
-
view
218 -
download
0
Transcript of Order from Chaos — the big picture — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper...
Order from Chaos
— the big picture —
1COMP 512, Rice University
Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.
Comp 512Spring 2011
COMP 512, Rice University
2
Optimization
The subject is confusing
• Whole notion of optimality
• Incredible number of transformations
• Odd, inconsistent terminology
Maybe this stuff is inherently hard
• Many intractable problems
• Many NP-complete problems
• Much overlap between problems and between solutions
• If optimization wasn’t confusing, why take COMP 512 ?
{ Value numberingRedundancy eliminationCommon subexpressions
Cooper McKinley, & Torczon cite 237 distinct papers in the survey!
COMP 512, Rice University
3
Optimization
A brief catalog (circa 1982 )
And this was before the literature exploded
COMP 512, Rice University
4
Optimization
The literature throws fuel on the fire
• Terminology is non-standard & non-intuitive
• Explanations are terse and incomplete
• Little comparative data that is believable
• No sense of perspective
• Papers give conflicting advice
An example – Is inline substitution profitable?
• Holler’s thesis: it almost always helps
• Hall’s thesis: it occasionally helps, but has lots of problems
• MacFarland’s thesis: it causes instruction cache misses
Reality lies somewhere in the middle
Waterman showed that program-specific heuristics win
{ RevivalPartially-dead codeForward propagation
} Not all those papers can be the best !
COMP 512, Rice University
5
Optimization
Improvement should be objective
• Easy to quantify
• Produce concrete improvements
• Taking measurements seems easy
Code either gets better or it gets worse
But, …
• Linear-time heuristics for hard problems
• Unforeseen consequences & poorly understood interactions
• “Obvious wins” have non-obvious downsides
• Multiple ways to achieve the same end
Experimental computer science takes a lot of work
COMP 512, Rice University
6
The Role of Comp 512
Bringing order out of chaos
• Provide a framework for thinking about optimization
• Differentiate analysis from transformation†
• Think about how things help, not what they do
Goal: a rational approach to the subject matter
• Objective criteria for evaluating ideas & papers
• Bring high school level science back into the game
†The Comp 512 Motto:Knowledge alone does not make code run faster. You have to change the code to make it run faster.
COMP 512, Rice University
7
Classic Taxonomy
Machine independent transformations
• Applicable across a broad range of machines
• Decrease ratio of overhead to real work
• Reduce running time or space
• Examples: dead code elimination
Machine dependent transformations
• Capitalize on specific machine properties
• Improve the mapping from IR to this machine
• Might use an exotic instruction (shift the reg. window for a loop)
• Example: instruction scheduling
COMP 512, Rice University
8
Classic Taxonomy
Distinction is not always clear
• Replacing multiply with shifts and adds
• Eliminating a redundant expression
The truth is somewhat muddled
• Machine independent means that we deliberately & knowingly ignore target-specific constraints
• Machine dependent means that we explicitly consider target-specific constraints
Redundancy elimination might fit in either category Versions that consider register pressure
COMP 512, Rice University
9
The Comp 512 Taxonomy
An effects-based classification (for speed)
• Five machine-independent ways to speed up code Eliminate a redundant computation Move code to a place where it executes less often Eliminate dead code Specialize a computation based on context Enable another transformation
• Three machine-dependent ways to speed up the code Manage or hide latency Take advantage of special hardware features Manage finite resources
For scalar optimization, this covers most of them
COMP 512, Rice University
10
Dead code
Dead code elim.
Partial d.c.e.
Constant propagation
Algebraic identities
The Comp 512 Taxonomy
Machine Independent
From §6 of Cooper, McKinley, & Torczon
Redundancy
Redundancy elim.
Partial red. elim.
Consolidation
Code motion
Loop-invariant c.m.
Consolidation
Global Scheduling [Click]
Constant propagation
Create opportunities
Reassociation
Replication
Specialization
Replication
Strength Reduction
Method caching
Heapstack allocation
Tail recursion elimination
COMP 512, Rice University
11
The Comp 512 Taxonomy
Machine Dependent
From §6 of Cooper, McKinley, & Torczon
Hide latency
Scheduling
Blocking references
Prefetching
Code layout
Data packing
Manage resources
Allocate (registers, tlb slots)
Schedule
Data packing
Coloring memory locations
Special features
Instruction selection
Peephole optimization
COMP 512, Rice University
12
What have we seen so far?
• Redundancy elimination LVN, SVN, DVN It is a category in taxonomy by itself
• Loop Unrolling Form of specialization
• Dead store elimination Form of dead code elimination
• Block Placement Form of latency hiding & resource management
• Inline substitution Form of specialization
COMP 512, Rice University
13
What about Fortran H? (Lecture 2)
• Eliminate a redundant computation Commoning
• Move code to a place where it executes less often Backward motion
• Eliminate dead code (must have done it, but don’t talk about it)
• Specialize a computation based on context Strength reduction
• Enable another transformation Reassociation
• Manage or hide latency
• Take advantage of special hardware features
• Manage finite resources Register allocation
}Didn’t really talk about these, if they did them
*
COMP 512, Rice University
14
Scope of Optimization (another axis)
Local
• Handles individual basic blocks Maximal length sequence of straight line code Each of the bi’s is a basic block
• Basic blocks are easy to analyze
• Can prove strongest results
• Code quality suffers at block boundaries
Local methods
• Value numbering, instruction scheduling
b4 b5
b6
b1
b3b2
*
COMP 512, Rice University
15
Superlocal
• Handles extended basic blocks
Sequence of blocks where each has a unique predecessor
Use results for bi to help with bj
• Analysis & transformation over larger region
• Fewer rough edges
• Can make it efficient by reusing results
Superlocal methods
• Value numbering, instruction scheduling
Scope of Optimization
b4 b5
b6
b1
b3b2
*
COMP 512, Rice University
16
Regional
• Arbitrary subset of blocks (loop nests, dominator subtrees)
• Use results from one block to improve others
• Limiting scope can increase focus on performance critical regions
• Can eliminate some global impediments
Regional methods
• Loop xforms (unroll, fuse, interchange,strip mining, blocking), DVNT, OSR,register promotion, prefetch insertion,software pipelining, trace scheduling ...
Scope of Optimization
b1
b2
b3
b4
b5
Remember Fortran H
*
COMP 512, Rice University
17
Whole procedure (global or intraprocedural )
• Handles entire procedure
• Make decisions based on global knowledge & global benefit
• No rough edges inside procedure (tied to compilation unit)
• Classic data-flow analysis
Global methods
• CSE (AVAIL, PRE, LCM), constant propagation, GCRA, dead code elim.,hoisting, copy coalescing, ...
Scope of Optimization
b4 b5
b6
b1
b3b2
COMP 512, Rice University
18
Whole program (interprocedural )
• Handles more than one procedure, up to entire program
• Create even larger scopes for optimization
• Limited interactions between procedures Parameters + global variables
• Analysis problems are harder
• Opportunities are different
• Issues for compiler structure
Interprocedural methods
• Inline substitution, procedure cloning,
constant propagation, using whole
program analysis to support global xforms
Scope of Optimization
b4 b5
b6
b1
b3b2
b4 b5
b6
b1
b3b2
b4 b5
b6
b1
b3b2
COMP 512, Rice University
19
Decision Complexity
Yet another axis on which to categorize optimizations
Constant timeLow-order
polynomial timeHard Problems
LVN, SVN, DVNT
Block placement
Tree balancing (ILP)
Loop unrolling Reassociation
Inline substitution
Spill Choice in RA
Copy Coalescing
COMP 512, Rice University
20
Near-term Roadmap
Data-flow analysis
• Deeper treatment of iterative data-flow theory
• Quick look at other solvers
Static single assignment form
• Construction and destruction of SSA
• Example algorithms that use SSA
Populating the Taxonomy
• More transformations, more transformations, more …
• Try to fill in the odd corners of the taxonomy