Optimizing Compilers for Modern Architectures Other Applications of Dependence Allen and Kennedy,...
-
Upload
ophelia-hopkins -
Category
Documents
-
view
230 -
download
4
Transcript of Optimizing Compilers for Modern Architectures Other Applications of Dependence Allen and Kennedy,...
Optimizing Compilers for Modern Architectures
Other Applications of Dependence
Allen and Kennedy, Chapter 12
Optimizing Compilers for Modern Architectures
Overview
• So far, we’ve discussed dependence analysis in Fortran
• Dependence analysis can be applied to any language and translation context where arrays and loops are useful
• Application to C and C++
• Application to hardware design
Optimizing Compilers for Modern Architectures
Problems of C
• C as “typed assembly language” versus Fortran as “high performance language”—C focuses more on ease of use and hardware operations
– Post-increments, Pre-increments, Register variable—Fortran focus is on ease of optimization
Optimizing Compilers for Modern Architectures
Problems of C
• In many cases, optimization is not desiredwhile (!(t=*p));—Optimizers would moves p outside the loop
• C++ as well as other new languages focus more on simplified software development, at the expense of optimizability
• Use of new languages has expanded into areas where optimization is required
Optimizing Compilers for Modern Architectures
Problems of C
• Pointers— Memory locations accessed by pointers is not clear
• Aliasing— C does not guarantee that arrays passed into subroutine
do not overlap
• Side-effect operators— Operators such as pre and post increment encourage a
style where array operations are strength-reduced by the programmers
Optimizing Compilers for Modern Architectures
Problems of C
• Loops—Fortran loops provides values and restrictions to simplify
optimizations
Optimizing Compilers for Modern Architectures
Pointers
• Two fundamental problems—A pointer variable can point to different memory locations
during its use—A memory location can be accessed by more than one
pointer variable at any given time, produces aliases for the location
• Resulting in a much more difficult and expensive dependence testing
Optimizing Compilers for Modern Architectures
Pointers
• Without knowledge of all possible references of an array, compilers must assume dependence
• Analyzing entire program to find out dependence is solvable, but still unsatisfactory
• Lead to the use of compiler options / pragmas—Safe parameters
– All pointer parameters to a function point to independent storage
—Safe pointers– All pointer variables (parameter, local, global) point to
independent storage
Optimizing Compilers for Modern Architectures
Naming and Structures
• In Fortran, a block of storage can be uniquely identified by a single name
• Consider these constructs:p;
*p;
**p;
*(p+4);
*(&p+4);
Optimizing Compilers for Modern Architectures
Naming and Structures
• Troublesome structures, such as unions—Naming problem
– What is the name of ‘a.b’ ?—Different sized objects to overlap same storage
– Reduce references to the same common unit of smallest storage possible
Optimizing Compilers for Modern Architectures
Loops
• Lack of constraints in C—Jumping into loop body is permitted—Induction variable (if there’s any) can be modified in the
body of the loop—Loop increment value may also be changed—Conditions controlling the initiation, increment, and
termination of the loop have no constraints on their form
Optimizing Compilers for Modern Architectures
Loops
• Rewrite as a DO loop—It must have one induction variable—That variable must be initialized with the same value on all
paths into the loop—The variable must have one and only one increment within
the loop—The increment must be executed on every iteration—The termination condition must match —No jumps from outside of the loop body
Optimizing Compilers for Modern Architectures
Scoping and Statics
• Create unique symbols for variables with same name but different scopes
• Static variables—Which procedures have access to the variable can be
determined from the scope information—If it contains an address, then the content of that address
can be modified by any other procedures
Optimizing Compilers for Modern Architectures
Problematic C Dialects
• Use of pointers rather than arrays
• Use of side effect operators—Complicates the work of optimizers—Need to be removed
• Use of address and dereference operators
Optimizing Compilers for Modern Architectures
Problematic C Dialects
• Requires enhancements in some transformations—Constant propagation
– Treat address operators as constants and propagate them where is essential
– Replace generic pointer inside a dereference with actual address
—Expression simplification and recognition– Need stronger recognition within expression where
variable is actually the ‘base variable’
Optimizing Compilers for Modern Architectures
Problematic C Dialects
—Conversion into array references– Useful to convert pointers into array references
—Induction variable substitution– Problem with strength reduction of array references– Expanding side-effect operators also requires changes
Optimizing Compilers for Modern Architectures
C Miscellaneous
• Volatile variables—Functions with these variables are best left without
optimization
• Setjmp and Longjmp—Commonly used for error handling—Storing and loading current state of computation which is
complex when optimization is performed and variables are allocated to registers
—No optimization
Optimizing Compilers for Modern Architectures
C Miscellaneous
• Varags and stdargs—Variable number of arguments—No optimization
Optimizing Compilers for Modern Architectures
Hardware Design: Overview
• Today, most hardware design is language-based
• Textual description of hardware in languages that are similar to those to develop software
• Level of abstraction moving towards low level detailed implementation to high level behavioral specification
• Key factor: compiler technology
Optimizing Compilers for Modern Architectures
Hardware Design: Overview
• Four level of abstraction—Circuit / Physical level
– Diagrams of electronic components—Logic level
– Boolean equations—Register transfer level (RTL)
– Control state transitions and data transfers, timing– Synthesis: conversion from RTL to its implementation
—System level– Concentrate on behavior– Behavioral synthesis
Optimizing Compilers for Modern Architectures
Hardware Design
• Behavior Synthesis is really a compilation problem
• Two fundamental tasks—Verification—Implementation
• Simulation of hardware is slow
Optimizing Compilers for Modern Architectures
Hardware Description Languages
• Verilog and VHDL
• Extensions in Verilog—Multi-valued logic: 0, 1, x, z
– x = unknown state, z = conflict– E.g. division by zero produces x state– Operations with x will result in x state -> can’t be
executed directly—Reactivity
– Propagation of changes automatically– “always” statement -> continuous execution– “@” operator -> blocks execution until one of the
operands change in value
Optimizing Compilers for Modern Architectures
Verilog
—Reactivityalways @(b or c)
a = b + c;—Objects
– Specific area of silicon– Completely separate area on the chip
—Connectivity– Continuous passing of information– Input port and output port
Optimizing Compilers for Modern Architectures
Verilog
—Connectivitymodule add(a,b,c)
output a;
input b, c;
integer a, b, c;
always @(b or c)
a = b + c;
endmodule
Optimizing Compilers for Modern Architectures
Verilog
• Instantiation—Verilog only allows static instantiationinteger x, y, z;
add adder1(x,y,z);
• Vector operations—Viewing other data structures as vector of scalars
Optimizing Compilers for Modern Architectures
Verilog
• Advantages—No aliasing—Restriction of form of subscripts—Entire hardware design given to compilers at one time
Optimizing Compilers for Modern Architectures
Verilog
• Disadvantages—Non-procedural continuation semantics—Lack of loops
– Loops are implicitly represented by always blocks and the scheduler
—Size
Optimizing Compilers for Modern Architectures
Optimizing simulation
• Philosophy—Increases level of abstraction—Opts for less details
• Inlining modules—HDLs have two properties that make module inlining
simpler– Whole design is reachable at one time– Recursion is not permitted
Optimizing Compilers for Modern Architectures
Optimizing simulation
• Execution ordering—The order in which the statement is executed can have a
dramatic effect on the efficiency—Fast in hardware, but not in software—Grouping increases performance—Execute blocks in topological order based on the
dependence graph of individual array elements– No memory overhead
Optimizing Compilers for Modern Architectures
Dynamic versus Static Scheduling
• Dynamic scheduling—Dynamically track changes in values and propagate them—Mimics hardware—Overhead of change checks
• Static scheduling—Blindly sweeps through all values for all objects regardless
any changes—No need for change checks
Optimizing Compilers for Modern Architectures
Dynamic versus Static Scheduling
• If the circuit is highly active, static scheduling is more suitable
• In general, using dynamic scheduling guided by static analysis provides the best results
Optimizing Compilers for Modern Architectures
Fusing always blocks
• High cost of change checks motivates fusing always blocks
• Output of a design may change
Optimizing Compilers for Modern Architectures
Vectorizing always block
• Regrouping low level operations back together to bring higher lever abstractions
• Vectorizing the bit operations
Optimizing Compilers for Modern Architectures
Two state versus four state
• Extra overhead in four state hardware
• Few people like hardware that enters unknown states
• Two state logic can be 3-5x faster
• Utilization of two valued logic where ever possible
• Finding out part executable in two state logic is difficult
• Use interprocedural analysis
Optimizing Compilers for Modern Architectures
Two state versus four state
• Test for detecting unknown is low cost, 2-3 instructions
• Check for unknowns but default quickly to two state execution
Optimizing Compilers for Modern Architectures
Rewriting block conditionsalways @(posedge(clk)) begin
sum = op1 ^ op2 ^ c_in;
c_out = (op1 & op2) | (op2 &
c_in) | (c_in & op1)
end
always @(op1 or op2 or c_in) begin
t_sum = op1 ^ op2 ^ c_in;
t_c_out = (op1 & op2) | …
end
always @(posedge(clk)) begin
sum = t_sum;
c_out = t_c_out;
End
Optimizing Compilers for Modern Architectures
Basic Optimizations
• Raise level of abstraction
• Constant propagation and dead code elimination
• Common subexpression elimination
Optimizing Compilers for Modern Architectures
Synthesis Optimization
• Goal is to insert the details
• Analogous to standard compilers
• Harder than standard compilers—Not targeted towards a fixed target—No single goal. Minimize cycle time, area, power
consumption
Optimizing Compilers for Modern Architectures
Basic Framework
• Selection outweigh scheduling
• Analogous to CISC
• Body of tree matching algorithms
• Needs constraints
Optimizing Compilers for Modern Architectures
Loop Transformationsfor(i=0; i<100;i++) {
t[i] = 0;
for(j=0; j<3; j++)
t[i] = t[i] + (a[i-j]>>2);
}
for(i=0; i<100; i++) {
o[i] = 0;
for(j=0; j<100; j++)
o[i] = o[i] +m[i][j] * t[j]
}
Optimizing Compilers for Modern Architectures
Loop Transformations
for(i=o; i<100; i++)
t[i] = 0;
for(i=0; i<100; i++)
o[i] = 0;
for(i=0; i<100; i++)
for(j=0; j<3; j++)
t[i] = t[i] + (a[i-j] >> 2)
for(i=0; i<100; i++)
for(j=0; j<100; j++)
o[i] = o[i] + m[i][j] * t[j];
Optimizing Compilers for Modern Architectures
Loop Transformations
for(i=0; i<100; i++)
o[i] = 0;
for(i=0; i<100; i++)
t[i] = 0;
for(j=0; j<3; j++)
t[i] = t[i] + (a[i-j] >> 2);
for(j=0; j<100; j++)
o[j] = o[j] + m[j][i] * t[i];
Optimizing Compilers for Modern Architectures
Loop Transformationfor(i=0; i<100; i++) {
o[i] = 0;
a0 = a[0];
a1 = a[-1];
a2 = a[-2];
a3 = a[-3];
for(i=0; i<100; i++) {
t = 0;
t = t + (a0>>2) + (a1>>2) + (a2>>2) + (a3>>2)
a3 = a2; a2 = a1; a1 = a0; a0 = a[i+1];
for(j=0; j<100; j++)
o[j] = o[j] + m[j][I] * t;
}
}
Optimizing Compilers for Modern Architectures
Control and Data Flow
• Von Neumann architecture—Data movement among memory and registers—Control flow encapsulated in the program counter and
effected with branches
• Synthesized hardware—Data movement among functional units—Control flow is which functional unit should be active on
what data at which time steps
Optimizing Compilers for Modern Architectures
Control and Data Flow
• Wires—Immediate transfer
• Latches—Values hold throughout one clock cycle
• Registers—Static variables in c—Held in one or more clock cycle
• Memories
Optimizing Compilers for Modern Architectures
Memory Reduction
• Memory access is slow compared to unit access
• Application of techniques —Loop interchange—Loop fusion—Scalar replacement—Strip mining—Unroll and jam—Prefetching