Post on 20-Dec-2015
A High Performance Application Representation
for Reconfigurable Systems
Wenrui Gong Gang Wang Ryan KastnerDepartment of Electrical and Computer Engineering
University of CaliforniaSanta Barbara, CA 93106-9560
{gong, wanggang, kastner}@ece.ucsb.eduhttp://express.ece.ucsb.edu
June 22, 2004
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 2
Outline
Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results Concluding remarks
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 3
Outline
Reconfigurable computing systems Reconfigurable computing systems Challenges of application representations
Compilation process Synthesizing to hardware Experimental results Concluding remarks
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 4
Reconfigurable Computing Systems
Standard programmable platforms Post-manufacturing customization Designs shift from physical chips to
configuration files A software design flow
Feature hardware speed with software flexibility
Enable higher productivity
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 5
Application Representations
A common application representation is needed to tame the complexity of system synthesis
Requirements Able to generate software code for
microprocessors Able to be easily translate to hardware
configuration files Allow a variety of transformations and
optimizations to exploit the performance
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 6
Parallelism Exploration
Fine grain parallelism Multiple functional units Issuing an operation to a free functional units Operations executed independently
Coarse grain parallelism Executing multiple threads With occasional synchronization
Reconfigurable computing systems support both fine and coarse grain parallelism
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 7
PDG + SSA
The PDG + SSA representation can be used for both hardware synthesis and software generation
The PDG and SSA forms are common representations for software generation
Here we concentrate on hardware synthesis
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 8
Outline
Reconfigurable computing systems Compilation process
Overview Constructing the PDG Incorporating the SSA form
Synthesizing to hardware Experimental results Concluding remarks
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 9
Overview
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 10
Program Dependence Graph
PDG: Program Dependence Graph ENTRY node: the root node of a PDG PREDICATE nodes: producing predicate
values from expressions Diamond-shaped nodes 2, 3, and 4
STATEMENTS nodes: a arbitrary set of operations
Circle nodes: 1, 4, 6, 7, and 8 REGION nodes: summarizing all
operations with the same control conditions together.
House-shaped nodes R2, R3, R4 … R3: the predicate value of 2 is True
Edges represent dependencies
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 11
Constructing the PDG from the CDFG
Implemented based on Ferrante’s algorithm Using post-dominate tree
var = pred;for (i = 0; i < len; ++i){ val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768;}return val;
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 12
Constructing the PDG (cont’d)
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 13
The Static Single Assignment Form
Each variable has exactly one assignment A variable is referenced always using the same
name At joint points of control conditions, special Ø nodes
are inserted.
val += diff;if (val > 32767) val = 32767;else if (val < -32768) val = -32768;
val_2 = val_1 + diff;if (val_2 > 32767) val_3 = 32767;else if (val_2 < -32768) val_4 = -32768;val_5 = phi(val_2,val_3,val_4);
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 14
Extending the PDG with Ø-Nodes
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 15
The Program Representation
Loop independent Ø-nodes taking two or more input
values and a predicate value committing one of the inputs
depending on this predicate Loop carried Ø-nodes
Input: the initial value, the loop-carried value, and also a predicate value
Outputs: one to the iteration body, and the other to the loop exit
Directing proper values to proper outputs.
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 16
Outline
Reconfigurable computing systems Compilation process Synthesizing to hardware
Data-path elements Ø-nodes
Experimental results Concluding remarks
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 17
Synthesizing the Data-Path
A one-to-one mapping is used Different resource allocation and binding algorithms can be used (on-going work)
Each operation has an operator and several operands Operands are synthesized directly to wires in the circuit
Each variable in the SSA form has only one definition point PREDICATE nodes: synthesized to Boolean logic signals to control
next-stage transitions and direct multiplexers to commit the correct value.
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 18
Synthesizing Ø-nodes
A loop-independent Ø-nodes are synthesized to a multiplexer. The multiplexer selects input values depending on the predicate values.
For a loop carried Ø-node, an additional switch is generated to direct the loop-exiting values
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 19
Synthesize to Hardware
Simplifications and optimizations Removing unnecessary
control dependencies Cascading/ expanding
multipliers obtain better performance
Flip-flops are inserted Guarantee that correct
values will available no matter which execution path is taken
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 20
Outline
Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results
Setup and benchmarks Results
Concluding remarks
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 21
Setup and Benchmarks
Benchmark suites Functions from the MediaBench suite Profiled using sample data Only report conservative results
Estimated execution time Aggressive predicated execution Only report conservative results
Area One-to-one mapping without resource sharing Reported in numbers of FPGA slices
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 22
Estimated Execution Time
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 23
Estimated Execution Time (cont’d)
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 24
Estimated FPGA Area
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 25
Outline
Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results Concluding remarks
On-going/future work
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 26
Concluding Remarks
The PDG+SSA form supports a variety of transformations and enables both coarse and fine grain parallelism
A method to synthesize this form to hardware
This form gives faster execution time using similar area when compared with CFG and PSSA forms
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 27
On-going/Future work
Investigate transformations to create coarse grained parallelism using the PDG+SSA form
Augment the PDG+SSA form with architectural information to provide fast estimation.
Integrate of resource sharing and other architectural synthesis techniques
6/21/2004
GONG et al: A High Performance Application Representation for Reconfigurable Systems 28
Thank You
Prof Ryan Kastner and Gang Wang All audiences