High Level Synthesis - islab.soe.uoguelph.ca
Transcript of High Level Synthesis - islab.soe.uoguelph.ca
1
High Level SynthesisHigh Level Synthesis
Computer Aided Design forComputer Aided Design for
Reconfigurable ComputerReconfigurable Computer
SystemsSystems
By: Garrett ReynoldsBy: Garrett Reynolds
2
ContentsContents
•• What is HLS and Why use HLSWhat is HLS and Why use HLS
•• SynthesisSynthesis–– Scheduling & AllocationScheduling & Allocation
•• ProblemsProblems
•• HLS with Reconfigurable HLS with Reconfigurable DatapathDatapathComponentsComponents
•• ResourcesResources
3
What is HLSWhat is HLS
•• HLS is the process of taking a behavioralHLS is the process of taking a behavioral
description and automatically translating it to adescription and automatically translating it to a
structural description at the register transfer levelstructural description at the register transfer level
•• Process must take user constraints and hardwareProcess must take user constraints and hardware
constraints into considerationconstraints into consideration
•• Structural description will consist of functionalStructural description will consist of functional
units, memory elements and interconnectionsunits, memory elements and interconnections
4
Why use HLS?Why use HLS?
•• Shorter design cycleShorter design cycle
–– Get designs out the door quicker, lower costsGet designs out the door quicker, lower costs
•• Less ErrorsLess Errors
•• Large variety of solutionsLarge variety of solutions
–– Designer can choose design depending onDesigner can choose design depending on
different trade offsdifferent trade offs
•• Self documentingSelf documenting
•• Make technology available to more peopleMake technology available to more people
5
SynthesisSynthesis
•• Is broken down into 2 main areas, synthesis andIs broken down into 2 main areas, synthesis and
allocationallocation
•• Before synthesis can take place an internalBefore synthesis can take place an internal
representation must be made from the programmedrepresentation must be made from the programmed
languagelanguage
•• The internal representation is usually in the form ofThe internal representation is usually in the form of
a graph and takes data flow and control intoa graph and takes data flow and control into
considerationconsideration
6
SynthesisSynthesis
•• A data flow graph (DFG) isA data flow graph (DFG) is
a common way to representa common way to represent
the internal structurethe internal structure
•• Operations are shown inOperations are shown in
nodes and are linkednodes and are linked
together if there is a datatogether if there is a data
dependency between nodes dependency between nodes
•• Y=max((Ashr1)+(B-(Bshr3)),B)Y=max((Ashr1)+(B-(Bshr3)),B)
7
SynthesisSynthesis
•• The DFG does not includeThe DFG does not include
control for loops and branchescontrol for loops and branches
•• The DFG can be augmentedThe DFG can be augmented
with control nodes to form a with control nodes to form a
control data flow graph CDFGcontrol data flow graph CDFG
8
SchedulingScheduling
•• The aim is to reduce the number of control steps toThe aim is to reduce the number of control steps to
complete the programcomplete the program
•• The first figure wouldThe first figure would
require 23 control steps,require 23 control steps,
where the second figurewhere the second figure
would only require 10would only require 10
9
SchedulingScheduling
•• To ensure efficient scheduling a designer has toTo ensure efficient scheduling a designer has toconsider the interaction with allocation as well asconsider the interaction with allocation as well asthe type of scheduling algorithm that will be usedthe type of scheduling algorithm that will be used
–– To schedule in the same c-step the designer needs toTo schedule in the same c-step the designer needs toknow if they use the same FUknow if they use the same FU
–– Need to find efficient schedule by knowing the delaysNeed to find efficient schedule by knowing the delaysof the operations used and this is only done when bothof the operations used and this is only done when boththe FUthe FU’’s and interconnections are defineds and interconnections are defined
–– Need to know which operations can be done in parallelNeed to know which operations can be done in parallelto know how many FUto know how many FU’’s should be useds should be used
10
SchedulingScheduling
•• To overcome this problem there have been many differentTo overcome this problem there have been many differentsolutions that either limit the number of FUsolutions that either limit the number of FU’’s, schedules, scheduleand allocate at the same time or allocate firstand allocate at the same time or allocate first
•• When selecting a scheduling algorithm there are 2 generalWhen selecting a scheduling algorithm there are 2 generalcategories that most can fit intocategories that most can fit into
–– Transformational and iterative/constructiveTransformational and iterative/constructive
•• Transformational can be very computationally expensiveTransformational can be very computationally expensivebut usually comes to a more optimal solutionbut usually comes to a more optimal solution
11
SchedulingScheduling
•• Very common constructive algorithms are the ASAPVery common constructive algorithms are the ASAP
and the ALAP configurationsand the ALAP configurations
•• For ASAP, each operation is taken from the graphFor ASAP, each operation is taken from the graph
and placed in the earliest control step possibleand placed in the earliest control step possible
•• These approaches are veryThese approaches are very
general and sometimes endgeneral and sometimes end
up giving a longer thanup giving a longer than
necessary solutionnecessary solution
12
SchedulingScheduling
•• To solve this problem a solution such as listTo solve this problem a solution such as list
scheduling can be used where each operation isscheduling can be used where each operation is
given a general criterion that is used to judge whengiven a general criterion that is used to judge when
it should be scheduledit should be scheduled
•• This criterion can be a number of different optionsThis criterion can be a number of different options
depending on the solution requireddepending on the solution required
13
AllocationAllocation
•• 3 main goals3 main goals
–– Map operations to FUMap operations to FU’’ss
–– Assign values to registersAssign values to registers
–– Provide the interconnections from the registers to the FUProvide the interconnections from the registers to the FU’’ssvia buses or multiplexersvia buses or multiplexers
•• Concentrate on optimizing some area whileConcentrate on optimizing some area whileconsidering the user constraintsconsidering the user constraints
–– Interconnect lengthInterconnect length
–– Register, bus or multiplexer costRegister, bus or multiplexer cost
–– Critical path delayCritical path delay
•• Operations can use the same FU as long as theOperations can use the same FU as long as theoperations are mutually exclusive, applies tooperations are mutually exclusive, applies tomemory and interconnect as wellmemory and interconnect as well
14
AllocationAllocation
•• For allocation the methods fall into 2 categoriesFor allocation the methods fall into 2 categories
being iterative/constructive and globalbeing iterative/constructive and global
•• The iterative technique will choose an operation,The iterative technique will choose an operation,
value or interconnect then make the assignmentvalue or interconnect then make the assignment
and repeat until the graph has been entirely coveredand repeat until the graph has been entirely covered
•• The way that it allocates specific choices isThe way that it allocates specific choices is
decided by a set of rules from the designerdecided by a set of rules from the designer
15
AllocationAllocation
•• Global allocation mainly uses graphs to formulateGlobal allocation mainly uses graphs to formulate
an optimal solutionan optimal solution
•• These graphs consist of the nodes which representThese graphs consist of the nodes which represent
operations, values or connectionsoperations, values or connections
that are connected using an arcthat are connected using an arc
if they are mutually exclusive toif they are mutually exclusive to
other nodesother nodes
16
AllocationAllocation
•• Once the graph is created, cliques can be foundOnce the graph is created, cliques can be found
•• If the objective was to minimize the hardware used,If the objective was to minimize the hardware used,
then the algorithm would be to find the leastthen the algorithm would be to find the least
amount of cliquesamount of cliques
•• These algorithms areThese algorithms are
usually greedy and areusually greedy and are
very costly to find thevery costly to find the
optimal solutionoptimal solution
17
ProblemsProblems
•• HLS is capable of finding good solutions, but givenHLS is capable of finding good solutions, but given
specialized goals, manual optimization is necessaryspecialized goals, manual optimization is necessary
•• A lack of interactivity exists where the designer hasA lack of interactivity exists where the designer has
limited control on the outcome of the designlimited control on the outcome of the design
processprocess
•• Wide varieties of libraries need to be accepted byWide varieties of libraries need to be accepted by
the synthesis toolsthe synthesis tools
•• HLS will effect the layout due to architectureHLS will effect the layout due to architecture
choices and the tools that can communicate to bothchoices and the tools that can communicate to both
levels are necessarylevels are necessary
18
HLS with HLS with Reconfig DatapathReconfig Datapath
ComponentsComponents
•• They propose a solution that utilizes runtimeThey propose a solution that utilizes runtime
reconfigurable components during HLSreconfigurable components during HLS
•• Using a resource constrained schedule they usedUsing a resource constrained schedule they used
the list algorithm to schedule their RTRthe list algorithm to schedule their RTR
•• Using a priority list relating to the difference inUsing a priority list relating to the difference in
ASAP and ALAP values they were able to scheduleASAP and ALAP values they were able to schedule
operations appropriatelyoperations appropriately
19
HLS withHLS with Reconfig Datapath Reconfig Datapath
ComponentsComponents
•• It is possible that using this heuristic can cause theIt is possible that using this heuristic can cause the
control step period to double in the worst casecontrol step period to double in the worst case
20
HLS withHLS with Reconfig Datapath Reconfig Datapath
ComponentsComponents
2 multipliers, 1 Add 1 multiplier, 1 reconfig multipler,1 adder
21
HLS withHLS with Reconfig Datapath Reconfig Datapath
ComponentsComponents
•• Results showed using 2 reconfigurable componentsResults showed using 2 reconfigurable components
they were able to achieve an average speedthey were able to achieve an average speed
increase of 53% which can account for the worstincrease of 53% which can account for the worst
case scenario when the algorithm will double thecase scenario when the algorithm will double the
control step sizecontrol step size
22
ResourcesResources
• Economakos, G. (2006). High-Level Synthesis with ReconfigurableDatapath Components. Parallel and Distributed Processing Symposium,pg 4
• Gjski, D. Ramachandran, L. (1994). Introduction to High-LevelSynthesis. IEEE Design and Test of Computers. Vol 11 No. 4, pgs 44-54
• McFarland, M. Parker, A. Camposano, R. (1988) Tutorial on High-Level Synthesis. IEEE Design Automation Conf. Pgs 330-336
23
Project UpdateProject Update
•• Neural Network with back propagationNeural Network with back propagation
•• In the process of programming a simple neuralIn the process of programming a simple neural
network and getting ready for profilingnetwork and getting ready for profiling