HPPS - Final - 06/14/2007
-
Upload
usrdresd -
Category
Technology
-
view
1.071 -
download
0
Transcript of HPPS - Final - 06/14/2007
POLITECNICO DI MILANO
High Performance Processors and
Systems PdM – UIC joint master 2007PdM – UIC joint master 2007
Instructor: Prof. Donatella SciutoInstructor: Prof. Donatella Sciuto
HPPS @ PdM – June 2007HPPS @ PdM – June 2007
2
General OutlineGeneral Outline
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
DRESDDRESD in a Nutshell in a NutshellDynamic Reconfigurability in Embedded System
Design
DRESD @ PdM – June 2007DRESD @ PdM – June 2007
4
OutlineOutline
ReconfigurationMotivationsBasic DefinitionSoC
5
MotivationsMotivations
Increasing need for behavioral flexibility in embedded systems design
Support of new standards, e.g. in media processingAddition of new features
Applications too large to fit on the device all at once
Speedup the overall computation of the final system
6
ReconfigurationReconfiguration
The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation.
Gerald Estrin, 1960
7
SoC ReconfigurationSoC Reconfiguration
fix
Partial TotalEmbedded
8
Different Scenarios...Different Scenarios...
Single Device Distributed System
9
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
DDynamicynamic Re Reconfigurabilityconfigurability AAppliedpplied toto M Multi-FPGAulti-FPGA
SSystemsystems
DReAMS
DReAMSDReAMS
Dynamic ReconfigurabilityApplied to Multi-
FPGA SystemsBranch of DRESD projectInherits architectures and tools
Automatic workflow from VHDL system description to FPGA implementation
VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs
DReAMS
13
OutlineOutline
Problem description
Project goals and contributions
Project phases
What is partitioning?
Existing approaches
Going deep into the problem
SpartAThe frameworkThe ideaThe algorithm
Experimental resultsFuture work
14
Problem descriptionProblem description
Multi-FPGA - RATIONALELarge designs do not fit into a single chipHigh performance parallelized applicationsOur case: apply dynamic reconfigurability
Need to break the initial design into several blocks
One block corresponds to a single FPGA chipWhich inputs/outputs?Which objectives?Which techinques?
15
Project goals and Project goals and contributionscontributions
Analyze existing approachesObtain a deep knowledge of this -well explored- fieldExtract basic ideas for a new approachObtain some terms of comparison
Define precisely which problem(s) we cope withContextualize the problemFocus on our needs
Develop a new solutionTheoretical backgroundImplementation and evaluation
16
Project phasesProject phases
First Phase [15th March – 12th April]Documentation: presentation (12/4), reportGoals:
Analysis of the state of the artProduce some hints on a new approach
Second Phase [13th April – 17th May]Documentation: presentation (17/5), reportGoals:
Precise definition of the problemPropose a new solution
Third Phase [18th May – 14th June]Documentation: presentation (14/6), final reportGoal
Implementation and evaluation of the proposed solution
17
What is partitioning?What is partitioning?
GoalDivide a set of interrelated objects into a set of subsetsOptimize a specific objective(s)
K-way partitioning• Given a graph G=(V,E), partition it into k subsets
V1...Vk such that their intersection is empty and their union = V.
• Balance constraint: |Vi| ≈ |V|/k
Aims at minimizing (or maximizing) an objective function
Edge-cutOther objectives
In general: NP-completeSeveral heuristics that provide good results have been developed
18
Existing approaches - a glanceExisting approaches - a glance
Traditional methodsKernighan – Lin and Fiduccia – Mattheyses heuristics
Iterative-improvement algorithmsBegins with an initial partition and iteratively improve itO(n3) complexity
Iterative algorithmsGeneticSimulated annealing
Multilevel algorithmsClustering -> Initial partitioning -> RefiningMeTIS/hMETIS suite: best current results for large flattened graphs partitioning
19
Going deeper into the problemGoing deeper into the problem
Two kinds of multi-FPGA partitionTopology-aware
Architecture topology is an inputNo optimization of the no. of FPGAs neededMain task: association between the (larger) system graph and the (smaller) architectural graph
Topology-freeArchitecture topology is not providedInput: dimension and communication features of FPGAsMinimization of the number of FPGAsPlace and route after partitioning
At the moment, we deal with the Topology-free problem
20
SPartA: the frameworkSPartA: the framework
Input: VHDL system description
Output: several VHDL files, one for each block (FPGA)
Three main phases:Extract design from VHDL description“Real” partitioning phase (core)Build VHDL files
21
SPartA: the ideaSPartA: the ideaStructural approach
Fully exploits the design hierarchyModules can be treated as single blocksBases for expansions toward dynamic reconfigurability
ObjectivesMinimize cutsizeMinimized the number of used FPGAsPreserving module integrity
22
SPartA: the algorithm SPartA: the algorithm 1/21/2
Recursive algorithm (deals with trees)Starts from TOP nodePrecondition
No leaves with dimension > FPGA sizeAt every moment, a node can be:
COVERED, UNCOVERED or PARTIALLY COVERED
Stop condition• Node TOP is COVERED
23
SPartA: the algorithm SPartA: the algorithm 2/22/2
OPEN ISSUE: Selecting the first node to be inserted into an empty partition
Random nodeNode with overall max communicationNode with max communication with its siblings
24
Results Results 2/22/2
Complexity: exponential, due to the recursive nature of the algorithmExecution time however low (tens of seconds for a reasonable large design)EXAMPLE
ORIGINAL TREE PARTITIONED TREE
25
Results Results 3/33/3
Evaluation metricsEDGECUT, FILLING and SPLITS
Evaluation of the three policies for node selection18 different trees of varying size
26
Results Results 3/33/3
27
Future workFuture work
Algorithm improvementBalancing of last partitionFirst node selection policiesMore refined “score” function for selecting node
Use closeness metrics
Comparisons with existing algorithms
ExpansionSpartA framework developmentTopology-aware partitioning
28
The endThe end
ANY QUESTIONS?
29
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition
Matteo [email protected]
OutlineOutline
IntroductionProblem descriptionProject GoalsState of the Art
Project in detailsContributionsPhasesResults
What’s next
32
Problem DescriptionProblem Description
Architectural description of a distributed FPGAs environment3 layers architecture
33
Project GoalsProject Goals
Design the architecture of the most generic distributed system
Node definitionInterface definitionCommunication channel definition
Design a communication protocolEssential protocolInterrupt based protocolTimeout improvement
34
State of the ArtState of the Art
CONFigurable ElecTronic TIssue (CONFETTI) by EPFLCellular based architecturePROs: high degree of parallelism, high computational powerCONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort
Splash 2 by IDA Supercomputing CenterArchitecture composed by a Sun Sparcstation host, an interface board and “Splash Array”s boardsPROs: again high parallelism and powerCONs: a central host coordinates the computational units, no fault tollerance, no flexibility
35
ContributionsContributions
The proposed architecture:
Allows several Spartan-3 Starter Boards to communicate and exchange data
It is portable to different FPGAs with minimum effort
It is the basic infrastructure that will allow external partial dynamic reconfiguration
36
Project PhasesProject Phases
First Phase, time window: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:
Digilent Spartan-3 Starter Board studyBoards connection
Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:
Communication between two Microblaze soft-processorsGPIO integration in the architecture
Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:
Interrupt handling, timeout handlingSimple application as example
37
Board StudyBoard Study
How to use resources like switches, leds and connectors in the boardHow to map an IP-Core port with a physical pin of the boardChoice of the A2 Expansion Connector to connect two boards
38
Microblaze CommunicationMicroblaze Communication
Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow
39
GPIO InsertionGPIO Insertion
Higher architecture portability through the use of the GPIO IP-Core.Higher architecture portability through the use of the GPIO IP-Core
40
Interrupt Controller InsertionInterrupt Controller Insertion
Communication protocol improvement by interrupt handling to prevent processor from busy waiting Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling
41
TimeoutTimeout
Malfunctioning due to interference on the communication channel lead to deadlocks
Communication protocol is not reliable at all
Counter implementation, including the driver used by the processor to lower down raised interrupts
Development of a simple application to verify to correctness of the proposed approach
42
ResultsResults
A short Demo ...
43
Future WorkFuture Work
Apply the proposed approach to external partial dynamic reconfiguration
Develop a co-simulation framework based on the VHDL/SystemC descriptions of distributed systems
Receive as input the VHDL description of the systemBuild the VHDL description for every nodeCreate the SystemC stub to allow inter node communicationDescribe the communication in SystemCCo-simulate the VHDL / SystemC description
QuestionsQuestions
45
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
OOperatingperating Sy System support stem support forfor R Reconfeconfiigurablegurable S SoCoC
POLITECNICO DI MILANO
Development of an OS Development of an OS architecture-independent architecture-independent
layer for dynamic layer for dynamic reconfigurationreconfiguration
Ivan [email protected]
4848
OutlineOutline
IntroductionProblem descriptionProject GoalsState of the Art
Project in detailsContributionsPhasesResults
What’s next
49
Problem descriptionProblem description
Need for an operating system support on Reconfigurable SoCs
Simplified software development processImproved code portability
Lack of support for dynamic reconfigurable architectures
Specific solutions for specific architectures
Need for an architecture-independent abstraction layer
49
50
Project GoalProject Goal
Primary goals:Analysis of the State of the ArtDefinition of the new intermediate layerPhysical implementation
Specific goals:Study of the solutions developed inside the DRESD group Comparison between existing solutionsRecovery of on of the two implementationsHardware architectures generation using up-to-date tools on Xilinx Virtex II – Pro VP7
50
51
State of the ArtState of the Art
Caronte implementation (Alberto Donato, 2005)Two kernel modules
ICAP deivice driverIP-Core manager (IPCM)
51
52
State of the Art (cont’d)State of the Art (cont’d)
YaRA implementation (Vincenzo Rana, 2006)Multi-layered structure
Four modules: Reconfiguration controller driver, MAC, LOL, Reconfiguration LibraryROTFL architecture
52
53
ContributionsContributions
Limits of existing implementationsLack of portability
E.g. YaRA solution implemented on RAPTOR2000
Reconfiguration process details visible from userspace
Definition of an architecture independent middleware
Improved portabilityIt works on different hardware architecturesIt works with different Linux distribution
Opportunity to optimize latencies53
54
PhasesPhases
First phase: Layer definitionGoal: Factorization of common features
Boundaries of the new middlewareMapping of existing solutions on the functionalities
Motivation: Provide guidelines for actual implementation
Second phase: Implementation recoveryGoal: Recovery of bootstrap process and kernel imagesMotivation: Full recovery of Caronte solution
Third phase: Architectures generationGoal: Synthesis of hardware architectures using up-to-date Xilinx tools and coresMotivation: Synthesis of hardware architectures using up-to-date Xilinx tools54
5555
First Phase: Layer definitionFirst Phase: Layer definition
Definition of new layer boundariesFactorization of existing featuresMapping of the required functionalities on existing implementations
Feature Caronte Solution YaRA Solution
Reconfiguration controller support
ICAP device driverReconfiguration Controller Driver
Dynamic address space assignment
IPCM Module MAC module
Dynamic device registration and driver
loadingIPCM Module LOL module
APIDirect interaction
with modulesReconfiguration
library
Module management (caching, placement...)
Not implemented ROTFL architectureLegend: ● = Both hardware and software ● = Hardware independent
56
Second Phase: Implementation RecoverySecond Phase: Implementation Recovery
Bootstrap process from flash memory
56
16 MB Flash0xe4000000
0xe42FFFFF
...
...
0xe4F00000
0xe4F80000
64 MB DDR SDRAM0x000000
00
......
0xe4FFFFFF0x03FFFFFF
0x00800000
...
BRAM
PowerPC
FPGABootloader
Bootmanager
Kernel and RAMDisk Image1
2
3
4 5
6
57
Second Phase: Implementation Recovery Second Phase: Implementation Recovery (cont’d)(cont’d)
Several issuesNo bootmanager nor linux kernel on flash memory at the beginningFlash memory seen as read-only memory at runtimeNeed for an ad-hoc solution
Avmon command line interfaceExecuted from DDR SDRAM memoryFTP transfert of bootmanager and flash programmingAlso useful for kernel download
Kernel executable imageKernel image built using a cross-compilerICAP and IPCM modules loaded at runtime
57
58
Third Phase: Architecture generationThird Phase: Architecture generation
Hardware architecture used in Second Phase no longer useful
Synthesized with Xilinx ISE and EDK 6.1
Same hardware structure realized with updated cores and recend tool versions
Synthesis with Xilinx ISE and EDK 7.1Synthesis with Xilinx ISE and EDK 9.1
Lack of device driver support and documentation to configure newest cores
58
59
Results: Implementation Results: Implementation RecoveryRecovery
Linux Bootstrap from flash memory
59
60
Results: Implementation Results: Implementation RecoveryRecovery
Design summary for hardware architectures on Xilinx Virtex II – Pro VP7
Two main limitationsEthernet controllerNecessity of a top-level design
Design too large for module-based reconfiguration60
Xilinx ISE/EDK 7.1 Xilinx ISE/EDK 9.1
Resource
Used Available
% Used Available
%
Slices 4926 4928 99% 5318 4928 107%
Flip-Flops
52179856 52%
57249856 58%
4-in LUTs
69749856 70%
69939856 70%
61
What’s nextWhat’s next
Device driver updates to support newest architectures
Intermediate layer implementationOpportunity to add some additional features
Reconfiguration scheduler
Opportunity to define a common device driver interface to simplify the creation of a new driver by the use
Integration of the middleware and the operating system support in a complete design flow
61
6262
QuestionsQuestions
63
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
6565
OutlineOutline
IntroductionProblem descriptionProject GoalsState of the Art
Project in detailsContributionsPhasesResults
What’s next
66
Problem descriptionProblem description
66
• User has to spread his attention on many problems, some of this related with the implementation of the design.
• Often users could don’t know anything about reconfigurable architecture generation and they haven’t.
67
Project GoalsProject Goals
67
• New design methodology tailored to support partial dynamic reconfigurable architecture
• Definition and implememtantion of design framework able to
• Support different design paradigms i.e. Xilinx Module Based, Xilinx EAPR
• Hide the dirty work (due to the recofiguration) to the application designer
• Support different architectural solutions i.e. different communication infrastructure IBM CoreConnect or Wishbone
68
ContributionsContributions
68
• With our frame work all user (novice and not) may be able to develop and debug their functionality through a reconfigurable architecture without analyze all problems related with that develop methodology
69
PhasesPhases
69
•1st phase (15 March – 15 April): Budgeting
•Study of the state of the art
•2nd phase(15 April – 15 May): Realization phase
•Construction of the entire frame work based on previously separated tools
•Implementation of a innovative work flow
•3rd phase (15 May – 15 June): Project’s validation
• Definition of a new communication infrastructure and transfer protocol for the reconfigurable part
• Verify the integration of the new infrastructure in the project
70
First PhaseFirst Phase
70
•Study of the state of the art
• Standard reconfigurable design flow
• Xilinx Modlue Based and EAPR
• Caronte Design Flow
• EDK-based architecture
71
SelSelf Reconfigurable f Reconfigurable ArchitectureArchitecture
71
72
Second Phase Second Phase 1/41/4
72
Costruction of the entire frame work based on prevoiusly separated tools
User has to focus his attention only on the develop of the IBM core-connect architecture and on writing modules which implement his functionality
SYSTEM.VHD contains all information about the IBM core-connect architecture
73
Second Phase Second Phase 2/42/4
73
ArchGen take the system.vhd file and process the contained architecture and translate that static architecture in a dynamic one
FIX.VHD contains the instantiations of the processors (one or more) and all the components presented in the IBM core-connect architecture
TOP.VHD contains the instantiations of the fix component and the information about the communication infrastructure
74
Second Phase Second Phase 33/4/4
74
COMiC generate an NCD file which contains the information about the communication infrastructure and an XDL file which contains the same information in text mode
75
Second Phase Second Phase 4/4/44
75
At this point we have only to collect all the information we need and so, through a parser we insert those into a new top.vhd which will be our fix part of the architecture, at this point we have only to manage the reconfigurable modules written by the user
76
Third Phase Third Phase 1/31/3
76
An OPB bus based on 3-state buffer used to link one or more modules to the fix part (created with ISE)
Definition of a new communication infrastructure and transfer protocol for the reconfigurable part
77
Third Phase Third Phase 2/32/3
77
Use ncd2xdl converter to obtain an xdl file which contains all parameters of our bus
78
Third Phase Third Phase 3/33/3
78
Perfect integration in our process, we can use all bus type to connect fix and reconfigurable part
Verify the integration of the new infrastructure in the project
79
ResultsResults
79
• That frame work answer to the need of automation presented from the novice user and help, generally, all the users that they head a low time to market.
80
What’s nextWhat’s next
80
• Our idea for future work is to schedule a one or two work day to patch some bugs presents in the project and to adjust the output of COMiC which has to create an OPB replay bus.
8181
Questions?Questions?
8282
What’s nextWhat’s next
DRESDDReAMS
Matteo MurgidaAlessandro Panella
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
PolarisPolaris
8484
PolarisPolaris
Create an integrated HW/SW system to manage 2D reconfiguration
SW side:Maintain information on FPGA statusDecide of how to efficiently allocate tasks
HW side:Provide support for effective task allocationPerform 2D bitstream relocation
84
85
Management of 2D Management of 2D Reconfiguration in a Reconfiguration in a
Reconfigurable SystemReconfigurable System
Massimo [email protected]
8686
OutlineOutline
IntroductionProblem description Project Goals and Contributions
Project in detailsPhasesResults
Future Work
87
Problem DescriptionProblem Description
New Generation of FPGAsVirtex-4 and Virtex-5Allow bi-dimensional reconfiguration
This permits to:Better exploit reconfigurable areaObtain modules performance optimizations
More complex management:Handle one more degree of freedomAvoid more fragmentationPerform good placement choices to keep low TRRKeep acceptable intra-module routing paths
87
88
Project Goals and Project Goals and ContributionsContributions
Analyze effects of 2D reconfigurationNew advantagesNew problems
Examine possible solutions to new problemsExplore literature to find promising ideasEvaluate those solutions in various scenarios
Propose a new solutionCombining ideas from literature with new onesObtaining good cost-quality tradeoff
88
89
Project PhasesProject PhasesFirst Phase, time window: 15th March – 12th April
Documentation: prj presentation (12/4), prj reportGoals:
General analysis of 2D reconfigurationDetailed description of the new problems
Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:
Definition of desired features for a solutionAnalysis and evaluation of existing solutions
Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoal: propose a new combined solution to effectively handle problems of 2D reconfiguration
89
90
Setting and Advantages Setting and Advantages DefinitionDefinition
Definition of the setting:2D self partial dynamical run-time reconfiguration
Analysis of the advantages of 2D ReconfigurationIn area usage and performance
90
9191
2D Fragmentation Problem2D Fragmentation Problem
Analysis of the 2D-fragmentation problemArea generally more fragmentedCan nullify the area optimizations obtained
9292
Placement DecisionsPlacement Decisions
Analysis of 2D placement choices effects:Again, bad choices can lead to performance loss
9393
Allocation managerAllocation manager
Definition of allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation
Definition of allocation manager structure:Empty space manager
Complete space Heuristic selection
FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )
94
Most relevant worksMost relevant works
Maintain complete information on empty space:KAMER:
Keep All Maximally Empty RectanglesApply a general fitting strategy
CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy
Heuristically prune part of the information:KNER:
Keep Non-overlapping Empty RectanglesApply a general fitting strategy
2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structure
Apply (exclusively) a general fitting strategy94
95
Evaluation and Proposed Evaluation and Proposed ApproachApproach
Proposed ApproachHeuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable systemFitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)95
High placement quality => high complexityLowest compl. => no focused fitting (bad especially for routing)
9696
Structure of the allocation managerStructure of the allocation manager
Task, defined by:Arrival time, ASAP, (ALAP), H, W, Latency, Communicating TasksHosted in a queue which also adds a pointer to the rectangle where it is placed
Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)
Rectangle, defined by:X, Y, H, WInitially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
9797
The Placement AlgorithmThe Placement Algorithm
98
Experimental ResultsExperimental Results
Benchmark of 100 randomly generated tasks:Size (5% to 25% of FPGA), randomly interconnected
Execution time: 3x less than CUR, close to KNERCommunication cost: 3x less than KNER, close to CURTask Rejection Rate: all solutions quite close
98
99
Future WorkFuture Work
Apply the proposed solution to self reconfiguration:
Adapt the algorithm to run on the internal processorCreate a validation reconfigurable architectureIntegrate the architecture with relocation
Tune the algorithm to improve results:Experiment techniques to reduce TRRTry to optimize the code to have an algorithm with lower running time
99
100100
Questions?Questions?
101101
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
POLITECNICO DI MILANO
Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems
Marco [email protected]
103103
OutlineOutline
IntroductionProblem descriptionProject Goals
Project in detailsPhases Results
What’s next
104
ProblemProblem DescriptionDescription
Self Dynamical Runtime 2D ReconfigurationXilinx Virtex-4 and Virtex-5
Relocation, different solutionsSoftware (BAnMat, PARBIT)Hardware (REPLICA, BiRF)
We chose an hardware solutionBiRF Square
104
105
Project GoalsProject Goals
Study of the new FPGA FamiliesExamination of Xilinx documentation on V4 and V5
Analysis of the new bitstream structureGeneration of V4 and V5 bitstream
Development of the new version of BiRFImplementationValidation
105
106
PhasesPhases
First Phase: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:
Xilinx documentation examinationV4 & V5 bitstream structure analysis
Second Phase: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:
Implementation of BiRF SquareSynthesis
Third Phase: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:
Verification & Validation
106
107107
Frame AddressingFrame Addressing
New Frame Addressing:Possibility of addressing rows and columns
108108
New ParserNew Parser
109
CRC CalculationCRC Calculation
Particular CRC value, used by Xilinx tools
Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation
An optimized algorithm has been used
109
110
Synthesis resultsSynthesis results
On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: maxfrequency of 290Mhz
110
111111
Target DeviceTarget Device
112112
Validation ArchitectureValidation Architecture
113
Results Results 1/21/2
BiRF SquarePermitsto apply relocation in a self partially and dynamically 2D-reconfigurable systemThe occupation ratio is relatively smallFrequency more than acceptableReduction of internal memory requirements
113
114
Results Results 2/22/2
Throughput of 7,3 MB/s:
A total configuration file size is about 1 MBConsidering an architecture:
1/3 of the area as fixed part 2/3 as reconfigurable part with 6 slots
With such hypothesisSize of a partial bitstream will be about 110 KBRelocation time of about 15 ms
114
115
What’s NextWhat’s Next
Future improvements:Direct access to the memory (DMA)
Direct manipulation of the bitstreamPortability
Integration with ICAPElimination of the relocation overhead Relocation time << reconfiguration time
The final goal:Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation
115
116116
QuestionsQuestions
117117
What’s nextWhat’s next
DRESDDReAMS
Alessandro PanellaMatteo Murgida
Operating SystemIvan Beretta
Design FlowAntonio Piazzi
PolarisMassimo MorandiMarco Novati
HLRMarco Maggioni
119
OutlineOutline
IntroductionProblem description
Project Goals
State of the Art
Project in detailsContributions
HLR workflowGraphGenIsomorphClusteringSimpleLatencySalomone
Results
What’s next
120
Problem DescriptionProblem Description
What is High Level Reconfiguration...?Theoretical approach to dynamic reconfiguration...
Vision...Reconfigurability has many advantages...
Mission...Exploit these advantages to obtain best performance...
How...?Adapting a system to this execution model managing complexity and drawbacks...
121
Project GoalProject Goal
Create a complete HLR workflow...From a real system specification to its reconfigurable execution model...
Define precise interfaces for each phase...To promote flexibility and future HLR researchs...To develop a complete toolchain...
Apply some algorithms regarding reconfigurability...To reuse past works...
122
State of ArtState of Art
Present of HLR...Some ideas/concepts regarding clustering and scheduling...... but no a complete and well-defined workflow.... but a lot of work to do.
System specifications analysis...PandA HW/SW framework to promote new ideas...Dynamic Reconfigurability can be considered as a branch of this research...
123
ContributionContribution
Dynamic library loading system...Embedded into GNU compilation tool-chain
Porting of PandA libraries into Earendil...Suitable for future analysis...
HLR tools deployed onto Earendil...Cover each step of workflow...
124
Gcc Frontend PartitioningAlgorithmPandA
HLR workflowHLR workflow
Clustering (with Analysis)...1st Month
Coloring...2nd Month
Scheduling...3rd Month Scheduling
Algorithm
ClusteredGraph
MetricEvaluation
ReconfigurableClustered
Graph
AreaLatency
Rec. TimePower
Target Architecture
Database
125
GraphGenGraphGen
GraphGen is the first step of the HLR toolchain...Takes as input a system specification or an algorithm...Produces a graph (CFG/BB/DFG/SDG)
Perfoms high level analysis step...Transforms the system description (C/C++/SystemC) to a representation suitable for further elaboration...Based on GCC and compiler theory...Uses PandA 0.4 funtionalities to produce a statement level graph...
126
IsomorphClusteringIsomorphClustering
IsomorphClusteing follows GraphGen in the HLR toolchain...
Takes as input a statement level graph...Produces a clustered graph...
Clustering phase...Aggregates nodes into configuration (basic unit of reconfigurable execution)...Based on isomorphism, tries to find different instances of isomorph templates...We can also apply differents algorithms...
127
SimpleLatencySimpleLatency
SimpleLatency follows IsomorphClusteing in the HLR toolchain...
Takes as input a clustered graph...Adds latency information at each configuration...Produces a reconfigurable clustered graph with latency evaluations...
Coloring...“Colors” each cluster with usefull evalution for reconfigurability...Based on clusters internal critical path...Different metric for different architectures...Connects HLR with real architectural parameters...
128
SalomoneSalomone
Salomone is the last step in the HLR toolchain...Takes as input a reconfigurable clustered graph...Produces a schedule on an abstract reconfigurable architecture...
Scheduling...It's considered the core task of HLR...Maps each configuration on an area portion...Adapts the system execution to reconfigurable model...Based onto graph coloring algorithm...
129
Results Results 1/31/3
Based onto AES encryption...
Templates found with Isomorph CLustering...Execution time... 123.94 s
130
Results Results 2/32/3
Salomone adapting and coloring...Execution time... 113.55 s
131
Results Results 3/33/3
Final Scheduling...
132
What's nextWhat's next
Heuristich implementation for Salomone...To improve result quality in term of number of area portions...
A new metric for area/latency...Based on RTL logical synthesis evaluations...
Introduce feedback into HLR workflow...Based on schedule evaluation...
New clustering and scheduling algorithms...Such as Napoleon...
133
QuestionsQuestions