HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

High Performance Processors and

Systems PdM – UIC joint master 2007PdM – UIC joint master 2007

Instructor: Prof. Donatella SciutoInstructor: Prof. Donatella Sciuto

HPPS @ PdM – June 2007HPPS @ PdM – June 2007

2

General OutlineGeneral Outline

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni


DRESDDRESD in a Nutshell in a NutshellDynamic Reconfigurability in Embedded System

Design

DRESD @ PdM – June 2007DRESD @ PdM – June 2007

4

OutlineOutline

ReconfigurationMotivationsBasic DefinitionSoC

5

MotivationsMotivations

Increasing need for behavioral flexibility in embedded systems design

Support of new standards, e.g. in media processingAddition of new features

Applications too large to fit on the device all at once

Speedup the overall computation of the final system

6

ReconfigurationReconfiguration

The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation.

Gerald Estrin, 1960

7

SoC ReconfigurationSoC Reconfiguration

fix

Partial TotalEmbedded

8

Different Scenarios...Different Scenarios...

Single Device Distributed System

9

What’s nextWhat’s next

DRESDDReAMS





HLRMarco Maggioni


DDynamicynamic Re Reconfigurabilityconfigurability AAppliedpplied toto M Multi-FPGAulti-FPGA

SSystemsystems

DReAMS

DReAMSDReAMS

Dynamic ReconfigurabilityApplied to Multi-

FPGA SystemsBranch of DRESD projectInherits architectures and tools

Automatic workflow from VHDL system description to FPGA implementation

VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs

DReAMS


Multi-FPGA PartitioningMulti-FPGA Partitioning

Alessandro [email protected]

13

OutlineOutline

Problem description

Project goals and contributions

Project phases

What is partitioning?

Existing approaches

Going deep into the problem

SpartAThe frameworkThe ideaThe algorithm

Experimental resultsFuture work

14

Problem descriptionProblem description

Multi-FPGA - RATIONALELarge designs do not fit into a single chipHigh performance parallelized applicationsOur case: apply dynamic reconfigurability

Need to break the initial design into several blocks

One block corresponds to a single FPGA chipWhich inputs/outputs?Which objectives?Which techinques?

15

Project goals and Project goals and contributionscontributions

Analyze existing approachesObtain a deep knowledge of this -well explored- fieldExtract basic ideas for a new approachObtain some terms of comparison

Define precisely which problem(s) we cope withContextualize the problemFocus on our needs

Develop a new solutionTheoretical backgroundImplementation and evaluation

16

Project phasesProject phases

First Phase [15th March – 12th April]Documentation: presentation (12/4), reportGoals:

Analysis of the state of the artProduce some hints on a new approach

Second Phase [13th April – 17th May]Documentation: presentation (17/5), reportGoals:

Precise definition of the problemPropose a new solution

Third Phase [18th May – 14th June]Documentation: presentation (14/6), final reportGoal

Implementation and evaluation of the proposed solution

17

What is partitioning?What is partitioning?

GoalDivide a set of interrelated objects into a set of subsetsOptimize a specific objective(s)

K-way partitioning• Given a graph G=(V,E), partition it into k subsets

V1...Vk such that their intersection is empty and their union = V.

• Balance constraint: |Vi| ≈ |V|/k

Aims at minimizing (or maximizing) an objective function

Edge-cutOther objectives

In general: NP-completeSeveral heuristics that provide good results have been developed

18

Existing approaches - a glanceExisting approaches - a glance

Traditional methodsKernighan – Lin and Fiduccia – Mattheyses heuristics

Iterative-improvement algorithmsBegins with an initial partition and iteratively improve itO(n3) complexity

Iterative algorithmsGeneticSimulated annealing

Multilevel algorithmsClustering -> Initial partitioning -> RefiningMeTIS/hMETIS suite: best current results for large flattened graphs partitioning

19

Going deeper into the problemGoing deeper into the problem

Two kinds of multi-FPGA partitionTopology-aware

Architecture topology is an inputNo optimization of the no. of FPGAs neededMain task: association between the (larger) system graph and the (smaller) architectural graph

Topology-freeArchitecture topology is not providedInput: dimension and communication features of FPGAsMinimization of the number of FPGAsPlace and route after partitioning

At the moment, we deal with the Topology-free problem

20

SPartA: the frameworkSPartA: the framework

Input: VHDL system description

Output: several VHDL files, one for each block (FPGA)

Three main phases:Extract design from VHDL description“Real” partitioning phase (core)Build VHDL files

21

SPartA: the ideaSPartA: the ideaStructural approach

Fully exploits the design hierarchyModules can be treated as single blocksBases for expansions toward dynamic reconfigurability

ObjectivesMinimize cutsizeMinimized the number of used FPGAsPreserving module integrity

22

SPartA: the algorithm SPartA: the algorithm 1/21/2

Recursive algorithm (deals with trees)Starts from TOP nodePrecondition

No leaves with dimension > FPGA sizeAt every moment, a node can be:

COVERED, UNCOVERED or PARTIALLY COVERED

Stop condition• Node TOP is COVERED

23

SPartA: the algorithm SPartA: the algorithm 2/22/2

OPEN ISSUE: Selecting the first node to be inserted into an empty partition

Random nodeNode with overall max communicationNode with max communication with its siblings

24

Results Results 2/22/2

Complexity: exponential, due to the recursive nature of the algorithmExecution time however low (tens of seconds for a reasonable large design)EXAMPLE

ORIGINAL TREE PARTITIONED TREE

25


Evaluation metricsEDGECUT, FILLING and SPLITS

Evaluation of the three policies for node selection18 different trees of varying size

26


27

Future workFuture work

Algorithm improvementBalancing of last partitionFirst node selection policiesMore refined “score” function for selecting node

Use closeness metrics

Comparisons with existing algorithms

ExpansionSpartA framework developmentTopology-aware partitioning

28

The endThe end

ANY QUESTIONS?

29


DRESDDReAMS





HLRMarco Maggioni


ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition

Matteo [email protected]

OutlineOutline

IntroductionProblem descriptionProject GoalsState of the Art

Project in detailsContributionsPhasesResults

What’s next

32

Problem DescriptionProblem Description

Architectural description of a distributed FPGAs environment3 layers architecture

33

Project GoalsProject Goals

Design the architecture of the most generic distributed system

Node definitionInterface definitionCommunication channel definition

Design a communication protocolEssential protocolInterrupt based protocolTimeout improvement

34

State of the ArtState of the Art

CONFigurable ElecTronic TIssue (CONFETTI) by EPFLCellular based architecturePROs: high degree of parallelism, high computational powerCONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort

Splash 2 by IDA Supercomputing CenterArchitecture composed by a Sun Sparcstation host, an interface board and “Splash Array”s boardsPROs: again high parallelism and powerCONs: a central host coordinates the computational units, no fault tollerance, no flexibility

35

ContributionsContributions

The proposed architecture:

Allows several Spartan-3 Starter Boards to communicate and exchange data

It is portable to different FPGAs with minimum effort

It is the basic infrastructure that will allow external partial dynamic reconfiguration

36

Project PhasesProject Phases

First Phase, time window: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:

Digilent Spartan-3 Starter Board studyBoards connection

Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Communication between two Microblaze soft-processorsGPIO integration in the architecture

Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:

Interrupt handling, timeout handlingSimple application as example

37

Board StudyBoard Study

How to use resources like switches, leds and connectors in the boardHow to map an IP-Core port with a physical pin of the boardChoice of the A2 Expansion Connector to connect two boards

38

Microblaze CommunicationMicroblaze Communication

Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow

39

GPIO InsertionGPIO Insertion

Higher architecture portability through the use of the GPIO IP-Core.Higher architecture portability through the use of the GPIO IP-Core

40

Interrupt Controller InsertionInterrupt Controller Insertion

Communication protocol improvement by interrupt handling to prevent processor from busy waiting Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling

41

TimeoutTimeout

Malfunctioning due to interference on the communication channel lead to deadlocks

Communication protocol is not reliable at all

Counter implementation, including the driver used by the processor to lower down raised interrupts

Development of a simple application to verify to correctness of the proposed approach

42

ResultsResults

A short Demo ...

43

Future WorkFuture Work

Apply the proposed approach to external partial dynamic reconfiguration

Develop a co-simulation framework based on the VHDL/SystemC descriptions of distributed systems

Receive as input the VHDL description of the systemBuild the VHDL description for every nodeCreate the SystemC stub to allow inter node communicationDescribe the communication in SystemCCo-simulate the VHDL / SystemC description

QuestionsQuestions

45


DRESDDReAMS





HLRMarco Maggioni


OOperatingperating Sy System support stem support forfor R Reconfeconfiigurablegurable S SoCoC


Development of an OS Development of an OS architecture-independent architecture-independent

layer for dynamic layer for dynamic reconfigurationreconfiguration

Ivan [email protected]

4848

OutlineOutline



What’s next

49


Need for an operating system support on Reconfigurable SoCs

Simplified software development processImproved code portability

Lack of support for dynamic reconfigurable architectures

Specific solutions for specific architectures

Need for an architecture-independent abstraction layer

49

50

Project GoalProject Goal

Primary goals:Analysis of the State of the ArtDefinition of the new intermediate layerPhysical implementation

Specific goals:Study of the solutions developed inside the DRESD group Comparison between existing solutionsRecovery of on of the two implementationsHardware architectures generation using up-to-date tools on Xilinx Virtex II – Pro VP7

50

51

State of the ArtState of the Art

Caronte implementation (Alberto Donato, 2005)Two kernel modules

ICAP deivice driverIP-Core manager (IPCM)

51

52

State of the Art (cont’d)State of the Art (cont’d)

YaRA implementation (Vincenzo Rana, 2006)Multi-layered structure

Four modules: Reconfiguration controller driver, MAC, LOL, Reconfiguration LibraryROTFL architecture

52

53


Limits of existing implementationsLack of portability

E.g. YaRA solution implemented on RAPTOR2000

Reconfiguration process details visible from userspace

Definition of an architecture independent middleware

Improved portabilityIt works on different hardware architecturesIt works with different Linux distribution

Opportunity to optimize latencies53

54

PhasesPhases

First phase: Layer definitionGoal: Factorization of common features

Boundaries of the new middlewareMapping of existing solutions on the functionalities

Motivation: Provide guidelines for actual implementation

Second phase: Implementation recoveryGoal: Recovery of bootstrap process and kernel imagesMotivation: Full recovery of Caronte solution

Third phase: Architectures generationGoal: Synthesis of hardware architectures using up-to-date Xilinx tools and coresMotivation: Synthesis of hardware architectures using up-to-date Xilinx tools54

5555

First Phase: Layer definitionFirst Phase: Layer definition

Definition of new layer boundariesFactorization of existing featuresMapping of the required functionalities on existing implementations

Feature Caronte Solution YaRA Solution

Reconfiguration controller support

ICAP device driverReconfiguration Controller Driver

Dynamic address space assignment

IPCM Module MAC module

Dynamic device registration and driver

loadingIPCM Module LOL module

APIDirect interaction

with modulesReconfiguration

library

Module management (caching, placement...)

Not implemented ROTFL architectureLegend: ● = Both hardware and software ● = Hardware independent

56

Second Phase: Implementation RecoverySecond Phase: Implementation Recovery

Bootstrap process from flash memory

56

16 MB Flash0xe4000000

0xe42FFFFF

...

...

0xe4F00000

0xe4F80000

64 MB DDR SDRAM0x000000

00

......

0xe4FFFFFF0x03FFFFFF

0x00800000

...

BRAM

PowerPC

FPGABootloader

Bootmanager

Kernel and RAMDisk Image1

2

3

4 5

6

57

Second Phase: Implementation Recovery Second Phase: Implementation Recovery (cont’d)(cont’d)

Several issuesNo bootmanager nor linux kernel on flash memory at the beginningFlash memory seen as read-only memory at runtimeNeed for an ad-hoc solution

Avmon command line interfaceExecuted from DDR SDRAM memoryFTP transfert of bootmanager and flash programmingAlso useful for kernel download

Kernel executable imageKernel image built using a cross-compilerICAP and IPCM modules loaded at runtime

57

58

Third Phase: Architecture generationThird Phase: Architecture generation

Hardware architecture used in Second Phase no longer useful

Synthesized with Xilinx ISE and EDK 6.1

Same hardware structure realized with updated cores and recend tool versions

Synthesis with Xilinx ISE and EDK 7.1Synthesis with Xilinx ISE and EDK 9.1

Lack of device driver support and documentation to configure newest cores

58

59

Results: Implementation Results: Implementation RecoveryRecovery

Linux Bootstrap from flash memory

59

60

Results: Implementation Results: Implementation RecoveryRecovery

Design summary for hardware architectures on Xilinx Virtex II – Pro VP7

Two main limitationsEthernet controllerNecessity of a top-level design

Design too large for module-based reconfiguration60

Xilinx ISE/EDK 7.1 Xilinx ISE/EDK 9.1

Resource

Used Available

% Used Available

%

Slices 4926 4928 99% 5318 4928 107%

Flip-Flops

52179856 52%

57249856 58%

4-in LUTs

69749856 70%

69939856 70%

61


Device driver updates to support newest architectures

Intermediate layer implementationOpportunity to add some additional features

Reconfiguration scheduler

Opportunity to define a common device driver interface to simplify the creation of a new driver by the use

Integration of the middleware and the operating system support in a complete design flow

61

6262

QuestionsQuestions

63


DRESDDReAMS





HLRMarco Maggioni


Design FLowDesign FLow

Antonio [email protected]

6565

OutlineOutline



What’s next

66


66

• User has to spread his attention on many problems, some of this related with the implementation of the design.

• Often users could don’t know anything about reconfigurable architecture generation and they haven’t.

67


67

• New design methodology tailored to support partial dynamic reconfigurable architecture

• Definition and implememtantion of design framework able to

• Support different design paradigms i.e. Xilinx Module Based, Xilinx EAPR

• Hide the dirty work (due to the recofiguration) to the application designer

• Support different architectural solutions i.e. different communication infrastructure IBM CoreConnect or Wishbone

68


68

• With our frame work all user (novice and not) may be able to develop and debug their functionality through a reconfigurable architecture without analyze all problems related with that develop methodology

69

PhasesPhases

69

•1st phase (15 March – 15 April): Budgeting

•Study of the state of the art

•2nd phase(15 April – 15 May): Realization phase

•Construction of the entire frame work based on previously separated tools

•Implementation of a innovative work flow

•3rd phase (15 May – 15 June): Project’s validation

• Definition of a new communication infrastructure and transfer protocol for the reconfigurable part

• Verify the integration of the new infrastructure in the project

70

First PhaseFirst Phase

70

•Study of the state of the art

• Standard reconfigurable design flow

• Xilinx Modlue Based and EAPR

• Caronte Design Flow

• EDK-based architecture

71

SelSelf Reconfigurable f Reconfigurable ArchitectureArchitecture

71

72

Second Phase Second Phase 1/41/4

72

Costruction of the entire frame work based on prevoiusly separated tools

User has to focus his attention only on the develop of the IBM core-connect architecture and on writing modules which implement his functionality

SYSTEM.VHD contains all information about the IBM core-connect architecture

73


73

ArchGen take the system.vhd file and process the contained architecture and translate that static architecture in a dynamic one

FIX.VHD contains the instantiations of the processors (one or more) and all the components presented in the IBM core-connect architecture

TOP.VHD contains the instantiations of the fix component and the information about the communication infrastructure

74


74

COMiC generate an NCD file which contains the information about the communication infrastructure and an XDL file which contains the same information in text mode

75


75

At this point we have only to collect all the information we need and so, through a parser we insert those into a new top.vhd which will be our fix part of the architecture, at this point we have only to manage the reconfigurable modules written by the user

76

Third Phase Third Phase 1/31/3

76

An OPB bus based on 3-state buffer used to link one or more modules to the fix part (created with ISE)

Definition of a new communication infrastructure and transfer protocol for the reconfigurable part

77


77

Use ncd2xdl converter to obtain an xdl file which contains all parameters of our bus

78


78

Perfect integration in our process, we can use all bus type to connect fix and reconfigurable part

Verify the integration of the new infrastructure in the project

79

ResultsResults

79

• That frame work answer to the need of automation presented from the novice user and help, generally, all the users that they head a low time to market.

80


80

• Our idea for future work is to schedule a one or two work day to patch some bugs presents in the project and to adjust the output of COMiC which has to create an OPB replay bus.

8181

Questions?Questions?

8282


DRESDDReAMS

Matteo MurgidaAlessandro Panella




HLRMarco Maggioni


PolarisPolaris

8484

PolarisPolaris

Create an integrated HW/SW system to manage 2D reconfiguration

SW side:Maintain information on FPGA statusDecide of how to efficiently allocate tasks

HW side:Provide support for effective task allocationPerform 2D bitstream relocation

84

85

Management of 2D Management of 2D Reconfiguration in a Reconfiguration in a

Reconfigurable SystemReconfigurable System

Massimo [email protected]

8686

OutlineOutline

IntroductionProblem description Project Goals and Contributions

Project in detailsPhasesResults

Future Work

87


New Generation of FPGAsVirtex-4 and Virtex-5Allow bi-dimensional reconfiguration

This permits to:Better exploit reconfigurable areaObtain modules performance optimizations

More complex management:Handle one more degree of freedomAvoid more fragmentationPerform good placement choices to keep low TRRKeep acceptable intra-module routing paths

87

88

Project Goals and Project Goals and ContributionsContributions

Analyze effects of 2D reconfigurationNew advantagesNew problems

Examine possible solutions to new problemsExplore literature to find promising ideasEvaluate those solutions in various scenarios

Propose a new solutionCombining ideas from literature with new onesObtaining good cost-quality tradeoff

88

89

Project PhasesProject PhasesFirst Phase, time window: 15th March – 12th April

Documentation: prj presentation (12/4), prj reportGoals:

General analysis of 2D reconfigurationDetailed description of the new problems

Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Definition of desired features for a solutionAnalysis and evaluation of existing solutions

Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoal: propose a new combined solution to effectively handle problems of 2D reconfiguration

89

90

Setting and Advantages Setting and Advantages DefinitionDefinition

Definition of the setting:2D self partial dynamical run-time reconfiguration

Analysis of the advantages of 2D ReconfigurationIn area usage and performance

90

9191

2D Fragmentation Problem2D Fragmentation Problem

Analysis of the 2D-fragmentation problemArea generally more fragmentedCan nullify the area optimizations obtained

9292

Placement DecisionsPlacement Decisions

Analysis of 2D placement choices effects:Again, bad choices can lead to performance loss

9393

Allocation managerAllocation manager

Definition of allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation

Definition of allocation manager structure:Empty space manager

Complete space Heuristic selection

FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )

94

Most relevant worksMost relevant works

Maintain complete information on empty space:KAMER:

Keep All Maximally Empty RectanglesApply a general fitting strategy

CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy

Heuristically prune part of the information:KNER:

Keep Non-overlapping Empty RectanglesApply a general fitting strategy

2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structure

Apply (exclusively) a general fitting strategy94

95

Evaluation and Proposed Evaluation and Proposed ApproachApproach

Proposed ApproachHeuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable systemFitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)95

High placement quality => high complexityLowest compl. => no focused fitting (bad especially for routing)

9696

Structure of the allocation managerStructure of the allocation manager

Task, defined by:Arrival time, ASAP, (ALAP), H, W, Latency, Communicating TasksHosted in a queue which also adds a pointer to the rectangle where it is placed

Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)

Rectangle, defined by:X, Y, H, WInitially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols

9797

The Placement AlgorithmThe Placement Algorithm

98

Experimental ResultsExperimental Results

Benchmark of 100 randomly generated tasks:Size (5% to 25% of FPGA), randomly interconnected

Execution time: 3x less than CUR, close to KNERCommunication cost: 3x less than KNER, close to CURTask Rejection Rate: all solutions quite close

98

99

Future WorkFuture Work

Apply the proposed solution to self reconfiguration:

Adapt the algorithm to run on the internal processorCreate a validation reconfigurable architectureIntegrate the architecture with relocation

Tune the algorithm to improve results:Experiment techniques to reduce TRRTry to optimize the code to have an algorithm with lower running time

99

100100

Questions?Questions?

101101


DRESDDReAMS





HLRMarco Maggioni


Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems

Marco [email protected]

103103

OutlineOutline

IntroductionProblem descriptionProject Goals

Project in detailsPhases Results

What’s next

104

ProblemProblem DescriptionDescription

Self Dynamical Runtime 2D ReconfigurationXilinx Virtex-4 and Virtex-5

Relocation, different solutionsSoftware (BAnMat, PARBIT)Hardware (REPLICA, BiRF)

We chose an hardware solutionBiRF Square

104

105


Study of the new FPGA FamiliesExamination of Xilinx documentation on V4 and V5

Analysis of the new bitstream structureGeneration of V4 and V5 bitstream

Development of the new version of BiRFImplementationValidation

105

106

PhasesPhases

First Phase: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:

Xilinx documentation examinationV4 & V5 bitstream structure analysis

Second Phase: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Implementation of BiRF SquareSynthesis

Third Phase: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:

Verification & Validation

106

107107

Frame AddressingFrame Addressing

New Frame Addressing:Possibility of addressing rows and columns

108108

New ParserNew Parser

109

CRC CalculationCRC Calculation

Particular CRC value, used by Xilinx tools

Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation

An optimized algorithm has been used

109

110

Synthesis resultsSynthesis results

On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: maxfrequency of 290Mhz

110

111111

Target DeviceTarget Device

112112

Validation ArchitectureValidation Architecture

113


BiRF SquarePermitsto apply relocation in a self partially and dynamically 2D-reconfigurable systemThe occupation ratio is relatively smallFrequency more than acceptableReduction of internal memory requirements

113

114


Throughput of 7,3 MB/s:

A total configuration file size is about 1 MBConsidering an architecture:

1/3 of the area as fixed part 2/3 as reconfigurable part with 6 slots

With such hypothesisSize of a partial bitstream will be about 110 KBRelocation time of about 15 ms

114

115

What’s NextWhat’s Next

Future improvements:Direct access to the memory (DMA)

Direct manipulation of the bitstreamPortability

Integration with ICAPElimination of the relocation overhead Relocation time << reconfiguration time

The final goal:Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation

115

116116

QuestionsQuestions

117117


DRESDDReAMS





HLRMarco Maggioni


HHighigh L Levelevel RReconfigurationeconfiguration

Marco [email protected]

119

OutlineOutline

IntroductionProblem description

Project Goals

State of the Art

Project in detailsContributions

HLR workflowGraphGenIsomorphClusteringSimpleLatencySalomone

Results

What’s next

120


What is High Level Reconfiguration...?Theoretical approach to dynamic reconfiguration...

Vision...Reconfigurability has many advantages...

Mission...Exploit these advantages to obtain best performance...

How...?Adapting a system to this execution model managing complexity and drawbacks...

121

Project GoalProject Goal

Create a complete HLR workflow...From a real system specification to its reconfigurable execution model...

Define precise interfaces for each phase...To promote flexibility and future HLR researchs...To develop a complete toolchain...

Apply some algorithms regarding reconfigurability...To reuse past works...

122

State of ArtState of Art

Present of HLR...Some ideas/concepts regarding clustering and scheduling...... but no a complete and well-defined workflow.... but a lot of work to do.

System specifications analysis...PandA HW/SW framework to promote new ideas...Dynamic Reconfigurability can be considered as a branch of this research...

123

ContributionContribution

Dynamic library loading system...Embedded into GNU compilation tool-chain

Porting of PandA libraries into Earendil...Suitable for future analysis...

HLR tools deployed onto Earendil...Cover each step of workflow...

124

Gcc Frontend PartitioningAlgorithmPandA

HLR workflowHLR workflow

Clustering (with Analysis)...1st Month

Coloring...2nd Month

Scheduling...3rd Month Scheduling

Algorithm

ClusteredGraph

MetricEvaluation

ReconfigurableClustered

Graph

AreaLatency

Rec. TimePower

Target Architecture

Database

125

GraphGenGraphGen

GraphGen is the first step of the HLR toolchain...Takes as input a system specification or an algorithm...Produces a graph (CFG/BB/DFG/SDG)

Perfoms high level analysis step...Transforms the system description (C/C++/SystemC) to a representation suitable for further elaboration...Based on GCC and compiler theory...Uses PandA 0.4 funtionalities to produce a statement level graph...

126

IsomorphClusteringIsomorphClustering

IsomorphClusteing follows GraphGen in the HLR toolchain...

Takes as input a statement level graph...Produces a clustered graph...

Clustering phase...Aggregates nodes into configuration (basic unit of reconfigurable execution)...Based on isomorphism, tries to find different instances of isomorph templates...We can also apply differents algorithms...

127

SimpleLatencySimpleLatency

SimpleLatency follows IsomorphClusteing in the HLR toolchain...

Takes as input a clustered graph...Adds latency information at each configuration...Produces a reconfigurable clustered graph with latency evaluations...

Coloring...“Colors” each cluster with usefull evalution for reconfigurability...Based on clusters internal critical path...Different metric for different architectures...Connects HLR with real architectural parameters...

128

SalomoneSalomone

Salomone is the last step in the HLR toolchain...Takes as input a reconfigurable clustered graph...Produces a schedule on an abstract reconfigurable architecture...

Scheduling...It's considered the core task of HLR...Maps each configuration on an area portion...Adapts the system execution to reconfigurable model...Based onto graph coloring algorithm...

129


Based onto AES encryption...

Templates found with Isomorph CLustering...Execution time... 123.94 s

130


Salomone adapting and coloring...Execution time... 113.55 s

131


Final Scheduling...

132

What's nextWhat's next

Heuristich implementation for Salomone...To improve result quality in term of number of area portions...

A new metric for area/latency...Based on RTL logical synthesis evaluations...

Introduce feedback into HLR workflow...Based on schedule evaluation...

New clustering and scheduling algorithms...Such as Napoleon...

133

QuestionsQuestions

HPPS - Final - 06/14/2007

Technology

Transcript of HPPS - Final - 06/14/2007