HPPS - Final - 06/14/2007

133
POLITECNICO DI MILANO High Performance Processors and Systems PdM – UIC joint master 2007 PdM – UIC joint master 2007 Instructor: Prof. Donatella Sciuto Instructor: Prof. Donatella Sciuto HPPS @ PdM – June 2007 HPPS @ PdM – June 2007

Transcript of HPPS - Final - 06/14/2007

Page 1: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

High Performance Processors and

Systems PdM – UIC joint master 2007PdM – UIC joint master 2007

Instructor: Prof. Donatella SciutoInstructor: Prof. Donatella Sciuto

HPPS @ PdM – June 2007HPPS @ PdM – June 2007

Page 2: HPPS - Final - 06/14/2007

2

General OutlineGeneral Outline

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 3: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

DRESDDRESD in a Nutshell in a NutshellDynamic Reconfigurability in Embedded System

Design

DRESD @ PdM – June 2007DRESD @ PdM – June 2007

Page 4: HPPS - Final - 06/14/2007

4

OutlineOutline

ReconfigurationMotivationsBasic DefinitionSoC

Page 5: HPPS - Final - 06/14/2007

5

MotivationsMotivations

Increasing need for behavioral flexibility in embedded systems design

Support of new standards, e.g. in media processingAddition of new features

Applications too large to fit on the device all at once

Speedup the overall computation of the final system

Page 6: HPPS - Final - 06/14/2007

6

ReconfigurationReconfiguration

The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation.

Gerald Estrin, 1960

Page 7: HPPS - Final - 06/14/2007

7

SoC ReconfigurationSoC Reconfiguration

fix

Partial TotalEmbedded

Page 8: HPPS - Final - 06/14/2007

8

Different Scenarios...Different Scenarios...

Single Device Distributed System

Page 9: HPPS - Final - 06/14/2007

9

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 10: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

DDynamicynamic Re Reconfigurabilityconfigurability AAppliedpplied toto M Multi-FPGAulti-FPGA

SSystemsystems

DReAMS

Page 11: HPPS - Final - 06/14/2007

DReAMSDReAMS

Dynamic ReconfigurabilityApplied to Multi-

FPGA SystemsBranch of DRESD projectInherits architectures and tools

Automatic workflow from VHDL system description to FPGA implementation

VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs

DReAMS

Page 12: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

Multi-FPGA PartitioningMulti-FPGA Partitioning

Alessandro [email protected]

Page 13: HPPS - Final - 06/14/2007

13

OutlineOutline

Problem description

Project goals and contributions

Project phases

What is partitioning?

Existing approaches

Going deep into the problem

SpartAThe frameworkThe ideaThe algorithm

Experimental resultsFuture work

Page 14: HPPS - Final - 06/14/2007

14

Problem descriptionProblem description

Multi-FPGA - RATIONALELarge designs do not fit into a single chipHigh performance parallelized applicationsOur case: apply dynamic reconfigurability

Need to break the initial design into several blocks

One block corresponds to a single FPGA chipWhich inputs/outputs?Which objectives?Which techinques?

Page 15: HPPS - Final - 06/14/2007

15

Project goals and Project goals and contributionscontributions

Analyze existing approachesObtain a deep knowledge of this -well explored- fieldExtract basic ideas for a new approachObtain some terms of comparison

Define precisely which problem(s) we cope withContextualize the problemFocus on our needs

Develop a new solutionTheoretical backgroundImplementation and evaluation

Page 16: HPPS - Final - 06/14/2007

16

Project phasesProject phases

First Phase [15th March – 12th April]Documentation: presentation (12/4), reportGoals:

Analysis of the state of the artProduce some hints on a new approach

Second Phase [13th April – 17th May]Documentation: presentation (17/5), reportGoals:

Precise definition of the problemPropose a new solution

Third Phase [18th May – 14th June]Documentation: presentation (14/6), final reportGoal

Implementation and evaluation of the proposed solution

Page 17: HPPS - Final - 06/14/2007

17

What is partitioning?What is partitioning?

GoalDivide a set of interrelated objects into a set of subsetsOptimize a specific objective(s)

K-way partitioning• Given a graph G=(V,E), partition it into k subsets

V1...Vk such that their intersection is empty and their union = V.

• Balance constraint: |Vi| ≈ |V|/k

Aims at minimizing (or maximizing) an objective function

Edge-cutOther objectives

In general: NP-completeSeveral heuristics that provide good results have been developed

Page 18: HPPS - Final - 06/14/2007

18

Existing approaches - a glanceExisting approaches - a glance

Traditional methodsKernighan – Lin and Fiduccia – Mattheyses heuristics

Iterative-improvement algorithmsBegins with an initial partition and iteratively improve itO(n3) complexity

Iterative algorithmsGeneticSimulated annealing

Multilevel algorithmsClustering -> Initial partitioning -> RefiningMeTIS/hMETIS suite: best current results for large flattened graphs partitioning

Page 19: HPPS - Final - 06/14/2007

19

Going deeper into the problemGoing deeper into the problem

Two kinds of multi-FPGA partitionTopology-aware

Architecture topology is an inputNo optimization of the no. of FPGAs neededMain task: association between the (larger) system graph and the (smaller) architectural graph

Topology-freeArchitecture topology is not providedInput: dimension and communication features of FPGAsMinimization of the number of FPGAsPlace and route after partitioning

At the moment, we deal with the Topology-free problem

Page 20: HPPS - Final - 06/14/2007

20

SPartA: the frameworkSPartA: the framework

Input: VHDL system description

Output: several VHDL files, one for each block (FPGA)

Three main phases:Extract design from VHDL description“Real” partitioning phase (core)Build VHDL files

Page 21: HPPS - Final - 06/14/2007

21

SPartA: the ideaSPartA: the ideaStructural approach

Fully exploits the design hierarchyModules can be treated as single blocksBases for expansions toward dynamic reconfigurability

ObjectivesMinimize cutsizeMinimized the number of used FPGAsPreserving module integrity

Page 22: HPPS - Final - 06/14/2007

22

SPartA: the algorithm SPartA: the algorithm 1/21/2

Recursive algorithm (deals with trees)Starts from TOP nodePrecondition

No leaves with dimension > FPGA sizeAt every moment, a node can be:

COVERED, UNCOVERED or PARTIALLY COVERED

Stop condition• Node TOP is COVERED

Page 23: HPPS - Final - 06/14/2007

23

SPartA: the algorithm SPartA: the algorithm 2/22/2

OPEN ISSUE: Selecting the first node to be inserted into an empty partition

Random nodeNode with overall max communicationNode with max communication with its siblings

Page 24: HPPS - Final - 06/14/2007

24

Results Results 2/22/2

Complexity: exponential, due to the recursive nature of the algorithmExecution time however low (tens of seconds for a reasonable large design)EXAMPLE

ORIGINAL TREE PARTITIONED TREE

Page 25: HPPS - Final - 06/14/2007

25

Results Results 3/33/3

Evaluation metricsEDGECUT, FILLING and SPLITS

Evaluation of the three policies for node selection18 different trees of varying size

Page 26: HPPS - Final - 06/14/2007

26

Results Results 3/33/3

Page 27: HPPS - Final - 06/14/2007

27

Future workFuture work

Algorithm improvementBalancing of last partitionFirst node selection policiesMore refined “score” function for selecting node

Use closeness metrics

Comparisons with existing algorithms

ExpansionSpartA framework developmentTopology-aware partitioning

Page 28: HPPS - Final - 06/14/2007

28

The endThe end

ANY QUESTIONS?

Page 29: HPPS - Final - 06/14/2007

29

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 30: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition

Matteo [email protected]

Page 31: HPPS - Final - 06/14/2007

OutlineOutline

IntroductionProblem descriptionProject GoalsState of the Art

Project in detailsContributionsPhasesResults

What’s next

Page 32: HPPS - Final - 06/14/2007

32

Problem DescriptionProblem Description

Architectural description of a distributed FPGAs environment3 layers architecture

Page 33: HPPS - Final - 06/14/2007

33

Project GoalsProject Goals

Design the architecture of the most generic distributed system

Node definitionInterface definitionCommunication channel definition

Design a communication protocolEssential protocolInterrupt based protocolTimeout improvement

Page 34: HPPS - Final - 06/14/2007

34

State of the ArtState of the Art

CONFigurable ElecTronic TIssue (CONFETTI) by EPFLCellular based architecturePROs: high degree of parallelism, high computational powerCONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort

Splash 2 by IDA Supercomputing CenterArchitecture composed by a Sun Sparcstation host, an interface board and “Splash Array”s boardsPROs: again high parallelism and powerCONs: a central host coordinates the computational units, no fault tollerance, no flexibility

Page 35: HPPS - Final - 06/14/2007

35

ContributionsContributions

The proposed architecture:

Allows several Spartan-3 Starter Boards to communicate and exchange data

It is portable to different FPGAs with minimum effort

It is the basic infrastructure that will allow external partial dynamic reconfiguration

Page 36: HPPS - Final - 06/14/2007

36

Project PhasesProject Phases

First Phase, time window: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:

Digilent Spartan-3 Starter Board studyBoards connection

Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Communication between two Microblaze soft-processorsGPIO integration in the architecture

Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:

Interrupt handling, timeout handlingSimple application as example

Page 37: HPPS - Final - 06/14/2007

37

Board StudyBoard Study

How to use resources like switches, leds and connectors in the boardHow to map an IP-Core port with a physical pin of the boardChoice of the A2 Expansion Connector to connect two boards

Page 38: HPPS - Final - 06/14/2007

38

Microblaze CommunicationMicroblaze Communication

Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow

Page 39: HPPS - Final - 06/14/2007

39

GPIO InsertionGPIO Insertion

Higher architecture portability through the use of the GPIO IP-Core.Higher architecture portability through the use of the GPIO IP-Core

Page 40: HPPS - Final - 06/14/2007

40

Interrupt Controller InsertionInterrupt Controller Insertion

Communication protocol improvement by interrupt handling to prevent processor from busy waiting Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling

Page 41: HPPS - Final - 06/14/2007

41

TimeoutTimeout

Malfunctioning due to interference on the communication channel lead to deadlocks

Communication protocol is not reliable at all

Counter implementation, including the driver used by the processor to lower down raised interrupts

Development of a simple application to verify to correctness of the proposed approach

Page 42: HPPS - Final - 06/14/2007

42

ResultsResults

A short Demo ...

Page 43: HPPS - Final - 06/14/2007

43

Future WorkFuture Work

Apply the proposed approach to external partial dynamic reconfiguration

Develop a co-simulation framework based on the VHDL/SystemC descriptions of distributed systems

Receive as input the VHDL description of the systemBuild the VHDL description for every nodeCreate the SystemC stub to allow inter node communicationDescribe the communication in SystemCCo-simulate the VHDL / SystemC description

Page 44: HPPS - Final - 06/14/2007

QuestionsQuestions

Page 45: HPPS - Final - 06/14/2007

45

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 46: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

OOperatingperating Sy System support stem support forfor R Reconfeconfiigurablegurable S SoCoC

Page 47: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

Development of an OS Development of an OS architecture-independent architecture-independent

layer for dynamic layer for dynamic reconfigurationreconfiguration

Ivan [email protected]

Page 48: HPPS - Final - 06/14/2007

4848

OutlineOutline

IntroductionProblem descriptionProject GoalsState of the Art

Project in detailsContributionsPhasesResults

What’s next

Page 49: HPPS - Final - 06/14/2007

49

Problem descriptionProblem description

Need for an operating system support on Reconfigurable SoCs

Simplified software development processImproved code portability

Lack of support for dynamic reconfigurable architectures

Specific solutions for specific architectures

Need for an architecture-independent abstraction layer

49

Page 50: HPPS - Final - 06/14/2007

50

Project GoalProject Goal

Primary goals:Analysis of the State of the ArtDefinition of the new intermediate layerPhysical implementation

Specific goals:Study of the solutions developed inside the DRESD group Comparison between existing solutionsRecovery of on of the two implementationsHardware architectures generation using up-to-date tools on Xilinx Virtex II – Pro VP7

50

Page 51: HPPS - Final - 06/14/2007

51

State of the ArtState of the Art

Caronte implementation (Alberto Donato, 2005)Two kernel modules

ICAP deivice driverIP-Core manager (IPCM)

51

Page 52: HPPS - Final - 06/14/2007

52

State of the Art (cont’d)State of the Art (cont’d)

YaRA implementation (Vincenzo Rana, 2006)Multi-layered structure

Four modules: Reconfiguration controller driver, MAC, LOL, Reconfiguration LibraryROTFL architecture

52

Page 53: HPPS - Final - 06/14/2007

53

ContributionsContributions

Limits of existing implementationsLack of portability

E.g. YaRA solution implemented on RAPTOR2000

Reconfiguration process details visible from userspace

Definition of an architecture independent middleware

Improved portabilityIt works on different hardware architecturesIt works with different Linux distribution

Opportunity to optimize latencies53

Page 54: HPPS - Final - 06/14/2007

54

PhasesPhases

First phase: Layer definitionGoal: Factorization of common features

Boundaries of the new middlewareMapping of existing solutions on the functionalities

Motivation: Provide guidelines for actual implementation

Second phase: Implementation recoveryGoal: Recovery of bootstrap process and kernel imagesMotivation: Full recovery of Caronte solution

Third phase: Architectures generationGoal: Synthesis of hardware architectures using up-to-date Xilinx tools and coresMotivation: Synthesis of hardware architectures using up-to-date Xilinx tools54

Page 55: HPPS - Final - 06/14/2007

5555

First Phase: Layer definitionFirst Phase: Layer definition

Definition of new layer boundariesFactorization of existing featuresMapping of the required functionalities on existing implementations

Feature Caronte Solution YaRA Solution

Reconfiguration controller support

ICAP device driverReconfiguration Controller Driver

Dynamic address space assignment

IPCM Module MAC module

Dynamic device registration and driver

loadingIPCM Module LOL module

APIDirect interaction

with modulesReconfiguration

library

Module management (caching, placement...)

Not implemented ROTFL architectureLegend: ● = Both hardware and software ● = Hardware independent

Page 56: HPPS - Final - 06/14/2007

56

Second Phase: Implementation RecoverySecond Phase: Implementation Recovery

Bootstrap process from flash memory

56

16 MB Flash0xe4000000

0xe42FFFFF

...

...

0xe4F00000

0xe4F80000

64 MB DDR SDRAM0x000000

00

......

0xe4FFFFFF0x03FFFFFF

0x00800000

...

BRAM

PowerPC

FPGABootloader

Bootmanager

Kernel and RAMDisk Image1

2

3

4 5

6

Page 57: HPPS - Final - 06/14/2007

57

Second Phase: Implementation Recovery Second Phase: Implementation Recovery (cont’d)(cont’d)

Several issuesNo bootmanager nor linux kernel on flash memory at the beginningFlash memory seen as read-only memory at runtimeNeed for an ad-hoc solution

Avmon command line interfaceExecuted from DDR SDRAM memoryFTP transfert of bootmanager and flash programmingAlso useful for kernel download

Kernel executable imageKernel image built using a cross-compilerICAP and IPCM modules loaded at runtime

57

Page 58: HPPS - Final - 06/14/2007

58

Third Phase: Architecture generationThird Phase: Architecture generation

Hardware architecture used in Second Phase no longer useful

Synthesized with Xilinx ISE and EDK 6.1

Same hardware structure realized with updated cores and recend tool versions

Synthesis with Xilinx ISE and EDK 7.1Synthesis with Xilinx ISE and EDK 9.1

Lack of device driver support and documentation to configure newest cores

58

Page 59: HPPS - Final - 06/14/2007

59

Results: Implementation Results: Implementation RecoveryRecovery

Linux Bootstrap from flash memory

59

Page 60: HPPS - Final - 06/14/2007

60

Results: Implementation Results: Implementation RecoveryRecovery

Design summary for hardware architectures on Xilinx Virtex II – Pro VP7

Two main limitationsEthernet controllerNecessity of a top-level design

Design too large for module-based reconfiguration60

Xilinx ISE/EDK 7.1 Xilinx ISE/EDK 9.1

Resource

Used Available

% Used Available

%

Slices 4926 4928 99% 5318 4928 107%

Flip-Flops

52179856 52%

57249856 58%

4-in LUTs

69749856 70%

69939856 70%

Page 61: HPPS - Final - 06/14/2007

61

What’s nextWhat’s next

Device driver updates to support newest architectures

Intermediate layer implementationOpportunity to add some additional features

Reconfiguration scheduler

Opportunity to define a common device driver interface to simplify the creation of a new driver by the use

Integration of the middleware and the operating system support in a complete design flow

61

Page 62: HPPS - Final - 06/14/2007

6262

QuestionsQuestions

Page 63: HPPS - Final - 06/14/2007

63

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 64: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

Design FLowDesign FLow

Antonio [email protected]

Page 65: HPPS - Final - 06/14/2007

6565

OutlineOutline

IntroductionProblem descriptionProject GoalsState of the Art

Project in detailsContributionsPhasesResults

What’s next

Page 66: HPPS - Final - 06/14/2007

66

Problem descriptionProblem description

66

• User has to spread his attention on many problems, some of this related with the implementation of the design.

• Often users could don’t know anything about reconfigurable architecture generation and they haven’t.

Page 67: HPPS - Final - 06/14/2007

67

Project GoalsProject Goals

67

• New design methodology tailored to support partial dynamic reconfigurable architecture

• Definition and implememtantion of design framework able to

• Support different design paradigms i.e. Xilinx Module Based, Xilinx EAPR

• Hide the dirty work (due to the recofiguration) to the application designer

• Support different architectural solutions i.e. different communication infrastructure IBM CoreConnect or Wishbone

Page 68: HPPS - Final - 06/14/2007

68

ContributionsContributions

68

• With our frame work all user (novice and not) may be able to develop and debug their functionality through a reconfigurable architecture without analyze all problems related with that develop methodology

Page 69: HPPS - Final - 06/14/2007

69

PhasesPhases

69

•1st phase (15 March – 15 April): Budgeting

•Study of the state of the art

•2nd phase(15 April – 15 May): Realization phase

•Construction of the entire frame work based on previously separated tools

•Implementation of a innovative work flow

•3rd phase (15 May – 15 June): Project’s validation

• Definition of a new communication infrastructure and transfer protocol for the reconfigurable part

• Verify the integration of the new infrastructure in the project

Page 70: HPPS - Final - 06/14/2007

70

First PhaseFirst Phase

70

•Study of the state of the art

• Standard reconfigurable design flow

• Xilinx Modlue Based and EAPR

• Caronte Design Flow

• EDK-based architecture

Page 71: HPPS - Final - 06/14/2007

71

SelSelf Reconfigurable f Reconfigurable ArchitectureArchitecture

71

Page 72: HPPS - Final - 06/14/2007

72

Second Phase Second Phase 1/41/4

72

Costruction of the entire frame work based on prevoiusly separated tools

User has to focus his attention only on the develop of the IBM core-connect architecture and on writing modules which implement his functionality

SYSTEM.VHD contains all information about the IBM core-connect architecture

Page 73: HPPS - Final - 06/14/2007

73

Second Phase Second Phase 2/42/4

73

ArchGen take the system.vhd file and process the contained architecture and translate that static architecture in a dynamic one

FIX.VHD contains the instantiations of the processors (one or more) and all the components presented in the IBM core-connect architecture

TOP.VHD contains the instantiations of the fix component and the information about the communication infrastructure

Page 74: HPPS - Final - 06/14/2007

74

Second Phase Second Phase 33/4/4

74

COMiC generate an NCD file which contains the information about the communication infrastructure and an XDL file which contains the same information in text mode

Page 75: HPPS - Final - 06/14/2007

75

Second Phase Second Phase 4/4/44

75

At this point we have only to collect all the information we need and so, through a parser we insert those into a new top.vhd which will be our fix part of the architecture, at this point we have only to manage the reconfigurable modules written by the user

Page 76: HPPS - Final - 06/14/2007

76

Third Phase Third Phase 1/31/3

76

An OPB bus based on 3-state buffer used to link one or more modules to the fix part (created with ISE)

Definition of a new communication infrastructure and transfer protocol for the reconfigurable part

Page 77: HPPS - Final - 06/14/2007

77

Third Phase Third Phase 2/32/3

77

Use ncd2xdl converter to obtain an xdl file which contains all parameters of our bus

Page 78: HPPS - Final - 06/14/2007

78

Third Phase Third Phase 3/33/3

78

Perfect integration in our process, we can use all bus type to connect fix and reconfigurable part

Verify the integration of the new infrastructure in the project

Page 79: HPPS - Final - 06/14/2007

79

ResultsResults

79

• That frame work answer to the need of automation presented from the novice user and help, generally, all the users that they head a low time to market.

Page 80: HPPS - Final - 06/14/2007

80

What’s nextWhat’s next

80

• Our idea for future work is to schedule a one or two work day to patch some bugs presents in the project and to adjust the output of COMiC which has to create an OPB replay bus.

Page 81: HPPS - Final - 06/14/2007

8181

Questions?Questions?

Page 82: HPPS - Final - 06/14/2007

8282

What’s nextWhat’s next

DRESDDReAMS

Matteo MurgidaAlessandro Panella

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 83: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

PolarisPolaris

Page 84: HPPS - Final - 06/14/2007

8484

PolarisPolaris

Create an integrated HW/SW system to manage 2D reconfiguration

SW side:Maintain information on FPGA statusDecide of how to efficiently allocate tasks

HW side:Provide support for effective task allocationPerform 2D bitstream relocation

84

Page 85: HPPS - Final - 06/14/2007

85

Management of 2D Management of 2D Reconfiguration in a Reconfiguration in a

Reconfigurable SystemReconfigurable System

Massimo [email protected]

Page 86: HPPS - Final - 06/14/2007

8686

OutlineOutline

IntroductionProblem description Project Goals and Contributions

Project in detailsPhasesResults

Future Work

Page 87: HPPS - Final - 06/14/2007

87

Problem DescriptionProblem Description

New Generation of FPGAsVirtex-4 and Virtex-5Allow bi-dimensional reconfiguration

This permits to:Better exploit reconfigurable areaObtain modules performance optimizations

More complex management:Handle one more degree of freedomAvoid more fragmentationPerform good placement choices to keep low TRRKeep acceptable intra-module routing paths

87

Page 88: HPPS - Final - 06/14/2007

88

Project Goals and Project Goals and ContributionsContributions

Analyze effects of 2D reconfigurationNew advantagesNew problems

Examine possible solutions to new problemsExplore literature to find promising ideasEvaluate those solutions in various scenarios

Propose a new solutionCombining ideas from literature with new onesObtaining good cost-quality tradeoff

88

Page 89: HPPS - Final - 06/14/2007

89

Project PhasesProject PhasesFirst Phase, time window: 15th March – 12th April

Documentation: prj presentation (12/4), prj reportGoals:

General analysis of 2D reconfigurationDetailed description of the new problems

Second Phase, time window: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Definition of desired features for a solutionAnalysis and evaluation of existing solutions

Third Phase, time window: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoal: propose a new combined solution to effectively handle problems of 2D reconfiguration

89

Page 90: HPPS - Final - 06/14/2007

90

Setting and Advantages Setting and Advantages DefinitionDefinition

Definition of the setting:2D self partial dynamical run-time reconfiguration

Analysis of the advantages of 2D ReconfigurationIn area usage and performance

90

Page 91: HPPS - Final - 06/14/2007

9191

2D Fragmentation Problem2D Fragmentation Problem

Analysis of the 2D-fragmentation problemArea generally more fragmentedCan nullify the area optimizations obtained

Page 92: HPPS - Final - 06/14/2007

9292

Placement DecisionsPlacement Decisions

Analysis of 2D placement choices effects:Again, bad choices can lead to performance loss

Page 93: HPPS - Final - 06/14/2007

9393

Allocation managerAllocation manager

Definition of allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation

Definition of allocation manager structure:Empty space manager

Complete space Heuristic selection

FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )

Page 94: HPPS - Final - 06/14/2007

94

Most relevant worksMost relevant works

Maintain complete information on empty space:KAMER:

Keep All Maximally Empty RectanglesApply a general fitting strategy

CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy

Heuristically prune part of the information:KNER:

Keep Non-overlapping Empty RectanglesApply a general fitting strategy

2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structure

Apply (exclusively) a general fitting strategy94

Page 95: HPPS - Final - 06/14/2007

95

Evaluation and Proposed Evaluation and Proposed ApproachApproach

Proposed ApproachHeuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable systemFitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)95

High placement quality => high complexityLowest compl. => no focused fitting (bad especially for routing)

Page 96: HPPS - Final - 06/14/2007

9696

Structure of the allocation managerStructure of the allocation manager

Task, defined by:Arrival time, ASAP, (ALAP), H, W, Latency, Communicating TasksHosted in a queue which also adds a pointer to the rectangle where it is placed

Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)

Rectangle, defined by:X, Y, H, WInitially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols

Page 97: HPPS - Final - 06/14/2007

9797

The Placement AlgorithmThe Placement Algorithm

Page 98: HPPS - Final - 06/14/2007

98

Experimental ResultsExperimental Results

Benchmark of 100 randomly generated tasks:Size (5% to 25% of FPGA), randomly interconnected

Execution time: 3x less than CUR, close to KNERCommunication cost: 3x less than KNER, close to CURTask Rejection Rate: all solutions quite close

98

Page 99: HPPS - Final - 06/14/2007

99

Future WorkFuture Work

Apply the proposed solution to self reconfiguration:

Adapt the algorithm to run on the internal processorCreate a validation reconfigurable architectureIntegrate the architecture with relocation

Tune the algorithm to improve results:Experiment techniques to reduce TRRTry to optimize the code to have an algorithm with lower running time

99

Page 100: HPPS - Final - 06/14/2007

100100

Questions?Questions?

Page 101: HPPS - Final - 06/14/2007

101101

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 102: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems

Marco [email protected]

Page 103: HPPS - Final - 06/14/2007

103103

OutlineOutline

IntroductionProblem descriptionProject Goals

Project in detailsPhases Results

What’s next

Page 104: HPPS - Final - 06/14/2007

104

ProblemProblem DescriptionDescription

Self Dynamical Runtime 2D ReconfigurationXilinx Virtex-4 and Virtex-5

Relocation, different solutionsSoftware (BAnMat, PARBIT)Hardware (REPLICA, BiRF)

We chose an hardware solutionBiRF Square

104

Page 105: HPPS - Final - 06/14/2007

105

Project GoalsProject Goals

Study of the new FPGA FamiliesExamination of Xilinx documentation on V4 and V5

Analysis of the new bitstream structureGeneration of V4 and V5 bitstream

Development of the new version of BiRFImplementationValidation

105

Page 106: HPPS - Final - 06/14/2007

106

PhasesPhases

First Phase: 15th March – 12th AprilDocumentation: prj presentation (12/4), prj reportGoals:

Xilinx documentation examinationV4 & V5 bitstream structure analysis

Second Phase: 13th April – 17th MayDocumentation: prj presentation (17/5), prj reportGoals:

Implementation of BiRF SquareSynthesis

Third Phase: 18th May – 14th JuneDocumentation: prj presentation (14/6), prj reportGoals:

Verification & Validation

106

Page 107: HPPS - Final - 06/14/2007

107107

Frame AddressingFrame Addressing

New Frame Addressing:Possibility of addressing rows and columns

Page 108: HPPS - Final - 06/14/2007

108108

New ParserNew Parser

Page 109: HPPS - Final - 06/14/2007

109

CRC CalculationCRC Calculation

Particular CRC value, used by Xilinx tools

Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation

An optimized algorithm has been used

109

Page 110: HPPS - Final - 06/14/2007

110

Synthesis resultsSynthesis results

On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: maxfrequency of 290Mhz

110

Page 111: HPPS - Final - 06/14/2007

111111

Target DeviceTarget Device

Page 112: HPPS - Final - 06/14/2007

112112

Validation ArchitectureValidation Architecture

Page 113: HPPS - Final - 06/14/2007

113

Results Results 1/21/2

BiRF SquarePermitsto apply relocation in a self partially and dynamically 2D-reconfigurable systemThe occupation ratio is relatively smallFrequency more than acceptableReduction of internal memory requirements

113

Page 114: HPPS - Final - 06/14/2007

114

Results Results 2/22/2

Throughput of 7,3 MB/s:

A total configuration file size is about 1 MBConsidering an architecture:

1/3 of the area as fixed part 2/3 as reconfigurable part with 6 slots

With such hypothesisSize of a partial bitstream will be about 110 KBRelocation time of about 15 ms

114

Page 115: HPPS - Final - 06/14/2007

115

What’s NextWhat’s Next

Future improvements:Direct access to the memory (DMA)

Direct manipulation of the bitstreamPortability

Integration with ICAPElimination of the relocation overhead Relocation time << reconfiguration time

The final goal:Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation

115

Page 116: HPPS - Final - 06/14/2007

116116

QuestionsQuestions

Page 117: HPPS - Final - 06/14/2007

117117

What’s nextWhat’s next

DRESDDReAMS

Alessandro PanellaMatteo Murgida

Operating SystemIvan Beretta

Design FlowAntonio Piazzi

PolarisMassimo MorandiMarco Novati

HLRMarco Maggioni

Page 118: HPPS - Final - 06/14/2007

POLITECNICO DI MILANO

HHighigh L Levelevel RReconfigurationeconfiguration

Marco [email protected]

Page 119: HPPS - Final - 06/14/2007

119

OutlineOutline

IntroductionProblem description

Project Goals

State of the Art

Project in detailsContributions

HLR workflowGraphGenIsomorphClusteringSimpleLatencySalomone

Results

What’s next

Page 120: HPPS - Final - 06/14/2007

120

Problem DescriptionProblem Description

What is High Level Reconfiguration...?Theoretical approach to dynamic reconfiguration...

Vision...Reconfigurability has many advantages...

Mission...Exploit these advantages to obtain best performance...

How...?Adapting a system to this execution model managing complexity and drawbacks...

Page 121: HPPS - Final - 06/14/2007

121

Project GoalProject Goal

Create a complete HLR workflow...From a real system specification to its reconfigurable execution model...

Define precise interfaces for each phase...To promote flexibility and future HLR researchs...To develop a complete toolchain...

Apply some algorithms regarding reconfigurability...To reuse past works...

Page 122: HPPS - Final - 06/14/2007

122

State of ArtState of Art

Present of HLR...Some ideas/concepts regarding clustering and scheduling...... but no a complete and well-defined workflow.... but a lot of work to do.

System specifications analysis...PandA HW/SW framework to promote new ideas...Dynamic Reconfigurability can be considered as a branch of this research...

Page 123: HPPS - Final - 06/14/2007

123

ContributionContribution

Dynamic library loading system...Embedded into GNU compilation tool-chain

Porting of PandA libraries into Earendil...Suitable for future analysis...

HLR tools deployed onto Earendil...Cover each step of workflow...

Page 124: HPPS - Final - 06/14/2007

124

Gcc Frontend PartitioningAlgorithmPandA

HLR workflowHLR workflow

Clustering (with Analysis)...1st Month

Coloring...2nd Month

Scheduling...3rd Month Scheduling

Algorithm

ClusteredGraph

MetricEvaluation

ReconfigurableClustered

Graph

AreaLatency

Rec. TimePower

Target Architecture

Database

Page 125: HPPS - Final - 06/14/2007

125

GraphGenGraphGen

GraphGen is the first step of the HLR toolchain...Takes as input a system specification or an algorithm...Produces a graph (CFG/BB/DFG/SDG)

Perfoms high level analysis step...Transforms the system description (C/C++/SystemC) to a representation suitable for further elaboration...Based on GCC and compiler theory...Uses PandA 0.4 funtionalities to produce a statement level graph...

Page 126: HPPS - Final - 06/14/2007

126

IsomorphClusteringIsomorphClustering

IsomorphClusteing follows GraphGen in the HLR toolchain...

Takes as input a statement level graph...Produces a clustered graph...

Clustering phase...Aggregates nodes into configuration (basic unit of reconfigurable execution)...Based on isomorphism, tries to find different instances of isomorph templates...We can also apply differents algorithms...

Page 127: HPPS - Final - 06/14/2007

127

SimpleLatencySimpleLatency

SimpleLatency follows IsomorphClusteing in the HLR toolchain...

Takes as input a clustered graph...Adds latency information at each configuration...Produces a reconfigurable clustered graph with latency evaluations...

Coloring...“Colors” each cluster with usefull evalution for reconfigurability...Based on clusters internal critical path...Different metric for different architectures...Connects HLR with real architectural parameters...

Page 128: HPPS - Final - 06/14/2007

128

SalomoneSalomone

Salomone is the last step in the HLR toolchain...Takes as input a reconfigurable clustered graph...Produces a schedule on an abstract reconfigurable architecture...

Scheduling...It's considered the core task of HLR...Maps each configuration on an area portion...Adapts the system execution to reconfigurable model...Based onto graph coloring algorithm...

Page 129: HPPS - Final - 06/14/2007

129

Results Results 1/31/3

Based onto AES encryption...

Templates found with Isomorph CLustering...Execution time... 123.94 s

Page 130: HPPS - Final - 06/14/2007

130

Results Results 2/32/3

Salomone adapting and coloring...Execution time... 113.55 s

Page 131: HPPS - Final - 06/14/2007

131

Results Results 3/33/3

Final Scheduling...

Page 132: HPPS - Final - 06/14/2007

132

What's nextWhat's next

Heuristich implementation for Salomone...To improve result quality in term of number of area portions...

A new metric for area/latency...Based on RTL logical synthesis evaluations...

Introduce feedback into HLR workflow...Based on schedule evaluation...

New clustering and scheduling algorithms...Such as Napoleon...

Page 133: HPPS - Final - 06/14/2007

133

QuestionsQuestions