November 3, 2016 L.26.16 @ Politecnico di Milano...
Transcript of November 3, 2016 L.26.16 @ Politecnico di Milano...
Marco D. Santambrogio – [email protected]!
Politecnico di Milano !
Xilinx PYNQ Hackathon!
November 3, 2016!L.26.16 @ Politecnico di Milano !
2
3
ORCAResearch
Affiliate program
Intern/VisitingNGC
AlumniFacilities
N2020
People
Environment
DReAMS
RIBS
NSW
NECST Courses
NECST RL Fair Event
NECSTmas
Green N
ECSTHistory Banner
4
ORCAResearch
Affiliate program
Intern/VisitingNGC
AlumniFacilities
N2020
People
Environment
DReAMS
RIBS
NSW
NECST Courses
NECST RL Fair Event
NECSTmas
NECST NECST
History Banner
Reconfigurable Computing!
DReAMS
FPGA based systems!exascale computing infrastructure !
CAD tools !Physical design !
High-level analysis and PLs!
To make great dreams come true we need: !clear objectives, common goals, and talented,
passionate and trustful people!
To make great dreams come true we need: !clear objectives, common goals, and talented,
passionate and trustful people!
That’s why we are here today !
8
h'p://how-i-met-your-mother.wikia.com/wiki/Barney's_Blog
!
10
XPH on the Web !We do really want to build the XPH community! !
Official website!http://xph.necst.it/!
On Facebook!https://www.facebook.com/PynqHackathon!
11
Xilinx Hackathon!• Hack!
– Evento per approfondire/sfidarsi su specifiche tecnologie tramite!
12
Xilinx Hackathon!• Hackathon!
– Evento per approfondire/sfidarsi su specifiche tecnologie tramite!
– più giorni (marathon) di coding su idee/problemi interessanti!
!
13
15
16
17
18
19L26.16 @ 5.15pm – Nov 16 !L26.15 @ 5,15pm!Nov.: 16, 18, 29!Dec.: 2, 16, 20!
20
Platinum !
22
Gold!23
h'ps://www.facebook.com/nerfitalia/
Silver!24
ViaGrossich,17
25
26
Reconfigurable Computing!
DReAMS
FPGA based systems!exascale computing infrastructure !
CAD tools !Physical design !
High-level analysis and PLs!
Reconfiguration ! The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation. !
Gerald Estrin, 1960!
28
Reconfigurable Computing!
Reconfigurable computing is defined as the study of computation using reconfigurable devices!
Christophe Bobda, 2007!
Processor
Rec Computing
Full Custom
Compilation time
Performance
low
high
low high
29
Reconfigurable Hardware!
�Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much higher
performance than software, while maintaining a higher level of flexibility than hardware�
(K. Compton and S. Hauck, Reconfigurable Computing: a Survey of Systems and Software, 2002)
30
trend toward !higher levels !of integration!
Evolution of implementation technologies !
• Logic gates (1950s-60s) !• Regular structures for two-level logic
(1960s-70s)!– muxes and decoders, PLAs!
• Programmable sum-of-products arrays (1970s-80s)!– PLDs, complex PLDs!
• Programmable gate arrays (1980s-90s) !– densities high enough to permit entirely new !
class of application, e.g., prototyping, emulation,!acceleration!
31
Gate Array Technology (IBM - 1970s) !• Simple logic gates!
– combine transistors to!implement combinational !and sequential logic !
• Interconnect!– wires to connect inputs and !
outputs to logic blocks !• I/O blocks!
– special blocks at periphery!for external connections !
• Add wires to make connections !– done when chip is fabbed!
• �mask-programmable�!– construct any circuit!
32
Field-Programmable Gate Arrays !• Logic blocks!
– to implement combinational!and sequential logic !
• Interconnect!– wires to connect inputs and !
outputs to logic blocks!• I/O blocks!
– special logic blocks at periphery!of device for external connections !!
• Key questions:!– how to make logic blocks programmable?!– how to connect the wires? !– after the chip has been fabbed !
33
Reconfigurable Architectures Characterization!• SoC (System on Chip) !
– Embedded Vs External!– Complete Vs Partial!– Dynamic VS Static!
• SoMC (System on Multipe-Chip)!– Embedded Vs External !– Complete Vs Partial!– Dynamic VS Static!
s t a t i c
Partial Complete Embedded
Complete/Partial Who
(a) (b) (c) (d)
34
The configuration bitstream !
• Occupation must be determined only on the basis of !– Number of configuration words !– Initial Frame Address Register (FAR) value !
35
Xilinx FPGA and Configuration Memory !
36
Programmable System on a Chip !• No longer just a bunch of reconfigurable
elements!• DSPs, GPP, reconfigurable elements, etc. etc...!
37
Intel Stellarton!• Heterogeneous Multicore !
– An Intel Atom E6XX processor !• # Cores: 1!• # Threads: 2!• L2 Cache: 512 KB!
– An Altera Field Programmable Gate Array!
38
38
Complex Heterogeneous Systems !• Due to the complexity in the demand, the system has to
be heterogeneous and able to autonomously adapt and evolve!– FPGAs!– DSPs!– GPP (Multi-cores)!
• Adaptive systems learn how they can be used to address a particular problem !– Respond to user goals !– Build self-performance models !– Identify what they needs to learn !– Adapt to changing goals, resources, models, operating conditions!– Gracefully adapt to failures !– Optimize their own behavior !
39
Heterogeneous Complex Systems!• Ryft ONE!
– Big Data infrastructure due to an FPGA-accellerated architecture!– http://www.ryft.com/!
• IBM Power8!– Introducing the Coherent Accelerator Processor Interface (CAPI)
port that is layered on top of PCI Express 3.0!– http://www-304.ibm.com/webapp/set2/sas/f/capi/home.html!
• Microsoft Catapult!– Stratix V (Arria 10 FPGA) !– http://research.microsoft.com/en-us/projects/catapult/!
• OpenPower Foundation !– http://openpowerfoundation.org/!
40
Limits and Drawbacks!• Design flow: The need of a comprehensive
framework which can guide designers through the whole implementation process is becoming stronger!
• Reconfiguration times impact heavily on the final solution�s latency !
41
Design flow!• Dynamic reconfigurable embedded systems are
gathering, an increasing interest from both the scientific and the industrial world!– The need of a comprehensive framework which can guide
designers through the whole implementation process is becoming stronger !
• There are several techniques to exploit partial reconfiguration, but.. !– Few approaches for frameworks and tools to design
dynamically reconfigurable systems !• They don’t take into consideration both the HW and the SW side
of the final architecture!• They are not able to support different devices!• They cannot be used to design systems for different
architectural solution!
42
42
SDx - Origin: Productivity Gap!
43
SDx - Origin: Productivity Gap!
Normal mortals cannot easily program massively parallel systems!
44
Hour Day Week Month
0.25
1
Year
4
16
64
256
Initial Design
ParallelisationClock Rate
RelativePerformance
Design-time
CPU
GPUFPGA
• FPGAs provide large speed-up and power savings – at a price! !– Days or weeks to get an initial version working !– Multiple optimisation and verification cycles to get high performance !
Page45
SDx - Origin: Productivity !gap from another angle!
(David Thomas, Imperial College, UK) !
45
• ISE,RTL-baseddesignentrywithIPlibrary
Legacy
• Microblaze,SDK,EDK
EmbeddedCPUintegra[on
• VivadoHLS• SDNet(DSLPX)• Blocks[tchingandmanualintegra[oninplaborminRTL
Raisedabstrac[onforaccelerators
• SDSoC,SDNet,SDAccel• Predefinedmethodsfordatatransfer&automatedimplementa[on
Simplifiedhostintegra[on&automatedinfrastructurecrea[on
Time
Abstrac[on
46
Innovation: !Evolution of Design Environments!
• ISE,RTL-baseddesignentrywithIPlibrary
Legacy
• Microblaze,SDK,EDK
EmbeddedCPUintegra[on
• VivadoHLS• SDNet(DSLPX)• Blocks[tchingandmanualintegra[oninplaborminRTL
Raisedabstrac[onforaccelerators
• SDSoC,SDNet,SDAccel• Predefinedmethodsfordatatransfer&automatedimplementa[on
Simplifiedhostintegra[on&automatedinfrastructurecrea[on
Time
Abstrac[on
47
Innovation: !Evolution of Design Environments!
• ISE,RTL-baseddesignentrywithIPlibrary
Legacy
• Microblaze,SDK,EDK
EmbeddedCPUintegra[on
• VivadoHLS• SDNet(DSLPX)• Blocks[tchingandmanualintegra[oninplaborminRTL
Raisedabstrac[onforaccelerators
• SDSoC,SDNet,SDAccel• Predefinedmethodsfordatatransfer&automatedimplementa[on
Simplifiedhostintegra[on&automatedinfrastructurecrea[on
Time
Abstrac[on
48
Innovation: !Evolution of Design Environments!
• ISE,RTL-baseddesignentrywithIPlibrary
Legacy
• Microblaze,SDK,EDK
EmbeddedCPUintegra[on
• VivadoHLS• SDNet(DSLPX)• Blocks[tchingandmanualintegra[oninplaborminRTL
Raisedabstrac[onforaccelerators
• SDSoC,SDNet,SDAccel• Predefinedmethodsfordatatransfer&automatedimplementa[on
Simplifiedhostintegra[on&automatedinfrastructurecrea[on
Time
Abstrac[on
49
Innovation: !Evolution of Design Environments!
Page 50
Platform creation, monitoring & profiling, runtime OS, static and dynamic
workload partitioning, cloud integration!
Reconfiguration challenges !• Reconfiguration times heavily impact on the final
solution�s latency !– Hiding reconfiguration time is not sufficient!!
• Possible solution: !– Trivial!
• Bitstream dimension reduction!– Complex!
• Maximize the reuse of configured modules!• Reconfiguration hiding!• Alternative implementation (SW execution)!• Relocation!
51
51
Tasks reuse!• Reconfiguration times impact heavily on the final
solution�s latency, therefore: !– Not only try to hide the reconfigurations!– But try to maximize the reuse of reconfigurable
modules!
Schedulelengthisonaverageatleast18.6%be'erthantheshortestoneand19.7%be'erthantheaverage.
52
Reconfiguration hiding!
Time
Area
AB
Reconf
D
C
Reconf
E
F
A
E
DC
B
F
2/1
2/2
1/2
1/1
1/1
2/2
Area/Time
53
Reconfiguration hiding!
Time
Area
AB
Reconf
D
C
Reconf
E
F
Area
AB
Reconf
Reconf
DC
Reconf
Reconf
F
E
A
E
DC
B
F
2/1
2/2
1/2
1/1
1/1
2/2
Area/Time
54
Alternative implementation !(SW execution) !
• Object code implemented as hardware components do not always guarantee the best performance…!
• Cryptography architecture !– 1 GPP running Linux!– 2 reconfigurable regions!– 2 cryptography services (AES and DES)!
55
Relocation: The Problem!
People Demanding for Functionalities
56
Relocation: The Problem!
People Demanding for Functionalities
Set of Available Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
57
Relocation: The Problem!
People Demanding for Functionalities
Set of Available Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
58
Relocation: Scenario!
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Time
Area
AB
Rec. F
F
Rec. E
E
Rec. C
C
Rec. D
D
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
59
Relocation: Motivation!
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
60
Relocation: Motivation!RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
RR3RR2RR1
A
RR3RR2RR1
C
RR3RR2RR1
B
RR3RR2RR1
B
RR3RR2RR1
D
RR3RR2RR1
D
E
RR3RR2RR1
E
RR3RR2RR1
RR3RR2RR1
F
Time
Area
AB
Rec. C
C
Rec. F
F
Rec. E
E
DRec. D
61
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Relocation: Motivation!
62
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Time
Area
AB
Rec. C
C
R2 F
F
R2 E
E
DR2 D
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
Relocation: Rationale!• Bitstreams relocation technique to: !
– speedup the overall system execution!– reduce the amount of memory used to store partial
bitstreams!– achieve a core preemptive execution !– assign at runtime the bitstreams placement!
63
Slots Modules Bitstreams Bitstreams with reloc. % Memory saving2 5 12 6 50,0%3 8 27 9 75,0%5 10 55 11 80,0%8 16 136 17 87,5%
Relocation: Virtual homogeneity !
64
BiRF - Relocation management!• Create an integrated HW/SW system to manage
relocation (1D and 2D) in reconfigurable architecture!
– Maintain information on FPGA status !– Decide how to efficiently allocate tasks!– Provide support for effective task allocation!– Perform bitstream relocation!
65 65 65
65
Xilinx PYNQ Hackathon!
Politecnico di Milano, DEIB !January 14-15, 2016 !
Marco D. Santambrogio <[email protected]>!Politecnico di Milano !