The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José...
Transcript of The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José...
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671668
exploring Manycore Architectures for Next-GeneratiOn HPC systems
The MANGO Process for Designing andProgramming Multi-Accelerator Multi-
FPGA SystemsH2RC’18
Sunday, November 11th, 2018Dallas, Texas, USA
Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti
Email: [email protected]
Nov 11th, 2018
MANGO Contexto MANGO FETHPC-2014 project:
– is about manycore architecture exploration in HPC
o HPC quest for performance/power improvement– Trend in using heterogeneous components
• GPUs, manycores, and even FPGAs• Goal is to get closer to the Intrinsic Computational Efficiency (ICE)
– MANGO focuses on heterogeneity• How we combine heterogeneous components for the best achievement of computational efficiency• How to program/manage them for the best achievement of computational efficiency
o Emerging requirements on HPC systems:
– Predictability (QoS; time sensitivity)• Due to the merging of HPC with Big Data
– Capacity computing• Run as many application instances as possible
– MANGO addresses predictability and capacity computing• 3P model (Performance/Power/Predictability)
Future and Emerging Technologies
H2RC’18 2
Nov 11th, 2018
MANGO Contexto MANGO builds a prototyping system for 3P space explorationo Goals:
– Hardware• Develop a flexible prototype for rapid exploration of architectures• Explore new deeply heterogeneous manycore architectures• Real-time support exploring the PPP design space• Provide a unified and simple (homogeneous) access to the system via a smart
interconnect– Software
• Adapt programming models and compiler support to the new architectures• Develop the right resource manager to deal with the system
– Infrastructure• Provide new monitoring tools to the system• Provide new cooling techniques to the system
– Applications• Analyze impact of on a set of real applications• Support of video transcoding, medical imaging, security and surveillance applications
H2RC’18 3
Nov 11th, 2018
MANGO Prototype
o General-purpose nodes (Xeon+GPGPU) coupled with Heterogeneous nodes, HNs:– A large-scale cluster of high-capacity FPGAs– A robust, scalable interconnect for a multi-FPGA manycore system– Will enable FPGA acceleration at scale:
a key ingredient for the EsD roadmap– A continuum from FPGA emulation to the final physical platform (might be an ASIC manycore, FPGA,
mixed…) under a stable software environment
– Native isolation and partitioning mechanisms for QoS-aware capacity computing HPC applicationso Two-phase passive energy-efficient coolingo Demonstrated applications with stringent high-performance and QoS requirements
H2RC’18 4
Nov 11th, 2018
Consortium
H2RC’18 5
Nov 11th, 2018
Agendao MANGO Architecture
– HN Hardware and assembly– Heterogeneity– Network– Accelerator Interface– MANGO Design Flow
o FPGA resource utilizationo Conclusions
H2RC’18 6
Nov 11th, 2018
MANGO Architecture
H2RC’18 7
HN cluster
HN cl
uste
r(m
ulti-
FPGA
s+
mem
ory
+ IO
)
…
HN cluster
HN cluster
HN cluster
HN cluster
HN cluster
MANGO prototype
ACC1ACC1ACC1ACC1ACC1ACC1ACC1ACC2
ACC1ACC1ACC1ACC3
Pool of accelerators
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
FPGA0 FPGA1
FPGA2 FPGA3 FPGA4
MEM MEM
MEM
PCIe
FPGA0 FPGA1 FPGA2 FPGA3 FPGA4
DDR3 DDR4 DDR4 PCIe
Pool
of h
ardw
are
com
pone
nts
forH
N c
lust
er
UNIT(ACCELER
ATOR)
U2M
M2U
TABLETLBTILEREG
NI
ROUTERITEM
ROUTER
ITEM NI
MC
tile concept
Nov 11th, 2018
HN Hardware and Assembly
H2RC’18 8
Lego-like excercise
Nov 11th, 2018
Heterogeneity: Pool of Accelerators
H2RC’18 9
C
L1D
L2D
NI
TR
R
C
L1D
L2D
NI
TR
R
C
L1D
L2D
NI
TR
R
C
L1D
L2D
NI
TR
R . . .. . .
. . .
. . .
manycore: MIPS, RISCV
General purpose Custom made
GPU-like manycore DCT cores sub-block 3
DCT4
DCT32DCT16
DCT32
DCT8
DCT32DCT16
DCT32
DCT cores sub-block 1
DCT cores sub-block 2
DCT cores sub-block 0
X0 – X3 X4 – X7 X8 – X11 X12 – X15 X16 – X19 X20 – X23 X24 – X27 X28 – X31
512-bit input
Output multiplexer
Adder units
DCT4 DCT4 DCT4 DCT4 DCT4 DCT4 DCT4
DCT16
DCT8 DCT8 DCT8
DCT16
8 X DCT4
4 X DCT8
2 X DCT16
1 X DCT32
“00”
“01”
“10”
“11”
512-bit output
mode
HeterogeneityTile concept Element connection
H2RC’18 10
Accelerator(UNIT)
U2M
M2U
TABLETLBTILEREG
NI
DATAROUTER
ITEM ROUTER
ITEM NI
Data pathFlow controlControl path
MCI/O Cont
UNIT
U2M
TABLETLB
TILEREG
NI
M2UDATAROUTER
ITEM ROUTER
ITEM NI
MC
I/O Cont
Nov 11th, 2018
Heterogeneity, but regular layout
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
FPGA0 FPGA1
FPGA2 FPGA3 FPGA4
MEM MEM
MEM
PCIe
H2RC’18 11
Nov 11th, 2018
Data Networko DATA Routers connected among them in a 2D mesh
layout. o NI decouples network from tile components.o Support for different Virtual networks (VN) with
different number of Virtual channels (VC).o Support for dynamic assignment of bandwidth per VN.o Support for capacity computing
12
UNIT
U2M
M2U
TABLETLBTILEREG
NI
DATAROUTER
ITEM ROUTER
ITEM NI
app1 app1 app1 app3
app2 app2 app3 app3U2M
TABLETLBNI
M2UDATAROUTER
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
MEM MEM
MEM
PCIe
H2RC’18
Control Network• Used for configuration and
monitoring– MANGO Infrastructure– Accelerators or units
• Flexible and generic to let the accelerators be configured based on their complexities– Ad-hoc protocols
Nov 11th, 2018 H2RC’18 13
UNIT
U2M
M2U
TABLETLBTILEREG
NI
DATAROUTER
ITEM ROUTER
ITEM NI
UNITTLB
TILEREG
ITEM ROUTER
ITEM NI
I/O Cont
Nov 11th, 2018
Accelerator interfaceo Decouple the UNIT from the rest
of the MANGO platformo Allow the implementation of an
unique interface for every UNITo Unify memory access
– Byte, Half (16 bits), word (32 bits) & block (512 bits) memory access types
o Allow to map synchronization registers in the virtual memory address space
14H2RC’18
UNIT
U2M
M2U
TABLETLBTILEREG
NI
DATAROUTER
ITEM ROUTER
ITEM NI
UNIT
U2M
TABLETLBNI
M2U
Nov 11th, 2018
MANGO Design Flow
Verilog Code
Bitstream file proFPGAConfiguration file
Architecture Definitionfile
Architecture definition template
tools
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
FPGA0 FPGA1
FPGA2 FPGA3 FPGA4
MEM MEM
MEM
PCIe
H2RC’18 15
Architecture Definition File
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
16
ARCHITECTURE ID CONCEPT
Nov 11th, 2018 H2RC’18 16
Architecture Definition TemplateKey aspects• Ease to configure the Unit
type for every tile
17
PEAK NU+ NU+
HACC
PEAK PEAK
PEAKARM
NU+ NU+
NU+ ARM
PEAKNU+ HACC
MC
MC
MC
MC
PCI
NU+
PEAK
PEAK NU+ NU+
PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACC
PEAK
PEAK
HACC HACC HACC
MCMC
MC MC
PCI
Nov 11th, 2018 H2RC’18
Architecture Definition TemplateKey aspects• Ease to configure the Unit
type for every tile• Ease to attach Memory
Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile
18
PEAK NU+ NU+
PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACC
PEAK
PEAK
HACC HACC HACC
MCMC
MC MC
PCI
PEAK NU+ NU+
PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACC
PEAK
PEAK
HACC HACC HACC
MCMC
MC MC
ETH
MC
Nov 11th, 2018 H2RC’18
Architecture Definition TemplateKey aspects• Ease to configure the Unit
type for every tile• Ease to attach Memory
Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile
• Ease to configure the number of tiles
19
PEAK NU+ NU+
PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACC
PEAK
PEAK
HACC HACC HACC
MC
PCI PEAK NU+ NU+
PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACC
PEAK
PEAK
HACC HACC HACC
PEAK NU+ NU+
HACC
PEAK PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACCNU+
PEAK
HACC
PEAK NU+ NU+
HACC
PEAK PEAK
PEAKARM
NU+ NU+
NU+ ARM
HACCNU+
PEAK
HACC
MC
MCMC
Nov 11th, 2018 H2RC’18
Architecture Definition TemplateKey aspects• Ease to configure the Unit
type for every tile• Ease to attach Memory
Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile
• Ease to configure the number of tiles
• Ease to implement multi-FPGA designs
20
Zynq Z100 Virtex V2000TEthernet MC TR INI RouterNI
Nov 11th, 2018 H2RC’18
Architecture Definition TemplateKey aspects• Ease to configure the Unit type
for every tile• Ease to attach Memory
Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile
• Ease to configure the number of tiles
• Ease to implement multi-FPGA designs
• Easy to configure the number of virtual networks and virtual channels to achieve QoS guarantee
21Nov 11th, 2018 H2RC’18
ROUTERFlit[63:0]
FlitType[1:0]
ValidGo
broadcast
VN[N:0]VC[M:0]
North
East
Local
South
West
<M, N>
Architecture Definition TemplateKey aspects• Ease to configure the Unit type
for every tile• Ease to attach Memory
Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile
• Ease to configure the number of tiles
• Ease to implement multi-FPGA designs
• Easy to configure the number of virtual networks and virtual channels to achieve QoS guarantee
22Nov 11th, 2018 H2RC’18
Verilog Code
Architecture definition template
R
TR
ACC
NI
R
TR
ACC
NI
MC
ETH
R
TR
ACC
NI
R
TR
ACC
NI
MCETH
Architecture Definition File: Example
Nov 11th, 2018 H2RC’18 23
ARCHITECTURE ID 0 ARCHITECTURE ID 1
FPGA Resource utilization
3.29
10.722.47
4.94
1.34
5.23
Tile (without accelerator) Tile (with accelerator)
% Utilization
LUT LUTRAM FF
• Xilinx Virtex V2000T speedgrade -1
• Tile with Accelerator:– MIPS-based cache
coherent 2-core accelerator
• 32K L1I• 512K L1D• 1MB L2D
Nov 11th, 2018 H2RC’18 24
Nov 11th, 2018
Conclusionso The MANGO approach for supporting the
implementation of multiple accelerators on a multi-FPGA platform– High customization of the cluster
• Flexibility for architecture exploration– Flexibility in configuring an architecture
• Rapid architecture exploration– Effectivity in the system communications
• QoS guarantee– Effectivity in monitoring the system
o Percentage of resources needed per tile quite slow
H2RC’18 25
Nov 11th, 2018
Thank you for your attentiono Contact us at
– www.mango-project.eu
o Other directly related EU project– www.recipe-project.eu
26
o The MANGO project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671668.
o The content of this presentation reflects only the authors' views and the European Commission is not responsible for any use that may be made of the information it contains.
H2RC’18
Architecture Definition File
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
27Nov 11th, 2018 H2RC’18 27
Architecture Definition File
28
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
Nov 11th, 2018 H2RC’18 28
Architecture Definition File
29
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
Nov 11th, 2018 H2RC’18 29
Architecture Definition File
30
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
Nov 11th, 2018 H2RC’18 30
Architecture Definition File
31
oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork
Nov 11th, 2018 H2RC’18 31
Nov 11th, 2018
DMA Transferso DMA controller attached to every memory controller.o Programming parameters: base address, source tile, size
of the transfer, destination of the transfer, target address.o Possibilities:
H2RC’18 32
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
MEM MEM
MEM
PCIe
ACC1 ACC1
ACC1
ACC1
ACC3
ACC3 ACC3
ACC3
ACC2
ACC2
ACC2
ACC2
MEM MEM
MEM
PCIe
Data Network
• DATA Routers connected among them in a 2D mesh layout.
• NI decouples network from tile components.• Supported different Virtual networks (VN) with different
number of Virtual channels (VC).• Supported dynamic assignment of bandwidth per VN.
33
T0 T1 T2 T3
T4 T5 T6 T7
UNIT
U2M
M2U
TABLETLBTILEREG
NI
DATAROUTER
ITEM ROUTER
ITEM NI
app1 app1 app1 app3
app2 app2 app3 app3