The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José...

33
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671668 exploring Manycore Architectures for Next-GeneratiOn HPC systems The MANGO Process for Designing and Programming Multi-Accelerator Multi- FPGA Systems H2RC’18 Sunday, November 11th, 2018 Dallas, Texas, USA Rafael Tornero , José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: [email protected]

Transcript of The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José...

Page 1: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671668

exploring Manycore Architectures for Next-GeneratiOn HPC systems

The MANGO Process for Designing andProgramming Multi-Accelerator Multi-

FPGA SystemsH2RC’18

Sunday, November 11th, 2018Dallas, Texas, USA

Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti

Email: [email protected]

Page 2: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

MANGO Contexto MANGO FETHPC-2014 project:

– is about manycore architecture exploration in HPC

o HPC quest for performance/power improvement– Trend in using heterogeneous components

• GPUs, manycores, and even FPGAs• Goal is to get closer to the Intrinsic Computational Efficiency (ICE)

– MANGO focuses on heterogeneity• How we combine heterogeneous components for the best achievement of computational efficiency• How to program/manage them for the best achievement of computational efficiency

o Emerging requirements on HPC systems:

– Predictability (QoS; time sensitivity)• Due to the merging of HPC with Big Data

– Capacity computing• Run as many application instances as possible

– MANGO addresses predictability and capacity computing• 3P model (Performance/Power/Predictability)

Future and Emerging Technologies

H2RC’18 2

Page 3: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

MANGO Contexto MANGO builds a prototyping system for 3P space explorationo Goals:

– Hardware• Develop a flexible prototype for rapid exploration of architectures• Explore new deeply heterogeneous manycore architectures• Real-time support exploring the PPP design space• Provide a unified and simple (homogeneous) access to the system via a smart

interconnect– Software

• Adapt programming models and compiler support to the new architectures• Develop the right resource manager to deal with the system

– Infrastructure• Provide new monitoring tools to the system• Provide new cooling techniques to the system

– Applications• Analyze impact of on a set of real applications• Support of video transcoding, medical imaging, security and surveillance applications

H2RC’18 3

Page 4: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

MANGO Prototype

o General-purpose nodes (Xeon+GPGPU) coupled with Heterogeneous nodes, HNs:– A large-scale cluster of high-capacity FPGAs– A robust, scalable interconnect for a multi-FPGA manycore system– Will enable FPGA acceleration at scale:

a key ingredient for the EsD roadmap– A continuum from FPGA emulation to the final physical platform (might be an ASIC manycore, FPGA,

mixed…) under a stable software environment

– Native isolation and partitioning mechanisms for QoS-aware capacity computing HPC applicationso Two-phase passive energy-efficient coolingo Demonstrated applications with stringent high-performance and QoS requirements

H2RC’18 4

Page 5: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Consortium

H2RC’18 5

Page 6: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Agendao MANGO Architecture

– HN Hardware and assembly– Heterogeneity– Network– Accelerator Interface– MANGO Design Flow

o FPGA resource utilizationo Conclusions

H2RC’18 6

Page 7: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

MANGO Architecture

H2RC’18 7

HN cluster

HN cl

uste

r(m

ulti-

FPGA

s+

mem

ory

+ IO

)

HN cluster

HN cluster

HN cluster

HN cluster

HN cluster

MANGO prototype

ACC1ACC1ACC1ACC1ACC1ACC1ACC1ACC2

ACC1ACC1ACC1ACC3

Pool of accelerators

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

FPGA0 FPGA1

FPGA2 FPGA3 FPGA4

MEM MEM

MEM

PCIe

FPGA0 FPGA1 FPGA2 FPGA3 FPGA4

DDR3 DDR4 DDR4 PCIe

Pool

of h

ardw

are

com

pone

nts

forH

N c

lust

er

UNIT(ACCELER

ATOR)

U2M

M2U

TABLETLBTILEREG

NI

ROUTERITEM

ROUTER

ITEM NI

MC

tile concept

Page 8: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

HN Hardware and Assembly

H2RC’18 8

Lego-like excercise

Page 9: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Heterogeneity: Pool of Accelerators

H2RC’18 9

C

L1D

L2D

NI

TR

R

C

L1D

L2D

NI

TR

R

C

L1D

L2D

NI

TR

R

C

L1D

L2D

NI

TR

R . . .. . .

. . .

. . .

manycore: MIPS, RISCV

General purpose Custom made

GPU-like manycore DCT cores sub-block 3

DCT4

DCT32DCT16

DCT32

DCT8

DCT32DCT16

DCT32

DCT cores sub-block 1

DCT cores sub-block 2

DCT cores sub-block 0

X0 – X3 X4 – X7 X8 – X11 X12 – X15 X16 – X19 X20 – X23 X24 – X27 X28 – X31

512-bit input

Output multiplexer

Adder units

DCT4 DCT4 DCT4 DCT4 DCT4 DCT4 DCT4

DCT16

DCT8 DCT8 DCT8

DCT16

8 X DCT4

4 X DCT8

2 X DCT16

1 X DCT32

“00”

“01”

“10”

“11”

512-bit output

mode

Page 10: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

HeterogeneityTile concept Element connection

H2RC’18 10

Accelerator(UNIT)

U2M

M2U

TABLETLBTILEREG

NI

DATAROUTER

ITEM ROUTER

ITEM NI

Data pathFlow controlControl path

MCI/O Cont

UNIT

U2M

TABLETLB

TILEREG

NI

M2UDATAROUTER

ITEM ROUTER

ITEM NI

MC

I/O Cont

Page 11: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Heterogeneity, but regular layout

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

FPGA0 FPGA1

FPGA2 FPGA3 FPGA4

MEM MEM

MEM

PCIe

H2RC’18 11

Page 12: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Data Networko DATA Routers connected among them in a 2D mesh

layout. o NI decouples network from tile components.o Support for different Virtual networks (VN) with

different number of Virtual channels (VC).o Support for dynamic assignment of bandwidth per VN.o Support for capacity computing

12

UNIT

U2M

M2U

TABLETLBTILEREG

NI

DATAROUTER

ITEM ROUTER

ITEM NI

app1 app1 app1 app3

app2 app2 app3 app3U2M

TABLETLBNI

M2UDATAROUTER

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

MEM MEM

MEM

PCIe

H2RC’18

Page 13: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Control Network• Used for configuration and

monitoring– MANGO Infrastructure– Accelerators or units

• Flexible and generic to let the accelerators be configured based on their complexities– Ad-hoc protocols

Nov 11th, 2018 H2RC’18 13

UNIT

U2M

M2U

TABLETLBTILEREG

NI

DATAROUTER

ITEM ROUTER

ITEM NI

UNITTLB

TILEREG

ITEM ROUTER

ITEM NI

I/O Cont

Page 14: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Accelerator interfaceo Decouple the UNIT from the rest

of the MANGO platformo Allow the implementation of an

unique interface for every UNITo Unify memory access

– Byte, Half (16 bits), word (32 bits) & block (512 bits) memory access types

o Allow to map synchronization registers in the virtual memory address space

14H2RC’18

UNIT

U2M

M2U

TABLETLBTILEREG

NI

DATAROUTER

ITEM ROUTER

ITEM NI

UNIT

U2M

TABLETLBNI

M2U

Page 15: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

MANGO Design Flow

Verilog Code

Bitstream file proFPGAConfiguration file

Architecture Definitionfile

Architecture definition template

tools

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

FPGA0 FPGA1

FPGA2 FPGA3 FPGA4

MEM MEM

MEM

PCIe

H2RC’18 15

Page 16: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

16

ARCHITECTURE ID CONCEPT

Nov 11th, 2018 H2RC’18 16

Page 17: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit

type for every tile

17

PEAK NU+ NU+

HACC

PEAK PEAK

PEAKARM

NU+ NU+

NU+ ARM

PEAKNU+ HACC

MC

MC

MC

MC

PCI

NU+

PEAK

PEAK NU+ NU+

PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACC

PEAK

PEAK

HACC HACC HACC

MCMC

MC MC

PCI

Nov 11th, 2018 H2RC’18

Page 18: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit

type for every tile• Ease to attach Memory

Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile

18

PEAK NU+ NU+

PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACC

PEAK

PEAK

HACC HACC HACC

MCMC

MC MC

PCI

PEAK NU+ NU+

PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACC

PEAK

PEAK

HACC HACC HACC

MCMC

MC MC

ETH

MC

Nov 11th, 2018 H2RC’18

Page 19: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit

type for every tile• Ease to attach Memory

Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile

• Ease to configure the number of tiles

19

PEAK NU+ NU+

PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACC

PEAK

PEAK

HACC HACC HACC

MC

PCI PEAK NU+ NU+

PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACC

PEAK

PEAK

HACC HACC HACC

PEAK NU+ NU+

HACC

PEAK PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACCNU+

PEAK

HACC

PEAK NU+ NU+

HACC

PEAK PEAK

PEAKARM

NU+ NU+

NU+ ARM

HACCNU+

PEAK

HACC

MC

MCMC

Nov 11th, 2018 H2RC’18

Page 20: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit

type for every tile• Ease to attach Memory

Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile

• Ease to configure the number of tiles

• Ease to implement multi-FPGA designs

20

Zynq Z100 Virtex V2000TEthernet MC TR INI RouterNI

Nov 11th, 2018 H2RC’18

Page 21: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit type

for every tile• Ease to attach Memory

Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile

• Ease to configure the number of tiles

• Ease to implement multi-FPGA designs

• Easy to configure the number of virtual networks and virtual channels to achieve QoS guarantee

21Nov 11th, 2018 H2RC’18

ROUTERFlit[63:0]

FlitType[1:0]

ValidGo

broadcast

VN[N:0]VC[M:0]

North

East

Local

South

West

<M, N>

Page 22: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition TemplateKey aspects• Ease to configure the Unit type

for every tile• Ease to attach Memory

Controllers (MC) and I/O devices (PCIe, Ethernet) to any tile

• Ease to configure the number of tiles

• Ease to implement multi-FPGA designs

• Easy to configure the number of virtual networks and virtual channels to achieve QoS guarantee

22Nov 11th, 2018 H2RC’18

Page 23: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Verilog Code

Architecture definition template

R

TR

ACC

NI

R

TR

ACC

NI

MC

ETH

R

TR

ACC

NI

R

TR

ACC

NI

MCETH

Architecture Definition File: Example

Nov 11th, 2018 H2RC’18 23

ARCHITECTURE ID 0 ARCHITECTURE ID 1

Page 24: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

FPGA Resource utilization

3.29

10.722.47

4.94

1.34

5.23

Tile (without accelerator) Tile (with accelerator)

% Utilization

LUT LUTRAM FF

• Xilinx Virtex V2000T speedgrade -1

• Tile with Accelerator:– MIPS-based cache

coherent 2-core accelerator

• 32K L1I• 512K L1D• 1MB L2D

Nov 11th, 2018 H2RC’18 24

Page 25: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Conclusionso The MANGO approach for supporting the

implementation of multiple accelerators on a multi-FPGA platform– High customization of the cluster

• Flexibility for architecture exploration– Flexibility in configuring an architecture

• Rapid architecture exploration– Effectivity in the system communications

• QoS guarantee– Effectivity in monitoring the system

o Percentage of resources needed per tile quite slow

H2RC’18 25

Page 26: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

Thank you for your attentiono Contact us at

– www.mango-project.eu

o Other directly related EU project– www.recipe-project.eu

26

o The MANGO project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671668.

o The content of this presentation reflects only the authors' views and the European Commission is not responsible for any use that may be made of the information it contains.

H2RC’18

Page 27: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

27Nov 11th, 2018 H2RC’18 27

Page 28: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

28

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

Nov 11th, 2018 H2RC’18 28

Page 29: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

29

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

Nov 11th, 2018 H2RC’18 29

Page 30: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

30

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

Nov 11th, 2018 H2RC’18 30

Page 31: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Architecture Definition File

31

oSectionsoGeneraloFPGA/multi-FPGAoMemory devicesoI/O devicesoNetwork

Nov 11th, 2018 H2RC’18 31

Page 32: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Nov 11th, 2018

DMA Transferso DMA controller attached to every memory controller.o Programming parameters: base address, source tile, size

of the transfer, destination of the transfer, target address.o Possibilities:

H2RC’18 32

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

MEM MEM

MEM

PCIe

ACC1 ACC1

ACC1

ACC1

ACC3

ACC3 ACC3

ACC3

ACC2

ACC2

ACC2

ACC2

MEM MEM

MEM

PCIe

Page 33: The MANGO Process for Designing and Programming Multi ... · Rafael Tornero, José Flich, José María Martínez, Tomás Picornell, Vincenzo Scotti Email: ratorga@disca.upv.es. Nov

Data Network

• DATA Routers connected among them in a 2D mesh layout.

• NI decouples network from tile components.• Supported different Virtual networks (VN) with different

number of Virtual channels (VC).• Supported dynamic assignment of bandwidth per VN.

33

T0 T1 T2 T3

T4 T5 T6 T7

UNIT

U2M

M2U

TABLETLBTILEREG

NI

DATAROUTER

ITEM ROUTER

ITEM NI

app1 app1 app1 app3

app2 app2 app3 app3