Reconfigurable Computing with the Partitioned Global...

23
High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable Computing with the Partitioned Global Address Space model Cascadia 2012 Ruediger Willenberg and Paul Chow August 14, 2012

Transcript of Reconfigurable Computing with the Partitioned Global...

Page 1: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

High-Performance Reconfigurable Computing Group

University of Toronto

Reconfigurable Computing with the

Partitioned Global Address Space model

Cascadia 2012

Ruediger Willenberg and Paul Chow

August 14, 2012

Page 2: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Parallelizing computation:

How to partition, communicate and

synchronize data?

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

2

Page 3: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Parallel Programming Models

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

3

Page 4: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Partitioned Global Address Space

• Any thread can access any memory location,

but:

• There is a visible difference between local

and remote memory locations

• One-sided communication (remote read and

write without local thread involvement)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

4

Page 5: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Language Level PGAS:

Unified Parallel C (UPC) example

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

5

#define N 100*THREADS

shared int [*] v1[N], v2[N], sum[N];

void main()

{

int i;

upc_forall(i=0; i<N; i++; &v1[i])

sum[i]=v1[i]+v2[i]; // all work is local

}

Others: Co-Array Fortran, Titanium (Java), Chapel (Cray), X10 (IBM)

Page 6: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Application Library Level PGAS:

Global Arrays

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

6

Page 7: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Communication Level PGAS:

GASNet (Global Address Space Networking)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

7

Others: ARMCI (Global Arrays), SHMEM (App level)

Page 8: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Network Level PGAS:

Remote DMA (RDMA)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

8

Examples: Infiniband, Myrinet, iWARP, RoCE

Page 9: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPUs+FPGAs: Co-processor Style

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

9

Page 10: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPUs+FPGAs: Symmetric Style

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

10

Page 11: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

What does „symmetric“ mean?

• CPU code and FPGA components can both

initiate data sends and requests

• Both use a similar or identical API to ease

migration

• For distributed-memory/message-passing,

TMD-MPI / ArchES-MPI implement this

• Our work strives to build a symmetric

PGAS system based on GASNet

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

11

Page 12: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GASNet Active Messages

Remote Write: Long Request Message

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

12

Page 13: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GASNet Active Messages

Remote Read: Long Reply Message

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

13

Page 14: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GAScore FPGA component

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

14

HardwareProcessingElement

Page 15: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GAScore FPGA system

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

15

Page 16: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

BEE3 multi-FPGA system

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

16

Page 17: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Hardware

• External DRAM support (caching...?)

• Strided and scatter/gather transfers

• Messaging management for custom

hardware cores

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

17

Page 18: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Hardware

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

18

Page 19: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Programmable Active

Message Sequencer

• Programmable/re-programmable through

GASNet messages

• Controls/synchronizes custom hardware

• Handles reception and transmission of

GASNet active messages

• Sequences based on: custom hardware state,

timer, amount of received data, number of

received messages of a specific type

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

19

Page 20: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Toolchain

challenges for FPGAs in HPC

• PGAS languages without heterogeneity

support (UPC, CAF, Titanium)

• PGAS languages without clear HLL-to-FPGA

path (Chapel, X10)

• Lack of FPGA programming experts in HPC

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

20

Page 21: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPU-based

Host

CPU-based

Host

Next Steps: Toolchain

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

21

GASNet CPU-based

Host

GASNet Library

Heterogeneous

C++ PGAS Library

C++ PGAS Application C++ generated code

DSL application

Compile Static

generation

manual

or

C-to-gates

Dynamic generation

P A M S

Custom

FPGA

Hardware

Page 22: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Heterogenous C++ PGAS library

• Concepts stolen from Global Arrays, Chapel, X10

• Specialized data classes for multi-dim. arrays, etc.

• Location and subgroup classes

• Distribution and layout types; assigned to arrays to

define storage and computation patterns

• Can at compile-time as well as runtime generate

and distribute PAMS code

• Can be used as a runtime library for code

generation from Domain-Specific Languages (DSLs)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

22

Page 23: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Thank you for attention!

Questions?

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

23