Overview of Proposed IEEE P1500 Scaleable Architecture for ...
A Linux-based Software Platform for the Reconfigurable Scaleable Computing Project
description
Transcript of A Linux-based Software Platform for the Reconfigurable Scaleable Computing Project
Williams 1 MAPLD 2005/1001
A Linux-based Software Platform for the
Reconfigurable Scaleable Computing Project
John A. Williams*, Neil W. Bergmann*
Robert F. Hodson+
* School of ITEEThe University of Queensland
Brisbane, Australia
+NASA Langley Research Centre
Hampton, Virginia
Williams 2 MAPLD 2005/1001
Outline RSC Overview
Concept, participants Existing Technology
MicroBlaze, uClinux New Developments
Vision, Multiprocessing, MPI, NoC Status and outlook
Planned Investigations, Progress
Williams 3 MAPLD 2005/1001
Outline RSC Overview
Concept, participants Existing Technology
MicroBlaze, uClinux New Developments
Vision, Multiprocessing, MPI, NoC Status and outlook
Planned Investigations, Progress
Williams 4 MAPLD 2005/1001
Reconfigurable Scaleable Computing Features
Next-generation on-board computing platform FPGA-based reconfigurable computer Soft CPU cores + embedded Linux operating
system Hybrid SW/HW application environment Hierarchical, scaleable computing network
Selected for funding in 2004 H&RT call for proposals
Williams 5 MAPLD 2005/1001
Reconfigurable Scaleable Computing Participants
NASA LaRC (project lead, hardware design) UQ (operating system, message passing
libraries) ASRC (system modeling, performance
analysis) Jefferson Labs (consulting) StarBridge Systems (graphical design tools) NASA Office of Logic Design NSA
Williams 6 MAPLD 2005/1001
Reconfigurable Scaleable Computing
Williams 7 MAPLD 2005/1001
Outline RSC Overview
Concept, participants Existing Technology
MicroBlaze, uClinux New Developments
Vision, Multiprocessing, MPI, NoC Status and outlook
Planned Investigations, Progress
Williams 8 MAPLD 2005/1001
MicroBlaze 32 bit RISC, Harvard soft processor Targeted to Xilinx logic primitives
~1000-1500 slices (10% of XC4V-LX25) Parameteriseable
Caches ALU, FPU
Memory/bus interfaces Local memory bus (LMB) On-chip Peripheral Bus (OPB) Fast Simplex Links (FSL)
Williams 9 MAPLD 2005/1001
MicroBlaze
Logic utilisation in RPM prototype FPGA device (16K dcache & 16K dcache)
Selected Device : 4vlx25ff668-10
Number of Slices: 1504 out of 10752 13% Number of Slice Flip Flops: 1172 out of 21504 5% Number of 4 input LUTs: 2238 out of 21504 10% Number of FIFO16/RAMB16s: 24 out of 72 33% Number used as RAMB16s: 24 Number of DSP48s: 3 out of 48 6%
Williams 10 MAPLD 2005/1001
MicroBlaze, Linux and RSC Why?
Path for existing applications onto RSC Standard platform improves design efficiency
Application development/debug Multiprocessing/clustering Software infrastructure Interoperability (networking, file systems, …)
UQ research focus in rSoC integration of custom hardware (for speed) with
conventional processor/OS modules (for flexibility)
Williams 11 MAPLD 2005/1001
MicroBlaze, Linux and RSC Why not?
Performance FPGAs roughly 10x less efficient than fixed silicon CPUs less efficient than custom hardware
A serialised abstraction of intrinsically parallel hardware Less efficient than deeply embedded software Abstraction incurs performance penalty
Stability/reliability RSC is a data processing/computation platform Not part of spacecraft survivability
MicroBlaze and Linux are only part of the solution
Williams 12 MAPLD 2005/1001
Outline RSC Overview
Concept, participants Existing Technology
MicroBlaze, uClinux New Developments
Vision, Multiprocessing, MPI, NoC Status and outlook
Planned Investigations, Progress
Williams 13 MAPLD 2005/1001
Vision Heterogeneous multiprocessing
Multiple software tasks per processor Multiple processors per chip/RPM Hardware Co-processors Multiple RPMs per stack Multiple stacks per system
RSC is an exotic computing machine How do we program it?
Williams 14 MAPLD 2005/1001
Vision Linux-based multiprocessing
To SW apps, RSC is a Linux cluster Critical computation offloaded to
hardware EITHER Co-processors to CPU nodes, OR Peers in the computational network
Find the sweet spot Runtime performance vs design effort
RSC is an exotic computing machine We must make it seem straightforward
Williams 15 MAPLD 2005/1001
Vision
Make it look like Linux Build on enormous library of Linux
knowledge, tools, apps, documentation, training and skills
Ability to prototype realistic user apps on Linux desktop is tremendously valuable
Williams 16 MAPLD 2005/1001
MicroBlaze Multiprocessing
Lots of processors gives performance and reliability – parallelism is key MicroBlaze achieves 4-8x better
MIPS/LUT than any other soft CPU architecture (in Xilinx FPGAs)
We can put about 8 CPUs in an FPGA What are the hardware architectural issues? How to use it efficiently?
Williams 17 MAPLD 2005/1001
MicroBlaze Multiprocessing
Implicit multiprocessing SMP, looks like one fast processor
Explicit multiprocessing protoSMP, looks like many processors
Multi-level multiprocessing MPI, looks like a cluster
Williams 18 MAPLD 2005/1001
MicroBlaze Multiprocessing
Symmetric Multiprocessing (SMP) N CPUs as a single virtual machine Implicit parallelism
Hidden by OS and hardware Hardware support
Cache coherency Memory architectures Distributed interrupt dispatch
Williams 19 MAPLD 2005/1001
SMP vs ProtoSMP
MBlaze3
MBlaze0
MBlaze2
MBlaze1 INTC
Kernel Memory
Application Memory
Per-CPU data structures
I/O (serial,ethernet, …)
SMP – 1 virtual machine
Williams 20 MAPLD 2005/1001
MicroBlaze Multiprocessing
ProtoSMP N CPUs on shared bus Private address zones within shared
physical memory Common shared memory region with
IPC protocols shared memory multicomputing
Williams 21 MAPLD 2005/1001
SMP vs ProtoSMP
MBlaze3
MBlaze0
MBlaze2
MBlaze1
INTC
Virt.I/O
Kernel Memory
Application Memory
Kernel Memory
Application Memory
Kernel Memory
Application Memory
Kernel Memory
Application Memory
Image 0
Image 3
Image 2
Image 1INTC
INTCINTC
Virt.I/O
Virt.I/O
Virt.I/O
I/O (serial,ethernet, …)
ProtoSMP – N virtual machines
Williams 22 MAPLD 2005/1001
SMP vs ProtoSMP SMP
Pros Implicit parallelism
and inter-CPU comms Efficient memory and
cache re-use Cons
Specialised hardware support (caches, distributed interrupts)
Requires kernel support
ProtoSMP Pros
Simplicity Use existing HW
components No changes in kernel
Cons Explicit parallelism
and inter-CPU comms
Memory waste Virtual IO model (N
terminals)
Williams 23 MAPLD 2005/1001
RSC Network Parallel processing architectures
often limited by CPU/memory bandwidth and interprocess comms bandwidth.
RSC has several potential bottlenecks: RPM memory, PCI backplane, interstack networks.
Need to leave scope for high-speed comms, eg. with Rocket I/O on FPGAs
Williams 24 MAPLD 2005/1001
RSC Network
Useful if applications can be initially developed without regard to partitioning and communications
Implies a uniform interprocess communications mechanism
We choose MPI
Williams 25 MAPLD 2005/1001
MPI on Microblaze-uClinux
MPI - Message Passing Interface API for explicit message passing
between processes Multiple processes on one machine,
or Distributed across many machines
Williams 26 MAPLD 2005/1001
MPI on MicroBlaze-uClinux
MPICH implementation, Argonne National Labs
MPICH2 – complete reimplementation of MPI conforming to MPI2 standard Layered implementation abstracting MPI
application interface from underlying physical transport
Process Management Interface
Williams 27 MAPLD 2005/1001
MPI on MicroBlaze uClinux
ROMIO
Sock SHM SSM …
Application
MPICH
CH3 Device Myrinet ...BG/L
IB
ADI3
CH3
MPEMPI
ADIO
PVFS ...GPFS XFS
http://www.sharcnet.ca/fw2003/slides/mpich2-details.ppt
Williams 28 MAPLD 2005/1001
MPI on MicroBlaze-uClinux MPICH2 on MicroBlaze
sock implementation over TCP/IP sockets Starting point for RSC, with COTS demo
MicroBlaze multiprocessing experiments shm shared memory wrapper, great for
SMP/protoSMP Create new wrapper layer around RSC
interconnect/NoC architecture once finalised
Can hardware co-processors look like MPI ?
Williams 29 MAPLD 2005/1001
Outline RSC Overview
Concept, participants Existing Technology
MicroBlaze, uClinux New Developments
Vision, Multiprocessing, MPI, NoC Status and outlook
Planned Investigations, Progress
Williams 30 MAPLD 2005/1001
COTS Demo Platform Two ethernet ports per board, up to 4
MicroBlaze per board Four boards per demo cluster
Variety of cluster configuration experiments 4x uniprocessor 4x4-way protoSMP 4x2x2-way protoSMP …
Williams 31 MAPLD 2005/1001
Status and Outlook Detailed SMP vs protoSMP
feasibility study Commenced Q2 2005
MPICH2 port investigations commenced Baseline implementation uniprocessor
over TCP/IP Work commenced Q2 2005
Williams 32 MAPLD 2005/1001
Conclusion MicroBlaze and uClinux are part of the
solution Those parts which are Linux, look like
desktop/cluster Linux Deliberate decisions in trade of design vs runtime
efficiency Looking ahead
Linux abstractions over RSC hardware Intra-board, inter-board, inter-stack, …
Development and debug environments Seamless integration with custom hardware
Viva, VHDL, …