12/9/04 1 Virtual Prototyping of Advanced Space System Architectures based on RapidIO Sponsor:...

112/9/04

Virtual Prototyping of Advanced SpaceSystem Architectures based on RapidIO

Sponsor:Sponsor:

Honeywell DSES Space, Clearwater, FL

Principal Investigator:Principal Investigator:

Dr. Alan D. George

Graduate Research AssistantsGraduate Research Assistants::

David Bueno, Ian Troxel, Chris Conger, Adam Leko

HCS Research Laboratory, ECE Department

University of Florida

212/9/04

Outline

I. Project Overview

II. BackgroundI. RapidIO (RIO)

II. Ground-Moving Target Indicator (GMTI)

III. Synthetic Aperture Radar (SAR)

III. Partitioning Methods

IV. Modeling Environment and Models

V. Experiments and Results

VI. Conclusions

312/9/04

Project Overview Simulative analysis of Space-Based Radar (SBR) systems based on

RapidIO interconnection networks RapidIO (RIO) is a high-performance, switched interconnect for embedded

systems High system scalability (boards, processors, memory units, sensors, etc.) Far better bisection bandwidth than existing bus-based technologies

Study optimal method of constructing scalable RIO-based systems for Ground Moving Target Indicator (GMTI) and Synthetic Aperture Radar (SAR) Identify system-level tradeoffs in system designs Discrete-event simulation of RapidIO network,

processing elements, and GMTI/SAR algorithms Identify limitations of RIO design for SBR Determine effectiveness of various GMTI/SAR

algorithm partitionings over RIO network

Image courtesy http://www.afa.org/magazine/aug2002/0802radar.asp

412/9/04

Background- RapidIO Three-layered, embedded system interconnect architecture

Logical – memory mapped I/O, message passing, and globally shared memory Transport – controls routing of packet from source to destination Physical – serial and parallel

Point-to-point, packet-switched interconnect Peak single-link throughput ranging from 2 to 64 Gb/s Focus on 16-bit parallel LVDS RIO implementation for satellite systems

Image courtesy G. Shippen, “RapidIO Technical Deep Dive 1: Architecture & Protocol,” Motorola Smart Network Developers Forum, 2003.

512/9/04

Background- GMTI

GMTI used to track moving targets on ground Estimated processing requirements range from

40 (aircraft) to 280 (satellite) GFLOPs GMTI broken into four stages:

Pulse Compression (PC) Doppler Processing (DP) Space-Time Adaptive Processing (STAP) Constant False-Alarm Rate detection (CFAR)

Incoming data organized as 3-D matrix (data cube) Data reorganization (“corner turn”) necessary between stages for processing efficiency Size of each cube dictated by Coherent Processing Interval (CPI)

PulseCompression

DopplerProcessing

Space-TimeAdaptive

Processing(STAP)

ConstantFalse Alarm

Rate(CFAR)

Receive Cube

Send Results

Corner Turn Partitioned along range dimension

Partitioned along pulse dimension

DATA CUBE

Ranges

612/9/04

GMTI – Parallel Partitioning Straightforward partitioning

Entire system works in parallel on a single data cube

All-to-all personalized communication used to perform corner-turn

Result latency must be ≤ 1 CPI

Staggered partitioning Processors work in small groups, one data

cube for each group Incoming data cubes are sent to groups via

round-robin distribution Result latency must be ≤ N × CPI

N = number of processor groups

Pipelined partitioning Processors work in small groups, each group

responsible for one stage of algorithm Corner turns “for free” Result latency must be ≤ N × CPI

N = number of stages

PC DP STAP CFAR

Straightforward Partitioning

Pipelined Partitioning

Data Cube0

Data Cube1

Data Cube2

Data Cube3

Data Cube4

Data Cube5

timestart

CPI 0 CPI 1 CPI 2 CPI 3 CPI 4

Staggered Partitioning

712/9/04

SAR Algorithm Flow and Partitioning SAR composed of 7 sub-tasks

Processing of 2-D radar data Each sub-task’s optimal data partitioning varies Processed out of global memory

Required data gathering and image processing times are extensive 16-second Coherent Processing Interval (CPI) Gigabytes of data, processed in smaller chunks (kB to MB range) Long processing time places short deadline on tolerable time to

report results

Data size stays constant throughout algorithm

Range-Pulse Compression

Polar Reformatting

Pulse FFT

Range FFT

Pulse FFT

Auto-focus

Magnitude Function

Range dimension

Pulse dimension

Range/pulse blocks

812/9/04

Model Library Overview RIO system modeling library created using Mission Level Designer (MLD), a

commercial discrete-event simulation modeling tool C++-based, block-level, hierarchical modeling tool

Algorithm modeling accomplished via script-based processing All processing nodes read from a global script file to determine

when/where to send data, and when/how long to compute

Our model library includes: RIO central-memory switch Compute node with RIO endpoint SBR traffic source/sink RIO logical message-passing layer Transport and parallel physical

layers

Model of Compute Nodewith RIO Endpoint

912/9/04

GMTI Processor Board Models System contains many processor boards connected via backplane Each processor board contains one RIO switch and four processors Processors modeled with three-stage

finite state machine Send data Receive data Compute

Behavior of processors controlledwith script files Script generator converts high-level

GMTI parameters to script Script is fed into simulations

Model ofFour-Processor Board

Scriptgenerator

SimulationGMTI & system parameters

Processor scriptsend…

receive…

1012/9/04

System Design Constraints 16-bit parallel 250MHz DDR RapidIO link

(1 GB/s) Expected basis of RadHard component performance

by time RIO and SBR ready to fly in ~2008 to 2010 Systems composed of processor boards

interconnected by RIO backplane 4 processors per board 8 Floating-Point Units (FPUs) per processor One 8-port central-memory switch per board; implies

4 connections to backplane per board

1112/9/04

7-Board System

Backplane and System Models

High bandwidth requirements for GMTI algorithm dictate architecture for SAR systems Expectation that systems must eventually support both SAR and GMTI Same backplane efficiently supports four-, five-, six-, and seven-board configurations

Can use smaller switches on backplane to conserve power if fewer than seven boards needed Three-board system possible using similar configuration with only two backplane switches

4-Switch Non-blocking Backplane

Backplane-to-Board 0, 1, 2, 3 Connections

Backplane-to-Board 4, 5, 6, and Data Source/GM Connections

1212/9/04

GMTI Performance Results Studied RapidIO system

designs for space-based GMTI Important conclusions:

Non-blocking backplane extremely important for GMTI

GMTI not sensitive to latency of individual packets Cut-through routing

unnecessary in switches RapidIO transmitter- and

receiver-controlled flow control perform almost identically

Straightforward partitioning method provides lowest latency, but with least efficient use of resources

Staggered partitioning (by board) method is very efficient, but with very high latency

Pipelined method is a compromise

CPI Latencies

32000 40000 48000 56000 64000

Number of ranges

Straightforward, 7boards

Staggered, 5 boards

Pipelined, 6 boards

Pipelined, 7 boards

Straightforward,7 boards

Staggered,5 boards

Pipelined,6 boards

Pipelined,7 boards

llel E

1312/9/04

SAR Performance Results Message-passing logical layer

involves higher overhead All operations have responses CPI latency is constant as chunk size

increases Low levels of contention in network for

all cases Logical I/O layer is inherently better

for current mapping (global-memory based) Approximately ½ of operations are writes,

which require no response Contention increases as chunk size

increases May be possible to use

synchronization to reduce contention, and double-buffering for latency hiding Small processing times reduce benefit of

double-buffering Smart use of double-buffering only on

certain tasks with long computation may maximize benefit of this approach

CPI Completion Latency vs. Chunk Size

170.000

180.000

190.000

200.000

210.000

220.000

64 128 256 512 1024

Per-Processor Chunk Size (KB)

CPI Completion Latency vs. Chunk Size

64 128 256 512 1024

Per-Processor Chunk Size (KB)

2xBuffer

1xBuffer

1412/9/04

Conclusions RapidIO is an emerging high-performance interconnect technology that

promises to be an important enabling-technology for future generations of latency- or throughput-bound applications Strong industry support, but very little scholarly research on RapidIO to date

Powerful SBR/RapidIO simulation and modeling capabilities developed at UF GMTI models SAR models RapidIO “Construction Set” Altia-MLD interface library

GMTI is extremely network-intensive Various partitioning options, with straightforward partitioning producing lowest latency Staggered partitioning has highest latency but uses network most efficiently

Network requirements of SAR easily handled by networks designed for GMTI Biggest challenge of SAR is memory requirements

Could benefit from further in-depth study on processor/memory architecture issues RIO Logical I/O layer found to benefit SAR CPI completion latency via responseless write operations

Well-suited to distributed-memory approach Double-buffering may improve performance of SAR application

Increases local memory requirements by factor of 2 to 3 Cut-through routing significantly benefits neither GMTI nor SAR in studies thus far

Experimental testbed for RIO system research is under development First step will be 2-node RIO testbed: Xilinx RIO cores and two small FPGA boards recently acquired Looking into options to build larger and more powerful testbed (comments welcome)

1512/9/04

Project Publications in 2004

RIO Systems Paper at HPEC’04 D. Bueno, C. Conger, A. Leko, I. Troxel and A. George, “Virtual

Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar,” Proc. of High-Performance Embedded Computing (HPEC) Workshop, MIT Lincoln Lab, Lexington, MA, Sep. 28-30, 2004.

RIO Networking Paper at LCN’04 D. Bueno, A. Leko, C. Conger, I. Troxel, and A. George,

“Simulative Analysis of the RapidIO Embedded Interconnect Architecture for Real-Time, Network-Intensive Applications,” Proc. of 29th IEEE Conference on Local Computer Networks (LCN) via the IEEE Workshop on High-Speed Local Networks (HSLN), Tampa, FL, Nov. 16-18, 2004.

1612/9/04

Future Work Potential follow-ons to current RapidIO work

Produce “Day in the life of SBR” simulation (with SAR and GMTI)

Develop GMTI-specific Altia GUI (the current Altia simulation is SAR-specific)

Develop “Day in the life of SBR” GUI Explore ways to increase the performance of SAR through

double-buffering and synchronization Examine the effects of per-node memory size on SAR and

GMTI traffic Development of RapidIO system testbed Couple RIO work with new research in reconfigurable

computing (RC) for advanced space systems Future projects are in planning stages

12/9/04 1 Virtual Prototyping of Advanced Space System Architectures based on RapidIO Sponsor:...

Documents

Transcript of 12/9/04 1 Virtual Prototyping of Advanced Space System Architectures based on RapidIO Sponsor:...

AN 836: RapidIO II Reference Design for Avalon-ST Pass ... · 1 RapidIO II Reference Design for Avalon®-ST Pass-Through Interface 1.1 Introduction The RapidIO II reference design

AN 568: RapidIO Interoperability With TI 6488 DSP ... · The Altera® RapidIO interoperability reference design provides a sample interface between the Altera RapidIO MegaCore® function

DSES-CREEESConnect_FINALDRAFT (2)

Serial RapidIO (SRIO) Subsystem

KeyStone Training Serial RapidIO (SRIO) Subsystem.

IDT Tsi572 Serial RapidIO Switch User Manual

RapidIO - sbtg

RapidIO Gen2 technology · semiconductor solutions 1 IDT ® SERIAL RAPIDIO GEN2 whItE PAPER RapidIO® Gen2 technology By Trevor Hiatt, Devashish Paul and Barry Wood Integrated Device

Ot6 Cardac and Pulmonary Dses

LogiCORE™ IP Serial RapidIO v5 - Xilinx · The complete LogiCORE™ IP RapidIO Endpoint solution consists of the Serial RapidIO Physical Layer, Buffer, and the RapidIO Logical Layer

Air and Droplet Borne Dses

Serial RapidIO Gen2 Endpoint v3 - XilinxSerial RapidIO Gen2 v3.2 4 PG007 October 1, 2014 Product Specification Introduction The LogiCORE IP Serial RapidIO Gen2 Endpoint Solution (SRIO

RapidIO Interconnect Specification Part 5: Globally Shared … · 2020. 6. 12. · RapidIO Trade Association 3 RapidIO Part 5: Globally Shared Memory Logical Specification Rev. 1.3

Using the Serial RapidIO Messaging Unit on PowerQUICC IIIUsing the Serial RapidIO Messaging Unit on PowerQUICC™ III, Rev. 0 Freescale Semiconductor 3 PowerQUICC III Serial RapidIO

RPI DSES-6070 HV7 Project Derk Philippona Fall 2008

RapidIO Technology Applied in Multifunction Phased Array Radar (MPAR) RapidIO Technology Applied in Multifunction Phased Array Radar (MPAR) (Introduction.

RapidIO Specification, Revision 3.0 RapidIOTM ... · This is Revision 2 of the errata for the RapidIO Specification Revision 3.0 and Revision 3.1. RapidIO specification revisions

LogiCORE IP Serial RapidIO Gen2 Endpoint v3Serial RapidIO Gen2 v3.1 4 PG007 December 18, 2013 Product Specification Introduction The LogiCORE IP Serial RapidIO Gen2 Endpoint Solution

Serial RapidIO Physical Layer Interface IP User’s Guide · Serial RapidIO Physical Layer Interface Lattice Semiconductor User’s Guide General Description The Serial RapidIO core

2017-08-12 DSES Open House Event Report