Post on 17-Dec-2015
112/9/04
Virtual Prototyping of Advanced SpaceSystem Architectures based on RapidIO
Sponsor:Sponsor:
Honeywell DSES Space, Clearwater, FL
Principal Investigator:Principal Investigator:
Dr. Alan D. George
Graduate Research AssistantsGraduate Research Assistants::
David Bueno, Ian Troxel, Chris Conger, Adam Leko
HCS Research Laboratory, ECE Department
University of Florida
212/9/04
Outline
I. Project Overview
II. BackgroundI. RapidIO (RIO)
II. Ground-Moving Target Indicator (GMTI)
III. Synthetic Aperture Radar (SAR)
III. Partitioning Methods
IV. Modeling Environment and Models
V. Experiments and Results
VI. Conclusions
312/9/04
Project Overview Simulative analysis of Space-Based Radar (SBR) systems based on
RapidIO interconnection networks RapidIO (RIO) is a high-performance, switched interconnect for embedded
systems High system scalability (boards, processors, memory units, sensors, etc.) Far better bisection bandwidth than existing bus-based technologies
Study optimal method of constructing scalable RIO-based systems for Ground Moving Target Indicator (GMTI) and Synthetic Aperture Radar (SAR) Identify system-level tradeoffs in system designs Discrete-event simulation of RapidIO network,
processing elements, and GMTI/SAR algorithms Identify limitations of RIO design for SBR Determine effectiveness of various GMTI/SAR
algorithm partitionings over RIO network
Image courtesy http://www.afa.org/magazine/aug2002/0802radar.asp
412/9/04
Background- RapidIO Three-layered, embedded system interconnect architecture
Logical – memory mapped I/O, message passing, and globally shared memory Transport – controls routing of packet from source to destination Physical – serial and parallel
Point-to-point, packet-switched interconnect Peak single-link throughput ranging from 2 to 64 Gb/s Focus on 16-bit parallel LVDS RIO implementation for satellite systems
Image courtesy G. Shippen, “RapidIO Technical Deep Dive 1: Architecture & Protocol,” Motorola Smart Network Developers Forum, 2003.
512/9/04
Background- GMTI
GMTI used to track moving targets on ground Estimated processing requirements range from
40 (aircraft) to 280 (satellite) GFLOPs GMTI broken into four stages:
Pulse Compression (PC) Doppler Processing (DP) Space-Time Adaptive Processing (STAP) Constant False-Alarm Rate detection (CFAR)
Incoming data organized as 3-D matrix (data cube) Data reorganization (“corner turn”) necessary between stages for processing efficiency Size of each cube dictated by Coherent Processing Interval (CPI)
PulseCompression
DopplerProcessing
Space-TimeAdaptive
Processing(STAP)
ConstantFalse Alarm
Rate(CFAR)
Receive Cube
Send Results
Corner Turn Partitioned along range dimension
Partitioned along pulse dimension
DATA CUBE
Ranges
Pu
lse
s
612/9/04
GMTI – Parallel Partitioning Straightforward partitioning
Entire system works in parallel on a single data cube
All-to-all personalized communication used to perform corner-turn
Result latency must be ≤ 1 CPI
Staggered partitioning Processors work in small groups, one data
cube for each group Incoming data cubes are sent to groups via
round-robin distribution Result latency must be ≤ N × CPI
N = number of processor groups
Pipelined partitioning Processors work in small groups, each group
responsible for one stage of algorithm Corner turns “for free” Result latency must be ≤ N × CPI
N = number of stages
time
1 CPI
PE #4
PE #3
PE #2
PE #1
PE #4
PE #3
PE #2
PE #1
PE #4
PE #3
PE #2
PE #1
PE #4
PE #3
PE #2
PE #1
PC DP STAP CFAR
Straightforward Partitioning
Pipelined Partitioning
Data Cube0
Data Cube1
Data Cube2
Data Cube3
Data Cube4
Data Cube5
timestart
CPI 0 CPI 1 CPI 2 CPI 3 CPI 4
Staggered Partitioning
712/9/04
SAR Algorithm Flow and Partitioning SAR composed of 7 sub-tasks
Processing of 2-D radar data Each sub-task’s optimal data partitioning varies Processed out of global memory
Required data gathering and image processing times are extensive 16-second Coherent Processing Interval (CPI) Gigabytes of data, processed in smaller chunks (kB to MB range) Long processing time places short deadline on tolerable time to
report results
Data size stays constant throughout algorithm
Range-Pulse Compression
Polar Reformatting
Pulse FFT
Range FFT
Pulse FFT
Auto-focus
Magnitude Function
Range dimension
Pulse dimension
Range/pulse blocks
812/9/04
Model Library Overview RIO system modeling library created using Mission Level Designer (MLD), a
commercial discrete-event simulation modeling tool C++-based, block-level, hierarchical modeling tool
Algorithm modeling accomplished via script-based processing All processing nodes read from a global script file to determine
when/where to send data, and when/how long to compute
Our model library includes: RIO central-memory switch Compute node with RIO endpoint SBR traffic source/sink RIO logical message-passing layer Transport and parallel physical
layers
Model of Compute Nodewith RIO Endpoint
912/9/04
GMTI Processor Board Models System contains many processor boards connected via backplane Each processor board contains one RIO switch and four processors Processors modeled with three-stage
finite state machine Send data Receive data Compute
Behavior of processors controlledwith script files Script generator converts high-level
GMTI parameters to script Script is fed into simulations
Model ofFour-Processor Board
Scriptgenerator
SimulationGMTI & system parameters
Processor scriptsend…
receive…
1012/9/04
System Design Constraints 16-bit parallel 250MHz DDR RapidIO link
(1 GB/s) Expected basis of RadHard component performance
by time RIO and SBR ready to fly in ~2008 to 2010 Systems composed of processor boards
interconnected by RIO backplane 4 processors per board 8 Floating-Point Units (FPUs) per processor One 8-port central-memory switch per board; implies
4 connections to backplane per board
1112/9/04
7-Board System
Backplane and System Models
High bandwidth requirements for GMTI algorithm dictate architecture for SAR systems Expectation that systems must eventually support both SAR and GMTI Same backplane efficiently supports four-, five-, six-, and seven-board configurations
Can use smaller switches on backplane to conserve power if fewer than seven boards needed Three-board system possible using similar configuration with only two backplane switches
4-Switch Non-blocking Backplane
Backplane-to-Board 0, 1, 2, 3 Connections
Backplane-to-Board 4, 5, 6, and Data Source/GM Connections
1212/9/04
GMTI Performance Results Studied RapidIO system
designs for space-based GMTI Important conclusions:
Non-blocking backplane extremely important for GMTI
GMTI not sensitive to latency of individual packets Cut-through routing
unnecessary in switches RapidIO transmitter- and
receiver-controlled flow control perform almost identically
Straightforward partitioning method provides lowest latency, but with least efficient use of resources
Staggered partitioning (by board) method is very efficient, but with very high latency
Pipelined method is a compromise
CPI Latencies
0
256
512
768
1024
1280
1536
32000 40000 48000 56000 64000
Number of ranges
La
ten
cy (
ms)
Straightforward, 7boards
Staggered, 5 boards
Pipelined, 6 boards
Pipelined, 7 boards
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Straightforward,7 boards
Staggered,5 boards
Pipelined,6 boards
Pipelined,7 boards
Para
llel E
ffici
ency
1312/9/04
SAR Performance Results Message-passing logical layer
involves higher overhead All operations have responses CPI latency is constant as chunk size
increases Low levels of contention in network for
all cases Logical I/O layer is inherently better
for current mapping (global-memory based) Approximately ½ of operations are writes,
which require no response Contention increases as chunk size
increases May be possible to use
synchronization to reduce contention, and double-buffering for latency hiding Small processing times reduce benefit of
double-buffering Smart use of double-buffering only on
certain tasks with long computation may maximize benefit of this approach
CPI Completion Latency vs. Chunk Size
170.000
180.000
190.000
200.000
210.000
220.000
64 128 256 512 1024
Per-Processor Chunk Size (KB)
Late
ncy
(ms)
MP
IO
CPI Completion Latency vs. Chunk Size
165
170
175
180
185
190
195
200
205
210
64 128 256 512 1024
Per-Processor Chunk Size (KB)
Re
sult
La
ten
cy (
ms)
2xBuffer
1xBuffer
1412/9/04
Conclusions RapidIO is an emerging high-performance interconnect technology that
promises to be an important enabling-technology for future generations of latency- or throughput-bound applications Strong industry support, but very little scholarly research on RapidIO to date
Powerful SBR/RapidIO simulation and modeling capabilities developed at UF GMTI models SAR models RapidIO “Construction Set” Altia-MLD interface library
GMTI is extremely network-intensive Various partitioning options, with straightforward partitioning producing lowest latency Staggered partitioning has highest latency but uses network most efficiently
Network requirements of SAR easily handled by networks designed for GMTI Biggest challenge of SAR is memory requirements
Could benefit from further in-depth study on processor/memory architecture issues RIO Logical I/O layer found to benefit SAR CPI completion latency via responseless write operations
Well-suited to distributed-memory approach Double-buffering may improve performance of SAR application
Increases local memory requirements by factor of 2 to 3 Cut-through routing significantly benefits neither GMTI nor SAR in studies thus far
Experimental testbed for RIO system research is under development First step will be 2-node RIO testbed: Xilinx RIO cores and two small FPGA boards recently acquired Looking into options to build larger and more powerful testbed (comments welcome)
1512/9/04
Project Publications in 2004
RIO Systems Paper at HPEC’04 D. Bueno, C. Conger, A. Leko, I. Troxel and A. George, “Virtual
Prototyping and Performance Analysis of RapidIO-based System Architectures for Space-Based Radar,” Proc. of High-Performance Embedded Computing (HPEC) Workshop, MIT Lincoln Lab, Lexington, MA, Sep. 28-30, 2004.
RIO Networking Paper at LCN’04 D. Bueno, A. Leko, C. Conger, I. Troxel, and A. George,
“Simulative Analysis of the RapidIO Embedded Interconnect Architecture for Real-Time, Network-Intensive Applications,” Proc. of 29th IEEE Conference on Local Computer Networks (LCN) via the IEEE Workshop on High-Speed Local Networks (HSLN), Tampa, FL, Nov. 16-18, 2004.
1612/9/04
Future Work Potential follow-ons to current RapidIO work
Produce “Day in the life of SBR” simulation (with SAR and GMTI)
Develop GMTI-specific Altia GUI (the current Altia simulation is SAR-specific)
Develop “Day in the life of SBR” GUI Explore ways to increase the performance of SAR through
double-buffering and synchronization Examine the effects of per-node memory size on SAR and
GMTI traffic Development of RapidIO system testbed Couple RIO work with new research in reconfigurable
computing (RC) for advanced space systems Future projects are in planning stages