Skau1 Approach to Scalable Parallel Processing For Space-Based Radar.
-
Upload
audra-mathews -
Category
Documents
-
view
219 -
download
2
Transcript of Skau1 Approach to Scalable Parallel Processing For Space-Based Radar.
Skau 1Confidential and Proprietary
Approach to Scalable Parallel ProcessingFor Space-Based Radar
Skau 2Confidential and Proprietary
Example: SAR and GMTI Partitioning/Mapping/Utilization
Skau 3Confidential and Proprietary
DopplerProcessing
JammerNulling
CFAROutputClutter
SuppressionSTAP
PulseCompression
Input
A/D
Range Correction &
AzimuthProcessing
RangeCompression
ImageFormation
&Post
Processing
OutputAzimuthFiltering
InputJammerNulling
A/D
Basic SAR Processing Flow
Basic GMTI Processing Flow
Basic SBR Signal Processing
Skau 4Confidential and Proprietary
Example: SAR Processing Assumptions
• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 16,384 pulses in azimuth (16.4sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 16,384 ranges x 16,384 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed
Skau 5Confidential and Proprietary
Example: SAR/GMTI Partitioning on BEP System
• Set local memory per processing element at 128 KBytes to be able to handle 16K FFTs for SAR mode (not double buffered)
• For the GMTI mode, the 128 KBytes of local memory can handle
- 256 pulses and 64 ranges per main memory access that can be processed in the pulse or Doppler dimension
or
- 8,000 ranges and 2 pulses worth of data per main memory access that can be processed in the range dimension
or
- any combination that maximizes throughput by “blocking,” i.e., effectively “caching”, and “striding” data for optimum performance
• Data can also be partitioned across beams, channels, and segments when they are independent variables relative to high level data flow processing
Skau 6Confidential and Proprietary
SAR Processing Flow/Global Memory Accessing (1 of 2)
Range PulseCompression16,384
ranges foreach pulse
Store datafrom eachpulse in global
memory 16,384
ranges foreach pulse
16,384pulses
x 1range
x 8 Bytes= 131.1 KBytes
Extractpulse
(cross-range)data
Perform cross-range
FFT
Loop 16,384 times = 16,384 ranges/1 range/loop
128pulsesx 128
rangesx
8 Bytes= 131.1 KBytes
Extract2D PolarReformat
data
Perform Polar
Reformatting
Loop 16,908 times = 268.44 x 106 cells/15.876 x 103 cells/patch
Storeprocessed
data
131.1 KBytes
Storeprocessed
data
Small overlap toprocess 126 x 126
patch
16,384 KBytes
16,384ranges
x 1pulse
x 8 Bytes= 16,384 KBytes
Loop 16,384 times = 16,384 pulses/1 pulse/loop
16,384 KBytes
Extract rangedata
Perform rangeFFT
Storeprocessed
data(transposed)
Skau 7Confidential and Proprietary
SAR Processing Flow/Global Memory Accessing (2 of 2)
16,384pulses
x 1range
x 8 Bytes= 131.1 KBytes
Loop 16,384 times = 16,384 ranges/1 range/loop
Perform Auto Focus
16,384 MBytes
Storeprocessed
data
131.1 KBytes
16,384pulses
x 1range
x 8 Bytes= 131.1 KBytes
Loop 16,384 times = 16,384 ranges/1 range/loop
65.6 KBytes
Extract pulse
(cross-range)data
Perform MagnitudeFunction
Storeprocessed
data
Extractpulse
(cross-range)data
Perform cross-range
FFT
Storeprocessed
data
Extractpulse
(cross-range)data
16,384pulses
x 1range
x 8 Bytes= 131.1 KBytes
Loop 1024 times = 32,768 ranges/32 ranges/loop
All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,
no need to extract and restore data until allof the processing is
complete. (I setit up that way
because the 2ndhalf of the 2D FFT,
AutoFocus, and Magnituding appear
to be performedonly in the cross-rangedimension.
Skau 8Confidential and Proprietary
SAR Processing Flow/Global Memory Utilization
16, 384pulses
2.2 GBytescomplex data
Storage ofpulse
compressedrange datafor furtherprocessing
1.1 GBytesof new data from
current CPIcollection
Storage ofcross-range
FFT processeddata throughthrough polarreformatting
Storage of1st half of
2D FFTresult
transposed(corner-turned)
throughMagnituding
2.2 GBytescomplex data
2.2 GBytescomplex data Total global memory required for SAR (worst case) = 9.9 GBytes *
* This might be able to be reduced to 7.8 GBytes if 2D FFT can be done “in place.” **
2.1 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM
CPIN-1 CPINCPIN+1
Platform Motion
Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) Not exactly sure how this works with ECCM, collecting training samples, etc. Obviously, you do not want a big jammer to mess up the formation of the SAR image. How to null a big jammer out without affecting the image is a major consideration. 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.
Skau 9Confidential and Proprietary
Possible SAR Partitioning/Mapping/Utilization(32K x 32K Image)
Skau 10Confidential and Proprietary
SAR Processing - Assumptions
• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 32,768 pulses in azimuth (32.8 sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 32,768 ranges x 32, 768 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed
• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board
- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node
• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)
Skau 11Confidential and Proprietary
SAR Processing Flow/Global Memory Accessing (1 of 2)
Range PulseCompression32,768
ranges foreach pulse
Store datafrom eachpulse in global
memory 32,768
ranges foreach pulse
32,768pulses
x 32ranges
x 8 Bytes= 8.4 MBytes
Extractpulse
(cross-range)data
Perform cross-range
FFT
Loop 1024 times = 32,768 ranges/32 ranges/loop
1012pulsesx 1012ranges
x8 Bytes
= 8.2 MBytes
Extract2D PolarReformat
data
Perform Polar
Reformatting
Loop 1060 times = 1073.74 x 106 cells/1.012 x 106cells/patch
Storeprocessed
data
8.4 MBytes
Storeprocessed
data
Small overlap toprocess 1010 x 1010
patch
8.2 MBytes
32,768ranges
x 32pulses
x 8 Bytes= 8.4 MBytes
Loop 1024 times = 32,768 ranges/32 ranges/loop
8.4 MBytes
Extract rangedata
Perform rangeFFT
Storeprocessed
data(transposed)
Note: This sizing assumes that the extracted data fills the available 8 MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored. The same thing is true for the patch size, except there is a little more bandwidth required because of the overlap needed to do the interpolation. This gets a little tricky because of the assumed in-place calculation.
Skau 12Confidential and Proprietary
SAR Processing Flow/Global Memory Accessing (2 of 2)
32,768pulses
x 32ranges
x 8 Bytes= 8.4 MBytes
Loop 1024 times = 32,768 ranges/32 ranges/loop
Perform Auto Focus
8.4 MBytes
Storeprocessed
data
8.4 MBytes
32,768pulses
x 32ranges
x 8 Bytes= 8.4 MBytes
Loop 1024 times = 32,768 ranges/32 ranges/loop
4.2 MBytes
Extract pulse
(cross-range)data
Perform MagnitudeFunction
Storeprocessed
data
Extractpulse
(cross-range)data
Perform cross-range
FFT
Storeprocessed
data
Extractpulse
(cross-range)data
32,768pulses
x 32ranges
x 8 Bytes= 8.4 MBytes
Loop 1024 times = 32,768 ranges/32 ranges/loop
All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,
no need to extract and restore data until allof the processing is
complete. (I setit up that way
because the 2ndhalf of the 2D FFT,
AutoFocus, and Magnituding appear
to be performedonly in the cross-rangedimension.
Note: This sizing assumes that the extracted data fills the available 2MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored.
Skau 13Confidential and Proprietary
SAR Processing Flow/Global Memory Utilization
32, 768pulses
8.6 GBytescomplex data
Storage ofpulse
compressedrange datafor furtherprocessing
4.3 GBytesof new data from
current CPIcollection
Storage ofcross-range
FFT processeddata throughthrough polarreformatting
Storage of1st half of
2D FFTresult
transposed(corner-turned)
throughMagnituding
8.6 GBytescomplex data
8.6 GBytescomplex data Total global memory required for SAR (worst case) = 30.1 GBytes *
* This might be able to be reduced to 21.4 GBytes if 2D FFT can be done “in place.” **
2.9 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM
CPIN-1 CPINCPIN+1
Platform Motion
Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) I am not sure how this works with ECCM, collecting training samples, etc. Obviously, you wouldn’t want a big jammer to mess up the formation of the SAR image, but I am not sure how you null a big jammer out without affecting the image 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.
Skau 14Confidential and Proprietary
SAR Processing Flow/Board-Level Partitioning/Utilization
PolyphaseChannelizer
Board #1
PolyphaseChannelizer
Board #2
PolyphaseChannelizer
Board #3
PolyphaseChannelizer
Board #4
ECCM, AWC, &PP Combination
Board #1
ECCM, AWC, &PP Combination
Board #2
ECCM, AWC, &PP Combination
Board #4
ECCM, AWC, &PP Combination
Board #3
Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,
& Mag. Board #1
Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,
& Mag. Board #2
Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,
& Mag. Board #3
Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,
& Mag. Board #4Max. Sustained T-put per Stage = 192 GOPSUsed T-put perStage = 144 GOPS
Max. Sustained T-put per Stage = 192 GOPSUsed T-put per Stage = 96 GOPS
Each ECCM/AWC/PP Combo Board processes 16 radar channels with 750 range cells per pulse and 40 subbands per channel; forms 1 beam; combines 40 subbands; each ECCM node outputs 8192 ranges to each of the next stage receiving nodes
Each Range PC - Mag node performs range compression on a pulse by pulse basis; sends processed data to global memory; frequency conversion, polar reformatting, 2D FFT,Auto Focus, and Magnituding are performed working out ofglobal memory, and restoring processed data in global memory on a function by function basis
1.15 GBytes/sec inputto each PPC board
1 beam x 8192 ranges x 8 Bytesinput to each Range PC board from each ECCM board every PRI = 65.5 MBytes/sec; Aggregate BW = 1.05 GBytes/sec
Global Memory
Max. Memory = 32 GBytesUsed Memory = 30.1 GBytes
Output
528.4 MBytes/sec between each board and global memory Aggregate BW = 2.1 GBytes/sec
Training Samples &Adaptive Weights
16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec
4 ch x 40 subbands x 1 pulse x 750 ranges x 8 Bytes output from each PPC board and input to each ECCM board every PRI (1 msec) = 0.96 GBytes/sec; Aggregate BW = 15.4 GBytes/sec
Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes
Skau 15Confidential and Proprietary
SAR Processing Flow/Node-Level Partitioning/Utilization
GlobalMemory
Compute Board #N
8 MByteLocalData
Memory
8 MByteLocalData
Memory
8 MByteLocalData
Memory
8 MByteLocalData
Memory
Compute Cluster #1
Compute Cluster #2
Compute Cluster #3
Compute Cluster #4
CNA #1
CNA #2
CNA #3
CNA #4
CNA #1
CNA #2
CNA #3
CNA #4
CNA #1
CNA #2
CNA #3
CNA #4
CNA #1
CNA #2
CNA #3
CNA #4
Possible Partitioning/Utilization for Range Pulse Compression,Frequency Conversion in Range, 2D FFT, Auto Focus, andMagnituding: Ranges or Cross-Ranges M+1 to M+32 => Compute Cluster #1 Ranges or Cross-Ranges M+1 to M+8 => CNA #1 Ranges or Cross-Ranges M+9 to M+16 => CNA #2 Ranges or Cross-Ranges M+17 to M+24 => CNA #3 Ranges or Cross-Ranges M+25 to M+32 => CNA #4 Ranges or Cross-Ranges M+33 to M+64 => Compute Cluster #2 Ranges or Cross-Ranges M+33 to M+40 => CNA #1 Ranges or Cross-Ranges M+41 to M+48 => CNA #2 Ranges or Cross-Ranges M+49 to M+56 => CNA #3 Ranges or Cross-Ranges M+57 to M+64 => CNA #4 Ranges or Cross-Ranges M+65 to M+96 => Compute Cluster #3 Ranges or Cross-Ranges M+65 to M+72 => CNA #1 Ranges or Cross-Ranges M+73 to M+80 => CNA #2 Ranges or Cross-Ranges M+81 to M+88 => CNA #3 Ranges or Cross-Ranges M+89 to M+96 => CNA #4 Ranges or Cross-Ranges M+96 to M+128 => Compute Cluster #4 Ranges or Cross-Ranges M+97 to M+104 => CNA #1 Ranges or Cross-Ranges M+105 to M+112 => CNA #2 Ranges or Cross-Ranges M+113 to M+120 => CNA #3 Ranges or Cross-Ranges M+121 to M+138 => CNA #4 for 1 < M < 128Similarly across all four (4) Compute boards with the associated changein indexing. (Polar Reformatting is similar except data is in Rng-XRng patches
Input to Board from ECCM nodes = 65.6 MBytes/sec/ECCM board x 4 ECCM boards = 262 MBytes/sec => perform range compressionRange Compression Output to Global Memory = 262 MBytes/sec
Post-Range Compression Processing out of global memory:
Input to Board 33.6 MBytes per fetch x 128 fetches per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd. Output from Board 33.6 MBytes per store x 128 stores per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd.
Aggregate Bandwidth per board= 131.2 MBytes/sec x 4 functions = 528.4 MBytes/sec
Skau 16Confidential and Proprietary
Example GMTI Partitioning/Mapping/Utilization
Skau 17Confidential and Proprietary
GMTI Processing - Assumptions
• Assumed GMTI Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample)• 256 pulses per CPI • 1 KHZ PRI• 500 sec receive window; (200,000 - 333,000 range cells per pulse) into the polyphase channelizer• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed• 6 beams formed• 10-bits (2 Bytes) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data
• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board
- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node
• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)
Skau 18Confidential and Proprietary
GMTI Processing Flow/Board-Level Partitioning/Utilization #1
PolyphaseChannelizer
Board #1
PolyphaseChannelizer
Board #2
PolyphaseChannelizer
Board #3
PolyphaseChannelizer
Board #4
ECCM, AWC, &PP Combination
Board #1
ECCM, AWC, &PP Combination
Board #2
ECCM, AWC, &PP Combination
Board #3
Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS
Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes
Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing
Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections
16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec
4.61 GBytes/sec inputto each PPC board
4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes output from each PPC board and input to global memory every PRI (1 msec) = 3.85 GBytes/sec ; Aggregate BW = 15.4 GBytes/sec
6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec
Global Memory
Max. Memory = 32 GBytesUsed Memory = 21 GBytes
Output
PulseComp.
Board #1
Pulse Comp.
Board #2
Pulse Comp.
Board #3
Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 101 GOPS
Doppler,STAP, &
CFARBoard #1
Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS
16 ch x 256 pulses x 1000 ranges x 40 subbands x 8 Bytes input to each each ECCM board from global memory every CPI = 5.1 GBytes/sec; Aggregate BW = 15.4 GBytes/sec
2 beams x 256 pulses x 90,000 ranges x 8 Bytes input to each Pulse Compression board from global memory each CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec
2 beams x 256 pulses x 72,000 ranges x 8 Bytes output from each Pulse Compression board to global memory every CPI = 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec
6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI = 3.46 GBytes/sec
Doppler Output@ 6.92 GBytes/sec
STAP Inout@ 6.92 GBytes/sec
STAP Output@ 1.73 GBytes/sec
CFAR Input @ 1.73 GBytes/sec
Skau 19Confidential and Proprietary
GMTI Processing Flow/Board-Level Partitioning/Utilization #2
PolyphaseChannelizer
Board #1
PolyphaseChannelizer
Board #2
PolyphaseChannelizer
Board #3
PolyphaseChannelizer
Board #4
ECCM, AWC, &PP Combination
Board #1
ECCM, AWC, &PP Combination
Board #2
ECCM, AWC, &PP Combination
Board #3
Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS
Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM boards
Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing
Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections
16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec
4.61 GBytes/sec inputto each PPC board
4 ch x 40 subbands x 1 pulses x 1000 ranges x 8 Bytes output from each PPC board to each ECCM boards every PRI (1 msec)= 5.12 GBytes/sec; Aggregate BW = 15.4 GBytes/sec
6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every PRI (1 msec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec
Global Memory
Max. Memory = 32 GBytesUsed Memory = 13.1 GBytes
Output
PulseComp.
Board #1
Pulse Comp.
Board #2
Pulse Comp.
Board #3
Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 101 GOPS
Doppler,STAP, &
CFARBoard #1
Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS
2 beams x 256 pulses x 90,000 ranges x 8 Bytes from global memory input to each Pulse Compression board every CPI (0.256 sec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec
2 beams x 256 pulses x 72,000 ranges x 8 Bytes output to global memory from each Pulse Comp. board every CPI (0.256 sec)= 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec
6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI (0.256 sec) = 3.46 GBytes/sec
Doppler Output@ 6.92 GBytes/sec
STAP Input@ 6.92 GBytes/sec
STAP Output@ 1.73 GBytes/sec
CFAR Input @ 1.73 GBytes/sec
Skau 20Confidential and Proprietary
GMTI Processing Flow/Board-Level Partitioning/Utilization #3
PolyphaseChannelizer
Board #1
PolyphaseChannelizer
Board #2
PolyphaseChannelizer
Board #3
PolyphaseChannelizer
Board #4
ECCM, AWC, & PPCComb. & Pulse Comp.
Board #1
Max. Sustained T-put per Stage = 240 GOPSUsed T-put perStage = 202 GOPS
Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM nodes
Each ECCM/AWC/PP Combo/Pulse Compression Board processes 16 radar channels with 90,000 range cells per pulse and 40 subbands per channel; forms 1 or beams (only 1 forms 2 beams; & combines 40 subbands; each ECCM node outputs 72,000 ranges to global memory for Doppler processing
Each ECCM/Pulse Compression board outputs 6 beams with 72,000 ranges per pulse; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections
16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec
4.61 GBytes/sec inputto each PPC board
4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes inputto ECCM boards from each PPC board every PRI (1 msec) = 3.07 GBytes/sec ); Aggregate BW = 15.4 GBytes/sec
Global Memory
Max. Memory = 128 GBytesUsed Memory = 8.66 GBytes
OutputDoppler,STAP, &
CFARBoard #1
Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS
6 beams x 72,000 ranges x 1 pulse x 8 Bytes output to global memory from the ECCM/Pulse Compression Boards every PRI (1 msec) = 3.45 GBytes/sec Aggregate BW
6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to the Doppler/STAP/CFAR board from global memory every CPI = 3.46 GBytes/sec
Doppler Output (12 beams x 72,000 ranges x 256 Dopplers)@ 6.92 GBytes/sec
STAP Input (12 beams x 72,000 ranges x 256 Dopplers) @ 6.92 GBytes/sec
STAP Output(3 beams x72,000 rangesx 256 Dopplers)@ 1.73 GBytes/sec
CFAR Input @ 1.73 GBytes/sec
ECCM, AWC, & PPCComb. & Pulse Comp.
Board #2
ECCM, AWC, & PPCComb. & Pulse Comp.
Board #3
ECCM, AWC, & PPCComb. & Pulse Comp.
Board #4
ECCM, AWC, & PPCComb. & Pulse Comp.
Board #5
Skau 21Confidential and Proprietary
Backup Charts
Skau 22Confidential and Proprietary
Radar Event Scheduler/Time Line Generator
Search Reqmnts
Field-Requested Events
Track Revisit
Requirements
Real-TimeWaveformDesigner
Beam Steering Controller
OnboardProcessing
Configurator
S/C AttitudeDetermination
& Control
Exciter/Transmitter
ElectronicallySteerableAntenna
Subsystem...
Receiver
ProgrammableOnboard
ProcessingSubsystem
S/C GuidanceNavigation &
Control
Mode /ContextControl
Health& Status
Processed Data/Target Reports Communication
Subsystem
Uplink/Downlink
Mode Control
Hypothetical Real-Time Adaptive Space-Based Radar Design
Skau 23Confidential and Proprietary
Space-Based RadarAdvantages
- the ultimate “high ground” -- radar “horizon” > for airborne or ground radar - 24-hour all weather capability (IR sensors can’t see through clouds, optical sensors blind in dark) - less vulnerable/more survivable than airborne assets - once launched, lower logistics costs than airborne assets (fuel, ground support fighter protection, etc.) - continuous world-wide coverage with full constellation of satellites - High Range Resolution (HRR) with frequency jumped burst waveforms - SAR (Synthetic Aperture Radar) - IFSAR (Interferometric SAR) - DTED (Digital Terrain Elevation Data) - DAR (Distributed Aperture Radar) - multi-mission
-- GMTI (Joint STARS)-- SAR (Joint STARS)-- AMTI (AWACS [E3-A] & E-2C)
- GPIR (Ground Penetrating Imaging Radar) - foliage penetration capability - reduced downlink requirements with onboard processing in many cases
Disadvantages/Issues
- limited power generation and power dissipation capability in space - limited aperture size (antenna dimensions) - high altitude (R4 losses) - ionospheric effects - steep look-down angle/Nadir Hole - clutter Doppler/clutter Doppler spread -- function of satellite-target geometry (earth background) and platform velocity - optimal waveform design for target detection performance and clutter cancellation
-- frequency of operation (X,L,UHF/VHF)-- polarization (Tx and Rx)-- pulse width, PRF, CPI
-- range resolution/bandwidth-- Doppler resolution-- range ambiguities
-- Doppler ambiguities - constellation
-- altitude, inclination, number of satellites, phasing - environmental -- radiation, micro-meteorites, etc. - launch vehicle capability - initial system cost
Skau 24Confidential and Proprietary
Processor - Processing Engine(s) -- Heterogeneous vs. Homogeneous -- GP vs. Accelerator -- Custom ASIC vs. FPGA vs. PIM (Processor In Memory) -- Clock Rate -- Algorithm/Architecture Coupling Efficiency -- Flexibility/Programmability vs. Performance
- Mode Switching - Optimal Waveform Design
-- Performance - Rad Hard vs. Rad Tolerant -- Available Software Development Environment - Number of Processing Engines
-- Number of Modules-- Number of Processing Engines per Module-- Parallelization Efficiencies
- Interconnect Capacity/Capability - Overlap Processing and Communication - Bandwidth
Memory Storage - Amount (size) - Location - Access Bandwidth - Utilization -- Pipeline -- Double Buffering -- “Caching”/Data Blocking
Algorithm Implementation - Analog vs. Digital vs. Optical - Performance/Efficiency - Optimum utilization of available arithmetic units - Arithmetic Precision -- Fixed Point vs. Floating Point -- # of bits -- Single vs. Double -- Programmable - Time Domain vs. Frequency Domain Pulse Compression -- Optimal Size of FFT for Frequency Domain Conv. - Radix Size(s) for FFTs - Power of 2 FFTs vs. odd-point FFTs - Method for Adaptive Weight Computation - Data Flow vs. Other Implementation Paradigms -- Medium/Coarse Grained vs. Fine Grained - Partitioning/Parallelism -- Data Partitioning -- Algorithm Partitioning -- Round Robin vs. Distributed Parallel - Phantom Functions, e.g., corner turns - Order of Execution of Functions - Grouping of Functions - Combination of Functions - Minimize Memory Required - Minimize Communication Requirements - Fault Tolerance - Real Time Execution
Considerations for Optimal Onboard Processing
Skau 25Confidential and Proprietary
Line ofFlight
GMTI CPIs GMTI CPIs GMTI CPIs
RadarProcessingTime Line
SAR CPI SAR CPI
SAR Beams SAR BeamsGMTI Search & Track Beams
Location of Last Detection ofa Suspected Stop & Go Target. . .
Images
Mode Processing ContextSwitch Points
Context Switch from GMTI to SpotlightSAR to image location of last detectionof a Stop-and-Go target which mayhave stopped
Context Switch from SAR back tonormal search and track patterns
GMTI Application
SAR Application
SAGE Capture
OnboardProcessor
TrackerTracks
Radar EventScheduler
RadarSynchronizer/
Controller
BeamSteering
Controller
RadarData
DetectionsGMTI
SAR
GMTI/SAR Mode Switching for SBR