Development of an FPGA-Based Two Transform Pulse Compressor lPerform a Two-Transform Pulse...

1
Development of an FPGA-Based Two Transform Pulse Compressor Perform a Two-Transform Pulse Compression using a Received reflected signal and a Reference signal Input signals are first phase corrected using a complex phase factor multiply Range Compression is achieved by a cross-correlation of the Received signal with the Reference signal which is implemented as mulktiplication of the Received signal by the conjugate of the Reference in the frequency domain Both input signals are first transformed to the frequency domain using Fast Fourier Transforms (FFTs) Provisions for a frequency domain correction are included as a complex multiply after the cross-correlation Following cross-correlation and error correction, an Inverse Fast Fourier Transform (IFFT) is used to obtain the time domain compressed signal An optional swath selection is used to select a desired portion of the output compressed signal Create a high-throughput Two Transform Pulse Compressor for use in wideband real-time Radar Signal Processor applications using Commercial Off-The-Shelf (COTS) Field Programmable Gate Array (FPGA) processor boards. GOALS 2 Virtex™ II FPGA Processing Elements XC2V6000 or X2V8000 0 to 48 MBytes of Synchronous ZBT SRAM in 6 Memory Banks 0 to 256 MBytes of Synchronous DRAM in 1 Memory Banks PCI Bus - Rev 2.2 Compliant 5V Board - 32/64 Bit, 33 MHz, 5V or 3.3V Slot 3.3V Board - 32/64 Bit, 33/66MHz, 3.3V Slot Automatic 32/64 Bit PCI Bus Recognition Host Software: NT 4.0 and 2000, Linux, Solaris API and Device Drivers VHDL Model of the System for Easy Development Accepts COTS High speed WILDSTAR™ I/O Cards WILDSTAR™ Data Port (WSDP™), FPDP, Myrinet™, 65 MHz A/D, and 1 GHz A/D 12 to 16 Million System Gates Virtex™ E FPGA is larger, faster, and uses less power than Virtex™ FPGA 150 MHz Board, FPGA and Memory Speed 4.8 GBytes/Sec Memory Band Width I/O Band Width 66 MHz PCI - Up to Theoretical Maximum of 512 MBytes/Sec with 64 Bits WILDSTAR™ PE to I/O Board - 3 GBytes/Sec LAD Bus - 256 MBytes/Sec at 66 MHz/32 Bits Supports Internet Reconfiguration Program from Flash on Power Up Commercial Off the Shelf Product (COTS) Features Benefits CONCEPTS PERFORMANCE SYSTEM COMPONENTS DEMONSTRATION HARDWARE WILDSTAR II™ PCI BOARD DESIGN ANALYSIS Integrated Sensors, Inc. (315)798-1377 www.sensors.com Two-Transform Pulse Compressor Algorithm Ch1 1D FFT Conj * 1D IFFT Rng Select Swath Selection Error Compensation (MC) Ref 1D FFT e j(k) Phase Multiply Correction (MC) e j(k) Phase Multiply Correction (MC) PE 1 VIRTEX TM II XC2V 6000, 8000 PCI PCI BUS WILDSTAR TM II PCI 64 Bits 66/133 MHz 172 PE 0 VIRTEX TM II XC2V 6000, 8000 DDR SDRAM 64 MB I/O #0 168 172 32 32 DDR2 SRAM 4 MB 36 32 32 I/O #1 DDR2 SRAM 4 MB DDR2 SRAM 4 MB 36 36 DDR2 SRAM 4 MB DDR2 SRAM 4 MB DDR2 SRAM 4 MB 36 36 36 DDR2 SRAM 4 MB 36 DDR2 SRAM 4 MB DDR2 SRAM 4 MB 36 36 DDR2 SRAM 4 MB DDR2 SRAM 4 MB DDR2 SRAM 4 MB 36 36 36 DDR SDRAM 64 MB 168 32 32 Copyright 2002 Annapolis Micro Systems, Inc. Prog Osc 3 Prog Osc 3 Flash Flash 16 Flash 16 Master Clock Generator PCLK MCLK ICLK Differential Single Ended 16 WSDP / FPGA WSDP / FPGA PRE- PROC Select IFFT EC Xmpy FFT Ch1 Ref Ch1b Refb Pulse #1 Pulse #2 2 pulses 64K ea 250 Msps Collected results: 4 processed pulses Ch1a Refa 4 pulses 64K ea 500 Msps 2 pulses 64K ea 250 Msps 1 pulse 64K 125 Msps 1 pulse 64K 125 Msps 1 pulse 64K 125 Msps 1 pulse 64K 125 Msps 1 pulse 64K 125 Msps FFT WILDSTAR Board #1 WILDSTAR Board #2 PE0 PE1 WSDP0 WSDP1 PE0 PE1 WSDP0 WSDP1 Router/Interface PRE- PROC PRE- PROC PRE- PROC PRE- PROC PRE- PROC PRE- PROC PRE- PROC FFT FFT FFT Select IFFT EC Xmpy FFT Select IFFT EC Xmpy FFT Select IFFT EC Xmpy FFT 2 Wildstar II Boards Process 4 Simultaneous Pulses Fixed Point complex FFT core Approximately 5:1 size reduction over Floating Point core Multiply/accumulators not driving factor in size Can fit ~4 x 8-bit FFT cores in a single V6000 FPGA 4:1 hardware improvement over Floating Point 64K vector length; 8 bit input; 18 bit max bit width 4 FFT points/clock + latency 64K complex FFT @ 150 MHz 109us 32K complex FFT @ 150 MHz 55us Floating Point vs Fixed Point Sizing Fairly consistent 5:1 ratio Observed with FFT, complex mpy and add, divide, sqrt cores Fixed Point Cores offer ~5:1 size advantage over Floating Point cores 4 6 8 10 12 14 16 18 20 22 -10 0 10 20 30 40 50 60 70 Fixed P oint FFT SQ N R s vs B it W idth & FFT Length (M ode 1) FFT Input/M ax B itW idth SQ NR dB 16K 32K 64K Signal to Quantization Noise Ratio (SQNR) Analyzed using MATLAB FFT models using specified bit widths and truncations Signal to Quantization Noise Ratio Uses uniform distributed noise input to FFT S|Xfloat| 2 S|Xfloat-Xfixed| 2 3 dB difference for each doubling of FFT length (1/2 bit) Bit growth through FFT added Bit growth from 8/10 bit inputs appears to give reasonable SQNRs 3.268 3.27 3.272 3.274 3.276 3.278 3.28 x 10 4 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 X corrX m itSig IP R C om parison (64K ,M ode2) Cell X corr Signal 2 Pow er 10log10 dB 20 bits 18 bits 16 bits 14 bits 12 bits 10 bits floating pt ISLR loss alone can be deceiving metric, need to consider factors such as IPR shape, which can show severe truncations with apparently good ISLR PULSE COMPRESSOR ARCHITECTURE FPGA Processors Offer high throughput, much higher density than DSP processors Reconfigurable processing COTS solution Low cost and much faster alternative to ASICs Implemented in Annapolis Wildstar COTS Boards Powerful core design tools and libraries available for fast development and prototyping Includes high speed WSDP data interconnects FPGAs offer growth path to improved processors 50 million gate parts Platform FPGAs including PPC processors, I/O and RAM are currently available Xilinx V8000 FPGA 8M gates Part integration (4X) Improved PPC speed (2X) Xilinx V6000 FPGA 6M gates PowerPC Processor Power PC Power PC FPGA Logic RAM High Speed I/O “Platform” FPGA Xilinx FPGA 50M gates 2007 Technology V8000 Parts Currently Available V6000 Parts: ECP Demo Xilinx 2VP50 FPGA 4 PPCs Currently Available WILDSTAR II TM ARCHITECTURE RAM Buffers Range Compression Processor (~6 boards) REF Ch1 Ch2 Ch3 A/D A/D A/D A/D Router/Time Align/Interface Interface (Custom) SYSTEM ARCHITECTURE 10 15 20 25 30 -2.5 -2 -1.5 -1 -0.5 0 Xm itS ig ISLR Loss vs FFT BitW idth (64K ,M ode2) M py BitW idth X corr IS LR Loss,dB 3 Wildstar II Virtex V6000 Board Nodes IBM PC servers to host boards (6) 6 Million gate parts Status: Operational 3 Wildstar II assemblies complete and operating 1 Data driver and collector board 2 Processing boards FFT Maximum bit widths of 18 bits appear to give less than 0.1 dB of ISLR loss, corresponds to space efficient fixed point FPGA FFT core implementations using Xilinx parts with embedded 18x18 multipliers Router / Time Alignment / Interface (Custom) Board required Signal Processor requires WSDP data input interfaces due to high data rates Time align Ref and Ch1, Ch2, Ch3... channels Buffer and rate reduce each channel into lower rate channels for WSDP capabilities (800 MB/sec) Provide WSDP compatible output interfaces FPGA Node configuration Processing node of 2 FPGA boards performs complete Range Compression on 2 range pulses using combination of V2000E Xilinx FPGAs on the WSDP I/O cards and V6000 FPGAs on the base cards Pass through concept: each iteration node strips off 1st 64k pulse samples to process, passes remaining pulse data onto succeeding iteration nodes FPGA IMPLEMENTATION via COREFIRE TM SYSTEM DESIGN GOALS Perform Pulse Compressions on input data in real time Up to 64K (16K, 32K, 64K) sample input pulses Up to 500 MHz data sample rate Data samples are complex, up to 8 bits per sample Corefire TM Annapolis Microsystems design tool Allows fast development of FPGA core designs using libraries of functional blocks Interface board design illustrated Receives data pulses from upstream splitter Performs FFT pre-processing FFT designs analyzed using metrics Bit widths and growth specified in models Performance considered using synthetic bandlimited pulse Sidelobe Ratio Energy in peak compared to energy in sidelobes Degradation in pulse compression will manifest itself with higher sidelobe levels Impulse Response Shape and quantization effects considered for compressed pulses 3.265 3.27 3.275 3.28 3.285 3.29 x 10 4 30 35 40 45 50 55 60 65 R ngcom p M ATLAB vs W ildstar (scaling:1051.1) dB M ATLAB :blue W ildstar: red Multiple pulse design currently running 4 simultaneous pulses processed on 2 Wildstar II boards FFT Throughput Performance One point per clock Clock currently running at 81Mhz on speed grade -4 parts Anticipate speeds up to 133 MHz on speed grade -6 parts Single Wildstar II Board provides up 16 Million FPGA gates and 4.8 Gbytes/sec I/O on WSDP ports Four node parallel processor implemented and operating; investigation continues for faster operating clock and larger parts for increased bit widths and data precision Three input channel architecture illustrated FPGA growth path includes increased gate density and increased features for smaller designs with improved precision and capabilities

Transcript of Development of an FPGA-Based Two Transform Pulse Compressor lPerform a Two-Transform Pulse...

Page 1: Development of an FPGA-Based Two Transform Pulse Compressor lPerform a Two-Transform Pulse Compression using a Received reflected signal and a Reference.

Development of an FPGA-Based Two Transform Pulse Compressor

Perform a Two-Transform Pulse Compression using a Received reflected signal and a Reference signal

Input signals are first phase corrected using a complex phase factor multiply

Range Compression is achieved by a cross-correlation of the Received signal with the Reference signal which is implemented as mulktiplication of the Received signal by the conjugate of the Reference in the frequency domain

Both input signals are first transformed to the frequency domain using Fast Fourier Transforms (FFTs)

Provisions for a frequency domain correction are included as a complex multiply after the cross-correlation

Following cross-correlation and error correction, an Inverse Fast Fourier Transform (IFFT) is used to obtain the time domain compressed signal

An optional swath selection is used to select a desired portion of the output compressed signal

Create a high-throughput Two Transform Pulse Compressor for use in wideband real-time Radar Signal Processor applications using Commercial Off-The-Shelf (COTS) Field Programmable Gate Array (FPGA) processor boards.

GOALSGOALS

2 Virtex™ II FPGA Processing Elements XC2V6000 or X2V8000

0 to 48 MBytes of Synchronous ZBT SRAM in 6 Memory Banks 0 to 256 MBytes of Synchronous DRAM in 1 Memory Banks PCI Bus - Rev 2.2 Compliant

5V Board - 32/64 Bit, 33 MHz, 5V or 3.3V Slot 3.3V Board - 32/64 Bit, 33/66MHz, 3.3V Slot Automatic 32/64 Bit PCI Bus Recognition

Host Software: NT 4.0 and 2000, Linux, Solaris API and Device Drivers

VHDL Model of the System for Easy Development Accepts COTS High speed WILDSTAR™ I/O Cards

WILDSTAR™ Data Port (WSDP™), FPDP, Myrinet™, 65 MHz A/D, and 1 GHz A/D

12 to 16 Million System Gates Virtex™ E FPGA is larger, faster, and uses less power than Virtex™ FPGA 150 MHz Board, FPGA and Memory Speed 4.8 GBytes/Sec Memory Band Width I/O Band Width

66 MHz PCI - Up to Theoretical Maximum of 512 MBytes/Sec with 64 Bits WILDSTAR™ PE to I/O Board - 3 GBytes/Sec LAD Bus - 256 MBytes/Sec at 66 MHz/32 Bits

Supports Internet Reconfiguration Program from Flash on Power Up Commercial Off the Shelf Product (COTS)

Features Benefits

CONCEPTSCONCEPTS

PERFORMANCEPERFORMANCE

SYSTEM COMPONENTSSYSTEM COMPONENTS

DEMONSTRATION HARDWAREDEMONSTRATION HARDWARE

WILDSTAR II™ PCI BOARDWILDSTAR II™ PCI BOARD

DESIGN ANALYSISDESIGN ANALYSIS

Integrated Sensors, Inc. (315)798-1377 www.sensors.com

Two-Transform Pulse Compressor Algorithm

Ch11D

FFT

1DFFT

Conj *Conj *

1DIFFT

1DIFFT

RngSelect

RngSelect

SwathSelection

ErrorCompensation

(MC)

ErrorCompensation

(MC)

Ref1D

FFT

1DFFT

e j(k)

Phase MultiplyCorrection (MC)

e j(k)

Phase Multiply Correction (MC)

PE 1

VIRTEX TM II

XC2V 6000, 8000

PCI

PCI BUS

WILDSTAR TM II PCI

64 Bits 66/133 MHz

172PE 0

VIRTEX TM II

XC2V 6000, 8000

DDR

SDRAM

64 MB

I/O

#0

168

172

32 32

DDR2

SRAM

4

MB

36

32 32

I/O

#1

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

36 36

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

36 36 36

DDR2

SRAM

4

MB

36

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

36 36

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

DDR2

SRAM

4

MB

36 36 36

DDR

SDRAM

64 MB

168

32 32

Copyright 2002

Annapolis Micro Systems, Inc.

Prog

Osc

3Prog

Osc

3

Flash

Flash

16

Flash

16

Master

Clock

Generator

PCLK

MCLKICLK

Differential

Single Ended16

WSD

P / F

PGA

WSD

P / F

PGA

PRE-PROC

SelectIFFTEC

XmpyFFT

Ch1Ref

Ch1b Refb

Pulse #1

Pulse #2

2 pulses64K ea

250 Msps

Collectedresults: 4

processedpulses

Ch1a Refa

4 pulses64K ea

500 Msps

2 pulses64K ea

250 Msps

1 pulse64K

125 Msps

1 pulse64K

125 Msps

1 pulse64K

125 Msps

1 pulse64K

125 Msps

1 pulse64K

125 Msps

FFT

WILDSTARBoard #1

WILDSTARBoard #2

PE0 PE1

WSDP0 WSDP1

PE0 PE1

WSDP0 WSDP1

Router/Interface

PRE-PROC

PRE-PROC

PRE-PROC

PRE-PROC

PRE-PROC

PRE-PROC

PRE-PROC

FFT FFT FFT

SelectIFFTEC

XmpyFFT

SelectIFFTEC

XmpyFFT

SelectIFFTEC

XmpyFFT

2 Wildstar II BoardsProcess 4 Simultaneous

Pulses

Fixed Point complex FFT core Approximately 5:1 size reduction over Floating Point core Multiply/accumulators not driving factor in size Can fit ~4 x 8-bit FFT cores in a single V6000 FPGA

• 4:1 hardware improvement over Floating Point• 64K vector length; 8 bit input; 18 bit max bit width

4 FFT points/clock + latency• 64K complex FFT @ 150 MHz 109us• 32K complex FFT @ 150 MHz 55us

Floating Point vs Fixed Point Sizing Fairly consistent 5:1 ratio Observed with FFT, complex mpy and add, divide, sqrt cores

Fixed Point Cores offer ~5:1 size advantage over Floating Point cores

4 6 8 10 12 14 16 18 20 22-10

0

10

20

30

40

50

60

70Fixed Point FFT SQNRs vs Bit Width & FFT Length (Mode 1)

FFT Input/Max Bit Width

SQ

NR

dB

16K32K64K

Signal to Quantization Noise Ratio (SQNR)

Analyzed using MATLAB FFT models using specified bit widths and truncations

Signal to Quantization Noise Ratio

Uses uniform distributed noise input to FFT

S|Xfloat|2 S|Xfloat-Xfixed|2

3 dB difference for each doubling of FFT length (1/2 bit)

Bit growth through FFT added

Bit growth from 8/10 bit inputs appears to give reasonable

SQNRs

3.268 3.27 3.272 3.274 3.276 3.278 3.28

x 104

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

Xcorr XmitSig IPR Comparison (64K, Mode2)

Cell

Xco

rr S

igna

l2 P

ow

er 1

0lo

g10

dB

20 bits18 bits16 bits14 bits12 bits10 bitsfloating pt

ISLR loss alone can be deceiving metric, need to consider factors such as IPR shape, which can show severe truncations with

apparently good ISLR

PULSE COMPRESSOR ARCHITECTUREPULSE COMPRESSOR ARCHITECTURE

FPGA Processors Offer high throughput, much higher density

than DSP processors Reconfigurable processing COTS solution Low cost and much faster alternative to ASICs

Implemented in Annapolis Wildstar COTS Boards Powerful core design tools and libraries

available for fast development and prototyping Includes high speed WSDP data interconnects

FPGAs offer growth path to improved processors 50 million gate parts Platform FPGAs including PPC processors, I/O

and RAM are currently available

XilinxV8000FPGA

8M gates Part integration (4X) Improved PPC speed (2X)

XilinxV6000FPGA

6M gates

PowerPCProcessor

Power PC

Power PC

FPGALogic

RAM

High Speed I/O

“Platform” FPGA

Xilinx FPGA

50M gates

2007Technology

V8000 PartsCurrently Available

V6000 Parts:ECP Demo

Xilinx2VP50FPGA

4 PPCs

Currently Available

WILDSTAR IITM ARCHITECTUREWILDSTAR IITM ARCHITECTURE

RA

M B

uff

ers

RA

M B

uff

ers

Range CompressionProcessor (~6 boards)

REF

Ch1

Ch2

Ch3

A/D

A/D

A/D

A/D

Rou

ter/

Tim

e A

lign

/In

terf

ace

Rou

ter/

Tim

e A

lign

/In

terf

ace

Interface(Custom)

SYSTEM ARCHITECTURESYSTEM ARCHITECTURE

10 15 20 25 30

-2.5

-2

-1.5

-1

-0.5

0

XmitSig ISLR Loss vs FFT Bit Width (64K, Mode2)

Mpy Bit Width

Xco

rr I

SL

R L

oss

, dB

3 Wildstar II Virtex V6000 Board Nodes IBM PC servers to host boards (6) 6 Million gate parts

Status: Operational 3 Wildstar II assemblies complete and

operating 1 Data driver and collector board 2 Processing boards

FFT Maximum bit widths of 18 bits appear to give less than 0.1 dB of ISLR loss, corresponds to space efficient fixed point FPGA FFT core implementations using Xilinx parts with embedded 18x18 multipliers

Router / Time Alignment / Interface (Custom) Board required

Signal Processor requires WSDP data input interfaces due to high data rates

Time align Ref and Ch1, Ch2, Ch3... channels Buffer and rate reduce each channel into lower rate channels

for WSDP capabilities (800 MB/sec) Provide WSDP compatible output interfaces

FPGA Node configuration Processing node of 2 FPGA

boards performs complete Range Compression on 2 range pulses using combination of V2000E Xilinx FPGAs on the WSDP I/O cards and V6000 FPGAs on the base cards

Pass through concept: each iteration node strips off 1st 64k pulse samples to process, passes remaining pulse data onto succeeding iteration nodes

FPGA IMPLEMENTATION via COREFIRETMFPGA IMPLEMENTATION via COREFIRETM

SYSTEM DESIGN GOALSSYSTEM DESIGN GOALS Perform Pulse Compressions on input data in real time

Up to 64K (16K, 32K, 64K) sample input pulses Up to 500 MHz data sample rate Data samples are complex, up to 8 bits per sample

CorefireTM

Annapolis Microsystems design tool

Allows fast development of FPGA core designs using libraries of functional blocks

Interface board design illustrated Receives data pulses from

upstream splitter Performs FFT pre-processing

FFT designs analyzed using metrics

Bit widths and growth specified in models

Performance considered using synthetic bandlimited pulse

Sidelobe Ratio Energy in peak compared to

energy in sidelobes Degradation in pulse

compression will manifest itself with higher sidelobe levels

Impulse Response Shape and quantization effects

considered for compressed pulses

3.265 3.27 3.275 3.28 3.285 3.29

x 104

30

35

40

45

50

55

60

65

Rngcomp MATLAB vs Wildstar (scaling: 1051.1)

dB

MATLAB: blue Wildstar: red

Multiple pulse design currently running 4 simultaneous pulses processed on 2 Wildstar II

boards

FFT Throughput Performance One point per clock Clock currently running at 81Mhz on speed grade -

4 parts Anticipate speeds up to 133 MHz on speed grade -

6 parts

Single Wildstar II Board provides up 16 Million FPGA gates and 4.8 Gbytes/sec I/O on WSDP ports

Four node parallel processor implemented and operating;

investigation continues for faster operating clock and larger parts for

increased bit widths and data precision

Three input channel architecture illustrated

FPGA growth path includes increased gate density and increased features for smaller designs with

improved precision and capabilities