Post on 19-Jan-2016
description
Design and Performance of a PCI Interface with four 2 Gbit/s
Serial Optical Links
Stefan Haas, Markus Joos
CERN
Wieslaw Iwanski
Henryk Niewodnicznski Institute of Nuclear Physics
LECC, 13.-17. Sept. 2004, Boston
S. Haas, 14. Sept. '04LECC 2004 - 2 -
Outline
● Introduction● Interface Card Hardware● Firmware Description● Software● Performance Measurements● Summary
S. Haas, 14. Sept. '04LECC 2004 - 3 -
Introduction
● DAQ systems for current and future experiments depend on reliable high-speed data transmission
● S-LINK specification addresses this type of application:► Point-to-point data link, bandwidth 160 MB/s (32-bit @ 40 MHz)► Flow control (XON/XOFF)► Error detection (e.g. CRC), ► Self-test mode & return line signals► CMC mezzanine card format
● ATLAS Read-Out Link (ROL)► ROL implementation is based on S-LINK ► Connects front-end electronics interface modules (Read-Out
Drivers) to the Read-Out system (ROS)► ROS is based on commodity PCs and custom PCI interface
cards (ROBin)► ~1650 ROLs will be used in ATLAS
S. Haas, 14. Sept. '04LECC 2004 - 4 -
ROL Source Card
● High-speed Optical Link for ATLAS (HOLA)● Standard S-LINK mezzanine card● Industry standard pluggable (SFP) 850nm F/O transceiver● Serial link speed 2 Gb/s with 8B10B line encoding● Low-power: ~2W typical
SERDES
S-LINK Protocol
FPGA
CMC mezzanine Connector
160MB/s32bit @ 40MHz
Cage for SFP F/O Transceiver
S. Haas, 14. Sept. '04LECC 2004 - 5 -
Quad S-LINK PCI Interface (FILAR)
● FILAR Features:► Four 2 Gb/s HOLA link channels integrated on-board► 64-bit/66MHz PCI interface (3.3V slots only)► Move data between 4 link interfaces and the host PC memory► Based on S32PCI64 interface design: one slot for S-LINK
mezzanine card
● Applications: small readout systems for lab & test beam
● FPGA-based (in-system reconfigurable) ► PCI I/F implemented using a commercial PCI IP core
● Firmware versions:► Quad S-LINK receiver (S-LINK to PCI)► Quad S-LINK transmitter (PCI to S-LINK) ► Quad S-LINK data source (for performance measurements)
S. Haas, 14. Sept. '04LECC 2004 - 6 -
FILAR Hardware
HOLA Interface FPGASFP Fiber Optic Transceiver SERDES
PCI Interface FPGA 64-bit/66MHz PCI interface
3.3V only(!)
S. Haas, 14. Sept. '04LECC 2004 - 7 -
Receiver Firmware Operation
● Host processor:► 1) Fills a request FIFO on the
interface card with addresses of free memory buffer pages
► 5) Reads the results from the acknowledge FIFO and processes the data
● Interface card:► 2) Transfers data fragments
from S-LINK to host memory as bus master using PCI bursts of up to 1kB for maximum performance
► 3) Stores length, status and control words for received fragments in an acknowledge FIFO
► 4) Asserts an interrupt (optional)
● Protocol overhead of ~2 PCI single-cycles (SC) per data fragment and channel:
► Write address of buffer memory page► Read length and status of received fragment
S. Haas, 14. Sept. '04LECC 2004 - 8 -
Receiver Firmware Block Diagram
BACKENDCONTROL
LOGIC
BACKENDCONTROL
LOGIC
CONTROL&
STATUSREGISTERS
CONTROL&
STATUSREGISTERS
64-BIT
PCI
IP
CORE
64-BIT
PCI
IP
CORE
DMAENGINE
66 MHz64-bitPCI
66 MHz64-bitPCI
INPUTBUFFER
FIFO
PCIBURST
FIFO
REQUESTFIFO
S-LINK
ACK.FIFO
REQUESTFIFO
REQUESTFIFO
REQUESTFIFO
ACK.FIFOACK.FIFOACK.FIFO
S-LINKS-LINKS-LINK
INPUTBUFFER
FIFO
INPUTBUFFER
FIFO
INPUTBUFFER
FIFO
PCIBURST
FIFO
PCIBURST
FIFO
PCIBURST
FIFO
528MB/s
S. Haas, 14. Sept. '04LECC 2004 - 9 -
Firmware Optimization
● Single-cycles do not use the PCI bus efficiently● Performance optimized version receiver firmware was
developed (DMA protocol firmware): ► Interface card transfers request and acknowledge data using
DMA► CPU prepares a descriptor block with buffer addresses for one
or more channels in system memory► Firmware fetches the block using DMA and fills the on-board
request FIFOs► Firmware transfers a block with the length and status
information from the acknowledge FIFOs to the system memory using DMA when a threshold is reached
► Requires additional memory resources in the FPGA, only 3 receive channels can be implemented on the current hardware
● Reduces PCI bus overhead and CPU load
S. Haas, 14. Sept. '04LECC 2004 - 10 -
Software
● FILAR software package:► Linux device driver (loadable module)► Library provides easy to use programming API for applications ► Test and benchmarking programs
● Software written in C● Separate drivers for the different receiver firmware versions● Supports multiple channels & PCI cards● Interrupt driven: device driver is called when a predefined number
of fragments are available in any channel● Code optimised for maximising throughput
► Manage the card with minimal attention from the application layer► Reduce the number of context switches
● Fully integrated into the ATLAS DataFlow software● Requires cmem driver/library for allocation of contiguous memory ● Similar package available for the transmitter firmware
S. Haas, 14. Sept. '04LECC 2004 - 11 -
Measurement Setup
● PC with Supermicro server motherboard (ServerWorks GC-LE chipset)
● 4 independent 64-bit PCI bus segments
● Intel Xeon CPU (3 GHz)● S-LINK input channels driven
by HOLA data sources
I/O BRIDGEI/O BRIDGE
64-bit PCI66MHz
FILARFILARFILARFILAR
I/O BRIDGEI/O BRIDGE
FILARFILAR
NORTHBRIDGE
NORTHBRIDGE
CPU(XEON3GHz)
CPU(XEON3GHz)
Memory(DDR 266)
Memory(DDR 266)
HOLAS-LINK
I/O BRIDGEI/O BRIDGE
64-bit PCI66MHz
FILARFILARFILARFILAR
I/O BRIDGEI/O BRIDGE
FILARFILAR
NORTHBRIDGE
NORTHBRIDGE
CPU(XEON3GHz)
CPU(XEON3GHz)
Memory(DDR 266)
Memory(DDR 266)
HOLAS-LINK
● Chipset architecture is important to obtain the maximum performance
S. Haas, 14. Sept. '04LECC 2004 - 12 -
Performance: Single-Cycle Firmware
● FILAR receiver with SC firmware
● Sawtooth structure due to overhead for setting up a PCI burst (1kB)
● Performance for one channel is limited by link bandwidth
● Throughput with 3 channels is limited by PCI interface
● Maximum throughput is ~450MB/s
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
500.0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Length [byte]
Ag
gre
gat
e T
hrou
ghp
ut [
Mby
te/s
]
1 Chan SC 2 Chan SC 3 Chan SC
145MB/s per channel
187MB/s per channel
360MB/s @ 1kB
S. Haas, 14. Sept. '04LECC 2004 - 13 -
Performance: DMA Protocol Firmware
● FILAR receiver with DMA firmware
● Better performance than SC firmware, in particular for short fragments
● 25% improvement for 3 channels at 1kB fragment length
● Performance for long fragments is similar for both firmware versions
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
500.0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Length [byte]
Ag
gre
gat
e T
hro
ug
hp
ut
[Mb
yte/
s]
1 Chan DMA 2 Chan DMA 3 Chan DMA
440MB/s @ 1kB
S. Haas, 14. Sept. '04LECC 2004 - 14 -
Throughput: Multiple FILAR cards
0
200
400
600
800
1000
1200
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Length [byte]
Ba
nd
wid
th [
Mb
yte
/s]
3 Chan. 6 Chan. 8 Chan.
● DMA protocol F/W● Maximum throughput
of 1.1GB/s with 3 receiver cards
● Throughput scales with the number of channels for fragments of 2kB and more
● For fragments of 500B and less the system is rate limited147MB/s per channel
145MB/s per channel
140MB/s per channel
S. Haas, 14. Sept. '04LECC 2004 - 15 -
Fragment Rate: Multiple FILAR cards
0
50
100
150
200
250
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Fragment Length [byte]
Fra
gm
ent
Fre
qu
ency
[kH
z]
3 Chan. 6 Chan. 8 Chan..
● Received data fragment frequency per channel vs. fragment length
● Fragment rates of 100kHz can be sustained with 3 cards for fragments of less than 1kB
100kHz @ 1kB
S. Haas, 14. Sept. '04LECC 2004 - 16 -
S-LINK Transmitter Performance
0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Fragment size [byte]
Agg
rega
te T
hrou
ghpu
t [M
byte
/s]
1 Channel 2 Channels 3 Channels
● Transmitter connected to a FILAR receiver in another PC
● PCI interface is saturated with 2 active channels
● Maximum throughput obtained is 360MB/s
● PCI memory read performance is not as good as write
S. Haas, 14. Sept. '04LECC 2004 - 17 -
Summary
● FILAR high-performance PCI interface card with 4 on-board 2 Gb/s S-LINK channels (HOLA) has been designed
● Quad S-LINK receiver, transmitter and data source firmware versions have been developed and optimized
● Software package with Linux device driver and API library are integrated in the ATLAS DataFlow software
● Maximum throughput for one receiver card ~450MB/s● Aggregate data rate of > 1GB/s to system memory has been
measured with 3 receiver cards● Event rates of over 100kHz can be achieved for 1kB fragments● FILAR applications and users:
► Test readout of front-end electronics interface modules► ATLAS subdetector groups (LAr, SCT, TileCal, TRT, Pixel, LVL1 Calo),
DAQ & ROBin, MDT chamber tests► Readout system for the ATLAS combined test beam► Stable design, ~50 cards produced so far