Block 3 (from SSRAM 1 of each PTA) Block 2 (from SSRAM 2 of each PTA) Authors: G. Chiodini 1, B....

Block 3 (from SSRAM 1 ofeach PTA)


Authors:

G. Chiodini1, B. Hall2, S. Magni3, D. Menasce3, L. Uplegger3, D. Zhang2

1I.N.F.N. Lecce, 2FNAL, 3I.N.F.N. Milano

Pomone is a general purpose, low-cost and portable read-out system based on the industry-standard PCI protocol.

It has been developed in the context of the BTeV experiment at the Fermilab Collider and is meant to be used in the pixel

test beam of Summer 2003.

Pomone is a general purpose, low-cost and portable read-out system based on the industry-standard PCI protocol.

It has been developed in the context of the BTeV experiment at the Fermilab Collider and is meant to be used in the pixel

test beam of Summer 2003.

A PCI Test Adapter (PTA) plug-in card, compliant with the PCI protocol

A PCI Test Adapter (PTA) plug-in card, compliant with the PCI protocol

Pomone has been designed to meet the requirements of a general Test Stand

hardware for testing detectors both in a laboratory environment and at test beam

facilities for the BTeV experiment at Fermilab. Current implementation focuses on

following components:

A Programmable MezzanineCard (PMC)

A Programmable MezzanineCard (PMC)

An FPIX read-out

chip (ROC)

An FPIX read-out

chip (ROC)A host

computer

A host computer

External datasource

subsystem, aFermilab Pixelreadout chip

(FPIX)

External datasource

subsystem, aFermilab Pixelreadout chip

(FPIX)

The PMC is intended to work in conjunction with a PTA card to serve

as a flexible platform for building small DAQ systems for testing

detectors and subsystems. The PMC is designed around the Xilinx

Virtex II FPGA which serves as an interface between the the PTA card

resources and the external subsystem/detector.

The PMC is intended to work in conjunction with a PTA card to serve

as a flexible platform for building small DAQ systems for testing

detectors and subsystems. The PMC is designed around the Xilinx

Virtex II FPGA which serves as an interface between the the PTA card

resources and the external subsystem/detector.

PCI-compliant card featuring:• an Altera FPGA controlling all functions• A PCI target interface (slave only)• two SSRAM banks (1 Mb each)• Daughter card interface for all links using IEEE1386 Mezzanine connectors• JTAG interface to upload FPGA code• USB interface

PCI-compliant card featuring:• an Altera FPGA controlling all functions• A PCI target interface (slave only)• two SSRAM banks (1 Mb each)• Daughter card interface for all links using IEEE1386 Mezzanine connectors• JTAG interface to upload FPGA code• USB interface

A host computeracting as a data-sink

(The current implementationsupports the

Linux operatingsystem)

A host computeracting as a data-sink

(The current implementationsupports the

Linux operatingsystem)

Both PMC and PTA cards have been developed by the ESE group at Fermilab.For more information on each of these hardware components check the following site:

http://www-ese.fnal.gov/eseproj/BTeV/home_default.html

Data are produced by the external subsystem at a variable rate (depending upon beam and interactionrate) while the host computer receives them at his own rate (depending upon the CPU clock and the processor current activity). The PTA has therefore been programmed to act as an intermediate buffer to hold events, thus allowing for a continuous sustained data-rate. The PTA can receive data up to30 MHz and the comptuer can digest data at about 2 Mb/s. This holds for the current FPGA configurationwhich not allow for DMA transfer. These value are well within specification for the coming test beam.

Principle of operation of the read-out

The FPGA on the PTA is programmed to direct the incoming data flow to one of the two internal 1 Mb SSRAM memory banks (1). When a bank reaches a user defined limit (size or timeout), data flow is switched to the other bank (2) and

an interrupt is generated for the host PC. At that point data are flushed out from bank 1

to the host computer, while bank # 2continues to receive data from the external

source. When bank 2 is full, another memory swap occurs, an interrupt is generated and the

whole process iterates again.

FPGAFPGAFrom PMC

From PMC

To hostPC

To hostPC

InterruptInterrupt

a

Two independent processes run on the host computer to manage the read-out, the producer and the consumer:

• the producer waits for interrupts: when one is received it fetches data from the SSRAM which has reached the programmed limit and transfers them to a statically allocated memory on the host computer.

• The consumer continually checks for data available in the shared memory and transfers them to an external storage. A block of data is ready to be transferred when the producer finally marks it as complete. Here is were the event-building actually takes place.

FPGAFPGA

SSRAMbank 1

SSRAMbank 2

From PMC

From PMC

producerproducer

InterruptInterrupt

bStaticallyallocated

shared memory

consumerconsumer

Mass storageMass storage

Host PCPTA

card

PTA card

There are thus three processors working togheter to pipeline data out of the detector: the PMC, the PTA and the host PC. Both PMC and PTA are initialized by commands issued by the producer from the host PC.

PMC FPGAPMC FPGA PTA FPGAPTA FPGAROCROC PCPC

Four programs on the host PC cooperate in the read-out: the logger centralizes messages received by both the producer and the consumer, while a GUI allows interaction with a user (the GUI is an optional component!)!

SSRAMbank 1

SSRAMbank 2

The read-out code has been designed upon the following guidelines:

• Code must be robust, in principle able to whitstand change of operating system environment, extensive refurbishing and additions of algorithms. Developers should be able to track down changes over time.

• Code must be highly modular, able to accommodate different detectors, with different hardware and software specifications. The backbone should essentially provide virtual methods to allow biodiversity.

• Functionality of the code must be guaranteed also in environments with minimal resources (eg. no X11 graphics is available). The system should be able to perform even without a GUI for user interaction.

• The system should provide mechanisms to both initialize and read-out the system. Since the incoming data rate is asynchronous from the host computer clock, a data-rate compensation buffer must be provided to accommodate for fluctuations in the sustained data-rate.

• Components of the system should be loosely coupled: this allows for upgrades and changes of individual elements with little effort. It further insures smoother deployment due to minimal cross dependencies.

• To allow for robustness and modularity, the code has been implemented in C++, and the overall code management is under the supervision of CVS in a FERMILAB based, periodically backed-up, repository. Particular care has been exercized in the object-oriented design in order to efficiently achieve an optimal decoupling of components.

• Should X11 resources not be available at any given time during run (eg. from remote institutions with bad network connection), users can still efficiently run the system by means of a command-line oriented interface. A more sophisticated GUI is provided for convenience but it’s not crucial to successfully operate the read-out.

• In order to accommodate an intermediate data-rate compensation buffer, the system, as described before, is split in two main processes with a shared memory in the middle: the first process (called producer) gets hits out of the PTA card placing them in the shared memory, while the second (called consumer) continuosly browses the shared memory to fetch completed blocks for the event-builder to assemble hits into events. Events are finally written by the consumer on an external data-storage.

• Additional elements of the system are:

a logger (centralized message logger), which receives messages from producer and consumer and writes them to a configurable output stream.

a controller (a command-line user interface). The producer, consumer and logger get user commands from a message-bus: users type commands at the controller prompt which feeds them to a reserved message-bus which is constantly monitored by the above processes.

A sophisticated GUI is provided to allow users to efficiently interact with the system. It drives the behaviour of producer, consumer and logger using the same message-bus mechanism of the controller

Loose coupling among these components is thus accomplished by the use of intermediate buffers, sharedmemories statically allocated in the host computer and the message-bus. No direct transaction occurs between them. Once defined a public interface that describes their internal behaviour, producer, consumer and logger are then essentially independent programs.

producerproducer

InterruptInterrupt

Statically allocated data

shared-memory

Statically allocated data

shared-memory

consumerconsumer

Mass storageMass storageHost PC

loggerlogger

Detector(PMC + PTA)

Detector(PMC + PTA)

controller orGUI

Statically allocated

message-bus

Statically allocated

message-bus

The end-user talks to producer and toconsumer by means of commands sentto the message-bus, which is polledconstantly by them to get orders. There is therefore no direct connection betweenthe two processes.

data-flow

producerproducer

SSRAM 1

FPGAFPGA Interrupt handlerInterrupt handler

Reset interruptReset interrupt

Detector(PMC)

Detector(PMC)

SSRAM 2

Sharedmemory

Sharedmemory consumerconsumer


Host PC

Time evolution of the read-out

producerproducer



Detector(PMC)

Detector(PMC)

SSRAM 2

Sharedmemory



Host PC

SSRAM 1

t0

Events begin to flow from the detector to the PTA. This goes ontill the first SSRAMoverflows a userselectable threshold

t1

An interrupt is raised by the FPGA to flagmemory-full status. Incoming data is thenredirected to secondSSRAM bank andproducer can start toread-out hits to sharedmemory

producerproducer



Detector(PMC)

Detector(PMC)

SSRAM 2

Sharedmemory



Host PC

SSRAM 1

t2

Interrupt is reset. While new data feedSSRAM 2, SSRAM 1is emptied by producerand shared memory receives data.

producerproducer



Detector(PMC)

Detector(PMC)

Sharedmemory



Host PC t3

As long as SSRAM 2becoms full, the system cycles again through steps from t0

to t2 . producer shiftsdata from SSRAM 2 while SSRAM 1 getsfresh data.

SSRAM 2SSRAM 1

producerproducer



Detector(PMC)

Detector(PMC)

consumerconsumer


Host PC t4

As soon as two blocks become available in theshared memory, theconsumers fetches them and collects hitsinto events (event-build)Events are then shuffledonto mass-storage.

SSRAM 2SSRAM 1

Sharedmemory

With this approach, the shared-memory acts as a compensating-buffer between an incoming data rate and an

outgoing one. This architecture comes in very handy when events are defined as ‘collection of hits with

same time stamp’. In this case the dual-buffer approach becomes essential to develop an efficient event-builder,

since the event builder needs only to keep sorting hist from at most two adjacent data blocks in the shared

memory to be sure no hits belonging to an event is lost. See next page for an explanation of the principle of

operation of a real test-bench where incoming data are categorized on a time-stamp basis rather than position in

the output buffer (this is the case of the anticipated use-case with pixel detectors at the coming BTeV test beam).

Our actual test-stand consists of several detectors to be read-out in a single data stream: an event is, in this case, defined as a collection of hits generated from different detectors at the same time (they are marked by the same time-stamp). In order to vastly improve the efficiency and speed of event-builder

PMC

FPGA

Slot 1

Slot 2

PTA

SSRAM 1

SSRAM 2

FPGA

ROC AROC A

ROC BROC B

Detector ADetector A

Detector BDetector B

PMC

FPGA

Slot 1

Slot 2

PTA

SSRAM 1

SSRAM 2

FPGA

ROC CROC C

ROC DROC D

Detector CDetector C

Detector DDetector D

PMC

FPGA

Slot 1

Slot 2

PTA

SSRAM 1

SSRAM 2

FPGA

ROC YROC Y

ROC ZROC Z

Detector YDetector Y

Detector ZDetector Z

Interrupthandler

Interrupthandler

Interrupthandler

Interrupthandler

Interrupthandler

Interrupthandler

producerproducer

PCI extender

Sharedmemory

Sharedmemory

consumerconsumer

mass storagemass storage

Host computer

Beam

GUI

stage, a mechanism has been provided to synchronize the swapping of the SSRAM banks in the PTA cards. The first bank reaching the limit raises an interrupt and immediately the producer issues a command to all other PTA cards to swap their SSRAM banks. In this way the consumerhas to deal with blocks from theshared memory that contain hitswith the same time-stamp that arespread out, at most, among twoconsecutive blocks (corresponding to a swap operation).

Building an event becomes thus justa time-stamp reordering of two 1 Mbbuffers at most, a task which imposesno particular heavy burdens on theread out computer.

time stamphit

1 1754323 2 1754323 3 1754323 4 1857769 5 1857769 6 1869980 7 1754323 8 1869994 9 187045210 185776911 1754323 … … … … … …n-8 6773788n-7 6773788n-6 7843823n-5 7845686n-4 7843823n-3 7948573n-2 6773788n-1 7845686n 7843823

1 7996453 2 7843823 3 8656393 4 7843823 5 7843823 6 8667584 7 8783753 8 8847932 9 878375310 878375311 1754323 … …

time stamphit time stamphit

a

a

a

d

b

b

c

c

d

d

d

d

e

e


Shared memory

.

.

.

.

.

. … … … … … … … … … … … … … … … … … …

… … … … … … … … … … … … … … … … … …

… … … … … … … … … … … … … … … … … …

… … … … … … … … … … … … … … …

In this artist conception of the shared memory, hits are shownwith a color-coded representation.

Hits reach the shared memory in a loosely sparsified mode: nonetheless, hits with the same time stamp are not too far between themin a single block (a block is an imageof all SSRAM # 1 or #2 contentsbefore a collective swaphas occurred). Therefore, in thisexample, events a, b and c are fullycontained in block #1, event d, on the contrary is half split between block #1 and block #2 (becausetransfer from PMC to PTA was not completed when a swap was issuedby the first PTA reaching the limit).Event e etc… will be fully containedin block #2. Event builder will thus use blocks 1 and 2, then discard #1,use #2 and 3, discard 2 and so on.

1 2 3

c d

The basic unit of read-out is a PTA card, intermediate component between a PMC

and a host computer. A complete read-out system consist of many replicas of this

elementary unit (as shown in box , where a complete setup is detailed)

Both FPGA are pre-programmed, and in the case of the PTA the code has been customdesigned and implemented by our group usingthe Altera Quartus product.

producer producer

consumerconsumer

loggerlogger Graphical userinterface

Graphical userinterface

Hardwaredata-sourceHardware

data-source

Externalmass-storage

Externalmass-storage

PTA card

PTA card

PTA card

PTA card

PTA card

4 • 5 6

7 8 9Software technologies used in the read-out

To properly insure ease of maintanance, portability and a smooth evolution-schema, the system hasbeen built upon a collection of libraries (almost all Open Software licenced) to handle the following:

WinDriver: a commercial Device Driver builder, by Jungo (http://www.jungo.com), upon which out own PCI device driver is built.

Xerces : an xml parser. Our system configuration files are in xml syntax: methods are provided to parse validate and transfer to memory initialization constants, geometrical or electrical parameters etc. (http://xml.apache.org/xerces-c/)

Qt : a library of classes to build complex GUIs (http://www.trolltech.com/documentation/index.html)

Root : data analyis package (http://root.cern.ch)

Nienet : a GPIB device driver (http://www.ni.com/linux/ni488dl.htm)

The Pomone read-out, written in C++, is maintained on a centralized CVS source code repository at Fermilab.It has been designed and tested to work on dual-processor workstations: in general the producer feeds theshared memory running on one processor while the consumer fetches hits and executes event-building (whichis a CPU intensive task) on the other processor. Histograms for monitoring purpose are served through an IPsocket: a histogram presenter client has been provided to allow users to monitor the DAQ activity from remotelocations, without placing additional burden on the DAQ cpu load.

Pomone has been provided with extensive on-line documentation. We use Doxigen to produce a

browsable Reference Guide. We have configured the Doxigen parser in order to provide both

a Reference as well as a User Guide in one single document, available through the Web both in

HTML and pdf formats. This has proven to be an extremely valuable tool to help collaborators to

develop the code. Since documentation is embedded in the source code as suitably formatted comment lines, it

is insured, this way, that code and documentation are always in synch.

Links to file, classes and compound elementsof the Pomone Reference Guide

Links to file, classes and compound elementsof the Pomone Reference Guide

Extensive User Guide, with schematics and drawings

Extensive User Guide, with schematics and drawings

Doxigen produces nice graphical inheritance treeschematics, with hyperlinks to class definitions.

Doxigen produces nice graphical inheritance treeschematics, with hyperlinks to class definitions.

Every device used by the read-out system isaccurately described and referenced in the on-lineguide (were appropriate, links are provided to theoriginal web site with up-to-date documentation)

Every device used by the read-out system isaccurately described and referenced in the on-lineguide (were appropriate, links are provided to theoriginal web site with up-to-date documentation)

The whole source code is suitably hyperlinked to allow easy and efficient browsing of the code.

The whole source code is suitably hyperlinked to allow easy and efficient browsing of the code.

Snapshots of the Pomone GUISnapshots of the Pomone GUI

Schematics of the PMC cardSchematics of the PMC card Schematics of the PTA cardSchematics of the PTA card

The PTA has been programmed using the QUARTUS software to generate suitable code for the Xilinx FPGA

4



http://www.jungo.com/

Block 3 (from SSRAM 1 of each PTA) Block 2 (from SSRAM 2 of each PTA) Authors: G. Chiodini 1, B....

Documents

Transcript of Block 3 (from SSRAM 1 of each PTA) Block 2 (from SSRAM 2 of each PTA) Authors: G. Chiodini 1, B....