(A Taste of) Data Acquisition, Triggers, and Controls Gregory Dubois-Felsmann Caltech CHEP 2003.
-
Upload
caitlin-georgia-gregory -
Category
Documents
-
view
219 -
download
3
Transcript of (A Taste of) Data Acquisition, Triggers, and Controls Gregory Dubois-Felsmann Caltech CHEP 2003.
(A Taste of) Data Acquisition, Triggers, and Controls
Gregory Dubois-FelsmannCaltech
CHEP 2003
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 2
Routine disclaimer…
Much interesting material presented:– About 35 talks (13.5 hours, ~100MB as uploaded)
from many experiments covering a rich variety of issues
– Many thanks to the speakers!
Fitting this into 25 minutes requires procedures familiar to the DAQ community:– Feature extraction
– and unfortunately also triggering
… with a decidedly imperfect data acquisition system
– So I apologize in advance for the things I’ve missed or perhaps misunderstood!
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 3
Outline
• Overview of talks presented• Some technological themes• Trigger architectural issues• The great challenge for the next years: scaling• Conclusions
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 4
Talks presented
Monday parallel
W. Badgett CDF Run II Data Acquisition
R. Rechenmacher Run II DZERO DAQ / Level 3 Trigger System
S. Luitz The BaBar Event Building and Level-3 Trigger Farm Upgrade
R. Itoh Upgrade of Belle DAQ System
A. Polini The Architecture of the ZEUS Micro Vertex Detector DAQ and Second Level Global Track Trigger
J. Schambach STAR TOF Readout Electronics and DAQ
R. Divià Challenging the challenge: Handling data in the Gigabit/s range [ALICE]
J. Gutleber, L. Orsini XDAQ - Real Scale Application Scenarios [CMS et al.]
G. Lehmann The DataFlow of the ATLAS Trigger and Data Acquisition System
S. Stancu The use of Ethernet in the DataFlow of the ATLAS Trigger & DAQ
S. Gadomski Experience with multi-threaded C++ applications in the ATLAS DataFlow
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 5
Talks presented
Tuesday parallel
A. Ceseracciu A Modular Object Oriented Data Acquisition System for the Gravitational Wave AURIGA experiment
R. Mahapatra Cryogenic Dark Matter Search Remote Controlled DAQ
T. Steinbeck A Software Data Transport Framework for Trigger Applications on Clusters [ALICE]
T. Higuchi Development of PCI Bus Based DAQ Platform for Higher Luminosity Experiments [e.g., Super-Belle]
J. Mans Data Acquisition Software for CMS HCAL Testbeams
B. Lee An Impact Parameter Trigger for DØ
G. Comune The Algorithm Steering and Trigger Decision mechanism of the ATLAS High Level Trigger
V. Boisvert The Region of Interest Strategy for the ATLAS Second Level Trigger
S. Wheeler Supervision of the ATLAS High Level Triggers
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 6
Talks presented
Thursday parallel – session 5: DAQ and controls
J. Kowalkowski Understanding and Coping with Hardware and Software Failures in a Very Large Trigger Farm [BTeV]
M. Gulmini Run Control and Monitor System for the CMS Experiment
S. Kolos Online Monitoring Software Framework in the ATLAS Experiment
G. Watts DAQ Monitoring and Auto Recovery at DØ
K. Maeshima Online Monitoring for the CDF Run II Experiment
M. Gonzalez Berges The Joint COntrols Project Framework [CERN multi-expt.: LHC et al.]
S. Lüders The Detector Safety System for LHC Experiments
J. Hamilton A Generic Multi-node State Monitoring System [BaBar]
V. Gyurjyan FIPA Agent Based Network Distributed Control System [JLAB]
M. Elsing Configuration of the ATLAS Trigger System
In parallel with session 5a, unfortunately
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 7
Talks presented
Thursday parallel – session 5a: first level triggers
Related plenary talks
G. Grastveit FPGA Co-processor for the ALICE High Level Trigger
B. Scurlock A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger
P. Chumney Level-1 Regional Calorimeter System for CMS
F. Meijers The CMS Event Builder
M. Grothe Architecture of the ATLAS High Level Trigger Event Selection Software
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 8
Projects represented
• Strong emphasis onLHC, continuingrecent trend
B-Factories
FNAL
LHC
Other Accelerators
Non-Accel.(by number of talks)
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 9
Some technological themes
• Triumph of C++ for HEP DAQ confirmed– Along with Java for GUIs
• Triumph of commodity computing hardware (Intel IA-32) and operating system (Linux)– Large-farms-of-small-boxes model confirmed
• [Near-]triumph of commodity networking hardware– Fast and GB Ethernet, standard commercial switches
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 10
More technological themes
• Continuation of long trend of reducing scope of application of custom hardware– Yet most DAQ software is still custom in present experiments
• Serious efforts to find what is generic in DAQ programming• Not just to isolate patterns (knowledge applicable by others) but also actual
programming toolkits…
• Continuation of trend of moving offline code and/or frameworks into high level triggers
• New: widespread use of XML for non-event information(configuration, monitoring)
• Many of the major new challenges relate to scaling to huge farms– Performance– Operability, control, monitoring
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 11
Programming languages
• C++ has to a large extent proven itself– Some current experiments have DAQ systems written from scratch almost
entirely in C++, including real-time code running in an embedded RTOS environment• E.g., BaBar: DataFlow, feature extraction, Level 3 trigger, and rest of online
system written in “serious” OO C++ on VxWorks, Solaris, Linux– Achieves virtually zero deadtime at 5.5kHz L1Accept rate on 1997-vintage 300 MHz
Motorola SBCs and 1.4 GHz Linux P-IIIs [Luitz]
– New projects are fairly uniformly continuing to adopt it for code in the event data flow path
– Caveats remain, though usually worth the cost:• Executable size• Dependency management seems to remain a challenge • Non-trivial work in creating shareables• Ease with which naïve users can write non-performant code• Threading requires care (see below)
– Even see use in hardware trigger FPGA coding (see below) [Scurlock, CMS]
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 12
BaBar online and DAQ system
3/24/03BaBar Farm Upgrade
S.Luitz CHEP 2003
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 13
Programming languages II
• Java has emerged as the other major player– Especially for graphical applications
• E.g., run control GUIs
– Good points:• (Some say) ease of programming vs. C++• Universal availability including rich GUI graphics library• Simple API for remote object programming (RMI)
– Caveats:• Performance (although results vary considerably)• JVM quality-of-implementation, platform (non-)independence
• No other real competitors on the horizon except for niche applications– Saw appearances of Python, LISP, etc.
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 14
Computing hardware and operating systems
• Farms of Intel IA-32 / Linux 1,2-CPU machines are the coin of the realm today; tomorrow?– Linux will continue to be!– Speakers had little to say about CPU chips except that they will buy the
most cost-effective farms they can shortly before each major project goes into final commissioning• Intel Itanium line may get some traction by then…• “Buy-late” is a big Moore’s Law win, but see “scaling” concerns below…
• Linux success is particularly striking– In use in essentially every HEP role, HLT through laptops– Even approaching in the embedded world
[PCI DAQ component development for Super-KEK-B et al., Higuchi]
– Still some lingering attachment to other Unix flavors for disk servers(though Linux-based IDE RAID storage is also becoming very common, in the offline world, too)
– Linux is so successful in HEP that cross-platform portability may erode
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 15
Linux in the embedded DAQ card worldTakeo HI GUCHI (KEK), CHEP2003PMC Processor
• RadiSys EPC- 6315– Equipped with I ntel PentiumI I I 800 MHz.– 512 kB secondary cache.– 256 MB SDRAM with ECC.– Chipset: RadiSys 82600.– Bootable f rom CompactFlash.
RedHat Linux 7.3 is running.– 33-bit 33/ 66 MHz PCI bus interf ace.
CompactFlash socket (boot)
RJ -45 Ethernet port (slow control)
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 16
Multithreading issues
• Language and library level:– Need to stay aware of serialization from locking mechanisms
used in outside libraries• Example: C++ Standard Library containers’ memory pool by default uses
a single lock; found to produce x2 penalty in ATLAS HLT tests [Gadomski]
• O/S level:– Linux is not a real-time operating system…
• Still no full implementation of POSIX threads
• Implementation of pthread “yield” operation interacts poorly with time slicing in scheduler (can’t reschedule immediately).
• Found to produce x4 penalty in ATLAS tests; [Gadomski] kernel patch available
[See also under “offline code in the online world” below]
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 17
Networking
• Commodity networking hardware!– The various flavors of Ethernet (Fast, GB, and beyond) have
become the almost unchallenged fabric of higher-level triggering and event building (one major exception: CMS still considering Myrinet)
– Standard protocols (TCP, UDP) also ascendant; some efforts to explore raw Ethernet
– All groups seem to be making some of the same discoveries, notably: Network switches are not simple, transparent devices!• Flow control and buffering behavior must be understood in detail
• Vendors can be cagey about the details (proprietary internal arch.)
• Need good tools to monitor traffic behavior
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 18
Networking adventures
3/24/2003 Ron Rechenmacher, Fermilab Slide 24
SummaryCommodity-based ethernet DAQ built for D0• 250 MB/s: 1 KHz of 250 KB events• 63 sources and >80 targets
Commodity (ethernet) systems • wow, a lot of stuff can show up!
You need a TCP/IP expert or twoPeople that can transcend boundaries“to the metal” understandingInfrastructure
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 19
Offline code in the online world
Use of offline code in high-level software triggers…• application framework
• or even offline reconstruction code
• Several current experiments and most future ones are doing this
• Problems:– Dependencies
– Performance – offline code has often not been exposed to the close scrutiny typical for online, and may have axes of flexibility at odds with high performance (CPU cycles and memory utilization)
– Multithreading – offline code is almost never written to be thread-safePresents a problem when thread parallelism is needed in a high level trigger
• Make a subset of the offline code, and its framework, thread-safe (ATLAS L2)?
• Replace threads with process and a shared-memory data model (BaBar, D0)?
– Offline event loop model may not be directly usable (BaBar, D0, ATLAS)
• Benefits:– Greatly simplifies incorporation
of trigger algorithms in simulations
– Eases development and validation of trigger algorithms
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 20
Genericity: patterns and products
• We have always noticed:The same ideas keep coming up and the same problems have to be solved over and over again.
• We have always thought:There must be something to be gained from applying that knowledge. Can generic problems be solved with generic tools?
• We have tried in various ways:– Identifying patterns: learning how to think about these common
problems; building up expert knowledge that can be applied to “the next experiment”; learning lessons• That’s what CHEP is all about…
– But we also aspire to reuse: applying a software product in more than one place
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 21
Reusing concrete products
• Sounds great; has a mixed history– In some places this has come to work well:
• CERNLIB, GEANT4, ROOT are ubiquitous in HEP
– But there are real obstacles, chief among them:• The difficulty of sharing code bases between experiments in different
phases of development (example: divergence of BaBar and CDF versions of their originally shared application framework)
• “The devil is in the details”: often the high level features of a system seem generic (the patterns) but the implementation picks up experiment-specific features
– Sometimes this is because of concern with compatibility with historical code– Sometimes it arises when the high-level architecture turns out to need to be
driven by some low-level optimization – Perhaps in principle the high-level design could be extended to cover both
users’ needs, but the press of deadlines favors a “quick hack” that doesn’t require renegotiation
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 22
Reuse in the online environment
– Reuse has been perhaps less successful on average in the online and DAQ worlds.• Often online code is prepared “later” in the construction of an experiment
(since simu/reco code is usually already required at the proposal stage),thus under more time pressure.
• Online code tends to require more low-level optimizations. Often these come with serious tradeoffs that can limit the flexibility of a design, even to its in-house users.
– But perhaps the next round of experiments presents a rare opportunity to do better:• The LHC experiments are on the same time scale and they still have a fair
amount of time left.
• The use of a common language, O/S, and networking environment helps.
• There are some interesting projects under way!
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 23
Quest for generic online software
• Data acquisition:– XDAQ (arising from the CMS project) [Gutleber/Orsini]– So far mostly being
used to provide acommon platform for several subdetectors’ commissioning DAQ systems and ease their integration into the main DAQ
– Exploring collaborationwith other expt’s
– Performance seems good.– How CMS-free can it be
kept over time, though?
CMS-TriDAS. 03-Mar-28 4
Scope of XDAQ
• Environment for data acquisition applications– communication over multiple network technologies concurrently
• e.g. input on Myrinet, output on TCP/IP over Ethernet
– configuration (parametrization) and control• protocol and bookkeeping of information
– cross-platform deployment• write once, use on every supported platform (Unix, RTOS)
– high-level provision of system services• memory management, synchronized queues, tasks
– built-in efficiency enablers• zero-copy and buffer loaning schemes usable by everyone
• Aim at creating interoperable DAQ systems– ECAL, HCAL, Tracker, Muon local DAQs commonly managed
• Gain a common understanding of the problem domain– terms, use-cases, priorities laid down in common documentation
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 24
Generic software for online
• Control and monitoring frameworksLots of projects in this area; a couple of examples(see Thursday program for more)
– Inherently fault-tolerant architectures [Kowalkowski]• Motivated by BTeV, but is at
a very generic level, with CScollaborators viewing it as ageneral research project
• In early stages, but worthwatching…
– Generic monitoringframeworks• Example: D0’s XML-based
distributed monitoring[Watts]
03.03.28 J.Kowakowski - CHEP 2003 La Jolla, CA 1
Understanding and Coping with Hardware and Software
Failures in a Very Large Trigger Farm
The RTES groupPresented by
Jim Kowalkowski
Fermilab Computing Division
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 25
XML
• A fairly new trend: XML is cropping up all over in online configuration and monitoring applications– Perhaps surprising? Not-very-compact, textual representation!– But we are willing to spend some CPU and network bandwidth
here (since other things in the systems require so much more)– Benefits:
• Avoids “private toy language” problem, when combined with scripting tools
• No more hand-written “run card”-type parsers• Easy to parse in many languages, and thus pick an appropriate (and
perhaps different) language for generating the data and for applying it.• Very easy to transmit over a network (byte stream)• Aids in a) using existing generic tools (editors, validators)
b) allowing new tools we build to be more generic within HEP
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 26
Triggering scope
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 27
Triggering
• Far too much information presented to cover in detail– Remarkable things can now be done in commodity CPUs
• E.g., ZEUS second level tracking trigger / global tracking trigger [Polini]
– Silicon vertex tracking information becoming absorbed into tracking triggers ahead of “Level 3”• ZEUS
• Fermilab (for B physics efficiency) [Lee, D0: hardware]
Bill Lee 25 March 2003 10
Conceptual DesignConceptual Design
L1CTTtracks in CFT Define road in SMT
Select SMT hits in roads
Fit trajectory to L1CTT+SMT hits. Measure pT,
impact parameter,
azimuth
Send results to L2
Pass L1CTT information to L2
Send SMT clusters to L3
roaddata
SMTdata
SiliconTriggerCard
SiliconTriggerCard
SiliconTriggerCard
SiliconTriggerCard
SiliconTriggerCard
FiberRoadCard
SiliconTriggerCard
TrackFit
CardL2CTT
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 28
ZEUS software tracking trigger
CHEP 2003, La Jolla, 23-28 March 2003 A. Polini12
Network Connection to the ZEUS Event Builder
(~100 Hz)
ADCM modules
Lynx OS
CPU
AnalogLinksNIM + Latency Clock +
ControlADCM modules
Lynx OS
CPU
AnalogLinksNIM + Latency
ZEUS Run Control and OnlineMonitoring Environment
Main MVDDAQ server, Local
Control, Event-Builder Interface
ADCM modules
Lynx OS
CPU
AnalogLinksNIM + Latency Clock+
Control
VME (C+C Slave)Crate 1 (MVD bottom)
Analog Data
MVD HELIX Front-End & Patch-Boxes
HELIX Driver Front-end
Lynx OS
CPU
GSLT 2TP modules
Lynx OS
CPU
Lynx OS
CPU
VME (C+C Slave)Crate 2 (MVD forward)
VME (C+C Master)Crate 0 (MVD top)
NIM + Latency
TP connection tothe Global Second
Level Trigger
VME HELIX Driver Crate
Global First Level Trigger,Busy, Error
NIM + Latency
Slow control + Latency Clock modules
Fast Ethernet/Gigabit Network
VME CPU Boot Server and Control
Clock+Control
MVD VME Readout
Lynx OS
CPU
NIM + Latency
Lynx OS
CPU
NIM + Latency
CTD 2TP modules
STT 2TP module
Central Tracking Detector Read-out
Forward Tracking, Straw Tube Tracker Read-out
VME TP connection Data from CTD
VME TP connection Data from STT
Global Second Level Trigger
Decision
Global Tracking Trigger Processors (GFLT rate 800 Hz)GTT Control +
Fan-out
The MVD Data Acquisition System and GTTThe MVD Data Acquisition System and GTT
CHEP 2003, La Jolla, 23-28 March 2003 A. Polini15
Find tracks in the CTD, extrapolate into the MVD to resolve pattern recognition ambiguity
– Find segments in Axial and Stereo layers of CTD
– Match Axial Segments to get r- tracks
– Match MVD r- hits
– Refi t r- track including MVD r- hits
Af ter fi nding 2-D tracks in r-, look for 3-D tracks in z-axial track length,s:
– Match stereo segments to track in r- to get position for z-s fi t
– Extrapolation to inner CTD layers
– I f available use coarse MVD wafer position to guide extrapolation
– Match MVD z hits
– Refi t z-s track including z hits
Constrained or unconstrained fi t– Pattern recognition better with constrained
tracks
– Secondary vertices require unconstrained tracks
Unconstrained track refi t af ter MVD hits have been matched
Barrel algorithm descriptionBarrel algorithm description
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 29
Hardware triggers
• Still indispensable at the first level
• Used in some places as adjuncts to second level
• Good progress on testing and production for LHC experiments
– [Scurlock]: Generate VHDL from C++ code:eases production of highly accurate board-level simulation of trigger
B. Lee An Impact Parameter Trigger for DØ
G. Grastveit FPGA Co-processor for the ALICE High Level Trigger
B. Scurlock A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger
P. Chumney Level-1 Regional Calorimeter System for CMS
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 30
The CMS – ATLAS choice
… and many other talks on configuration and other detailsCMS baseline:
– Build full events at output of Level 1 (100 kHz, 1MB events)– Risk: this is a lot of data to handle
Able to fall back to a partial-readout Level 2 model
ATLAS baseline:– L2 trigger operates on “ROIs” – nominally 2% of event data – at output of
Level 1 (75 kHz, 1MB events, 20 kB ROI data)– Full event build at L2 rate of ~1 kHz, sent to Event Filter (EF) farm– Risk: not yet completely clear that small ROIs provide enough information
Able to shift boundary between L2, EF somewhatBoth experiments finding present or readily foreseeable technology adequateat least at level of individual subsystems – full scale end-to-end tests beginning
F. Meijers The CMS Event Builder
G. Lehmann The DataFlow of the ATLAS Trigger and Data Acquisition System
V. Boisvert The Region of Interest Strategy for the ATLAS Second Level Trigger
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 31
Scaling
Many issues remain to be confronted fully in buildingsystems with many thousands of CPUs!
• Fault tolerance• Overseeing huge constantly-changing collections of
active entities(too many for direct human oversight)
• Performance issues:– Image activation– Calibration constant loading– Configuration– Global knowledge updates required to keep system coherent
File server and/ordatabase contention?
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 32
An exotic tidbit
• A familiar reassurance to nervous newcomers:“Go ahead, type whatever you want: the worst that can happen is that we might have to reboot the computer.”
• A cautionary tale from CDF:– Observed unexpected losses of silicon detector readout channels
– Proposed explanation: “Vibrational resonances due to Lorentz forces on digital power lines to the front end chips Limits trigger ratesProbably related to high deadtime setting up steady patterns”
Good wire bond Broken bond after enduring vibrational resonance stress
Up close
Test stand results simulating overloading the DAQ system, within a magnetic field
Net result: physical damage can be caused by changing
trigger configuration!!!
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 33
Regretfully omitted
• Developments in front-end DAQ electronics– STAR TOF readout [Schambach]
• Importance of development tools– Network performance monitoring [many]
– Thread debugging [Gadomski]
– …
STAR TOF Readout Electronics and DAQ
Joachim Schambach
University of Texas at Austin
For the STAR TOF Collaboration
Gregory Dubois-Felsmann - CHEP 2003Summary of DAQ, Trigger, and Controls 34
Conclusions
• Trigger/DAQ systems have kept up well with the demands of doing physics with the present generation of experiments
• Many new technologies and ideas have made this possible
• But we are entering an entirely new regime with experiments of the LHC scale and must take care that we are not overwhelmed by complexity– We should try hard to find ways to get realistic advance looks at systems
integration and scaling before the “last minute” bulk hardware buys…
• Very interesting research and reduction-to-practice work lies ahead in the next few years – looking forward especially to CHEP 2003 “+2”