J. Linnemann, MSU 1
The DØ Level 2Trigger
James T. LinnemannMichigan State University
For the DØ L2 Trigger Group(10 institutions!)
DPF Aug 12, 2000
Columbus, Ohio
J. Linnemann, MSU 2
RunII at the TevatronAccelerator upgrades will
produce >30 times the instantaneous
p-pbar beam luminosity of the previous run
Run I (1992-6)Operation
Run II (2000 +)Operation
Detector Input
Trigger Selection
Data Tape
7.5 MHz300 kHz
3 Hz 50 Hz
J. Linnemann, MSU 3
Trigger Strategy
Calor. Muon SiliconVertex
Cent.Tracker
Pre-Shower
e+,e- em clus track >> mip
γγγγ Em clus no track >> mip
µµµµ Mip µ track track
ττττ Had clus. track
νννν Miss ET
Jets Had clus.
HeavyFlavor
Had clus µ track displacedvertex
Detectors provide geometric objectsCombined by trigger systems to formphysics objects
Trigger decisions: count objectsmeasure object propertiesmeasure object relations
J. Linnemann, MSU 4
RunII Trigger SystemTrigger components and stages
Detectors
FrameworkDAQ
Tape
10 KhZ
1 KhZ
50 HzL3 Farm
Level 2
L2 Global
Level 1
Prec
isio
n re
adou
t
Fast readout128 trigger bits
Branching: 128 to N triggers
Streaming
outputrate
pre-processors
J. Linnemann, MSU 5
Trigger Organization
L2FW:Combined objects (e, µ, j)
L1FW: towers, tracks, correlations
L1CAL
L2STT
GlobalL2L2CTT
L2PS
L2Cal
L1CTT
L2Muon
L1Muon
L1FPD
Detector L1 Trigger L2 Trigger7.5 MHz 5-10 kHz
CAL
FPSCPS
CFT
SMT
Muon
FPD
1000 Hz
J. Linnemann, MSU 6
Level 2 Requirements• First event wide trigger decision• Combine sub-detector information into physics objects
(e, jets, µ, missing ET)• 10 kHz input rate � 100 µµµµsec time budget• 1 kHz accept rate � 90% rejection• Less than 5% deadtime• Event order must be maintained!• 16 front-end buffers for events awaiting decisions• Hardware support for monitoring and diagnosis
– And VME readback of all download, status
J. Linnemann, MSU 7
Stochastic Pipeline
• 2-Stage pipeline: Preprocessor and Global– Preprocessors use detector-specific L1 data– Global processor combines detectors
• Triggers map 1-to-1 with L1 (128 bits)• Each trigger programmable (physics objects and cuts)
• Full set of buffers (16) between stages– Busy raised by front ends not L2
• Readout driven by hardware framework• Muon, STT have more pipeline stages • Design verified by queueing simulations
J. Linnemann, MSU 8
Parallelism
• Preprocessors – Muon, tracking, and PS use “geographic parallelism”– Calorimeter preprocessor
• Find EM objects, jets and calculate missing ET in parallel• EM/jet/Missing ET event synchronous (“lockstep”)• Can the processing be done in event asynchronous way?
• Global– Parallel algorithm?
• Need a highly parallel algorithm or suffer from Amdahl’s law
– Global farm?• Must report answers in same order as arrival
J. Linnemann, MSU 9
DPM
VME to L3
Level 2 Crate (9U VME for Physics)
• Pre-processors and global have similar crates• Data processing is done by
– 500 MHz Alpha CPU cards (=25,000 cycles/event)– or processor specific hardware
MagicBus
VMEbus
custom processors
Slots for αα ααor
αα ααW
orker
Magic Bus Trans
Processors
αα ααAdm
inistrator
(J3)
J. Linnemann, MSU 10
Data Flow• In, Out by serial lines: 16MB/s Hotlinks, 106MB/s G-links
• MBT broadcasts to Alphas 320MB/s Magic Bus
(Muon PP only, DSPs)
Magic Bus Transceiver
Worker
Administrator
Mbus Broadcast
Level 3
(pre-processors only)
L2 Global
VBD
Detector L1 Source SLICtranslator
J. Linnemann, MSU 11
Alpha Processor Card• Based on DEC PC164 board
– Joint DØ/CDF Project• Contains:
– 500MHz 21164 Alpha CPU 2-4 instructions/cycle– 128Mb main memory– 267MB/s PCI
• VME interface• 320MB/s MBus interfaces (DMA+PIO) (all I/O > 1KHz)• Custom Backplane control lines, interrupts• Ethernet, EIDE Hard Drive (Linux)• 32 channel ECL output port (monitoring)
• Two functions• administrator controls and manages workers• workers process data
J. Linnemann, MSU 12
Other Level 2 Cards• Magic Bus Transceiver (MBT) 8 in, 2 out
– sends pre-processor outputs to global crate (Hotlinks)– broadcasts incoming pre-processor data (Hotlinks) to alphas (MBus)– interfaces with Hardware Trigger Framework
• SLIC (muon system) 16 in, 2 out– 5 DSPs, 1 master and 4 workers– does basic formatting and sorting of data
• Fiber Input Converter (FIC)– translates G-link to Hotlink for L1 Trigger
• Serial Fanout (SFO) 1-6 in, 12 out– fans out 160 MHz Hotlinks (including test stand data copies)
• Cable Input Converter (CIC) 12 in, 12 out– recovers signals for muon processor inputs
•• Bit3 VMEBit3 VME--PC controllerPC controller PCI/fiber in, VME/dualport– monitoring and initialization commercial cardcommercial card
•• VME Buffer Driver (VBD)VME Buffer Driver (VBD)– reads and stores data to send to Level 3 Run I Legacy cardRun I Legacy card
J. Linnemann, MSU 13
Alpha Environment• Two environments used:
– Alpha Linux for most running, debugging• modified Linux low level interrupt handler; physical addresses• turn off Linux while running--unless crash returns to debugger • run test software to debug crate cards in situ
– bare system still used for some low level testing• C++ but carefully---
– No dynamic memory during event loop– No RTTI, STL– Restricted use of virtual functions– Data I/O done for user
• No user I/O except via error logger during running– Run user code unaltered in simulator
• Relink with simulation version of interfaces
J. Linnemann, MSU 14
CPU
1 2
spare
3
VBD
4 5 6 7
STC
8
STC
9
STC
10
STC
11
STC
12
STC
13
FRC
14
STC
15
STC
16
TFC
20
TFC
1918
STC
17 21
spare
spare
spare
terminator
spare
terminator
Sector 1 Sector 2
L2STT Crate
LVDS serial links for communication between boards
J. Linnemann, MSU 15
Motherboardmain logic
L3 buffer control
link tx/rx
PCI busses for onboard
communication
SCL receiver
J. Linnemann, MSU 16
STT Card Flavors• Fiber Road Card
– fan out L1 tracker data– manage L3 buffers– arbitrate VME bus– FPGA based
• Silicon Trigger Card– preprocess Si data– associate hits with L1 tracks– FPGA based
• LVDS Rx, Tx cards– LVDS to PCI (132MB/s)– Input:
• Event building, buffering– Output:
• fanout
• Track Fit Card– fit trajectory to hits– DSP based; C program
• CPU (68K)– initialization– downloading– monitoring– Resets– VxWorks
• VBD (legacy)– L3 readout
J. Linnemann, MSU 17
Ready to run in Ready to run in 2001!2001!STT upgrade 1STT upgrade 1stst yearyear
Current Status• Baseline boards in production (or done)• Online Software (all under release control)
– Administrator, worker common code being tested• Similar stage with DSP code for SLIC cards
– Alpha device drivers written, being tuned– global script runner and example filters written– Most preprocessor algorithms in simulator
• Installation and commissioning has begun– Vertical slice verification now– Test stand running– Installing crates and cabling now– Baseline system test in fall 2000
J. Linnemann, MSU 18
Queuing Simulations
• Use RESQ package from IBM• Baseline system meets system requirements if:
– Median Preprocessor time of roughly 50 µsec– Median Global time of roughly 50 µsec– Avoid long tails in processing time– Buffers placed between all elements
• Concerns:– Correlation between Cal Workers– Retreat paths for Cal and Global
J. Linnemann, MSU 19
Deadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM time µs
Perce
nt de
adtim
e
50 µs jet process time40 µs jet process time30 µs jet process time20 µs jet process timeIdentical EM/Jet process time
0
2
4
6
8
10
12
0 10 20 30 40 50
Calorimeter PreprocessorMissing ET: (nearly) constant time 45 µsecEM, jet: vary between 20 and 50 µsec
% Deadtime for Event synchronous vs event asynchronous processing
Deadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM time µs
Perce
nt de
adtim
e50 µs jet process time40 µs jet process time30 µs jet process time20 µs jet process timeIdentical EM/Jet process time
0
2
4
6
8
10
12
0 10 20 30 40 50
J. Linnemann, MSU 20
More Global Workers?• Parallel algorithm suffers from Amdahl’s law• Additional workers
– event synchronous mode (lockstep)– event asynchronous mode (non-lockstep)
Deadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor time
Single global worker
Dual global workers, lockstepDual global workers, non-lockstep
µs
perc
ent d
eadt
ime
0
5
10
15
20
25
30
35
40
45
40 60 80 100 120 140 160
J. Linnemann, MSU 21
Processing of SMT Data• bad strip mask
– zero amplitude of flagged strips• pedestal/gain calibration
– chip-by-chip lookup table• clustering algorithm
– similar to offline, use 5 strips for centroid
cluster
centroid
strippuls
e he
ight
J. Linnemann, MSU 22
Track Fit Algorithm
• require hits in – at least 3 of the 4 SMT layers– same 30 degree sector– at most 2 adjacent barrels
• choose hits – closest to trajectory defined by CFT and beam
• linearized χ2 fit
J. Linnemann, MSU 23
STT Performance
• generate helical tracks• intersect with
– ideal detector geometry– nominal assembly tolerance– nominal + tent distortions
• reconstruct with ideal detector geometry• impact parameter resolution includes
– multiple scattering (50 µm GeV/pT)– beam spot size (30 µm)
J. Linnemann, MSU 24
Performance
2ε3 –(ε3)2 ε∝ f3f
10%7.5%4.5%12%9%
4.5%14%10%
4.5%
background pass rate
0.910.96120 µm11tent0.910.96108 µm4.6nominal1.001.0096 µm1ideal
1.5
0.880.94110 µm19tent0.880.9494 µm8nominal1.001.0076 µm1ideal
3.0
0.880.95100 µm30tent0.900.9586 µm11nominal1.001.0068 µm1ideal
∞
relative bb eff
relative b efficiency
impact par cut (2σ)
relative rate
geometrypT(GeV)
assume: luminous region =22 cm, three tracks with S>2, impact parameter of b-tracks =250 µm
Top Related