Download - The DØ Level 2 Trigger - Michigan State University · tent 11 120 µm 0.96 0.91 nominal 4.6 108 µm 0.96 0.91 ideal 1 96 µm 1.00 1.00 1.5 tent 19 110 µm 0.94 0.88 nominal 8 94

J. Linnemann, MSU 1

The DØ Level 2Trigger

James T. LinnemannMichigan State University

For the DØ L2 Trigger Group(10 institutions!)

DPF Aug 12, 2000

Columbus, Ohio

J. Linnemann, MSU 2

RunII at the TevatronAccelerator upgrades will

produce >30 times the instantaneous

p-pbar beam luminosity of the previous run

Run I (1992-6)Operation

Run II (2000 +)Operation

Detector Input

Trigger Selection

Data Tape

7.5 MHz300 kHz

3 Hz 50 Hz

J. Linnemann, MSU 3

Trigger Strategy

Calor. Muon SiliconVertex

Cent.Tracker

Pre-Shower

e+,e- em clus track >> mip

γγγγ Em clus no track >> mip

µµµµ Mip µ track track

ττττ Had clus. track

νννν Miss ET

Jets Had clus.

HeavyFlavor

Had clus µ track displacedvertex

Detectors provide geometric objectsCombined by trigger systems to formphysics objects

Trigger decisions: count objectsmeasure object propertiesmeasure object relations

J. Linnemann, MSU 4

RunII Trigger SystemTrigger components and stages

Detectors

FrameworkDAQ

Tape

10 KhZ

1 KhZ

50 HzL3 Farm

Level 2

L2 Global

Level 1

Prec

isio

n re

adou

t

Fast readout128 trigger bits

Branching: 128 to N triggers

Streaming

outputrate

pre-processors

J. Linnemann, MSU 5

Trigger Organization

L2FW:Combined objects (e, µ, j)

L1FW: towers, tracks, correlations

L1CAL

L2STT

GlobalL2L2CTT

L2PS

L2Cal

L1CTT

L2Muon

L1Muon

L1FPD

Detector L1 Trigger L2 Trigger7.5 MHz 5-10 kHz

CAL

FPSCPS

CFT

SMT

Muon

FPD

1000 Hz

J. Linnemann, MSU 6

Level 2 Requirements• First event wide trigger decision• Combine sub-detector information into physics objects

(e, jets, µ, missing ET)• 10 kHz input rate � 100 µµµµsec time budget• 1 kHz accept rate � 90% rejection• Less than 5% deadtime• Event order must be maintained!• 16 front-end buffers for events awaiting decisions• Hardware support for monitoring and diagnosis

– And VME readback of all download, status

J. Linnemann, MSU 7

Stochastic Pipeline

• 2-Stage pipeline: Preprocessor and Global– Preprocessors use detector-specific L1 data– Global processor combines detectors

• Triggers map 1-to-1 with L1 (128 bits)• Each trigger programmable (physics objects and cuts)

• Full set of buffers (16) between stages– Busy raised by front ends not L2

• Readout driven by hardware framework• Muon, STT have more pipeline stages • Design verified by queueing simulations

J. Linnemann, MSU 8

Parallelism

• Preprocessors – Muon, tracking, and PS use “geographic parallelism”– Calorimeter preprocessor

• Find EM objects, jets and calculate missing ET in parallel• EM/jet/Missing ET event synchronous (“lockstep”)• Can the processing be done in event asynchronous way?

• Global– Parallel algorithm?

• Need a highly parallel algorithm or suffer from Amdahl’s law

– Global farm?• Must report answers in same order as arrival

J. Linnemann, MSU 9

DPM

VME to L3

Level 2 Crate (9U VME for Physics)

• Pre-processors and global have similar crates• Data processing is done by

– 500 MHz Alpha CPU cards (=25,000 cycles/event)– or processor specific hardware

MagicBus

VMEbus

custom processors

Slots for αα ααor

αα ααW

orker

Magic Bus Trans

Processors

αα ααAdm

inistrator

(J3)

J. Linnemann, MSU 10

Data Flow• In, Out by serial lines: 16MB/s Hotlinks, 106MB/s G-links

• MBT broadcasts to Alphas 320MB/s Magic Bus

(Muon PP only, DSPs)

Magic Bus Transceiver

Worker

Administrator

Mbus Broadcast

Level 3

(pre-processors only)

L2 Global

VBD

Detector L1 Source SLICtranslator


Alpha Processor Card• Based on DEC PC164 board

– Joint DØ/CDF Project• Contains:

– 500MHz 21164 Alpha CPU 2-4 instructions/cycle– 128Mb main memory– 267MB/s PCI

• VME interface• 320MB/s MBus interfaces (DMA+PIO) (all I/O > 1KHz)• Custom Backplane control lines, interrupts• Ethernet, EIDE Hard Drive (Linux)• 32 channel ECL output port (monitoring)

• Two functions• administrator controls and manages workers• workers process data


Other Level 2 Cards• Magic Bus Transceiver (MBT) 8 in, 2 out

– sends pre-processor outputs to global crate (Hotlinks)– broadcasts incoming pre-processor data (Hotlinks) to alphas (MBus)– interfaces with Hardware Trigger Framework

• SLIC (muon system) 16 in, 2 out– 5 DSPs, 1 master and 4 workers– does basic formatting and sorting of data

• Fiber Input Converter (FIC)– translates G-link to Hotlink for L1 Trigger

• Serial Fanout (SFO) 1-6 in, 12 out– fans out 160 MHz Hotlinks (including test stand data copies)

• Cable Input Converter (CIC) 12 in, 12 out– recovers signals for muon processor inputs

•• Bit3 VMEBit3 VME--PC controllerPC controller PCI/fiber in, VME/dualport– monitoring and initialization commercial cardcommercial card

•• VME Buffer Driver (VBD)VME Buffer Driver (VBD)– reads and stores data to send to Level 3 Run I Legacy cardRun I Legacy card


Alpha Environment• Two environments used:

– Alpha Linux for most running, debugging• modified Linux low level interrupt handler; physical addresses• turn off Linux while running--unless crash returns to debugger • run test software to debug crate cards in situ

– bare system still used for some low level testing• C++ but carefully---

– No dynamic memory during event loop– No RTTI, STL– Restricted use of virtual functions– Data I/O done for user

• No user I/O except via error logger during running– Run user code unaltered in simulator

• Relink with simulation version of interfaces


CPU

1 2

spare

3

VBD

4 5 6 7

STC

8

STC

9

STC

10

STC

11

STC

12

STC

13

FRC

14

STC

15

STC

16

TFC

20

TFC

1918

STC

17 21

spare

spare

spare

terminator

spare

terminator

Sector 1 Sector 2

L2STT Crate

LVDS serial links for communication between boards


Motherboardmain logic

L3 buffer control

link tx/rx

PCI busses for onboard

communication

SCL receiver


STT Card Flavors• Fiber Road Card

– fan out L1 tracker data– manage L3 buffers– arbitrate VME bus– FPGA based

• Silicon Trigger Card– preprocess Si data– associate hits with L1 tracks– FPGA based

• LVDS Rx, Tx cards– LVDS to PCI (132MB/s)– Input:

• Event building, buffering– Output:

• fanout

• Track Fit Card– fit trajectory to hits– DSP based; C program

• CPU (68K)– initialization– downloading– monitoring– Resets– VxWorks

• VBD (legacy)– L3 readout


Ready to run in Ready to run in 2001!2001!STT upgrade 1STT upgrade 1stst yearyear

Current Status• Baseline boards in production (or done)• Online Software (all under release control)

– Administrator, worker common code being tested• Similar stage with DSP code for SLIC cards

– Alpha device drivers written, being tuned– global script runner and example filters written– Most preprocessor algorithms in simulator

• Installation and commissioning has begun– Vertical slice verification now– Test stand running– Installing crates and cabling now– Baseline system test in fall 2000


Queuing Simulations

• Use RESQ package from IBM• Baseline system meets system requirements if:

– Median Preprocessor time of roughly 50 µsec– Median Global time of roughly 50 µsec– Avoid long tails in processing time– Buffers placed between all elements

• Concerns:– Correlation between Cal Workers– Retreat paths for Cal and Global


Deadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM time µs

Perce

nt de

adtim

e

50 µs jet process time40 µs jet process time30 µs jet process time20 µs jet process timeIdentical EM/Jet process time

0

2

4

6

8

10

12

0 10 20 30 40 50

Calorimeter PreprocessorMissing ET: (nearly) constant time 45 µsecEM, jet: vary between 20 and 50 µsec

% Deadtime for Event synchronous vs event asynchronous processing

Deadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM timeDeadtime vs Average EM time µs

Perce

nt de

adtim

e50 µs jet process time40 µs jet process time30 µs jet process time20 µs jet process timeIdentical EM/Jet process time

0

2

4

6

8

10

12

0 10 20 30 40 50


More Global Workers?• Parallel algorithm suffers from Amdahl’s law• Additional workers

– event synchronous mode (lockstep)– event asynchronous mode (non-lockstep)

Deadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor timeDeadtime vs processor time

Single global worker

Dual global workers, lockstepDual global workers, non-lockstep

µs

perc

ent d

eadt

ime

0

5

10

15

20

25

30

35

40

45

40 60 80 100 120 140 160


Processing of SMT Data• bad strip mask

– zero amplitude of flagged strips• pedestal/gain calibration

– chip-by-chip lookup table• clustering algorithm

– similar to offline, use 5 strips for centroid

cluster

centroid

strippuls

e he

ight


Track Fit Algorithm

• require hits in – at least 3 of the 4 SMT layers– same 30 degree sector– at most 2 adjacent barrels

• choose hits – closest to trajectory defined by CFT and beam

• linearized χ2 fit


STT Performance

• generate helical tracks• intersect with

– ideal detector geometry– nominal assembly tolerance– nominal + tent distortions

• reconstruct with ideal detector geometry• impact parameter resolution includes

– multiple scattering (50 µm GeV/pT)– beam spot size (30 µm)


Performance

2ε3 –(ε3)2 ε∝ f3f

10%7.5%4.5%12%9%

4.5%14%10%

4.5%

background pass rate

0.910.96120 µm11tent0.910.96108 µm4.6nominal1.001.0096 µm1ideal

1.5

0.880.94110 µm19tent0.880.9494 µm8nominal1.001.0076 µm1ideal

3.0

0.880.95100 µm30tent0.900.9586 µm11nominal1.001.0068 µm1ideal

∞

relative bb eff

relative b efficiency

impact par cut (2σ)

relative rate

geometrypT(GeV)

assume: luminous region =22 cm, three tracks with S>2, impact parameter of b-tracks =250 µm