DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel...

1
DEVELOPMENT OF DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM I. Kisel (for CBM Collaboration) Collaboration) GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany; GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany; E- mail: [email protected] [email protected] Open Charm Event Open Charm Event Selection Selection D (c = 312 m): D + K - + + (9.5%) D 0 (c = 123 m): D 0 K - + (3.8%) D 0 K - + + - (7.5%) D s (c = 150 m): D + s K + K - + (5.3%) + c (c = 60 m): + c pK - + (5.0%) No simple trigger primitive, like high p t , available to tag events of interest. The only selective signature is the detection of the decay vertex. Very efficient tracking algorithms are essential for the feasibility of the open charm event selection Up to 10 9 tracks/sec in the Silicon Tracker Develop algorithms which exploit the full potential of modern processors. First step: - use SIMD instructions Best results were obtained with a Cellular Automaton based track finder with integrated Kalman filter track fit allows usage of double-side strip detectors even at high track densities highly optimized code - field approximated by polynomials - compact, cache-efficient data - most calculations SIMDized - fast on standard PC's - well adapted to next generation many-core and wide-SIMD processors - already ported to IBM Cell processor and NVIDIA graphics cards very fast when only hard quasi-primary tracks are reconstructed, as needed in the online first level event selection of open charm candidates supports reconstruction of soft tracks down to 100 MeV/c, as needed in the offline analysis High Speed Tracking Algorithms High Speed Tracking Algorithms Source: CBM Progress Report, 2008. Cell: Heterogeneous multi-core Cell: Heterogeneous multi-core Intel P4 Intel P4 Cell Cell lxg1411 eh102 blade11b c4 Optimization steps for the track fit routine Performance on different platforms CPU time for track reconstruction and fit Typical reconstructed Au+Au collision Concept of SIMD R&D Roadmap R&D Roadmap Detailed simulation and co- optimization of the tracking system and the analysis algorithms - alternate sensor types (single- sided sensors) - alternate module layouts Detailed studies of event selection algorithms - open charm selector covering all relevant channels (D 0 ,D ± ,D s c ) - design of multi-level event selection Mathematical and computational optimization of all algorithms Determine best platform for: - Hit/Cluster finding - Tracklet finding Go beyond SIMDization (from scalars to vectors) Address MIMDization (multi-threads, multi-cores and many-core systems) Exploit the numerical throughput of dedicated purpose processors like GPU's (Graphics Processors) Be ready for the emerging heterogeneous many-core systems Re-design algorithms to run efficiently on all CPU/GPU architectures Investigate new languages for the performance critical core of algorithms, like OpenCL, Ct or CUDA CPU/GPU CPU/GPU AMD: AMD: Fusion Fusion OpenCL? OpenCL? Gaming Gaming STI: STI: Cell Cell GP CPU GP CPU Intel: Intel: Larrabee Larrabee GP GPU GP GPU Nvidia: Nvidia: Tesla Tesla CPU CPU Intel: Intel: XXX-cores XXX-cores FPGA FPGA Xilinx Xilinx CPU: SIMD, multi-core CPU: SIMD, multi-core GPU: Controller plus many ALU GPU: Controller plus many ALU Deutsche Physikalische Gesellschaft e.V. Bochum 09 Tracking Challenge Tracking Challenge Fixed-target heavy-ion experiment 10 7 Au+Au collisions/sec ~ 1000 charged particles/collision Non-homogeneous magnetic field Double-sided strip detectors Track reconstruction in STS/MVD and displaced vertex search required in the first trigger level Scalability on Intel multi-core CPUs Porting to NVIDIA CUDA Cores Cores HW Threads HW Threads SIMD width SIMD width N N speed-up speed-up = N = N cores cores *(N *(N threads threads /2)*W /2)*W SIMD SIMD K K - + First level event selection is done in a processor farm fed with data from the event building network FPGA FPGA FPGA FPGA PC PC PC PC PC PC PC PC PC PC Sub-Farm Sub-Farm Winner of the DPG Poster Session 2009 Winner of the DPG Poster Session 2009

Transcript of DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel...

Page 1: DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

DEVELOPMENT OF DEVELOPMENT OF ONLINE EVENT SELECTION IN ONLINE EVENT SELECTION IN

CBMCBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany;GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany; E-mail: [email protected]@gsi.de Open Charm Event Open Charm Event

SelectionSelectionD (c = 312 m): D+ K-++ (9.5%)D0 (c = 123 m): D0 K-+ (3.8%) D0 K- + + - (7.5%) D

s (c = 150 m): D+

s K+ K- + (5.3%) +

c (c = 60 m): +

c pK-+ (5.0%)

No simple trigger primitive, like high pt, available to tag events of interest. The only selective signature is the detection of the decay vertex.

Very efficient tracking algorithms are essential for the feasibility of the open charm event selection

Up to 109 tracks/sec in the Silicon Tracker

Develop algorithms which exploit the full potential

of modern processors. First step:- use SIMD instructions

Best results were obtained with aCellular Automaton based track finderwith integrated Kalman filter track fit

allows usage of double-side strip detectors even at high track densities

highly optimized code- field approximated by polynomials- compact, cache-efficient data- most calculations SIMDized- fast on standard PC's- well adapted to next generation

many-core and wide-SIMD processors

- already ported to IBM Cell processor and NVIDIA graphics cards

very fast when only hard quasi-primary

tracks are reconstructed, as needed

in the online first level event selection

of open charm candidates

supports reconstruction of soft tracks

down to 100 MeV/c, as needed in the

offline analysis

High Speed Tracking AlgorithmsHigh Speed Tracking Algorithms

Source: CBM Progress Report, 2008.

Cell: Heterogeneous multi-coreCell: Heterogeneous multi-core

Inte

l P4

Inte

l P4

Cell

Cell

lxg1411

eh102blade11bc

4

Optimization steps for the track fit routine

Performance on different platforms

CPU time for track reconstruction and fitTypical reconstructed Au+Au collision

Concept of SIMD

R&D RoadmapR&D Roadmap

Detailed simulation and co-optimization of the

tracking system and the analysis algorithms

- alternate sensor types (single-sided sensors)

- alternate module layouts

Detailed studies of event selection algorithms

- open charm selector covering all relevant channels (D0,D±,Ds,Λc)

- design of multi-level event selection

Mathematical and computational optimization

of all algorithms

Determine best platform for:- Hit/Cluster finding- Tracklet finding- Tracking/Vertexting

Go beyond SIMDization (from scalars to vectors)

Address MIMDization (multi-threads, multi-coresand many-core systems)

Exploit the numerical throughputof dedicated purpose processorslike GPU's (Graphics Processors)

Be ready for the emerging heterogeneousmany-core systems

Re-design algorithms to run efficiently onall CPU/GPU architectures

Investigate new languages for the performance

critical core of algorithms, like OpenCL, Ct or CUDA

CPU/GPUCPU/GPU AMD: AMD: FusionFusion

CPU/GPUCPU/GPU AMD: AMD: FusionFusion

OpenCL?OpenCL?OpenCL?OpenCL?

GamingGaming STI: STI: CellCell

GamingGaming STI: STI: CellCell

GP CPUGP CPU Intel: Intel: LarrabeeLarrabee

GP CPUGP CPU Intel: Intel: LarrabeeLarrabee

GP GPUGP GPU Nvidia: Nvidia: TeslaTesla

GP GPUGP GPU Nvidia: Nvidia: TeslaTesla

CPUCPU Intel: Intel: XXX-coresXXX-cores

CPUCPU Intel: Intel: XXX-coresXXX-cores

FPGAFPGA XilinxXilinx

FPGAFPGA XilinxXilinx

CPU: SIMD, multi-coreCPU: SIMD, multi-core GPU: Controller plus many ALUGPU: Controller plus many ALU

Deutsche Physikalische Gesellschaft e.V.

Bochum 09

Tracking ChallengeTracking Challenge

Fixed-target heavy-ion experiment 107 Au+Au collisions/sec ~ 1000 charged particles/collision Non-homogeneous magnetic field Double-sided strip detectors Track reconstruction in STS/MVD and displaced vertex search required in the first trigger level

Scalability on Intel multi-core CPUs

Porting to NVIDIA CUDA

CoresCores

HW ThreadsHW ThreadsSIMD widthSIMD width

NNspeed-upspeed-up = N = Ncorescores*(N*(Nthreadsthreads/2)*W/2)*WSIMDSIMD

KK--

+

First level event selection is done in a processor farm fed with data from the event building network

FP

GA

FP

GA

FP

GA

FP

GA

PCPC PCPCPCPCPCPC PCPC

Sub-FarmSub-Farm

Winner of the DPG Poster Session 2009 Winner of the DPG Poster Session 2009