Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment
-
Upload
holly-adams -
Category
Documents
-
view
21 -
download
0
description
Transcript of Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment
Many-Core ScalabilityMany-Core Scalabilityof the Online Event of the Online Event
ReconstructionReconstructionin the CBM Experimentin the CBM Experiment
Ivan KiselIvan KiselGSI, GermanyGSI, Germany
(for the CBM Collaboration)(for the CBM Collaboration)
CHEP-2010CHEP-2010Taipei, October 18-22, 2010Taipei, October 18-22, 2010
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 22/11/11
Tracking Challenge in CBM (FAIR/GSI, Germany)
• Fixed-target heavy-ion experiment• 107 Au+Au collisions/s• 1000 charged particles/collision• Non-homogeneous magnetic field• Double-sided strip detectors (85% combinatorial space points)
Track reconstruction in STS/MVD and displaced vertex search are required in the first trigger level.
Reconstruction packages:• track finding Cellular Automaton (CA)• track fitting Kalman Filter (KF)• vertexing KF Particle
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 33/11/11
Many-Core CPU/GPU ArchitecturesMany-Core CPU/GPU Architectures
2x4 cores 512 cores
1+8 cores32 cores
Intel/AMD CPU NVIDIA GPU
Intel MICA IBM Cell
Future systems are heterogeneousFuture systems are heterogeneous
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 44/11/11
Kalman Filter (KF) based Track FitKalman Filter (KF) based Track Fit
Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF)
Detector layersHits
(r, C)
r – Track parameters C – Precision
Initialising
Prediction
Correction
Precision
11
22
33
r = { x, y, z, px, py, pz }
Position, direction and momentum
State vector
Nowadays the Kalman Filter is used in almost all HEP experiments
Nowadays the Kalman Filter is used in almost all HEP experiments
Kalman Filter: 1.Start with an arbitrary initialization.2.Add one hit after another. 3.Improve the state vector. 4.Get the optimal parameters after the last hit.
KF as a recursive least squares method
KF Block-diagram 11
22 33
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 55/11/11
Performance of the KF Track Fit on CPU/GPU Systems
scalar double single ->2 4 8 16 32
1.00
10.00
0.10
0.01
Scalability on different CPU architectures – speed-up 100
Data Stream Parallelism(10x)
Task Level Parallelism(100x)
2xCell SPE (16 )
Woodcrest ( 2 )Clovertown ( 4 )
Dunnington ( 6 )
SIMD Cores and Threads
Tim
e/T
rack
, s
Threads
Cores
Threads
SIMD
Real-time performance on different Intel CPU platforms Real-time performance on NVIDIA GPU graphic cards
Scalabilty
CPUGPU
The Kalman Filter algorithm performs at ns levelThe Kalman Filter algorithm performs at ns level
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 66/11/11
Scalability of the KF Track Fit on Scalability of the KF Track Fit on 2xNehalem = 8 Cores2xNehalem = 8 Cores
Aiming to develop a scheduler, we investigate scalability and CPU load.
Given n threads each filled with 10m tracks, run them on specific n logical cores with 1 thread per 1 logical core.
Each physical core of Nehalem has 2 HW threads (read/exe), which are considered by the operating system as logical cores.
For small groups of tracks the overhead becomes significant, while large groups of tracks use CPU more efficient.
Measure tracks throughput rather than time per track.
Core 1Core 2
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 77/11/11
Cellular Automaton (CA) as Track FinderCellular Automaton (CA) as Track Finder
Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA)
0. Hits
1. Segments
1 2 3 42. Counters
3. Track Candidates
4. Tracks
Cellular Automaton:• local w.r.t. data• intrinsically parallel• extremely simple• very fast
Perfect for many-core CPU/GPU !
Cellular Automaton:• local w.r.t. data• intrinsically parallel• extremely simple• very fast
Perfect for many-core CPU/GPU !
Detector layers
Hits
4. Tracks (CBM)
0. Hits (CBM)
1000 Hits
1000 Tracks
Cellular Automaton:1.Build short track segments.2.Connect according to the track model, estimate a possible position on a track.3.Tree structures appear, collect segments into track candidates.4.Select the best track candidates.
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 88/11/11
CBM Cellular Automaton Track Finder770
TracksTop view Front view
Efficiency Scalability
Highly efficient reconstruction of 150 central collisions per second
Highly efficient reconstruction of 150 central collisions per second
Intel X5550, 2x4 cores at 2.67 GHz
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 99/11/11
CA Track Finder: First 4D Version
The beam will have no bunch structure, therefore 4D measurements (x, y, z, t).
• 100 mbias events have been divided into groups, each of them has been processed as a single event.• Each hit had an integer index (equal to the event number, later will represent the time measurement t ).• Each tracklet was constructed from hits with the same t.
The same efficiency. Slight increase of the processing time with larger time slices.
Reconstruction of time slices rather then events will be needed.
Ready for more realistic 4D simulations
Ready for more realistic 4D simulations
Events/group Speed, ev/s
1 62
2 67
4 62
5 61
10 54
20 47
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 1010/11/11
Consolidate Efforts: International Tracking Workshop
45 participants from Austria, China, Germany, India, Italy, Norway, Russia, Switzerland, UK and USA
Follow-up Workshop:10-11 March, 2011,
CERNSverre Jarp
Follow-up Workshop:10-11 March, 2011,
CERNSverre Jarp
21 October 2010, CHEP-201021 October 2010, CHEP-2010 Ivan Kisel, GSIIvan Kisel, GSI 1111/11/11
SummarySummary
• CBM event reconstruction is at the ms levelCBM event reconstruction is at the ms level• Strong many-core scalability Strong many-core scalability • Data parallelism vs. task parallelismData parallelism vs. task parallelism• Use SIMD unit, multi-threading, many-coresUse SIMD unit, multi-threading, many-cores• Parallelization is now becoming a standard in the CBM experimentParallelization is now becoming a standard in the CBM experiment