Heavy Ion Group
description
Transcript of Heavy Ion Group
Andrzej Olszewski 1September 13, 2010
Heavy Ion Group
Software&Computing Data distribution
Andrzej Olszewski 2September 13, 2010
2
• Large spread in event multiplicity from the change of impact parameter• Collision geometry mostly produces peripheral (low-multiplicity) events• But highest multiplicities may reach 5x average• Central event coming randomly every ~30 collision• In production of detector simulations this is aggravated by event ordering
placing most central events together in the end of production jobs
Glauber
Heavy Ion Collisions
Central collision
HIJING
Peripheral collision
Andrzej Olszewski 3September 13, 2010
• Collision geometry parameters• Heavy Ion physics results are often presented as a function of centrality• Can be obtained from calculations based on Glauber model, but better to
use results from Glauber based Monte Carlo models• Parameters need to be preserved thru the chain of simulation production
• Special features of collisions of heavy ions, bulk matter objects• Different physics program requires specific reconstruction algorithms• Different collision properties require modifications or replacement of
some of the standard Atlas reconstruction algorithms
• High particle multiplicities• Heavy Ion collisions produce much higher particle multiplicities than pp• This leads to longer reconstruction times and critically high memory usage• Large event size requires more disk space per event than pp data
Distinct Features
Andrzej Olszewski 4September 13, 2010
• Models Hijing and Hydjet used• Using code from official Genser distribution• Athena interface implemented for both models
• Problems?• Hijing code had a problem on SLC5• Fixed in June and partially validated (high statistics validation soon)
Event Generators
Andrzej Olszewski 5September 13, 2010
• Using standard Athena tools• Collision parameters transferred when using HeavyIonSimConfig.py
• Problems?• Long time (up to 24 hours) of detector simulations of a single central
event still acceptable in production on Grid at this collision energy• Rate of simulation permanent failures ~0.1%
Simulation & Digitization
Andrzej Olszewski 6September 13, 2010
• Using standard Athena tools• Heavy Ion specific modifications activated when using HeavyIonRecConfig.py
• Collision parameters transferred• Trigger algorithms selected by HI menus• Heavy Ion specific algorithms from HeavyIonRec used• Modifications in standard reconstruction algorithms activated
• Problems?• No production problems in rel. 15
Reconstruction
Andrzej Olszewski 7September 13, 2010
• HIGlobal: Global variables reconstruction– HICentrality: Event centrality– HIGlobalEt: Total Et– HIFlow: charged particle elliptic flow v2
– HIGlobalNSiCluster: dNch/d based on pixel cluster density
– HIPixelTracklets: dNch/d based on 2-point tracklets in Pixel detector
• HIJetRec: Jet reconstruction – extends standard JetRec + new background subtraction and fake jet rejection
• HIPhoton: Direct photon pre-analysis– based on pp photon algorithms, produces special ntuple for final analysis
HI Algorithms
Andrzej Olszewski 8September 13, 2010
• Trigger processing using dedicated HI menus– several trigger menus developed by Tomasz and Iwona Grabowska-Bołd
• Tracking running in newTracking mode– newTracking with modified cuts activated by doHeavyIon switch – to lower CPU and memory requirements– no lowPt, no BackTracking, no conversions
• Vertexing run in simple "DefaultFastFinding" mode– no V0 finder, no secondary vertices
• Regular jets off – Heavy Ion version of jet reconstruction run instead– no ETMiss, no ETFlow, no BTagging, no TauRec
• Calorimeter reconstruction based on cells– CaloTopoClusters activated for monitoring purposes
• Muon reconstruction on– no MuidLowPt
pp Algorithms
Andrzej Olszewski 9September 13, 2010
• Official production at energy ECM = 2.75 TeV done in recent campaign with releases 15.6.X.Y
• Description of mc09 samples• Hijing, Hydjet minimum bias and central samples• Hijing with particle flow• 5-10k events
• Additional Hijing requests for increased statistics and more physics samples accepted and running now (actually waiting in the queue with low priority)
• Hijing tasks• Hydjet tasks
Simulation Production
Andrzej Olszewski 10September 13, 2010
• Total amount and rate of data taking – fit data to available storage and computing resources
• Reconstruction properties and requirements, data types and sizes – required cpu time and disk space for storage of reconstruction results
• Tier0, Tier1 and Group Grid resources available – input for production and data distribution strategy
• Software development and installation procedures– deadlines, possible scenarios for running tests and production
• Production strategy – which resources will be used in which step
• Analysis Model – where the data should be distributed
Real Data Production Planning
Andrzej Olszewski 11September 13, 2010
CPU/Mem in rel. 15/16 Reconstruction
• Rel. 15.6.9.13 has acceptable CPU and memory consumption, with 100% reconstruction job success• Rel. 16.0.0 reconstruction on simulations (only) exceeds available ~4 GB memory limit in 55% of jobs • Reason 1: increased memory consumption between releases due to test run with tracking from min p T= 0.5 GeV,
leading to 50 MB difference (at lower multiplicity) to 700 MB difference in most central events!• Reason 2: increased memory consumption by monitoring algorithms, adding 200 MB more at high multiplicity!• To reduce memory usage we may look for compromise in tracking min pT and reduce monitoring requirements,
or run reconstruction on simulations without monitoring altogether.
DataTask
Release
CPU (RDO2ESD) Wall
average50 /event
Memmax[GB]
File size/event
init <event> finalize ESD CBNT HIST
simulation likereconstruction+monitoring
15.6.9.13 136 53 3.5 68 3.4 7.8 8.7 1.2
simulation like- no truth
15.6.9.13 133 50 3.4 59 3.1 5.6 8.4 1.2
data like- no trigger, no truth
15.6.9.13 109 44 2.7 51 2.5 5.5 8.4 1.1
simulation likereconstruction+monitoring
16.0.0 averages could not be taken because 55/100 jobs finished with memory allocation error
data like- no trigger, no truth
16.0.0 55 73 3.8 81 3.0 6.8 - 0.9
Andrzej Olszewski 12September 13, 2010
Data Reconstruction• Reconstruction strategy at Tier0 • Performance of Heavy Ion reconstruction with monitoring in rel. 15.6.9.13
– 45 CPU s/event (no trigger, no truth), assuming <2 min/event panda wall time)• Tier0 capacity
– current total CPU capacity of Tier0 farm (2500 cores!)– efficiency of CPU use at Tier0 (~100%!)– no additional CPU needed for other processing (file merging, etc.) at Tier0
• Calculated throughput: 30 PbPb events/hour/CPU core 1,800k events/day = 20.8 Hz• Expected rate of PbPb event data taking is > 60Hz so additional resources are needed• Separate data by streaming at DAQ
– Express stream with duplicated events used for prompt calibration at Tier0o Construct just one physics stream and reconstruct it promptly at Tier1 siteso Construct two exclusive physics streams: physics.prompt and physics.bulk
Prompt stream to be reconstructed promptly at Tier0 Bulk stream reconstructed at Tier1s with possibly updated software
Andrzej Olszewski 13September 13, 2010
Data Distribution• Resources available
– General Atlas disk resources on the Grid– Group disk resource for PHYS-HI group
sites: BNL, CERN, CYFRONET, WEIZMANN Total promised: 115 TB, used 15 TB (so far)
• Data formats and sizes– RAW: 2.2 MB/event – ESD: 5.5 MB/event (and possible dESD with reduced size)– D3PD: 4.0 MB/event (measured with truth) (in development, so size will change)
• Distribution strategy– General Atlas disk will be used to store RAW and ESD files from official production– Some RAW and ESD may be partially replicated to PHYS-HI group for group tests– D3PD are expected to be stored on PHYS-HI resources– Number of versions/copies will have to be adjusted to available resources
– Problems? New CYFRONET storage hardware installation delays
Andrzej Olszewski 14September 13, 2010
Software Readiness• Heavy Ion reconstruction code
– Mostly ready, expect minor changes– D3PD making in full development
• Validation available and running– HIValidation package running tests and comparing to reference– HIInDetValidation package testing HI mode tracking– Tests on MC and Data in RecJobTransforms– We need more tests in Tier0 configuration
• Production scripts– Mostly ready solutions using Reco_trf.py command– HeavyIonD3PDMaker ready for interface in Reco_trf.py (David Cote) – Tested in private production with standard monitoring and some specials for
trigger and tracking validation• Athena release schedule
– End of October Tier0 changes releases (from reprocessing schedule)– we need to have all updates in by that time
Andrzej Olszewski 15September 13, 2010
• PHYS-HI group has no access to CAF area – we may want to run calibration (tests) outside Tier0– we may want to run reconstruction tests on recent raw data– we shall ask for access and disk quota
• Group afs space at CERN?– for doing common development– for keeping group software release?
• Group releases and official group production?– not planned probably this year
• Panda group queues at sites with group space– I would like to re quest such queues with higher priorities for phys-hi group
production at CYFRONET and WEIZMANN– Could be used for increased efficiency in group production of D3PDs with latest
versions of software and for (group/user) analysis jobs
Issues & Plans
Andrzej Olszewski 16September 13, 2010
BACKUP
Andrzej Olszewski 17September 13, 2010
CPU/Mem in rel. 16 reconstruction
• Rel. 16.0.0 with tracking from min pT= 0.5 GeV increased memory consumption by 500 MB!
• Trying tracking with min pT= 1.0 GeV has still not reduced memory use enough to avoid some jobs crashed
DataTask
Release
CPU (RDO2ESD) Wall
average50 /event
Memmax[GB]
File size/event
init event finalize ESD CBNT HIST
Hijing 2.75 TeV+fixed jet monitoring
task: 16715.6.9.13
136 53 3.5 68 3.4 7.8 8.7 1.2
Hijing 2.75 TeV- no truth
task: 17615.6.9.13
133 50 3.4 59 3.1 5.6 8.4 1.2
Hijing 2.75 TeV- no trigger, no truth
task: 18115.6.9.13
109 44 2.7 51 2.5 5.5 8.4 1.1
Hijing 2.75 TeVfull reconstruction
task:26816.0.0
averages could not be taken because 55/100 jobs finished with memory allocation error
Hijing 2.75 TeVmin pT = 1 GeV
task:26816.0.0
averages could not be taken because 6/100 jobs finished with memory allocation error
Hijing 2.75- no trigger
task: 27816.0.0
56 84 2.1 96 3.4 9.1 - 0.9
Hijing 2.75 TeV- no trigger, no truth
task: 29016.0.0
55 73 3.8 81 3.0 6.8 - 0.9
Andrzej Olszewski 18September 13, 2010
CPU/Mem in rel. 16 reconstruction
nihialgs: only standard algorithms but in heavy ion modenotrigger: standard + heavy ion algorithmsnomonitor: standard + heavy ion + trigger algorithmsfull: all algorithms with monitoring
16.0.0 RDO2ESD high midcentral
seq 412
nohialgs Si2k: 3900.0 <cpu>: 4257.250 ms <vmem>: 2635.484 MBnotrigger Si2k: 3900.0 <cpu>: 6863.750 ms <vmem>: 2863.396 MBnomonitor Si2k: 3900.0 <cpu>: 7454.250 ms <vmem>: 3398.312 MBfull Si2k: 3900.0 <cpu>: 8062.750 ms <vmem>: 3981.224 MB
15.6.12.1 RDO2ESDhigh midcentral
seq 412
nohialgs Si2k: 3100.0 <cpu>: 2491.250 ms <vmem>: 1887.424 MBnotrigger Si2k: 3100.0 <cpu>: 5223.000 ms <vmem>: 2219.140 MBnomonitor Si2k: 3100.0 <cpu>: 7099.500 ms <vmem>: 2825.308 MBfull Si2k: 3100.0 <cpu>: 7315.000 ms <vmem>: 3130.396 MB
16.0.0 RDO2ESD lower midcentral seq 411
nohialgs Si2k: 2200.0 <cpu>: 3198.250 ms <vmem>: 1379.540 MBnotrigger Si2k: 1900.0 <cpu>: 6653.000 ms <vmem>: 1652.096 MBnomonitor Si2k: 2200.0 <cpu>: 7208.500 ms <vmem>: 2078.940 MBfull Si2k: 1900.0 <cpu>: 8334.250 ms <vmem>: 2261.072 MB
15.6.9.13 RDO2ESD lower midcentral seq
411
nohialgs Si2k: 1900.0 <cpu>: 2525.500 ms <vmem>: 1319.764 MBnotrigger Si2k: 2200.0 <cpu>: 5721.500 ms <vmem>: 1595.164 MBnomonitor Si2k: 1900.0 <cpu>: 6628.750 ms <vmem>: 2068.704 MBfull Si2k: 2200.0 <cpu>: 6982.250 ms <vmem>: 2281.704 MB
Andrzej Olszewski 19September 13, 2010
Algorithm Walltime ComparisonAlgorithm
Wall without tracking fixin rel. 15.6.9.13
Wall with tracking fix in rel. 15.6.9.13
ChronoStatSvcInDetSiSpTrackFinder:executeTrigSteer_EF:executeTrigJetRec_Cone:executeCaloTopoCluster:executeStreamBS:executeInDetAmbiguitySolver:executeInDetSiSpTrackFinderPixel:executeInDetSCTMonManager:executeCBNT_AthenaAware:executeInDetGlobalManager:executeegamma:executeAthena...InDetAmbiguitySolverPixel:executeegamma_ToolSvc.emshowersofte:executeCBNTAA_Truth:executeInDetExtensionProcessor:executeInDetPriVxFinder:executeMuonRPC_CablingSvc
26 [min]174 [s]156 [s]113 [s]109 [s]100 [s]72.9 [s]72.1 [s]65.1 [s]62.4 [s]59.4 [s]58 [s]
36.8 [s]34.9 [s]31 [s]30.4 [s]30.3 [s]28.6 [s]26.7 [s]
37.5 [min] 11.7 [min] !147 [s]108 [s]102 [s]94.8 [s]105 [s]118 [s]92.6 [s]59 [s]91.6 [s]64.5 [s]
105 [s]32.9 [s]46.3 [s]29.1 [s]48.4 [s] 18.2 [s] !24.9 [s]