Heavy Ion Group

19
Andrzej Olszewski 1 September 13, 2010 Heavy Ion Group Software&Computing Data distribution

description

Software&Computing Data distribution. Heavy Ion Group. Heavy Ion Collisions. Glauber. Large spread in event multiplicity from the change of impact parameter Collision geometry mostly produces peripheral (low-multiplicity) events But highest multiplicities may reach 5x average - PowerPoint PPT Presentation

Transcript of Heavy Ion Group

Page 1: Heavy Ion Group

Andrzej Olszewski 1September 13, 2010

Heavy Ion Group

Software&Computing Data distribution

Page 2: Heavy Ion Group

Andrzej Olszewski 2September 13, 2010

2

• Large spread in event multiplicity from the change of impact parameter• Collision geometry mostly produces peripheral (low-multiplicity) events• But highest multiplicities may reach 5x average• Central event coming randomly every ~30 collision• In production of detector simulations this is aggravated by event ordering

placing most central events together in the end of production jobs

Glauber

Heavy Ion Collisions

Central collision

HIJING

Peripheral collision

Page 3: Heavy Ion Group

Andrzej Olszewski 3September 13, 2010

• Collision geometry parameters• Heavy Ion physics results are often presented as a function of centrality• Can be obtained from calculations based on Glauber model, but better to

use results from Glauber based Monte Carlo models• Parameters need to be preserved thru the chain of simulation production

• Special features of collisions of heavy ions, bulk matter objects• Different physics program requires specific reconstruction algorithms• Different collision properties require modifications or replacement of

some of the standard Atlas reconstruction algorithms

• High particle multiplicities• Heavy Ion collisions produce much higher particle multiplicities than pp• This leads to longer reconstruction times and critically high memory usage• Large event size requires more disk space per event than pp data

Distinct Features

Page 4: Heavy Ion Group

Andrzej Olszewski 4September 13, 2010

• Models Hijing and Hydjet used• Using code from official Genser distribution• Athena interface implemented for both models

• Problems?• Hijing code had a problem on SLC5• Fixed in June and partially validated (high statistics validation soon)

Event Generators

Page 5: Heavy Ion Group

Andrzej Olszewski 5September 13, 2010

• Using standard Athena tools• Collision parameters transferred when using HeavyIonSimConfig.py

• Problems?• Long time (up to 24 hours) of detector simulations of a single central

event still acceptable in production on Grid at this collision energy• Rate of simulation permanent failures ~0.1%

Simulation & Digitization

Page 6: Heavy Ion Group

Andrzej Olszewski 6September 13, 2010

• Using standard Athena tools• Heavy Ion specific modifications activated when using HeavyIonRecConfig.py

• Collision parameters transferred• Trigger algorithms selected by HI menus• Heavy Ion specific algorithms from HeavyIonRec used• Modifications in standard reconstruction algorithms activated

• Problems?• No production problems in rel. 15

Reconstruction

Page 7: Heavy Ion Group

Andrzej Olszewski 7September 13, 2010

• HIGlobal: Global variables reconstruction– HICentrality: Event centrality– HIGlobalEt: Total Et– HIFlow: charged particle elliptic flow v2

– HIGlobalNSiCluster: dNch/d based on pixel cluster density

– HIPixelTracklets: dNch/d based on 2-point tracklets in Pixel detector

• HIJetRec: Jet reconstruction – extends standard JetRec + new background subtraction and fake jet rejection

• HIPhoton: Direct photon pre-analysis– based on pp photon algorithms, produces special ntuple for final analysis

HI Algorithms

Page 8: Heavy Ion Group

Andrzej Olszewski 8September 13, 2010

• Trigger processing using dedicated HI menus– several trigger menus developed by Tomasz and Iwona Grabowska-Bołd

• Tracking running in newTracking mode– newTracking with modified cuts activated by doHeavyIon switch – to lower CPU and memory requirements– no lowPt, no BackTracking, no conversions

• Vertexing run in simple "DefaultFastFinding" mode– no V0 finder, no secondary vertices

• Regular jets off – Heavy Ion version of jet reconstruction run instead– no ETMiss, no ETFlow, no BTagging, no TauRec

• Calorimeter reconstruction based on cells– CaloTopoClusters activated for monitoring purposes

• Muon reconstruction on– no MuidLowPt

pp Algorithms

Page 9: Heavy Ion Group

Andrzej Olszewski 9September 13, 2010

• Official production at energy ECM = 2.75 TeV done in recent campaign with releases 15.6.X.Y

• Description of mc09 samples• Hijing, Hydjet minimum bias and central samples• Hijing with particle flow• 5-10k events

• Additional Hijing requests for increased statistics and more physics samples accepted and running now (actually waiting in the queue with low priority)

• Hijing tasks• Hydjet tasks

Simulation Production

Page 10: Heavy Ion Group

Andrzej Olszewski 10September 13, 2010

• Total amount and rate of data taking – fit data to available storage and computing resources

• Reconstruction properties and requirements, data types and sizes – required cpu time and disk space for storage of reconstruction results

• Tier0, Tier1 and Group Grid resources available – input for production and data distribution strategy

• Software development and installation procedures– deadlines, possible scenarios for running tests and production

• Production strategy – which resources will be used in which step

• Analysis Model – where the data should be distributed

Real Data Production Planning

Page 11: Heavy Ion Group

Andrzej Olszewski 11September 13, 2010

CPU/Mem in rel. 15/16 Reconstruction

• Rel. 15.6.9.13 has acceptable CPU and memory consumption, with 100% reconstruction job success• Rel. 16.0.0 reconstruction on simulations (only) exceeds available ~4 GB memory limit in 55% of jobs • Reason 1: increased memory consumption between releases due to test run with tracking from min p T= 0.5 GeV,

leading to 50 MB difference (at lower multiplicity) to 700 MB difference in most central events!• Reason 2: increased memory consumption by monitoring algorithms, adding 200 MB more at high multiplicity!• To reduce memory usage we may look for compromise in tracking min pT and reduce monitoring requirements,

or run reconstruction on simulations without monitoring altogether.

DataTask

Release

CPU (RDO2ESD) Wall

average50 /event

Memmax[GB]

File size/event

init <event> finalize ESD CBNT HIST

simulation likereconstruction+monitoring

15.6.9.13 136 53 3.5 68 3.4 7.8 8.7 1.2

simulation like- no truth

15.6.9.13 133 50 3.4 59 3.1 5.6 8.4 1.2

data like- no trigger, no truth

15.6.9.13 109 44 2.7 51 2.5 5.5 8.4 1.1

simulation likereconstruction+monitoring

16.0.0 averages could not be taken because 55/100 jobs finished with memory allocation error

data like- no trigger, no truth

16.0.0 55 73 3.8 81 3.0 6.8 - 0.9

Page 12: Heavy Ion Group

Andrzej Olszewski 12September 13, 2010

Data Reconstruction• Reconstruction strategy at Tier0 • Performance of Heavy Ion reconstruction with monitoring in rel. 15.6.9.13

– 45 CPU s/event (no trigger, no truth), assuming <2 min/event panda wall time)• Tier0 capacity

– current total CPU capacity of Tier0 farm (2500 cores!)– efficiency of CPU use at Tier0 (~100%!)– no additional CPU needed for other processing (file merging, etc.) at Tier0

• Calculated throughput: 30 PbPb events/hour/CPU core 1,800k events/day = 20.8 Hz• Expected rate of PbPb event data taking is > 60Hz so additional resources are needed• Separate data by streaming at DAQ

– Express stream with duplicated events used for prompt calibration at Tier0o Construct just one physics stream and reconstruct it promptly at Tier1 siteso Construct two exclusive physics streams: physics.prompt and physics.bulk

Prompt stream to be reconstructed promptly at Tier0 Bulk stream reconstructed at Tier1s with possibly updated software

Page 13: Heavy Ion Group

Andrzej Olszewski 13September 13, 2010

Data Distribution• Resources available

– General Atlas disk resources on the Grid– Group disk resource for PHYS-HI group

sites: BNL, CERN, CYFRONET, WEIZMANN Total promised: 115 TB, used 15 TB (so far)

• Data formats and sizes– RAW: 2.2 MB/event – ESD: 5.5 MB/event (and possible dESD with reduced size)– D3PD: 4.0 MB/event (measured with truth) (in development, so size will change)

• Distribution strategy– General Atlas disk will be used to store RAW and ESD files from official production– Some RAW and ESD may be partially replicated to PHYS-HI group for group tests– D3PD are expected to be stored on PHYS-HI resources– Number of versions/copies will have to be adjusted to available resources

– Problems? New CYFRONET storage hardware installation delays

Page 14: Heavy Ion Group

Andrzej Olszewski 14September 13, 2010

Software Readiness• Heavy Ion reconstruction code

– Mostly ready, expect minor changes– D3PD making in full development

• Validation available and running– HIValidation package running tests and comparing to reference– HIInDetValidation package testing HI mode tracking– Tests on MC and Data in RecJobTransforms– We need more tests in Tier0 configuration

• Production scripts– Mostly ready solutions using Reco_trf.py command– HeavyIonD3PDMaker ready for interface in Reco_trf.py (David Cote) – Tested in private production with standard monitoring and some specials for

trigger and tracking validation• Athena release schedule

– End of October Tier0 changes releases (from reprocessing schedule)– we need to have all updates in by that time

Page 15: Heavy Ion Group

Andrzej Olszewski 15September 13, 2010

• PHYS-HI group has no access to CAF area – we may want to run calibration (tests) outside Tier0– we may want to run reconstruction tests on recent raw data– we shall ask for access and disk quota

• Group afs space at CERN?– for doing common development– for keeping group software release?

• Group releases and official group production?– not planned probably this year

• Panda group queues at sites with group space– I would like to re quest such queues with higher priorities for phys-hi group

production at CYFRONET and WEIZMANN– Could be used for increased efficiency in group production of D3PDs with latest

versions of software and for (group/user) analysis jobs

Issues & Plans

Page 16: Heavy Ion Group

Andrzej Olszewski 16September 13, 2010

BACKUP

Page 17: Heavy Ion Group

Andrzej Olszewski 17September 13, 2010

CPU/Mem in rel. 16 reconstruction

• Rel. 16.0.0 with tracking from min pT= 0.5 GeV increased memory consumption by 500 MB!

• Trying tracking with min pT= 1.0 GeV has still not reduced memory use enough to avoid some jobs crashed

DataTask

Release

CPU (RDO2ESD) Wall

average50 /event

Memmax[GB]

File size/event

init event finalize ESD CBNT HIST

Hijing 2.75 TeV+fixed jet monitoring

task: 16715.6.9.13

136 53 3.5 68 3.4 7.8 8.7 1.2

Hijing 2.75 TeV- no truth

task: 17615.6.9.13

133 50 3.4 59 3.1 5.6 8.4 1.2

Hijing 2.75 TeV- no trigger, no truth

task: 18115.6.9.13

109 44 2.7 51 2.5 5.5 8.4 1.1

Hijing 2.75 TeVfull reconstruction

task:26816.0.0

averages could not be taken because 55/100 jobs finished with memory allocation error

Hijing 2.75 TeVmin pT = 1 GeV

task:26816.0.0

averages could not be taken because 6/100 jobs finished with memory allocation error

Hijing 2.75- no trigger

task: 27816.0.0

56 84 2.1 96 3.4 9.1 - 0.9

Hijing 2.75 TeV- no trigger, no truth

task: 29016.0.0

55 73 3.8 81 3.0 6.8 - 0.9

Page 18: Heavy Ion Group

Andrzej Olszewski 18September 13, 2010

CPU/Mem in rel. 16 reconstruction

nihialgs: only standard algorithms but in heavy ion modenotrigger: standard + heavy ion algorithmsnomonitor: standard + heavy ion + trigger algorithmsfull: all algorithms with monitoring

16.0.0 RDO2ESD high midcentral

seq 412

nohialgs Si2k: 3900.0 <cpu>: 4257.250 ms <vmem>: 2635.484 MBnotrigger Si2k: 3900.0 <cpu>: 6863.750 ms <vmem>: 2863.396 MBnomonitor Si2k: 3900.0 <cpu>: 7454.250 ms <vmem>: 3398.312 MBfull Si2k: 3900.0 <cpu>: 8062.750 ms <vmem>: 3981.224 MB

15.6.12.1 RDO2ESDhigh midcentral

seq 412

nohialgs Si2k: 3100.0 <cpu>: 2491.250 ms <vmem>: 1887.424 MBnotrigger Si2k: 3100.0 <cpu>: 5223.000 ms <vmem>: 2219.140 MBnomonitor Si2k: 3100.0 <cpu>: 7099.500 ms <vmem>: 2825.308 MBfull Si2k: 3100.0 <cpu>: 7315.000 ms <vmem>: 3130.396 MB

16.0.0 RDO2ESD lower midcentral seq 411

nohialgs Si2k: 2200.0 <cpu>: 3198.250 ms <vmem>: 1379.540 MBnotrigger Si2k: 1900.0 <cpu>: 6653.000 ms <vmem>: 1652.096 MBnomonitor Si2k: 2200.0 <cpu>: 7208.500 ms <vmem>: 2078.940 MBfull Si2k: 1900.0 <cpu>: 8334.250 ms <vmem>: 2261.072 MB

15.6.9.13 RDO2ESD lower midcentral seq

411

nohialgs Si2k: 1900.0 <cpu>: 2525.500 ms <vmem>: 1319.764 MBnotrigger Si2k: 2200.0 <cpu>: 5721.500 ms <vmem>: 1595.164 MBnomonitor Si2k: 1900.0 <cpu>: 6628.750 ms <vmem>: 2068.704 MBfull Si2k: 2200.0 <cpu>: 6982.250 ms <vmem>: 2281.704 MB

Page 19: Heavy Ion Group

Andrzej Olszewski 19September 13, 2010

Algorithm Walltime ComparisonAlgorithm

Wall without tracking fixin rel. 15.6.9.13

Wall with tracking fix in rel. 15.6.9.13

ChronoStatSvcInDetSiSpTrackFinder:executeTrigSteer_EF:executeTrigJetRec_Cone:executeCaloTopoCluster:executeStreamBS:executeInDetAmbiguitySolver:executeInDetSiSpTrackFinderPixel:executeInDetSCTMonManager:executeCBNT_AthenaAware:executeInDetGlobalManager:executeegamma:executeAthena...InDetAmbiguitySolverPixel:executeegamma_ToolSvc.emshowersofte:executeCBNTAA_Truth:executeInDetExtensionProcessor:executeInDetPriVxFinder:executeMuonRPC_CablingSvc

26 [min]174 [s]156 [s]113 [s]109 [s]100 [s]72.9 [s]72.1 [s]65.1 [s]62.4 [s]59.4 [s]58 [s]

36.8 [s]34.9 [s]31 [s]30.4 [s]30.3 [s]28.6 [s]26.7 [s]

37.5 [min] 11.7 [min] !147 [s]108 [s]102 [s]94.8 [s]105 [s]118 [s]92.6 [s]59 [s]91.6 [s]64.5 [s]

105 [s]32.9 [s]46.3 [s]29.1 [s]48.4 [s] 18.2 [s] !24.9 [s]