Post on 16-Jan-2016
description
1
ATLAS DC2 Production …on Grid3
M. Mambelli, University of Chicago
for the US ATLAS DC2 team
September 28, 2004CHEP04
2
ATLAS Data Challenges Purpose
Validate the LHC computing model Develop distributed production & analysis tools Provide large datasets for physics working groups
Schedule DC1 (2002-2003): full software chain DC2 (2004): automatic grid production system DC3 (2006): drive final deployments for startup
3
ATLAS DC2 Production Phase I: Simulation (Jul-Sep 04)
generation, simulation & pileup produced datasets stored on Tier1 centers, then
CERN (Tier0) scale: ~10M events, 30 TB
Phase II: “Tier0 Test” @CERN (1/10 scale) Produce ESD, AOD (reconstruction) Stream to Tier1 centers
Phase III: Distributed analysis (Oct-Dec 04) access to event and non-event data from anywhere in
the world both in organized and chaotic wayscf. D. Adams, #115
4
ATLAS Production System Components
Production database ATLAS job definition and status
Supervisor (all Grids): Windmill (L. Goossens, #501) Job distribution and verification system
Data Management: Don Quijote (M. Branco #142) Provides ATLAS layer above Grid replica systems
Grid Executors LCG: Lexor (D. Rebatto #364) NorduGrid: Dulcinea (O. Smirnova #499) Grid3: Capone (this talk)
5
LCG NorduGrid Grid3 LSF
LCGexe
NGexe
G3exe
Legacyexe
super super super super
prodDB(CERN)
datamanagement
RLS RLS RLS
jabber soap soap jabber
Don Quijote “DQ”
Windmill
Lexor
AMI(Metadata)
CaponeDulcinea
ATLAS Global Architecture
this talk
6
Capone and Grid3 Requirements Interface to Grid3 (GriPhyN VDT based) Manage all steps in the job life cycle
prepare, submit, monitor, output & register Manage workload and data placement Process messages from Windmill Supervisor Provide useful logging information to user Communicate executor and job state
information to Windmill (ProdDB)
7
Capone Execution Environment GCE Server side
ATLAS releases and transformations Pacman installation, dynamically by grid-based jobs
Execution sandbox Chimera kickstart executable Transformation wrapper scripts
MDS info providers (required site-specific attributes) GCE Client side (web service)
Capone Chimera/Pegasus, Condor-G (from VDT) Globus RLS and DQ clients
“GCE” = Grid Component Environment“GCE” = Grid Component Environment
8
Capone Architecture Message interface
Web Service Jabber
Translation layer Windmill schema
CPE (Process Engine) Processes
Grid3: GCE interface Stub: local shell testing DonQuijote (future)
Message protocols
Translation
Web Service
CPE
Jabber
Windmill ADA
Stu
b
Grid
Do
nQ
uijo
te
9
Capone System Elements GriPhyN Virtual Data System (VDS) Transformation
A workflow accepting input data (datasets), parameters and producing output data (datasets)
Simple (executable)/Complex (DAG) Derivation
Transformation where the parameters have been bound to actual parameters
Directed Acyclic Graph (DAG) Abstract DAG (DAX) created by Chimera, with no
reference to concrete elements in the Grid Concrete DAG (cDAG) created by Pegasus, where CE, SE
and PFN have been assigned Globus, RLS, Condor
10
Capone Grid Interactions
Capone
Condor-G
schedd
GridMgr
CEgatekeeper
gsiftpWN
SE
Chimera
RLS Monitoring MDSGridCat
MonALISA
Windmill
Pegasus
ProdDB
VDC
DonQuijote
11
Capone
Condor-G
schedd
GridMgr
CEgatekeeper
gsiftpWN
SE
Chimera
RLS Monitoring MDSGridCat
MonALISA
Windmill
Pegasus
ProdDB
VDC
DonQuijote
A job in Capone (1, submission) Reception
Job received from Windmill Translation
Un-marshalling, ATLAS transformation DAX generation
Chimera generates abstract DAG Input file retrieval from RLS catalog
Check RLS for input LFNs (retrieval of GUID, PFN) Scheduling: CE and SE are chosen Concrete DAG generation and submission
Pegasus creates Condor submit files DAGMan invoked to manage remote steps
Capone
Condor-G
schedd
GridMgr
CEgatekeeper
gsiftpWN
SE
Chimera
RLS Monitoring MDSGridCat
MonALISA
Windmill
Pegasus
ProdDB
VDC
DonQuijote
12
Capone
Condor-G
schedd
GridMgr
CEgatekeeper
gsiftpWN
SE
Chimera
RLS Monitoring MDSGridCat
MonALISA
Windmill
Pegasus
ProdDB
VDC
DonQuijote
A job in Capone (2, execution) Remote job running / status checking
Stage-in of input files, create POOL FileCatalog Athena (ATLAS code) execution
Remote Execution Check Verification of output files and exit codes Recovery of metadata (GUID, MD5sum, exe attributes)
Stage Out: transfer from CE site to destination SE Output registration
Registration of the output LFN/PFN and metadata in RLS Finish
Job completed successfully, communicates to Windmill that jobs is ready for validation
Job status is sent to Windmill during all the execution Windmill/DQ validate & register output in ProdDB
13
Performance Summary (9/20/04) Several physics and calibration samples produced 56K job attempts at Windmill level
9K of these aborted before grid submission: mostly RLS down or selected CE down
“Full” success rate: 66% Average success after submitted: 70% Includes subsequent problems at submit host Includes errors from development
60 CPU-years consumed since July 8 TB produced
Job status Capone Total
failed 18812
finished 37371
14
ATLAS DC2 CPU usage
LCG41%
Grid330%
NorduGrid29%
G. Poulard, 9/21/04G. Poulard, 9/21/04
Total ATLAS DC2
~ 1470 kSI2k.months~ 100000 jobs~ 7.94 million events~ 30 TB
Total ATLAS DC2
~ 1470 kSI2k.months~ 100000 jobs~ 7.94 million events~ 30 TB
15
Ramp up ATLAS DC2
Sep 10
Mid July
CP
U-d
ay
16J. Shank, 9/21/04J. Shank, 9/21/04
Job Distribution on Grid3UTA_dpcc
17%
BNL_ATLAS17%
UC_ATLAS_Tier214%
BU_ATLAS_Tier213%
IU_ATLAS_Tier210%
UCSanDiego_PG5%
UM_ATLAS4%
UBuffalo_CCR4%
PDSF4%
FNAL_CMS4%
CalTech_PG4%
Others4%
17
# CE Gatekeeper TotalJobs Finished
Jobs Failed
Success Rate (%)
1 UTA_dpcc 8817 6703 2114 76.02
2 UC_ATLAS_Tier2 6132 4980 1152 81.21
3 BU_ATLAS_Tier2 6336 4890 1446 77.18
4 IU_ATLAS_Tier2 4836 3625 1211 74.96
5 BNL_ATLAS_BAK 4579 3591 988 78.42
6 BNL_ATLAS 3116 2548 568 81.77
7 UM_ATLAS 3583 1998 1585 55.76
8 UCSanDiego_PG 2097 1712 385 81.64
9 UBuffalo_CCR 1925 1594 331 82.81
10 FNAL_CMS 2649 1456 1193 54.96
11 PDSF 2328 1430 898 61.43
12 CalTech_PG 1834 1350 484 73.61
13 SMU_Physics_Cluster 660 438 222 66.36
14 Rice_Grid3 493 363 130 73.63
15 UWMadison 516 258 258 50.00
16 FNAL_CMS2 343 228 115 66.47
17 UFlorida_PG 394 182 212 46.19
Site Statistics (9/20/04)
Average success rate by site: 70%
18
Capone & Grid3 Failure Statistics Total jobs (validated) 37713 Jobs failed 19303
Submission 472 Execution 392 Post-job check 1147 Stage out 8037 RLS registration 989 Capone host interruptions 2725 Capone succeed, Windmill fail 57 Other 5139
9/20/04
19
Production lessons Single points of failure
Production database RLS, DQ, VDC and Jabber servers
One local network domain Distributed RLS
System expertise (people) Fragmented production software Fragmented operations (defining/fixing jobs in the production
database) Client (Capone submit) hosts
Load and memory requirements for job management Load caused by job state checking (interaction with Condor-G) Many processes
No client host persistency Need local database for job recovery next phase of development
DOEGrids certificate or certificate revocation list expiration
20
Production lessons (II) Site infrastructure problems
Hardware problems Software distribution, transformation upgrades File systems (NFS major culprit); various solutions by site
administrators Errors in stage-out caused by poor network connections and
gatekeeper load. Fixed by adding I/O throttling, checking number of TCP connections
Lack of storage management (eg SRM) on sites means submitters do some cleanup remotely. Not a major problem so far, but we’ve not had much competition
Load on gatekeepers Improved by moving md5sum off gatekeeper
Post job processing Remote execution (mostly in pre/post job) error prone Reason of the failure difficult to understand
No automated tools for validation
21
Operations Lessons Grid3 iGOC and US Tier1 developed operations response model Tier1 center
core services “on-call” person available always response protocol developed
iGOC Coordinates problem resolution for Tier1 “off hours” Trouble handling for non-ATLAS Grid3 sites. Problems resolved
at weekly iVDGL operations meetings Shift schedule (8-midnight since July 23)
7 trained DC2 submitters Keeps queues saturated, reports sites and system problems,
cleans working directories Extensive use of email lists
Partial use of alternatives like Web portals, IM
22
Conclusions Completely new system
Grid3 simplicity requires more functionality and state management on the executor submit host
All functions of job planning, job state tracking, and data management (stage-in, out) managed by Capone rather than grid systems clients exposed to all manner of grid failures good for experience, but a client-heavy system
Major areas for upgrade to Capone system Job state management and controls, state persistency Generic transformation handling for user-level
production
23
Authors GIERALTOWSKI, Gerald (Argonne National Laboratory)
MAY, Edward (Argonne National Laboratory)VANIACHINE, Alexandre (Argonne National Laboratory) SHANK, Jim (Boston University)YOUSSEF, Saul (Boston University)BAKER, Richard (Brookhaven National Laboratory)DENG, Wensheng (Brookhaven National Laboratory)NEVSKI, Pavel (Brookhaven National Laboratory)MAMBELLI, Marco (University of Chicago)GARDNER, Robert (University of Chicago)SMIRNOV, Yuri (University of Chicago)ZHAO, Xin (University of Chicago)LUEHRING, Frederick (Indiana University)SEVERINI, Horst (Oklahoma University)DE, Kaushik (University of Texas at Arlington)MCGUIGAN, Patrick (University of Texas at Arlington) OZTURK, Nurcan (University of Texas at Arlington)SOSEBEE, Mark (University of Texas at Arlington)
24
Acknowledgements Windmill team (Kaushik De) Don Quijote team (Miguel Branco) ATLAS production group, Luc Goossens, CERN IT (prodDB) ATLAS software distribution team (Alessandro de Salvo, Fred
Luehring) US ATLAS testbed sites and Grid3 site administrators iGOC operations group ATLAS Database group (ProdDB Capone-view displays) Physics Validation group: UC Berkeley, Brookhaven Lab More info
US ATLAS Grid http://www.usatlas.bnl.gov/computing/grid/ DC2 shift procedures http://grid.uchicago.edu/dc2shift US ATLAS Grid Tools & Services http://grid.uchicago.edu/gts/